Created attachment 145126 [details]
(cd src/ && patch -p1 ; diff is against recent-ish CURRENT)
At Isilon, efs.ko is a very large kernel module; we were seeing kldload take 12-13 seconds. Most of that time was spent repeatedly resolving the same global symbols for relocations. link_elf_obj would lookup the same globals many times; each time, it would lookup in the local module first, before missing and falling back to deps, where it would find 'kernel' and get a fast hit (hash table hit). Since ld(1) doesn't generate lookup hashes (SysV or GNU) for relocatable objects (kmods on AMD64), lookups in the local module involved walking the entire symtab and strcmp()ing. efs.ko has 32k symbols. This was very slow.
Add a special ELF SHN_ value (in the SHN_LOOS to SHN_HIOS range) to represent cached SHN_UNDEF globals. Cache the results of global lookup during relocatable object load.
Modify Dtrace to ignore these SHN_FBSD_CACHED symbols the same way it ignored SHN_UNDEF symbols before. I didn't find other users of SHN_UNDEF that might be negatively affected by this change.
Basic Dtrace testing suggests it isn't broken. Performance results:
# for i in 1 2 3 ; do time kldload efs.ko ; kldunload efs.ko ; done
kldload efs.ko 0.00s user 13.00s system 103% cpu 12.607 total
kldload efs.ko 0.00s user 13.25s system 102% cpu 12.876 total
kldload efs.ko 0.00s user 13.14s system 103% cpu 12.752 total
kldload zfs.ko 0.00s user 1.77s system 102% cpu 1.729 total # Smaller, for comparison
kldload efs.ko 0.00s user 0.25s system 99% cpu 0.252 total
kldload efs.ko 0.00s user 0.20s system 99% cpu 0.204 total
kldload efs.ko 0.00s user 0.21s system 99% cpu 0.211 total
kldload zfs.ko 0.00s user 0.04s system 96% cpu 0.040 total # For comparison
Sponsored by: EMC / Isilon storage division
Date: Thu Apr 2 20:14:51 2015
New Revision: 281003
Speed up symbol lookup for the amd64 kernel modules.
Amd64 uses relocatable object files as the modules format. It is good
WRT not having unneeded overhead for PIC code, in particular, due to
absence of useless GOT and PLT. But the cost is that the module
linking process cannot use hash to speed up the symbol lookup, and
that each reference to the symbol requiring a relocation, instead of
single-place relocation in GOT.
Cache the successfull symbol lookup results in the module symbol
table, using the newly allocated SHN_FBSD_CACHED value from
SHN_LOOS-HIOS range as an indicator. The SHN_FBSD_CACHED together
with the non-existent definition of the found symbol are reverted
after successfull relocations, which is done under kld_sx lock, so it
should not be visible to other consumers of the symbol table.
Submitted by: Conrad Meyer
Differential Revision: https://reviews.freebsd.org/D1718
MFC after: 3 weeks