Created attachment 145126 [details] (cd src/ && patch -p1 ; diff is against recent-ish CURRENT) The problem: At Isilon, efs.ko is a very large kernel module; we were seeing kldload take 12-13 seconds. Most of that time was spent repeatedly resolving the same global symbols for relocations. link_elf_obj would lookup the same globals many times; each time, it would lookup in the local module first, before missing and falling back to deps, where it would find 'kernel' and get a fast hit (hash table hit). Since ld(1) doesn't generate lookup hashes (SysV or GNU) for relocatable objects (kmods on AMD64), lookups in the local module involved walking the entire symtab and strcmp()ing. efs.ko has 32k symbols. This was very slow. Solution: Add a special ELF SHN_ value (in the SHN_LOOS to SHN_HIOS range) to represent cached SHN_UNDEF globals. Cache the results of global lookup during relocatable object load. Modify Dtrace to ignore these SHN_FBSD_CACHED symbols the same way it ignored SHN_UNDEF symbols before. I didn't find other users of SHN_UNDEF that might be negatively affected by this change. Testing done: Basic Dtrace testing suggests it isn't broken. Performance results: Before # for i in 1 2 3 ; do time kldload efs.ko ; kldunload efs.ko ; done kldload efs.ko 0.00s user 13.00s system 103% cpu 12.607 total kldload efs.ko 0.00s user 13.25s system 102% cpu 12.876 total kldload efs.ko 0.00s user 13.14s system 103% cpu 12.752 total kldload zfs.ko 0.00s user 1.77s system 102% cpu 1.729 total # Smaller, for comparison After kldload efs.ko 0.00s user 0.25s system 99% cpu 0.252 total kldload efs.ko 0.00s user 0.20s system 99% cpu 0.204 total kldload efs.ko 0.00s user 0.21s system 99% cpu 0.211 total kldload zfs.ko 0.00s user 0.04s system 96% cpu 0.040 total # For comparison Sponsored by: EMC / Isilon storage division
Review: https://reviews.freebsd.org/D1718
Author: kib Date: Thu Apr 2 20:14:51 2015 New Revision: 281003 URL: https://svnweb.freebsd.org/changeset/base/281003 Log: Speed up symbol lookup for the amd64 kernel modules. Amd64 uses relocatable object files as the modules format. It is good WRT not having unneeded overhead for PIC code, in particular, due to absence of useless GOT and PLT. But the cost is that the module linking process cannot use hash to speed up the symbol lookup, and that each reference to the symbol requiring a relocation, instead of single-place relocation in GOT. Cache the successfull symbol lookup results in the module symbol table, using the newly allocated SHN_FBSD_CACHED value from SHN_LOOS-HIOS range as an indicator. The SHN_FBSD_CACHED together with the non-existent definition of the found symbol are reverted after successfull relocations, which is done under kld_sx lock, so it should not be visible to other consumers of the symbol table. Submitted by: Conrad Meyer Differential Revision: https://reviews.freebsd.org/D1718 MFC after: 3 weeks Modified: head/sys/kern/link_elf_obj.c head/sys/sys/elf_common.h