Bug 192249 - [PATCH] Load reloc modules (amd64, mips) faster by caching global syms
Summary: [PATCH] Load reloc modules (amd64, mips) faster by caching global syms
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Bryan Drewery
Depends on:
Reported: 2014-07-29 19:35 UTC by Conrad Meyer
Modified: 2015-04-02 20:23 UTC (History)
1 user (show)

See Also:

(cd src/ && patch -p1 ; diff is against recent-ish CURRENT) (2.97 KB, patch)
2014-07-29 19:35 UTC, Conrad Meyer
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Conrad Meyer 2014-07-29 19:35:46 UTC
Created attachment 145126 [details]
(cd src/ && patch -p1 ; diff is against recent-ish CURRENT)

The problem:

At Isilon, efs.ko is a very large kernel module; we were seeing kldload take 12-13 seconds. Most of that time was spent repeatedly resolving the same global symbols for relocations. link_elf_obj would lookup the same globals many times; each time, it would lookup in the local module first, before missing and falling back to deps, where it would find 'kernel' and get a fast hit (hash table hit). Since ld(1) doesn't generate lookup hashes (SysV or GNU) for relocatable objects (kmods on AMD64), lookups in the local module involved walking the entire symtab and strcmp()ing. efs.ko has 32k symbols. This was very slow.


Add a special ELF SHN_ value (in the SHN_LOOS to SHN_HIOS range) to represent cached SHN_UNDEF globals. Cache the results of global lookup during relocatable object load.

Modify Dtrace to ignore these SHN_FBSD_CACHED symbols the same way it ignored SHN_UNDEF symbols before. I didn't find other users of SHN_UNDEF that might be negatively affected by this change.

Testing done:

Basic Dtrace testing suggests it isn't broken. Performance results:

# for i in 1 2 3 ; do time kldload efs.ko ; kldunload efs.ko ; done
kldload efs.ko  0.00s user 13.00s system 103% cpu 12.607 total
kldload efs.ko  0.00s user 13.25s system 102% cpu 12.876 total
kldload efs.ko  0.00s user 13.14s system 103% cpu 12.752 total

kldload zfs.ko  0.00s user 1.77s system 102% cpu 1.729 total    # Smaller, for comparison

kldload efs.ko  0.00s user 0.25s system 99% cpu 0.252 total
kldload efs.ko  0.00s user 0.20s system 99% cpu 0.204 total
kldload efs.ko  0.00s user 0.21s system 99% cpu 0.211 total

kldload zfs.ko  0.00s user 0.04s system 96% cpu 0.040 total     # For comparison

Sponsored by:	EMC / Isilon storage division
Comment 1 Conrad Meyer 2015-01-29 15:25:36 UTC
Review:  https://reviews.freebsd.org/D1718
Comment 2 Bryan Drewery freebsd_committer 2015-04-02 20:23:40 UTC
Author: kib
Date: Thu Apr  2 20:14:51 2015
New Revision: 281003
URL: https://svnweb.freebsd.org/changeset/base/281003

  Speed up symbol lookup for the amd64 kernel modules.
  Amd64 uses relocatable object files as the modules format.  It is good
  WRT not having unneeded overhead for PIC code, in particular, due to
  absence of useless GOT and PLT.  But the cost is that the module
  linking process cannot use hash to speed up the symbol lookup, and
  that each reference to the symbol requiring a relocation, instead of
  single-place relocation in GOT.
  Cache the successfull symbol lookup results in the module symbol
  table, using the newly allocated SHN_FBSD_CACHED value from
  SHN_LOOS-HIOS range as an indicator.  The SHN_FBSD_CACHED together
  with the non-existent definition of the found symbol are reverted
  after successfull relocations, which is done under kld_sx lock, so it
  should not be visible to other consumers of the symbol table.
  Submitted by:	Conrad Meyer
  Differential Revision:  https://reviews.freebsd.org/D1718
  MFC after:	3 weeks