We have been tracking a performance regression in FreeBSD 11 stable based smbd pegs a cpu and takes much longer to list huge directories than it's FreeBSD 9.x base counterpart . Profiling showed that the time was geometrically related to file count within the directory. Samba emits a telldir for every file during a readdir; and in 9.x that seemed to run in linear time and things where fine; however this change appears to expands the workload of list scan to O(n^2) vs file count. https://svnweb.freebsd.org/base/stable/11/lib/libc/gen/telldir.c?r1=235647&r2=269204 For a directory with 64k files, the performance is as follows when driven by samba listing files. Identical hardware and zfs data sets are used: using dtrace script: BEGIN { printf("thinking, hit control-c when you are tired of it");} pid$1::$2:entry { self->st= timestamp; } pid$1::$2:return { @[execname,"delta(ns)" ] = quantize( timestamp - self->st); self->st = 0; } 9.3: #./dt_time_in_pid_func.dt `top -b | head -10 | tail -1 | chomp | cut -w -f1` telldir dtrace: script './dt_time_in_pid_func.dt' matched 3 probes CPU ID FUNCTION:NAME 7 1 :BEGIN thinking, hit control-c when you are tired of it ^C smbd delta(ns) value ------------- Distribution ------------- count 2048 | 0 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 53883 8192 |@ 730 16384 | 502 and under 11-stable(ish): #dtrace -s dtprofile.is_in_path.dt `top -b | head -10 | tail -1 | chomp | cut -w -f1` telldir dtrace: script 'dtprofile.is_in_path.dt' matched 2 probes ^C deltans value ------------- Distribution ------------- count 2048 | 0 4096 | 90 8192 | 8 16384 | 0 32768 | 0 65536 |@ 1583 131072 |@@@@@@@@@@ 12270 262144 |@@@@@@@@@@@@@@@@@@@@@@ 28159 <<- libc telldir takes how long now?! 524288 |@@@@@@@ 9299 1048576 | After reverting the telldir change shamelessly: https://github.com/freenas/os/commit/92873f3190c830302143d759411b23bd719b0ba2 Performance for the telldir returned to constant time. The change appears important to @standards however the impact is tough to explain to samba users. To conjecture, I wonder if a run time tunable could select the 'conforming' or 'fast' behaviour for telldir like LD_PRELOAD... style directives.
Notify committer and reviewer of the rather-old r269204 commit (itself based on PR 121656, Phabricator https://phabric.freebsd.org/D490.)
So this wasn't purely about standards@ compliance, but there was actual software that was broken on FreeBSD because of this (it might have involved using telldir() on a large directory accessed via NFS where the client broke. If it's the thing I'm thinking of then you would have an 'ls' on a large NFS directory that would never complete). The larger fix is to change getdirentries() to report a valid seek location with each 'struct dirent'. That depends on changing the contents of 'struct dirent' which is a non-trivial thing to do, but is included in the ongoing 'ino64' work. Once that is in place you don't need the telldir cookies at all. One thing you could do for now is replace the linear O(n) search with something better like a tree that you can do a binary search on.
I propose such a patch for this problem: https://reviews.freebsd.org/D10408 . On my synthetic tests it shows great results with minimal additional complexity.
nice.. Effect under smbd workload is good. smbd delta(ns) value ------------- Distribution ------------- count 1024 | 0 2048 |@@@@@@@@ 10912 4096 |@@@@@@@@@@@ 15434 8192 | 221 ...
A commit references this bug: Author: mav Date: Mon Apr 17 19:03:31 UTC 2017 New revision: 317064 URL: https://svnweb.freebsd.org/changeset/base/317064 Log: Optimize pathologic case of telldir() for Samba. When application reads large directory, calling telldir() for each entry, like Samba does, it creates exponential performance drop as number of entries reach tenths to hundreds of thousands. It is caused by full search through the internal list, that never finds matches in that scenario, but creates O(n^2) delays. This patch optimizes that search, limiting it to entries of the same buffer, turning time closer to O(n) in case of linear directory scan. PR: 218622 Reviewed by: jhb, jilles MFC after: 2 weeks Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D10408 Changes: head/lib/libc/gen/telldir.c
I've merged the patch into 11 and 10 stable.
FWIW I have further improved upon mav's solution with r326640. Now, telldir is O(1) in both time and space on 64-bit systems. r326640 | asomers | 2017-12-06 15:06:48 -0700 (Wed, 06 Dec 2017) | 22 lines Optimize telldir(3) Currently each call to telldir() requires a malloc and adds an entry to a linked list which must be traversed on future telldir(), seekdir(), closedir(), and readdir() calls. Applications that call telldir() for every directory entry incur O(n^2) behavior in readdir() and O(n) in telldir() and closedir(). This optimization eliminates the malloc() and linked list in most cases by packing the relevant information into a single long. On 64-bit architectures msdosfs, NFS, tmpfs, UFS, and ZFS can all use the packed representation. On 32-bit architectures msdosfs, NFS, and UFS can use the packed representation, but ZFS and tmpfs can only use it for about the first 128 files per directory. Memory savings is about 50 bytes per telldir(3) call. Speedup for telldir()-heavy directory traversals is about 20-30x for one million files per directory. Reviewed by: kib, mav, mckusick MFC after: 3 weeks Sponsored by: Spectra Logic Corp Differential Revision: https://reviews.freebsd.org/D13385