Bug 218622 - libc/gen/telldir [hack-n-PATCH] performance limited to O(n) vs file count, O(n^2) against samba ls workload
Summary: libc/gen/telldir [hack-n-PATCH] performance limited to O(n) vs file count, O...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: standards (show other bugs)
Version: 11.0-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-standards (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-13 02:59 UTC by ash
Modified: 2018-01-17 21:54 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ash 2017-04-13 02:59:27 UTC
We have been tracking a performance regression in FreeBSD 11 stable based smbd pegs a cpu and takes much longer to list huge directories than it's FreeBSD 9.x base counterpart .  Profiling showed that the time was geometrically related to file count within the directory. 

Samba emits a telldir for every file during a readdir; and in 9.x that seemed to run in linear time and things where fine; however this change appears to expands the workload of list scan to O(n^2) vs file count. 
https://svnweb.freebsd.org/base/stable/11/lib/libc/gen/telldir.c?r1=235647&r2=269204

For a directory with 64k files, the performance is as follows when driven by samba listing files. Identical hardware and zfs data sets are used:

using dtrace script:
BEGIN { printf("thinking, hit control-c when you are tired of it");}

pid$1::$2:entry
        { self->st= timestamp; }
pid$1::$2:return
        {
        @[execname,"delta(ns)" ] = quantize(  timestamp - self->st);
        self->st = 0;
        }

9.3: 
#./dt_time_in_pid_func.dt `top -b | head -10 | tail -1 | chomp | cut -w -f1` telldir 
dtrace: script './dt_time_in_pid_func.dt' matched 3 probes
CPU     ID                    FUNCTION:NAME
  7      1                           :BEGIN thinking, hit control-c when you are tired of it
^C

  smbd                                                delta(ns)                                         
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  53883    
            8192 |@                                        730      
           16384 |                                         502     

and under 11-stable(ish):
 #dtrace -s dtprofile.is_in_path.dt `top -b | head -10 | tail -1 | chomp | cut -w -f1` telldir  
dtrace: script 'dtprofile.is_in_path.dt' matched 2 probes
^C

  deltans                                           
           value  ------------- Distribution ------------- count    
            2048 |                                         0        
            4096 |                                         90       
            8192 |                                         8        
           16384 |                                         0        
           32768 |                                         0        
           65536 |@                                        1583     
          131072 |@@@@@@@@@@                               12270    
          262144 |@@@@@@@@@@@@@@@@@@@@@@                   28159    <<- libc telldir takes how long now?!
          524288 |@@@@@@@                                  9299     
         1048576 |      



After reverting the telldir change shamelessly:
https://github.com/freenas/os/commit/92873f3190c830302143d759411b23bd719b0ba2

Performance for the telldir returned to constant time. 

The change appears important to @standards however the impact is tough to explain to samba users.  To conjecture, I wonder if a run time tunable could select the 'conforming' or 'fast' behaviour for telldir like  LD_PRELOAD... style directives.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2017-04-13 03:18:36 UTC
Notify committer and reviewer of the rather-old r269204 commit (itself based on PR 121656, Phabricator https://phabric.freebsd.org/D490.)
Comment 2 John Baldwin freebsd_committer freebsd_triage 2017-04-14 17:37:36 UTC
So this wasn't purely about standards@ compliance, but there was actual software that was broken on FreeBSD because of this (it might have involved using telldir() on a large directory accessed via NFS where the client broke.  If it's the thing I'm thinking of then you would have an 'ls' on a large NFS directory that would never complete).  The larger fix is to change getdirentries() to report a valid seek location with each 'struct dirent'.  That depends on changing the contents of 'struct dirent' which is a non-trivial thing to do, but is included in the ongoing 'ino64' work.  Once that is in place you don't need the telldir cookies at all.  One thing you could do for now is replace the linear O(n) search with something better like a tree that you can do a binary search on.
Comment 3 Alexander Motin freebsd_committer freebsd_triage 2017-04-16 08:19:00 UTC
I propose such a patch for this problem: https://reviews.freebsd.org/D10408 .
On my synthetic tests it shows great results with minimal additional complexity.
Comment 4 ash 2017-04-17 16:05:48 UTC
nice..

Effect under smbd workload is good.

smbd                                                delta(ns)                                         
           value  ------------- Distribution ------------- count    
            1024 |                                         0        
            2048 |@@@@@@@@                         10912    
            4096 |@@@@@@@@@@@                  15434    
            8192 |                                         221      
...
Comment 5 commit-hook freebsd_committer freebsd_triage 2017-04-17 19:04:04 UTC
A commit references this bug:

Author: mav
Date: Mon Apr 17 19:03:31 UTC 2017
New revision: 317064
URL: https://svnweb.freebsd.org/changeset/base/317064

Log:
  Optimize pathologic case of telldir() for Samba.

  When application reads large directory, calling telldir() for each entry,
  like Samba does, it creates exponential performance drop as number of
  entries reach tenths to hundreds of thousands.  It is caused by full search
  through the internal list, that never finds matches in that scenario, but
  creates O(n^2) delays.  This patch optimizes that search, limiting it to
  entries of the same buffer, turning time closer to O(n) in case of linear
  directory scan.

  PR:		218622
  Reviewed by:	jhb, jilles
  MFC after:	2 weeks
  Sponsored by:	iXsystems, Inc.
  Differential Revision:	https://reviews.freebsd.org/D10408

Changes:
  head/lib/libc/gen/telldir.c
Comment 6 Alexander Motin freebsd_committer freebsd_triage 2017-05-01 06:07:45 UTC
I've merged the patch into 11 and 10 stable.
Comment 7 Alan Somers freebsd_committer freebsd_triage 2018-01-17 21:54:34 UTC
FWIW I have further improved upon mav's solution with r326640.  Now, telldir is O(1) in both time and space on 64-bit systems.

r326640 | asomers | 2017-12-06 15:06:48 -0700 (Wed, 06 Dec 2017) | 22 lines

Optimize telldir(3)

Currently each call to telldir() requires a malloc and adds an entry to a
linked list which must be traversed on future telldir(), seekdir(),
closedir(), and readdir() calls. Applications that call telldir() for every
directory entry incur O(n^2) behavior in readdir() and O(n) in telldir() and
closedir().

This optimization eliminates the malloc() and linked list in most cases by
packing the relevant information into a single long. On 64-bit architectures
msdosfs, NFS, tmpfs, UFS, and ZFS can all use the packed representation.  On
32-bit architectures msdosfs, NFS, and UFS can use the packed
representation, but ZFS and tmpfs can only use it for about the first 128
files per directory.  Memory savings is about 50 bytes per telldir(3) call.
Speedup for telldir()-heavy directory traversals is about 20-30x for one
million files per directory.

Reviewed by:    kib, mav, mckusick
MFC after:      3 weeks
Sponsored by:   Spectra Logic Corp
Differential Revision:  https://reviews.freebsd.org/D13385