Summary: | [libc] getpwent(3) functions on 5.3 with large password file extremely slow | ||
---|---|---|---|
Product: | Base System | Reporter: | bruce <bruce> |
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
Status: | Open --- | ||
Severity: | Affects Only Me | CC: | benno, ngie |
Priority: | Normal | ||
Version: | 5.3-RELEASE | ||
Hardware: | Any | ||
OS: | Any |
Description
bruce
2005-01-05 15:50:46 UTC
I have a workaround. I copied /usr/src/lib/libc/gen/getpwent.c from a 4.7 system onto the 5.3 system, and then remade and installed libc. Load average on our mail server went from 70 to 2. -- Bruce Campbell Engineering Computing CPH-2374B University of Waterloo (519)888-4567 ext 5889 ---------------------------------------- This mail sent through www.mywaterloo.ca Adding relevant information to followup ----- Forwarded message from Bruce Campbell <bruce@engmail.uwaterloo.ca> ----- Date: Wed, 5 Jan 2005 11:08:22 -0500 From: Bruce Campbell <bruce@engmail.uwaterloo.ca> To: freebsd-questions@freebsd.org In-Reply-To: <1104932867.41dbf003405b1@www.nexusmail.uwaterloo.ca> X-Originating-IP: 129.97.50.50 Subject: Re: New FreeBSD 5.3 e-mail server extremely slow - traced to getpwnam maybe ? Quoting Bruce Campbell <bruce@engmail.uwaterloo.ca>: > > On Tue, Jan 04, 2005 at 09:27:27PM -0500, Bruce Campbell wrote: > > > > > I wrote a small program: > > > > > > #include <sys/types.h> > > > #include <pwd.h> > > > > > > main( int argc, char *argv[] ) > > > { > > > getpwuid( 13076 ); > > > } > > > > > > and ran it under truss on 5.x and it generated 178,711 lines of output. > > > (the bulk of which is those lseek/read calls as above) It looks like the overhaul of getpwent Apr/2003 to make it thread safe: http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/gen/getpwent.c may be the problem. I've tested the dbm_fetch function independently on a large file, and it is fine. I've opened a bug report, and plan to build a replacement 4.x mail server, as the most deterministic path to restoring adequate e-mail service to our users. Can anyone suggest a workaround ? -- Bruce Campbell Engineering Computing CPH-2374B University of Waterloo (519)888-4567 ext 5889 ---------------------------------------- This mail sent through www.mywaterloo.ca _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" ----- End forwarded message ----- Adding relevant info to followup ----- Forwarded message from Bruce Campbell <bruce@engmail.uwaterloo.ca> ----- Date: Wed, 05 Jan 2005 08:47:47 -0500 From: Bruce Campbell <bruce@engmail.uwaterloo.ca> Subject: Re: New FreeBSD 5.3 e-mail server extremely slow - traced to getpwnam maybe ? To: Kris Kennaway <kris@obsecurity.org> Cc: freebsd-questions@freebsd.org Quoting Kris Kennaway <kris@obsecurity.org>: > On Tue, Jan 04, 2005 at 09:27:27PM -0500, Bruce Campbell wrote: > > > I wrote a small program: > > > > #include <sys/types.h> > > #include <pwd.h> > > > > main( int argc, char *argv[] ) > > { > > getpwuid( 13076 ); > > } > > > > and ran it under truss on 5.x and it generated 178,711 lines of output. > > (the bulk of which is those lseek/read calls as above) > > ... > > Try tuning the pwd_mkdb parameters (see hash(3)) in > /usr/src/usr.sbin/pwd_mkdb/pwd_mkdb.c and recompile: > > HASHINFO openinfo = { > 4096, /* bsize */ > 32, /* ffactor */ > 256, /* nelem */ > 2048 * 1024, /* cachesize */ > NULL, /* hash() */ > 0 /* lorder */ > }; > > e.g. adjust nelem to 12000 to accomodate your > significantly-larger-than-average password database. If this helps, > please submit a PR requesting that someone make an option to pwd_mkdb > to tune this at runtime (or better yet, submit the patch to do this > yourself - it's straightforward to modify the source to do this). Thanks. That had no effect on the large number of seeks/reads to do a getpwuid of a specific uid. I tried boosting that number further, still no change. I suspect the problem is related to some change to the hash functions between 4.7 and 5.2.1 and I hope to get to the bottom of it today. I tried two getpwnam (as opposed to getpwuid) calls on 2 different userids, one took 1000 seek/reads, the other 16,000, so it's all pretty random, no doubt related to how stuff gets hashed. On 4.7 it takes just one or two reads/seeks. As each login via ipop, imap, and each sendmail, and just about everything will be doing getpwnam's I think this is our problem. -- Bruce Campbell Engineering Computing CPH-2374B University of Waterloo (519)888-4567 ext 5889 ---------------------------------------- This mail sent through www.mywaterloo.ca ----- End forwarded message ----- If anyone would like to investigate, I've just installed a new 5.3 system and created a dummy 10,000 entry password file. Please contact me for the password and hostname. -- Bruce Campbell Engineering Computing CPH-2374B University of Waterloo (519)888-4567 ext 5889 ---------------------------------------- This mail sent through www.mywaterloo.ca One of my co-workers appears to be on to something... ----- Forwarded message from Paul McKone <pdmckone at engmail uwaterloo ca> ----- After a bit of reading: in /etc/nsswitch.conf passwd: compat produces lots and lots of lseek's passwd: files produces the expected number "compat" is the compatibility database method for nis and stuff, or something like that, called by nsdispatch, which is called by getpwent.c "files" just looks at /etc/passwd and /etc/master.passwd I've changed the entries for group and passwd in ecserv14's /etc/nsswitch.conf group: files passwd: files and the output from "truss -f /root/a.out | & wc -l" has grone from 16341 to 119 lines. We are onto something here. I don't know what's going on, but it's tied into "nsdispatch" and "compat" somehow. PDM -- Paul McKone Engineering Computing Waterloo Unlimited BASc Systems Manager Development Team =========== innovation through support building a community of scholars ==== phone: 519-888-4567 ext 2757(am) ext 5140(pm) ==== email: pdmckone engmail.uwaterloo.ca paul unlimited.uwaterloo.ca ==== https: www.eng.uwaterloo.ca/~pdmckone www.unlimited.uwaterloo.ca ---------------------------------------- This mail sent through www.mywaterloo.ca ----- End forwarded message ----- -- Bruce Campbell Engineering Computing CPH-2374B University of Waterloo (519)888-4567 ext 5889 ---------------------------------------- This mail sent through www.mywaterloo.ca This problem/bug still exists in 6.3-PRERELEASE (6.3-PRERELEASE FreeBSD 6.3-PRERELEASE #0: Wed Dec 5 13:35:05 CET 2007). The more users there are in the password db, the bigger the impact on performance (we had around 74000 entries and this renders the system unusable). The mentioned workaround (nsswitch.conf) does help a lot. Performance does not match that of 4-stable, but it handles large(r) user bases well enough. Responsible Changed From-To: freebsd-bugs->adrian This tickles my fancy. Responsible Changed From-To: krion->adrian to committer We had someone run into this issue on the postfix mailing list recently with FreeBSD 8.2-PRERELEASE. Here is the link to the thread on the Postfix list: http://tech.groups.yahoo.com/group/postfix-users/message/271899 I was about to open another PR for nsswitch.conf for lack of any better idea on where to file this when someone mentioned this PR. I have no idea what could be done to make for a better experience out of the box. Here is the thread where I first asked about this in 2007: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2007-04/msg00193.html Here is some current testing from the same box 3+ years later with a 90 line passwd file on the host and all the users in a jail on the same machine: $ uname -a FreeBSD web1.hosting.tcworks.net 7.4-PRERELEASE FreeBSD 7.4-PRERELEASE #0: Wed Dec 1 11:02:50 CST 2010 root@web1.hosting.tcworks.net:/usr/obj/usr/src/sys/GENERIC i386 $ wc -l /etc/passwd 90 /etc/passwd lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:14:16 Thu Dec 16 $ egrep '^(group|passwd):' /etc/nsswitch.conf group: files passwd: files lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:14:31 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.124s user 0m0.095s sys 0m0.040s lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:14:34 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.113s user 0m0.053s sys 0m0.071s lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:14:35 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.124s user 0m0.078s sys 0m0.056s $ sudo vi /etc/nsswitch.conf lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:15:37 Thu Dec 16 $ egrep '^(group|passwd):' /etc/nsswitch.conf group: compat passwd: compat lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:15:47 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.748s user 0m0.384s sys 0m0.355s lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:15:51 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.743s user 0m0.425s sys 0m0.317s lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:15:54 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.734s user 0m0.381s sys 0m0.351s Then from inside the jail: $ wc -l /etc/passwd 3935 /etc/passwd lambert@users.tcworks.net /home 18:04:11 Thu Dec 16 $ egrep '^(group|passwd):' /etc/nsswitch.conf group: compat passwd: compat lambert@users.tcworks.net /home 18:16:32 Thu Dec 16 $ time ls -l | wc -l 3917 real 1m0.092s user 0m11.030s sys 0m47.301s I ran top outside the jail while running ls the first time and got: 68759 997 1 116 0 4492K 2140K CPU0 0 0:25 88.96% ls later during the same run ... 68759 997 1 118 0 4492K 2240K CPU0 0 0:57 98.29% ls lambert@users.tcworks.net /home 18:17:35 Thu Dec 16 $ time ls -l | wc -l 3917 real 1m1.616s user 0m11.464s sys 0m48.228s lambert@users.tcworks.net /home 18:18:57 Thu Dec 16 $ time ls -l | wc -l 3917 real 1m0.295s user 0m12.076s sys 0m46.384s lambert@users.tcworks.net /home 18:20:08 Thu Dec 16 $ $ sudo vi /etc/nsswitch.conf lambert@users.tcworks.net /home 18:20:49 Thu Dec 16 $ egrep '^(group|passwd):' /etc/nsswitch.conf group: files passwd: files lambert@users.tcworks.net /home 18:20:53 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.148s user 0m0.086s sys 0m0.074s lambert@users.tcworks.net /home 18:20:59 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.158s user 0m0.078s sys 0m0.089s lambert@users.tcworks.net /home 18:21:00 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.157s user 0m0.035s sys 0m0.134s -- Scott Lambert KC5MLE Unix SysAdmin lambert@lambertfam.org We had someone run into this issue on the postfix mailing list recently with FreeBSD 8.2-PRERELEASE. Here is the link to the thread on the Postfix list: http://tech.groups.yahoo.com/group/postfix-users/message/271899 I was about to open another PR for nsswitch.conf for lack of any better idea on where to file this when someone mentioned this PR. I have no idea what could be done to make for a better experience out of the box. Here is the thread where I first asked about this in 2007: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2007-04/msg00193.html Here is some current testing from the same box 3+ years later with a 90 line passwd file on the host and all the users in a jail on the same machine: $ uname -a FreeBSD web1.hosting.tcworks.net 7.4-PRERELEASE FreeBSD 7.4-PRERELEASE #0: Wed Dec 1 11:02:50 CST 2010 root@web1.hosting.tcworks.net:/usr/obj/usr/src/sys/GENERIC i386 $ wc -l /etc/passwd 90 /etc/passwd lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:14:16 Thu Dec 16 $ egrep '^(group|passwd):' /etc/nsswitch.conf group: files passwd: files lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:14:31 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.124s user 0m0.095s sys 0m0.040s lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:14:34 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.113s user 0m0.053s sys 0m0.071s lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:14:35 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.124s user 0m0.078s sys 0m0.056s $ sudo vi /etc/nsswitch.conf lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:15:37 Thu Dec 16 $ egrep '^(group|passwd):' /etc/nsswitch.conf group: compat passwd: compat lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:15:47 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.748s user 0m0.384s sys 0m0.355s lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:15:51 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.743s user 0m0.425s sys 0m0.317s lambert@web1.hosting.tcworks.net /home/ezjails/users.tcworks.net/home 18:15:54 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.734s user 0m0.381s sys 0m0.351s Then from inside the jail: $ wc -l /etc/passwd 3935 /etc/passwd lambert@users.tcworks.net /home 18:04:11 Thu Dec 16 $ egrep '^(group|passwd):' /etc/nsswitch.conf group: compat passwd: compat lambert@users.tcworks.net /home 18:16:32 Thu Dec 16 $ time ls -l | wc -l 3917 real 1m0.092s user 0m11.030s sys 0m47.301s I ran top outside the jail while running ls the first time and got: 68759 997 1 116 0 4492K 2140K CPU0 0 0:25 88.96% ls later during the same run ... 68759 997 1 118 0 4492K 2240K CPU0 0 0:57 98.29% ls lambert@users.tcworks.net /home 18:17:35 Thu Dec 16 $ time ls -l | wc -l 3917 real 1m1.616s user 0m11.464s sys 0m48.228s lambert@users.tcworks.net /home 18:18:57 Thu Dec 16 $ time ls -l | wc -l 3917 real 1m0.295s user 0m12.076s sys 0m46.384s lambert@users.tcworks.net /home 18:20:08 Thu Dec 16 $ $ sudo vi /etc/nsswitch.conf lambert@users.tcworks.net /home 18:20:49 Thu Dec 16 $ egrep '^(group|passwd):' /etc/nsswitch.conf group: files passwd: files lambert@users.tcworks.net /home 18:20:53 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.148s user 0m0.086s sys 0m0.074s lambert@users.tcworks.net /home 18:20:59 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.158s user 0m0.078s sys 0m0.089s lambert@users.tcworks.net /home 18:21:00 Thu Dec 16 $ time ls -l | wc -l 3917 real 0m0.157s user 0m0.035s sys 0m0.134s -- Scott Lambert KC5MLE Unix SysAdmin lambert@lambertfam.org Responsible Changed From-To: adrian->freebsd-bugs with bugmeister hat on, reassign back to mailing list. (adrian did not respond to a request for status). The reason for this is how the compat code scans the password table looking for compat entries. The algorithm is O(m * n) where m is the number of entries to lookup (in the case of the ls -l /home) and n is the number of lines in the password database. This is clearly pessimized and should be able to be better optimized. The current logic is something like: scan each entry in the passwd db: if it starts with '+/-' do compat checking if the key matches, return The logic for files is something like: Lookup exact entry in db, return if found. The process could probably do some amount of intelligence to avoid rescanning the password database on every lookup, but I can't think of anything off the top of my head (it's late). I'll think about it some more. This seems like something we might want to fix, benno? |