[Please note that the email address given is not valid due to security reasons. Just reply to the gnats entry, and I'll follow it via the web interface.] Between FreeBSD 6.3 and 7.1, the behavior of the NFS server changed for the worse. Under 6.3, the load generated by the nfsd's would never exceed their number (I am using nfs_server_flags="-u -t -n 8"). With 7.1, when the client generates a lot of requests, it seems that the load on the NFS server can grow nearly without bounds, rendering the server all but unusable. I have been testing this with two scenarios: - running a FreeBSD buildworld -j4 with src on NFS and obj local, so there are only NFS reads, and - compiling openoffice.org-3-devel with WRKDIRPREFIX set to an NFS-mounted directory, and MAXMODULES=3, MAXPROCESSES=3, resulting in both reads and writes. In both cases, the NFS server reaches absurdly high load values: with 'buildword -j4' on the client I currently read from 'systat -vm 1' on the server: - a load just passing the 15 mark - nearly 100% 'sys' - there seems to be nearly no disk i/o activity - very poor general network throughput: an xterm from the server to the client is nearly unresponsive (in fact, I again have to suspend the build on the client right now in order to be able to write this bug report in an xterm running on the server). - resuming the build immediately sends the load skyrocketing again. I am quite sure that this is a server issue, as for a while I had the client running 7.0 and server 6.3, and in this scenario this did not happen. BTW, the NFS mounts on the client are done via amd - this probably influences which mount type is used. Fix: Unknown. How-To-Repeat: Generate a high NFS load on a client, and watch the server load skyrocket.
I just realized that there might also be an interaction with pppoa2 (from /usr/ports/net/pppoa). The excessive load seems to happen more likely if at the same time I have heavy traffic through my ADSL connection, which goes through an Alcatel ADSL modem, pppoa2, and user-mode ppp. Syslog will continually show the following message: Feb 4 13:56:59 gandalf pppoa2[1491]: write_dest: 11800 ENOBUFS errors Feb 4 13:57:03 gandalf pppoa2[1491]: write_dest: 11900 ENOBUFS errors Feb 4 13:57:08 gandalf pppoa2[1491]: write_dest: 12000 ENOBUFS errors Feb 4 13:57:11 gandalf pppoa2[1491]: write_dest: 12100 ENOBUFS errors Feb 4 13:57:15 gandalf pppoa2[1491]: write_dest: 12200 ENOBUFS errors At the same time, netstat -m shows the following: # netstat -m 233/412/645 mbufs in use (current/cache/total) 207/241/448/25600 mbuf clusters in use (current/cache/total/max) 204/180 mbuf+clusters out of packet secondary zone in use (current/cache) 1/19/20/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 476K/661K/1137K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/5/6656 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 15 requests for I/O initiated by sendfile 0 calls to protocol drain routines
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
I am now very sure that this is an interaction with pppoa, and it is also worse than I originally thought: it will even lead to failed NFS transactions for the client. Here is what I have: Machine A ('server', a mini home server) does the following: - connecting to the Internet using usermode ppp over pppoa over an Alcatel ADSL modem - NFS serving FreeBSD sources Machine B does the following: - Mounting the FreeBSD sources from A (using amd), under directory /vol/SRC/FreeBSD/HEAD/src - Compiling the FreeBSD sources: make -j4 buildworld, such that the corresponding obj is local (via amd again) Especially in the first phase of the buildworld (clean, depend, obj), there is a lot of simultaneous NFS traffic from B to A. As soon as a download is started at A (going via pppoa, of course), the load on A rises to very high values (> 20 not uncommon). This may lead to B aborting the compile, it just did that with "directory not found". Both machines are running 7.1.0. No such problem happended when both were running 6.3.0.
o.k. pppoa does not have (much) to do with it... even when it is not running, the excessive load happens.
o.k. more info... I have this on the server (machine A in my previous post); it is showing quite a high number for "Server Ret-Failed": # nfsstat Client Info: Rpc Counts: Getattr Setattr Lookup Readlink Read Write Create Remove 2304000 0 221 165 0 0 0 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 0 0 0 0 0 22 0 0 Mknod Fsstat Fsinfo PathConf Commit 0 58 0 0 0 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 17 17 2304466 Cache Info: Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses 0 2304000 885499 221 0 0 0 0 BioRLHits Misses BioD Hits Misses DirE Hits Misses 532202 165 89 11 50 0 Server Info: Getattr Setattr Lookup Readlink Read Write Create Remove 8 19 3123468 82 34382 3156 19 11 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 0 0 11 0 0 9522 0 868616 Mknod Fsstat Fsinfo PathConf Commit 0 31396 23 0 147 Server Ret-Failed 3046139 Server Faults 0 Server Cache Stats: Inprog Idem Non-idem Misses 0 51 0 412 Server Write Gathering: WriteOps WriteRPC Opsaved 3156 3156 0 and this on the client (machine B): # nfsstat Client Info: Rpc Counts: Getattr Setattr Lookup Readlink Read Write Create Remove 48679649 0 3035823 58 26400 0 0 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 0 0 0 0 0 7255 0 767940 Mknod Fsstat Fsinfo PathConf Commit 0 27347 1 0 0 Rpc Info: TimedOut Invalid X Replies Retries Requests 0 0 0 1 52543821 Cache Info: Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses 46870061 49414909 64135869 3035823 1144711 26389 0 0 BioRLHits Misses BioD Hits Misses DirE Hits Misses 8748412 58 43987 7218 25358 0 Server Info: Getattr Setattr Lookup Readlink Read Write Create Remove 0 0 0 0 0 0 0 0 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 0 0 0 0 0 0 0 0 Mknod Fsstat Fsinfo PathConf Commit 0 0 0 0 0 Server Ret-Failed 0 Server Faults 0 Server Cache Stats: Inprog Idem Non-idem Misses 0 0 0 0 Server Write Gathering: WriteOps WriteRPC Opsaved 0 0 0
Yet more info... here is output from top. Also, the following just happened: - I am editing this mail on the NFS server. - together with the top output from below, I was pasting a total of 1000 lines (my XTerm scroll size). - This caused the load on this server to effectively double again (over the pasted values shown below). Basically, I can only continue editing this mail if I suspend the build on the client machine, in which case the server immediately becomes responsive again. So maybe it is not a pppoa interaction with NFS serving, but any load on the server + NFS server makes the load on the server go to insane values. Or may be it is just additional TCP load, because I am displaying this XTerm on the NFS client (where the X server is running), and all the pasting has to go via the X server's TCP connection. Also, I have the impression that as long as only one of the 8 nfsd's on the server is busy, things are mostly normal, but as soon as more than one starts doing work (as seen in the output below), the load on the server goes way up. And regarding "mostly normal": even if only one nfsd seems to be active, the load on the server is already close to one - assuming that an nfsd does not do much more than network and disk i/o this really should not be the case (and was not under 6.3, where the load was low even under quite heavy NFS i/o). So maybe it is a ULE problem, after all? last pid: 2527; load averages: 14.71, 10.36, 6.13 up 0+01:04:43 08:21:08 111 processes: 9 running, 102 sleeping CPU: 1.4% user, 0.0% nice, 90.5% system, 8.1% interrupt, 0.0% idle Mem: 135M Active, 745M Inact, 119M Wired, 1012K Cache, 112M Buf, 248M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 971 root 1 4 0 3128K 944K - 13:45 40.28% nfsd 972 root 1 4 0 3128K 944K - 2:19 15.09% nfsd 973 root 1 4 0 3128K 944K - 1:31 10.16% nfsd 974 root 1 4 0 3128K 944K - 1:03 6.05% nfsd 975 root 1 4 0 3128K 944K - 0:49 4.59% nfsd 977 root 1 4 0 3128K 944K - 0:41 3.56% nfsd 978 root 1 4 0 3128K 944K - 0:35 2.64% nfsd 976 root 1 4 0 3128K 944K - 0:31 1.81% nfsd 2527 root 1 96 0 3164K 992K RUN 0:00 1.54% rsh 1471 root 1 81 -15 5032K 2716K select 0:05 0.05% ppp 919 root 1 96 0 3128K 3148K select 2:16 0.00% amd 1539 root 1 96 0 6508K 4964K RUN 0:10 0.00% xterm 1140 squid 1 4 0 12000K 10152K sbwait 0:05 0.00% perl5.8.9 1130 squid 1 96 0 15660K 10820K RUN 0:05 0.00% squid 1141 squid 1 4 0 12000K 10148K sbwait 0:04 0.00% perl5.8.9 1142 squid 1 4 0 12000K 10148K sbwait 0:04 0.00% perl5.8.9 1143 squid 1 4 0 12000K 10104K sbwait 0:03 0.00% perl5.8.9
Sorry for the many posts... this is problem is really nagging me, and I need to clarify an error I made: sysctl kern.sched.name on the server reports '4BSD', so it's not ULE as I wrote in my previous posting.
Another top snapshot on the server, scenario is the following: - make -j4 buildworld running on the client, currently in "===> kerberos5/lib/libkadm5srv (all)" - on the server, in addition to serving NFS, also running a 'svn log' command, where the repository is also served by the server via http (httpd, subversion repo, and svn log all running on the server, displaying in an XTerm running on the client). If I do the 'svn log' without NFS load from the client (buildworld stopped), there is nearly instantaneous output. If I do it with a running buildworld on the client, I get the a top output similar to the one below (actually, it is very hard to capture the "worst" moments, as the server is so unresponsive - in fact the load was something like 25, and each of the 8 nfsds consumed about 10% of CPU). last pid: 2527; load averages: 14.71, 10.36, 6.13 up 0+01:04:43 08:21:08 111 processes: 9 running, 102 sleeping CPU: 1.4% user, 0.0% nice, 90.5% system, 8.1% interrupt, 0.0% idle Mem: 135M Active, 745M Inact, 119M Wired, 1012K Cache, 112M Buf, 248M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU COMMAND 971 root 1 4 0 3128K 944K - 13:45 40.28% nfsd 972 root 1 4 0 3128K 944K - 2:19 15.09% nfsd 973 root 1 4 0 3128K 944K - 1:31 10.16% nfsd 974 root 1 4 0 3128K 944K - 1:03 6.05% nfsd 975 root 1 4 0 3128K 944K - 0:49 4.59% nfsd 977 root 1 4 0 3128K 944K - 0:41 3.56% nfsd 978 root 1 4 0 3128K 944K - 0:35 2.64% nfsd 976 root 1 4 0 3128K 944K - 0:31 1.81% nfsd 2527 root 1 96 0 3164K 992K RUN 0:00 1.54% rsh 1471 root 1 81 -15 5032K 2716K select 0:05 0.05% ppp 919 root 1 96 0 3128K 3148K select 2:16 0.00% amd 1539 root 1 96 0 6508K 4964K RUN 0:10 0.00% xterm 1140 squid 1 4 0 12000K 10152K sbwait 0:05 0.00% perl5.8.9 1130 squid 1 96 0 15660K 10820K RUN 0:05 0.00% squid 1141 squid 1 4 0 12000K 10148K sbwait 0:04 0.00% perl5.8.9 1142 squid 1 4 0 12000K 10148K sbwait 0:04 0.00% perl5.8.9 1143 squid 1 4 0 12000K 10104K sbwait 0:03 0.00% perl5.8.9
We experience exactly the same problem. With FreeBSD 7.0 everything was alright, but with the upgrade to 7.1 performance became abyssmal. Normally it seems to work for some time; but after a few hours, the load climbs and the 4 nfsd-processes each use 100% CPU (on a 4way-system). The system only serves as nfs-root (ro) for about 20 systems, so there's not too much nfs-traffic going on. I wasn't able to capture it when every nfsd was using 100% CPU (because, well, some thousands of users depend on it); this is what it looks when it's half-working (hundreds of timeouts on the clients): 37907 root 1 4 0 4604K 1112K - 2 46:29 33.59% nfsd 37908 root 1 4 0 4604K 1112K - 2 15:29 17.38% nfsd 37909 root 1 4 0 4604K 1112K - 3 5:59 5.66% nfsd 37910 root 1 4 0 4604K 1112K - 2 2:38 0.00% nfsd We're suspecting the ULE scheduler as responsible for the mess. Right now we're testing a kernel with the 4BSD scheduler; we'll know more next week. But now it looks like this: 1040 root 1 4 0 4604K 1068K - 3 5:49 6.49% nfsd 1039 root 1 4 0 4604K 1068K - 3 2:20 1.86% nfsd 1042 root 1 4 0 4604K 1068K - 1 0:48 0.05% nfsd 1041 root 1 4 0 4604K 1068K - 2 0:15 0.00% nfsd Promising. Regards Peter -- "Those who give up essential liberties for temporary safety deserve neither liberty nor safety." -- Benjamin Franklin "It's also true that those who would give up privacy for security are likely to end up with neither." -- Bruce Schneier
To add to what Peter Keel is writing: My kernels *did* still use the 4BSD scheduler, so I am quite sure that Peter will not see an improvement when switching to it from the ULE scheduler. Next observation: My server, aside from serving NFS, is also serving samba clients. Yesterday, from a single Windows 98 host, a directory on the server containing approx. 100 files was deleted. During this time, the server was completely unresponsive (except that I could still ping it). It was not even possible to contact the DNS server running on it. After a few minutes (and presumably when the Windows 98 host was finished deleting the directory, I did not watch this directly), things returned to normal. However, the "xload" display from the server then refreshed again and indicated a truly gigantic load peak - it must have been greater than 50 as the background of the xload window was completely filled with y axis lines (the horizontal lines dividing load levels). Something has been messed up horribly with multiprocessing on 7.1.
On Wed, 4 Feb 2009, Martin Birgmeier wrote: > [Please note that the email address given is not valid due > to security reasons. Just reply to the gnats entry, and > I'll follow it via the web interface.] > > Between FreeBSD 6.3 and 7.1, the behavior of the NFS server > changed for the worse. Under 6.3, the load generated by the > nfsd's would never exceed their number (I am using > nfs_server_flags="-u -t -n 8"). With 7.1, when the client > generates a lot of requests, it seems that the load on the > NFS server can grow nearly without bounds, rendering the > server all but unusable. Hi Martin: Could I ask you to clarify a few things about the NFS server configuration: - What device/device driver is on the server? - Is the mount over TCP or UDP? There are a number of changes between 7.0, one of the potential changes to look at is the increased UDP parallelism support due to read-locking in the UDP stack overwhelming the capabilities of some device/device drivers. We've had a couple of reports of this specifically with the bge driver. If you're using UDP, converting to TCP would be interesting; if it's bge, then we could give you a patch to locally revert the read-locking changes to see if that helps manage contention better (i.e., more contention higher on a sec of locks moves the contention off the single device driver lock). Robert N M Watson Computer Laboratory University of Cambridge
Robert, Thanks for taking this. First off, the previous message I wrote (about similar problems with samba) might in fact also be caused by NFS - samba is running on this server, but it was configured to serve a directory which itself is served by amd(8) on this server. Now to your questions: device/driver (grep sis /var/run/dmesg.boot): sis0: <NatSemi DP8381[56] 10/100BaseTX> port 0xa400-0xa4ff mem 0xd5800000-0xd5800fff irq 9 at device 9.0 on pci0 sis0: Silicon Revision: DP83815D miibus0: <MII bus> on sis0 sis0: Ethernet address: 00:40:f4:1a:42:ba sis0: [ITHREAD] sis0: Applying short cable fix (reg=f4) sis0: Applying short cable fix (reg=f5) mount: It is what amd(8) instructs it to be, and from running tcpdump I believe it is over TCP.
Addendum: - Changing the scheduler yielded nothing, as you expected. - This was over TCP, we never used UDP. We mounted the systems with these options: rw,nfsv3,tcp,intr,nolockd,bg - The ethernet-device drivers are "em" on all machines. em0: <Intel(R) PRO/1000 Network Connection 6.9.6> Regards Peter -- "Those who give up essential liberties for temporary safety deserve neither liberty nor safety." -- Benjamin Franklin "It's also true that those who would give up privacy for security are likely to end up with neither." -- Bruce Schneier
You can probably close this, I cannot reproduce it any more on the current setup: RELENG_7_3_0_RELEASE. Regards, Martin
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped