I have jails accessing NFSv4 mounts by nullfs mounts into the jail. When a process in the jail does a filesystem scan of the contents in the NFSv4 mount, syslogd on the jail host (prison 0) is spammed with messages and takes up a lot of CPU Dec 8 09:00:16 skeletor nfsuserd:[71758]: req from ip=0xac10017a port=773 Dec 8 09:00:16 skeletor nfsuserd:[71757]: req from ip=0xac10017a port=773 Dec 8 09:01:16 skeletor nfsuserd:[71758]: req from ip=0xac10017a port=773 Dec 8 09:01:16 skeletor nfsuserd:[71755]: req from ip=0xac10017a port=773 Dec 8 09:02:17 skeletor nfsuserd:[71756]: req from ip=0xac10017a port=773 Dec 8 09:02:17 skeletor nfsuserd:[71757]: req from ip=0xac10017a port=773 Dec 8 09:03:17 skeletor nfsuserd:[71756]: req from ip=0xac10017a port=773 Dec 8 09:03:17 skeletor nfsuserd:[71758]: req from ip=0xac10017a port=773 That's a brief example; I've had thousands happen within seconds.
Created attachment 164098 [details] make nfsuserd use an AF_LOCAL socket This problem appears to be caused by the jail translating 127.0.0.1 to the primary IP address for the machine. Since nfsuserd will only accept RPC requests from 127.0.0.1, it generates log messages when an RPC request comes from any other IP address. It appears that the best solution to this problem is to change the nfsuserd daemon so that it uses an AF_LOCAL socket, which guarantees that all requests come from the local machine. The two patches apply to nfsuserd.c and the kernel: nfsuserd-aflocal.patch - nfsuserd. nfsuserd-aflocal-kern.patch - kernel. Since the kernel can still handle the old unpatched nfsuserd, I think it can be MFC'd.
Created attachment 164099 [details] kernel changes to support nfsuserd using an AF_LOCAL socket Kernel changes to add support for nfsuserd using an AF_LOCAL socket.
Created attachment 164134 [details] kernel changes to support nfsuserd using an AF_LOCAL socket This patch is a cleaned up version of 164099, but doesn't fix any problems with 164099 (ie. if you are using 164099 there isn't much point in using this patch). Both this patch and 164099 are relative to "sys", so you need to cd to the root of your kernel source tree before applying it (/usr/src/sys or ???).
I'm also experiencing this issue. What would need to be done to move this work along?
If someone can test the patch(es), running the nfsuserd with option "1", so it only runs a single daemon and it doesn't hang, then I could commit this. (I could never cause the hangs to occur during previous testing, so I couldn't tell if only running one daemon would fix the hangs.)
Note that, so long as your NFSv4 mounts are using AUTH_SYS (no Kerberos mounts), you can work around this problem by not running the nfsuserd. To do this you need to: - Take the line that forces nfsuserd to run out of /etc/rc.d/nfsd - Make sure nfsuserd_enable="YES" is not in your /etc/rc.conf - On the NFS server, set sysctl vfs.nfsd.enable_stringtouid=1 (Can be done via /etc/sysctl.conf.) Then reboot all your machines (server and clients). Note that killing off the nfsuserd isn't sufficient. The machine must be rebooted and never run nfsuserd after the boot. (If it has been running, the kernel code keeps trying to use it. This was done so that, if an nfsuserd died, the cached mappings would continue to work until the sysadmin restarted nfsuserd.) Personally, I find doing this much easier than using the nfsuserd. (As of RFC-7530, this is an acceptable way to use NFSv4.0 and "unofficially" NFSv4.1 as well.)
Created attachment 183835 [details] make nfsuserd use an AF_LOCAL socket Update the nfsuserd.c patch, so that it ignores the "num_servers" option and only runs 1. I believe the hangs in previous testing were caused by a race in the local socket code tickled by multiple server processes using the socket concurrently. The code still parses the "num_servers" option, so that the patch doesn't cause a POLA violation when "num_servers" is specified. If committed, a man page update (not in this patch) would indicate that "num_servers" is deprecated and no longer used.
A commit references this bug: Author: rmacklem Date: Tue Jul 4 22:20:31 UTC 2017 New revision: 320659 URL: https://svnweb.freebsd.org/changeset/base/320659 Log: Add a Bugs section that indicates that the nfsuserd doesn't work when jails are being used on the system. It is hoped that the patches in PR#205193 will someday get tested/debugged so that they can be committed to fix this. This is a content change. PR: 205193 MFC after: 2 weeks Changes: head/usr.sbin/nfsuserd/nfsuserd.8
A commit references this bug: Author: rmacklem Date: Thu Jul 6 00:53:13 UTC 2017 New revision: 320698 URL: https://svnweb.freebsd.org/changeset/base/320698 Log: Add support for AF_LOCAL socket upcalls to the nfsuserd daemon. This patch adds support for AF_LOCAL socket upcalls to an nfsuserd daemon that supports them. A future patch to the nfsuserd daemon will use AF_LOCAL sockets to avoid a problem when using upcalls to 127.0.0.1 if jails are in use. Suggested by: dfr PR: 205193 Changes: head/sys/fs/nfs/nfs_commonkrpc.c head/sys/fs/nfs/nfs_commonport.c head/sys/fs/nfs/nfs_commonsubs.c head/sys/fs/nfs/nfs_var.h
A commit references this bug: Author: rmacklem Date: Thu Jul 6 22:04:37 UTC 2017 New revision: 320757 URL: https://svnweb.freebsd.org/changeset/base/320757 Log: Modify the nfsuserd daemon so that it uses an AF_LOCAL socket for upcalls. This patch modifies the nfsuserd daemon so that it uses an AF_LOCAL socket for upcalls by default. This should fix the problem with using a UDP socket upcall to 127.0.0.1 when jails are used. The AF_LOCAL socket case only supports a single server daemon, since hangs were observed by the original problem reporter when multiple daemons were used. The patch adds a command line option called "-use-udpsock" which makes the daemon revert to its prepatched behaviour. Suggested by: dfr PR: 205193 Relnotes: yes Changes: head/usr.sbin/nfsuserd/nfsuserd.c
A variation of these patches has been committed to head/current. The change was to add a "-use-udpport" option for nfsuserd that makes it revert to the old/prepatched behaviour. nfsuserd also only runs a single daemon for the AF_LOCAL case, since I believe that is what caused the hangs when tested by the original reporter. I listed dfr@ as "Suggested by:", since I couldn't remember if he had reviewed the patch two years ago or not. I do remember that he suggested switching to an AF_LOCAL socket to fix the problem. If it goes well in head/current with no reports of hangs for a couple of months, I will then consider MFC'ng the patches. I change to the nfsuserd.8 man page that notes that it doesn't work when jails are enabled in the BUGS section will be MFC'd soon.
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved. Thanks
Reopening the PR, since this has never been resolved. The AF_LOCAL patch works for some, but not others. (I've forgotten exactly how it breaks things, but it does, so it can't be committed.) I do have a patch that adds a command line option to replace 127.0.0.1 with another IP address, that works around the problem, but it is kind of an ugly fix. I may commit this, since it is at least a safe fix. rick
Created attachment 202117 [details] modify nfsuserd so that it checks what 127.0.0.1 maps to This patch modifies nfsuserd so that it does a bind()/getsockname() for 127.0.0.1 to see what it maps to when jails are enabled and a match with 127.0.0.1 fails. This should fix the problem and does not require kernel changes.
A commit references this bug: Author: rmacklem Date: Sat Apr 6 21:53:47 UTC 2019 New revision: 345992 URL: https://svnweb.freebsd.org/changeset/base/345992 Log: Add INET6 support for the upcalls to the nfsuserd daemon. The kernel code uses UDP to do upcalls to the nfsuserd(8) daemon to get updates to the username<->uid and groupname<->gid mappings. A change to AF_LOCAL last year had to be reverted, since it could result in vnode locking issues on the AF_LOCAL socket. This patch adds INET6 support and the required #ifdef INET and INET6 to the code. Requested by: bz PR: 205193 Reviewed by: bz, rgrimes MFC after: 2 weeks Differential Revision: http://reviews.freebsd.org/D19218 Changes: head/sys/fs/nfs/nfs.h head/sys/fs/nfs/nfs_commonport.c head/sys/fs/nfs/nfs_commonsubs.c head/sys/fs/nfs/nfs_var.h
A commit references this bug: Author: rmacklem Date: Sat Apr 6 22:05:52 UTC 2019 New revision: 345994 URL: https://svnweb.freebsd.org/changeset/base/345994 Log: Fix nfsuserd so that it handles the mapped localhost address when jails are enabled. The nfsuserd(8) daemon does not function correctly when jails are enabled, since localhost gets mapped to another IP address and, as such, the upcall RPC fails. This patch fixes the problem by doing a getsockname(2) of a socket mapped to localhost to find out what the correct address is for the comparison test with the upcall's from IP address. This patch also adds INET6 support and the required #ifdef's for INET and INET6. It now uses INET6 by default for the upcalls, if the kernel has INET6 support and the daemon is also built with INET6 support. Tested by: freebsd@danielengel.com (earlier version) PR: 205193 Reviewed by: bz, rgrimes MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19218 Changes: head/usr.sbin/nfsuserd/Makefile head/usr.sbin/nfsuserd/nfsuserd.c
A commit references this bug: Author: rmacklem Date: Sat Apr 6 22:14:03 UTC 2019 New revision: 345995 URL: https://svnweb.freebsd.org/changeset/base/345995 Log: Delete the BUGS entry related to failing when jails are enabled. r345994 has finally fixed the bug that caused the nfsuserd(8) daemon to fail when jails were enabled, so delete the BUGS entry from the man page. PR: 205193 MFC after: 2 weeks Changes: head/usr.sbin/nfsuserd/nfsuserd.8
The patches to fix this (r345992, r345994) have finally been committed to head. This PR can be closed once these patches are MFC'd. (They are variants of the patches already attached to this PR.)
Patches that fix this have been MFC'd.