Bug 205193

Summary: jail accessing NFSv4 mount causes syslog spam
Product: Base System Reporter: Mark Felder <feld>
Component: kernAssignee: Rick Macklem <rmacklem>
Status: Closed FIXED    
Severity: Affects Some People CC: fabian.freyer, gonzo, julien
Priority: --- Flags: rmacklem: mfc-stable12+
rmacklem: mfc-stable11+
rmacklem: mfc-stable10+
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
make nfsuserd use an AF_LOCAL socket
none
kernel changes to support nfsuserd using an AF_LOCAL socket
none
kernel changes to support nfsuserd using an AF_LOCAL socket
none
make nfsuserd use an AF_LOCAL socket
none
modify nfsuserd so that it checks what 127.0.0.1 maps to none

Description Mark Felder freebsd_committer freebsd_triage 2015-12-10 14:36:48 UTC
I have jails accessing NFSv4 mounts by nullfs mounts into the jail. When a process in the jail does a filesystem scan of the contents in the NFSv4 mount, syslogd on the jail host (prison 0) is spammed with messages and takes up a lot of CPU

Dec  8 09:00:16 skeletor nfsuserd:[71758]: req from ip=0xac10017a port=773
Dec  8 09:00:16 skeletor nfsuserd:[71757]: req from ip=0xac10017a port=773
Dec  8 09:01:16 skeletor nfsuserd:[71758]: req from ip=0xac10017a port=773
Dec  8 09:01:16 skeletor nfsuserd:[71755]: req from ip=0xac10017a port=773
Dec  8 09:02:17 skeletor nfsuserd:[71756]: req from ip=0xac10017a port=773
Dec  8 09:02:17 skeletor nfsuserd:[71757]: req from ip=0xac10017a port=773
Dec  8 09:03:17 skeletor nfsuserd:[71756]: req from ip=0xac10017a port=773
Dec  8 09:03:17 skeletor nfsuserd:[71758]: req from ip=0xac10017a port=773

That's a brief example; I've had thousands happen within seconds.
Comment 1 Rick Macklem freebsd_committer freebsd_triage 2015-12-11 02:34:38 UTC
Created attachment 164098 [details]
make nfsuserd use an AF_LOCAL socket

This problem appears to be caused by the jail translating
127.0.0.1 to the primary IP address for the machine.
Since nfsuserd will only accept RPC requests from 127.0.0.1,
it generates log messages when an RPC request comes from
any other IP address.

It appears that the best solution to this problem is to
change the nfsuserd daemon so that it uses an AF_LOCAL
socket, which guarantees that all requests come from the
local machine.

The two patches apply to nfsuserd.c and the kernel:
nfsuserd-aflocal.patch - nfsuserd.
nfsuserd-aflocal-kern.patch - kernel.

Since the kernel can still handle the old unpatched
nfsuserd, I think it can be MFC'd.
Comment 2 Rick Macklem freebsd_committer freebsd_triage 2015-12-11 02:36:25 UTC
Created attachment 164099 [details]
kernel changes to support nfsuserd using an AF_LOCAL socket

Kernel changes to add support for nfsuserd using an
AF_LOCAL socket.
Comment 3 Rick Macklem freebsd_committer freebsd_triage 2015-12-11 22:01:43 UTC
Created attachment 164134 [details]
kernel changes to support nfsuserd using an AF_LOCAL socket

This patch is a cleaned up version of 164099, but doesn't
fix any problems with 164099 (ie. if you are using 164099
there isn't much point in using this patch).
Both this patch and 164099 are relative to "sys", so you
need to cd to the root of your kernel source tree before
applying it (/usr/src/sys or ???).
Comment 4 Fabian Freyer 2017-06-04 18:45:42 UTC
I'm also experiencing this issue. What would need to be done to move this work along?
Comment 5 Rick Macklem freebsd_committer freebsd_triage 2017-06-25 23:39:03 UTC
If someone can test the patch(es), running the nfsuserd with option "1", so
it only runs a single daemon and it doesn't hang, then I could commit this.
(I could never cause the hangs to occur during previous testing, so I
 couldn't tell if only running one daemon would fix the hangs.)
Comment 6 Rick Macklem freebsd_committer freebsd_triage 2017-06-26 12:54:04 UTC
Note that, so long as your NFSv4 mounts are using AUTH_SYS (no Kerberos mounts),
you can work around this problem by not running the nfsuserd.

To do this you need to:
- Take the line that forces nfsuserd to run out of /etc/rc.d/nfsd
- Make sure nfsuserd_enable="YES" is not in your /etc/rc.conf
- On the NFS server, set sysctl vfs.nfsd.enable_stringtouid=1
  (Can be done via /etc/sysctl.conf.)

Then reboot all your machines (server and clients).

Note that killing off the nfsuserd isn't sufficient. The machine
must be rebooted and never run nfsuserd after the boot.
(If it has been running, the kernel code keeps trying to use it. This
 was done so that, if an nfsuserd died, the cached mappings would continue
 to work until the sysadmin restarted nfsuserd.)

Personally, I find doing this much easier than using the nfsuserd.
(As of RFC-7530, this is an acceptable way to use NFSv4.0 and "unofficially"
 NFSv4.1 as well.)
Comment 7 Rick Macklem freebsd_committer freebsd_triage 2017-06-27 11:36:31 UTC
Created attachment 183835 [details]
make nfsuserd use an AF_LOCAL socket

Update the nfsuserd.c patch, so that it ignores the "num_servers" option
and only runs 1. I believe the hangs in previous testing were caused by
a race in the local socket code tickled by multiple server processes using
the socket concurrently.

The code still parses the "num_servers" option, so that the patch doesn't
cause a POLA violation when "num_servers" is specified.
If committed, a man page update (not in this patch) would indicate that
"num_servers" is deprecated and no longer used.
Comment 8 commit-hook freebsd_committer freebsd_triage 2017-07-04 22:21:04 UTC
A commit references this bug:

Author: rmacklem
Date: Tue Jul  4 22:20:31 UTC 2017
New revision: 320659
URL: https://svnweb.freebsd.org/changeset/base/320659

Log:
  Add a Bugs section that indicates that the nfsuserd doesn't work
  when jails are being used on the system.
  It is hoped that the patches in PR#205193 will someday get tested/debugged
  so that they can be committed to fix this.

  This is a content change.

  PR:		205193
  MFC after:	2 weeks

Changes:
  head/usr.sbin/nfsuserd/nfsuserd.8
Comment 9 commit-hook freebsd_committer freebsd_triage 2017-07-06 00:53:45 UTC
A commit references this bug:

Author: rmacklem
Date: Thu Jul  6 00:53:13 UTC 2017
New revision: 320698
URL: https://svnweb.freebsd.org/changeset/base/320698

Log:
  Add support for AF_LOCAL socket upcalls to the nfsuserd daemon.

  This patch adds support for AF_LOCAL socket upcalls to an nfsuserd daemon
  that supports them. A future patch to the nfsuserd daemon will use AF_LOCAL
  sockets to avoid a problem when using upcalls to 127.0.0.1 if jails are
  in use.

  Suggested by:	dfr
  PR:		205193

Changes:
  head/sys/fs/nfs/nfs_commonkrpc.c
  head/sys/fs/nfs/nfs_commonport.c
  head/sys/fs/nfs/nfs_commonsubs.c
  head/sys/fs/nfs/nfs_var.h
Comment 10 commit-hook freebsd_committer freebsd_triage 2017-07-06 22:05:01 UTC
A commit references this bug:

Author: rmacklem
Date: Thu Jul  6 22:04:37 UTC 2017
New revision: 320757
URL: https://svnweb.freebsd.org/changeset/base/320757

Log:
  Modify the nfsuserd daemon so that it uses an AF_LOCAL socket for upcalls.

  This patch modifies the nfsuserd daemon so that it uses an AF_LOCAL socket
  for upcalls by default. This should fix the problem with using a UDP
  socket upcall to 127.0.0.1 when jails are used.
  The AF_LOCAL socket case only supports a single server daemon, since hangs
  were observed by the original problem reporter when multiple daemons
  were used.
  The patch adds a command line option called "-use-udpsock" which makes
  the daemon revert to its prepatched behaviour.

  Suggested by:	dfr
  PR:		205193
  Relnotes:	yes

Changes:
  head/usr.sbin/nfsuserd/nfsuserd.c
Comment 11 Rick Macklem freebsd_committer freebsd_triage 2017-07-07 12:24:24 UTC
A variation of these patches has been committed to head/current.
The change was to add a "-use-udpport" option for nfsuserd that
makes it revert to the old/prepatched behaviour.
nfsuserd also only runs a single daemon for the AF_LOCAL case,
since I believe that is what caused the hangs when tested by
the original reporter.

I listed dfr@ as "Suggested by:", since I couldn't remember if
he had reviewed the patch two years ago or not. I do remember that
he suggested switching to an AF_LOCAL socket to fix the problem.

If it goes well in head/current with no reports of hangs for a
couple of months, I will then consider MFC'ng the patches.

I change to the nfsuserd.8 man page that notes that it doesn't
work when jails are enabled in the BUGS section will be MFC'd soon.
Comment 12 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:42:51 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 13 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-21 18:01:35 UTC
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved.

Thanks
Comment 14 Rick Macklem freebsd_committer freebsd_triage 2019-01-21 23:01:49 UTC
Reopening the PR, since this has never been resolved.
The AF_LOCAL patch works for some, but not others.
(I've forgotten exactly how it breaks things, but it does, so it
 can't be committed.)

I do have a patch that adds a command line option to replace
127.0.0.1 with another IP address, that works around the problem,
but it is kind of an ugly fix.
I may commit this, since it is at least a safe fix.

rick
Comment 15 Rick Macklem freebsd_committer freebsd_triage 2019-02-18 02:12:30 UTC
Created attachment 202117 [details]
modify nfsuserd so that it checks what 127.0.0.1 maps to

This patch modifies nfsuserd so that it does a bind()/getsockname() for
127.0.0.1 to see what it maps to when jails are enabled and a match
with 127.0.0.1 fails.

This should fix the problem and does not require kernel changes.
Comment 16 commit-hook freebsd_committer freebsd_triage 2019-04-06 21:54:04 UTC
A commit references this bug:

Author: rmacklem
Date: Sat Apr  6 21:53:47 UTC 2019
New revision: 345992
URL: https://svnweb.freebsd.org/changeset/base/345992

Log:
  Add INET6 support for the upcalls to the nfsuserd daemon.

  The kernel code uses UDP to do upcalls to the nfsuserd(8) daemon to get
  updates to the username<->uid and groupname<->gid mappings.
  A change to AF_LOCAL last year had to be reverted, since it could result
  in vnode locking issues on the AF_LOCAL socket.
  This patch adds INET6 support and the required #ifdef INET and INET6
  to the code.

  Requested by:	bz
  PR:		205193
  Reviewed by:	bz, rgrimes
  MFC after:	2 weeks
  Differential Revision:	http://reviews.freebsd.org/D19218

Changes:
  head/sys/fs/nfs/nfs.h
  head/sys/fs/nfs/nfs_commonport.c
  head/sys/fs/nfs/nfs_commonsubs.c
  head/sys/fs/nfs/nfs_var.h
Comment 17 commit-hook freebsd_committer freebsd_triage 2019-04-06 22:06:18 UTC
A commit references this bug:

Author: rmacklem
Date: Sat Apr  6 22:05:52 UTC 2019
New revision: 345994
URL: https://svnweb.freebsd.org/changeset/base/345994

Log:
  Fix nfsuserd so that it handles the mapped localhost address when jails
  are enabled.

  The nfsuserd(8) daemon does not function correctly when jails are enabled,
  since localhost gets mapped to another IP address and, as such, the upcall
  RPC fails.
  This patch fixes the problem by doing a getsockname(2) of a socket mapped
  to localhost to find out what the correct address is for the comparison
  test with the upcall's from IP address.
  This patch also adds INET6 support and the required #ifdef's for INET and
  INET6. It now uses INET6 by default for the upcalls, if the kernel has
  INET6 support and the daemon is also built with INET6 support.

  Tested by:	freebsd@danielengel.com (earlier version)
  PR:		205193
  Reviewed by:	bz, rgrimes
  MFC after:	2 weeks
  Differential Revision:	https://reviews.freebsd.org/D19218

Changes:
  head/usr.sbin/nfsuserd/Makefile
  head/usr.sbin/nfsuserd/nfsuserd.c
Comment 18 commit-hook freebsd_committer freebsd_triage 2019-04-06 22:14:27 UTC
A commit references this bug:

Author: rmacklem
Date: Sat Apr  6 22:14:03 UTC 2019
New revision: 345995
URL: https://svnweb.freebsd.org/changeset/base/345995

Log:
  Delete the BUGS entry related to failing when jails are enabled.

  r345994 has finally fixed the bug that caused the nfsuserd(8) daemon to
  fail when jails were enabled, so delete the BUGS entry from the man page.

  PR:		205193
  MFC after:	2 weeks

Changes:
  head/usr.sbin/nfsuserd/nfsuserd.8
Comment 19 Rick Macklem freebsd_committer freebsd_triage 2019-04-06 22:21:18 UTC
The patches to fix this (r345992, r345994) have finally been committed
to head. This PR can be closed once these patches are MFC'd.
(They are variants of the patches already attached to this PR.)
Comment 20 Rick Macklem freebsd_committer freebsd_triage 2019-04-21 02:14:22 UTC
Patches that fix this have been MFC'd.