Bug 200585

Summary: [nlm] Fatal trap 9 when printing out KASSERT trying to run umount -f on an NFS share while it's trying to print out "lockd not responding" in nlm(4)
Product: Base System Reporter: Enji Cooper <ngie>
Component: kernAssignee: freebsd-fs (Nobody) <fs>
Status: Closed FIXED    
Severity: Affects Some People CC: benno, delphij, gonzo, kib, pho, rmacklem
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
This LOR might be interesting...
none
Hold the mnt credentials before vnode unlock. none

Description Enji Cooper freebsd_committer freebsd_triage 2015-06-02 04:32:09 UTC
Repro:

- Check out head@r283754
- Build/install a kernel with the following options:

include GENERIC
options DEBUG_MEMGUARD

- Add the following config items to /etc/rc.conf:

rpcbind_enable="YES"
nfs_client_enable="YES"

- Mount an NFS share to /mnt.
- Do `svn co http://svn.freebsd.org/base/head /mnt/src`. If lockd's broken then it
  should start printing out "lockd not responding".
- Let it go its course for a while.
- Run `umount -f /mnt`

Expected results:

The umount -f should succeed.

Actual results:

The kernel panics when running strlen(9) with a Fatal Trap 9 (it looks like some of the
kernel structures were being torn down while it was trying to print out the message):

# kgdb /boot/kernel/kernel vmcore.7
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 5; apic id = 05
instruction pointer	= 0x20:0xffffffff80a2e51f
stack pointer	        = 0x28:0xfffffe2f8fa0dc30
frame pointer	        = 0x28:0xfffffe2f8fa0dc40
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 1730 (svn)

#0  doadump (textdump=0) at pcpu.h:221
221	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) bt
#0  doadump (textdump=0) at pcpu.h:221
#1  0xffffffff8036503e in db_dump (dummy=<value optimized out>, dummy2=false, dummy3=0, dummy4=0x0) at /usr/src/sys/ddb/db_command.c:533
#2  0xffffffff80364bb1 in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:440
#3  0xffffffff80364844 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493
#4  0xffffffff803673db in db_trap (type=<value optimized out>, code=0) at /usr/src/sys/ddb/db_main.c:251
#5  0xffffffff8099ef54 in kdb_trap (type=9, code=0, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:654
#6  0xffffffff80e3b079 in trap_fatal (frame=0xfffffe2f8fa0db80, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:854
#7  0xffffffff80e3ad20 in trap (frame=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:201
#8  0xffffffff80e1bd72 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:235
#9  0xffffffff80a2e51f in strlen (str=0xdeadc0dedeadc0de <Address 0xdeadc0dedeadc0de out of bounds>) at /usr/src/sys/libkern/strlen.c:98
#10 0xffffffff809a8dea in kvprintf (fmt=0xffffffff810969ba " @ %s:%d", func=0xffffffff809a9e00 <snprintf_func>, arg=0xfffffe2f8fa0dd70, radix=Cannot access memory at address 0xa
)
    at /usr/src/sys/kern/subr_prf.c:796
#11 0xffffffff809a9de1 in vsnprintf (str=<value optimized out>, size=<value optimized out>, 
    format=0xdeadc0dedeadc0de <Address 0xdeadc0dedeadc0de out of bounds>, ap=0xfefefefefefefeff) at /usr/src/sys/kern/subr_prf.c:529
#12 0xffffffff80961efa in kassert_panic (fmt=0xffffffff8109699f "mtx_lock() of spin mutex %s @ %s:%d") at /usr/src/sys/kern/kern_shutdown.c:623
#13 0xffffffff80945ec6 in __mtx_lock_flags (c=0xfffff800220fb818, opts=0, file=0xffffffff810cd0cc "/usr/src/sys/nlm/nlm_advlock.c", line=136)
    at /usr/src/sys/kern/kern_mutex.c:216
#14 0xffffffff80b6b640 in nlm_feedback (type=<value optimized out>, proc=<value optimized out>, arg=0xfffffe2f8fa0e3c8)
    at /usr/src/sys/nlm/nlm_advlock.c:136
#15 0xffffffff80c36491 in clnt_dg_call (cl=<value optimized out>, ext=0xfffffe2f8fa0e3d8, proc=2, args=0xfffff8012cf4ab00, 
    resultsp=0xfffffe2f8fa0e038, utimeout={tv_sec = 60, tv_usec = 0}) at /usr/src/sys/rpc/clnt_dg.c:666
#16 0xffffffff80c3b6bd in clnt_call_private (cl=0xfffff801e13cb140, ext=0xfffffe2f8fa0e3d8, proc=2, xargs=<value optimized out>, 
    argsp=<value optimized out>, xresults=0xffffffff80b73a10 <xdr_nlm4_res>, resultsp=0xfffffe2f8fa0e0b8, utimeout={tv_sec = 60, tv_usec = 0})
    at /usr/src/sys/rpc/rpc_generic.c:761
#17 0xffffffff80b6ce80 in nlm4_lock_4 (argp=0xfefefefefefefeff, clnt_res=<value optimized out>, clnt=0xdeadc0dedeadc0de, ext=0x0)
    at /usr/src/sys/nlm/nlm_prot_clnt.c:212
#18 0xffffffff80b6b979 in nlm_setlock (host=0xfffff80157ab2800, ext=0xfffffe2f8fa0e3d8, vers=4, timo=0xfffffe2f8fa0e408, 
    retries=<value optimized out>, vp=0xfffff80157a2d588, fl=<value optimized out>, flags=64, svid=1730, fhlen=32, fh=0xfffffe2f8fa0e630, 
    size=<value optimized out>, reclaim=0) at /usr/src/sys/nlm/nlm_advlock.c:597
#19 0xffffffff80b6b05e in nlm_advlock_internal (vp=0xfffff80157a2d588, id=<value optimized out>, op=<value optimized out>, fl=0xfffffe2f8fa0e958, 
    flags=<value optimized out>, reclaim=0, unlock_vp=<value optimized out>) at /usr/src/sys/nlm/nlm_advlock.c:328
#20 0xffffffff80b6a58d in nlm_advlock (ap=<value optimized out>) at /usr/src/sys/nlm/nlm_advlock.c:388
#21 0xffffffff8087b4ed in nfs_advlock (ap=<value optimized out>) at /usr/src/sys/fs/nfsclient/nfs_clvnops.c:3095
#22 0xffffffff80f35e37 in VOP_ADVLOCK_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:2529
#23 0xffffffff80913ab3 in kern_fcntl (td=0xfffff80027b53960, fd=<value optimized out>, cmd=<value optimized out>, arg=<value optimized out>)
    at vnode_if.h:1041
#24 0xffffffff80912f57 in kern_fcntl_freebsd (td=0xfffff80027b53960, fd=3, cmd=<value optimized out>, arg=140737488344840)
    at /usr/src/sys/kern/kern_descrip.c:451
#25 0xffffffff80e3b8df in amd64_syscall (td=0xfffff80027b53960, traced=0) at subr_syscall.c:133
#26 0xffffffff80e1c05b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:395
#27 0x00000008037b529a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb) p version
$1 = 0xffffffff8149b370 "FreeBSD 11.0-CURRENT #0 9a1d54a(master): Sat May 30 04:02:59 UTC 2015\n    root@bakeoff-dut-101:/usr/obj/usr/src/sys/BAKEOFF\n"
(kgdb) frame 14
#14 0xffffffff80b6b640 in nlm_feedback (type=<value optimized out>, proc=<value optimized out>, arg=0xfffffe2f8fa0e3c8)
    at /usr/src/sys/nlm/nlm_advlock.c:136
136		mtx_lock(&nmp->nm_mtx);
(kgdb) p *((struct nlm_feedback_arg *)arg)->nf_nmp
$11 = {nm_com = {nmcom_mtx = {lock_object = {lo_name = 0xdeadc0dedeadc0de <Address 0xdeadc0dedeadc0de out of bounds>, lo_flags = 3735929054, 
        lo_data = 3735929054, lo_witness = 0xdeadc0dedeadc0de}, mtx_lock = 16045693110842147038}, nmcom_flag = -559038242, 
    nmcom_state = -559038242, nmcom_mountp = 0xdeadc0dedeadc0de, nmcom_timeo = -559038242, nmcom_retry = -559038242, 
    nmcom_hostname = 0xfffff800220fb838 "��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������"..., nmcom_getinfo = 0xdeadc0dedeadc0de, 
    nmcom_vinvalbuf = 0xdeadc0dedeadc0de}, nm_numgrps = -559038242, 
  nm_fh = 0xfffff800220fb8a4 "��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������"..., nm_fhsize = -559038242, nm_sotype = -559038242, 
  nm_soproto = -559038242, nm_soflags = -559038242, nm_nam = 0xdeadc0dedeadc0de, nm_deadthresh = -559038242, nm_rsize = -559038242, 
  nm_wsize = -559038242, nm_readdirsize = -559038242, nm_readahead = -559038242, nm_wcommitsize = -559038242, nm_acdirmin = -559038242, 
  nm_acdirmax = -559038242, nm_acregmin = -559038242, nm_acregmax = -559038242, 
  nm_verf = 0xfffff800220fb968 "��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������"..., nm_bufq = {tqh_first = 0xdeadc0dedeadc0de, 
    tqh_last = 0xdeadc0dedeadc0de}, nm_bufqlen = -16162, nm_bufqwant = -8531, nm_bufqiods = -559038242, nm_maxfilesize = 16045693110842147038, 
  nm_rpcops = 0xdeadc0dedeadc0de, nm_tprintf_initial_delay = -559038242, nm_tprintf_delay = -559038242, nm_secflavor = -559038242, 
  nm_client = 0xdeadc0dedeadc0de, nm_timers = 0xfffff800220fb9b0, 
  nm_principal = 0xfffff800220fb9f0 "��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������"..., nm_mech_oid = 0xdeadc0dedeadc0de, 
  nm_nametimeo = -559038242, nm_negnametimeo = -559038242, nm_clientid = 16045693110842147038, nm_fsid = {val = 0xfffff800220fba60}, 
  nm_lease_time = 3735929054, nm_last_renewal = -2401050962867404578}
(kgdb) p ((struct nlm_feedback_arg *)arg)->nf_nmp->nm_com
$12 = {nmcom_mtx = {lock_object = {lo_name = 0xdeadc0dedeadc0de <Address 0xdeadc0dedeadc0de out of bounds>, lo_flags = 3735929054, 
      lo_data = 3735929054, lo_witness = 0xdeadc0dedeadc0de}, mtx_lock = 16045693110842147038}, nmcom_flag = -559038242, nmcom_state = -559038242, 
  nmcom_mountp = 0xdeadc0dedeadc0de, nmcom_timeo = -559038242, nmcom_retry = -559038242, 
  nmcom_hostname = 0xfffff800220fb838 "��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������"..., nmcom_getinfo = 0xdeadc0dedeadc0de, 
  nmcom_vinvalbuf = 0xdeadc0dedeadc0de}
(kgdb) p ((struct nlm_feedback_arg *)arg)->nf_nmp->nm_com.nmcom_mtx
$13 = {lock_object = {lo_name = 0xdeadc0dedeadc0de <Address 0xdeadc0dedeadc0de out of bounds>, lo_flags = 3735929054, lo_data = 3735929054, 
    lo_witness = 0xdeadc0dedeadc0de}, mtx_lock = 16045693110842147038}
(kgdb) p ((struct nlm_feedback_arg *)arg)->nf_nmp->nm_com.nmcom_mtx.mtx_lock
$14 = 16045693110842147038
Comment 1 Enji Cooper freebsd_committer freebsd_triage 2015-06-02 04:56:23 UTC
Created attachment 157364 [details]
This LOR might be interesting...
Comment 2 Xin LI freebsd_committer freebsd_triage 2015-06-02 05:56:57 UTC
I think the LOR may be unrelated.

This happens because umount -f (nfs_unmount in sys/fs/nfsclient/nfs_clvfsops.c:1522) free's the mountpoint structure (1579) but nlm have kept a reference to it.

In order to fix this I think we would need a way similar to newnfs_nmcancelreqs() that cancells all lockd RPCs that are associated with the mountpoint (i.e. abort all lock requests that are related to the unmounting mountpoint) which is probably non-trivial with the current structure.
Comment 3 Peter Holm freebsd_committer freebsd_triage 2015-06-03 05:27:10 UTC
Looks like td_ucred problem:

https://people.freebsd.org/~pho/stress/log/svn.txt
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2015-06-03 07:08:12 UTC
(In reply to Peter Holm from comment #3)
Indeed, the unmount which drops the last reference on the nfs mount credentials is the culprit.  This sole issue can be easily handled by taking additional reference on the mnt_cred, before the vnode is unlocked.

If there are further problems from the mnt disappearing, we could try to busy the filesystem after the vnode is unlocked.  But this would have bad consequences for the unmount -f.  Lets try the minimal patch first.
Comment 5 Konstantin Belousov freebsd_committer freebsd_triage 2015-06-03 07:10:18 UTC
Created attachment 157390 [details]
Hold the mnt credentials before vnode unlock.
Comment 6 Peter Holm freebsd_committer freebsd_triage 2015-06-03 08:33:13 UTC
I see spinning threads with this patch:

https://people.freebsd.org/~pho/stress/log/kostik813.txt
Comment 7 Rick Macklem freebsd_committer freebsd_triage 2015-06-03 12:20:41 UTC
It's been a while since I looked at it, but I
remember that there is a known issue with the
client side NLM for "umount -f". Basically it
unlocks and later relocks the vnode without
holding a refcnt on the mount point. At the time,
I didn't know enough about the mount point refcnt
stuff to fix it.

I'll try and take a look later to-day.

I don't know if this is the cause of the crash
(I can't remember what credentials are used), but
it does need to be fixed for "umount -f" to be safe
when running rpc.lockd. (I keep hoping people will
stop using rpc.lockd and rpc.statd, but it's just
a dream;-)
Comment 8 Rick Macklem freebsd_committer freebsd_triage 2015-06-05 12:17:37 UTC
Although adding a ref count to the cred struct may
help avoid some crashes, the code that prints out
the message uses both the mount structure and the
NFS specific one hanging off of mnt_data.

Unfortunately adding a ref count to the mount structure
via vfs_ref(mp) only delays freeing of *mp and not *mnt_data.
Also, vfs_destroy_mount() waits for the ref count on the mount
structure to go to 0 and, as such, "umount -f" won't complete
until the lock RPC completes.

If the ref count on the mount structure only free'd the structure
and substructure hanging off of mnt_data, then adding a ref count
to the mount structure before the VOP_UNLOCK() could avoid the
crashes.
Maybe this VFS change should be considered?

Also, doing "umount -f" on a file system using locking could
leave "dangling locks" on files. In theory, rpc.statd should
eventually notice that the host isn't responding and get rid
of these file locks, but I wouldn't bet on it.

I suppose the dangers of using "umount -f" while running
rpc.lockd should be documented. I'll come up with a man page
patch for that.
Comment 9 commit-hook freebsd_committer freebsd_triage 2015-06-17 23:25:08 UTC
A commit references this bug:

Author: rmacklem
Date: Wed Jun 17 23:24:47 UTC 2015
New revision: 284531
URL: https://svnweb.freebsd.org/changeset/base/284531

Log:
  Document that a forced dismount of an NFSv3 mount when the
  NLM (rpc.lockd) is running can crash the system.
  Unfortunately this is not easy to fix, but I have left
  PR#200585 open.

  PR:		200585
  MFC after:	3 days

Changes:
  head/sbin/umount/umount.8
Comment 10 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2019-01-21 17:54:41 UTC
There is a commit referencing this PR, but it's still not closed and has been inactive for some time. Closing the PR as fixed but feel free to re-open it if the issue hasn't been completely resolved.

Thanks