Bug 204340

Summary: [panic] nfsd, em, msix, fatal trap 9
Product: Base System Reporter: g_amanakis
Component: kernAssignee: Rick Macklem <rmacklem>
Status: Closed FIXED    
Severity: Affects Only Me CC: avg, bugs, g_amanakis, mav, net, rmacklem, sbruno
Priority: --- Keywords: IntelNetworking, crash, patch
Version: 10.2-RELEASEFlags: rmacklem: mfc-stable10+
rmacklem: mfc-stable9-
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
patch that might fix this problem
none
Patch to fix sg_threadcount++ so the mutex is held when done
none
patch that locks mutex when request queue is updated
none
patch that makes NFSv4.1 server release the backchannel xprt
none
add svcpool_close so that svcpool_destroy doesn't get called when nfsd threads are killed
none
add svcpool_close (cleaned up version) none

Description g_amanakis 2015-11-06 18:11:59 UTC
On a X9SCM with em NICs restarting nfsd with mounted shares remotely results in kernel trap 9 with panic and reboot. The system is running a GENERIC kernel on an up-to-date FreeBSD 10.2-RELEASE. 

On /var/log/messages:

1406 Nov  5 16:43:00 x3200 kernel: 
1407 Nov  5 16:43:00 x3200 kernel: 
1408 Nov  5 16:43:00 x3200 kernel: Fatal trap 9: general protection fault while in kernel mode
1409 Nov  5 16:43:00 x3200 kernel: cpuid = 0; apic id = 00
1410 Nov  5 16:43:00 x3200 kernel: instruction pointer       = 0x20:0xffffffff8092e980
1411 Nov  5 16:43:00 x3200 kernel: stack pointer             = 0x28:0xfffffe01ee7ee430
1412 Nov  5 16:43:00 x3200 kernel: frame pointer             = 0x28:0xfffffe01ee7ee4b0
1413 Nov  5 16:43:00 x3200 kernel: code segment              = base rx0, limit 0xfffff, type 0x1b
1414 Nov  5 16:43:00 x3200 kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
1415 Nov  5 16:43:00 x3200 kernel: processor eflags  =
1416 Nov  5 16:43:00 x3200 kernel: interrupt enabled, resume, IOPL = 0
1417 Nov  5 16:43:00 x3200 kernel: current process           = 12 (irq266: em1:rx0)
1418 Nov  5 16:45:06 x3200 syslogd: kernel boot file is /boot/kernel/kernel
1419 Nov  5 16:45:06 x3200 kernel: trap number               = 9
1420 Nov  5 16:45:06 x3200 kernel: panic: general protection fault
1421 Nov  5 16:45:06 x3200 kernel: cpuid = 0
1422 Nov  5 16:45:06 x3200 kernel: KDB: stack backtrace:
1423 Nov  5 16:45:06 x3200 kernel: #0 0xffffffff80984e30 at kdb_backtrace+0x60
1424 Nov  5 16:45:06 x3200 kernel: #1 0xffffffff809489e6 at vpanic+0x126
1425 Nov  5 16:45:06 x3200 kernel: #2 0xffffffff809488b3 at panic+0x43
1426 Nov  5 16:45:06 x3200 kernel: #3 0xffffffff80d4aadb at trap_fatal+0x36b
1427 Nov  5 16:45:06 x3200 kernel: #4 0xffffffff80d4a75c at trap+0x75c
1428 Nov  5 16:45:06 x3200 kernel: #5 0xffffffff80d307f2 at calltrap+0x8
1429 Nov  5 16:45:06 x3200 kernel: #6 0xffffffff80b4a725 at xprt_active+0x45
1430 Nov  5 16:45:06 x3200 kernel: #7 0xffffffff80b4e0a5 at svc_vc_soupcall+0x35
1431 Nov  5 16:45:06 x3200 kernel: #8 0xffffffff809bcc52 at sowakeup+0x82
1432 Nov  5 16:45:06 x3200 kernel: #9 0xffffffff80aea942 at tcp_do_segment+0x2b22
1433 Nov  5 16:45:06 x3200 kernel: #10 0xffffffff80ae7720 at tcp_input+0x12b0
1434 Nov  5 16:45:06 x3200 kernel: #11 0xffffffff80a77f57 at ip_input+0x97
1435 Nov  5 16:45:06 x3200 kernel: #12 0xffffffff80a177d2 at netisr_dispatch_src+0x62
1436 Nov  5 16:45:06 x3200 kernel: #13 0xffffffff80a0eb76 at ether_demux+0x126
1437 Nov  5 16:45:06 x3200 kernel: #14 0xffffffff80a0f81e at ether_nh_input+0x35e
1438 Nov  5 16:45:06 x3200 kernel: #15 0xffffffff80a177d2 at netisr_dispatch_src+0x62
1439 Nov  5 16:45:06 x3200 kernel: #16 0xffffffff804e121b at em_rxeof+0x2eb
1440 Nov  5 16:45:06 x3200 kernel: #17 0xffffffff804e1663 at em_msix_rx+0x33
1441 Nov  5 16:45:06 x3200 kernel: Uptime: 2m10s
1442 Nov  5 16:45:06 x3200 kernel: Dumping 460 out of 8134 MB:..4%..11%..21%..32%..42%..53%..63%..73%..84%..91%Copyright (c) 1992-201     5 The FreeBSD Project.


Running kgdb:
(kgdb) list *0xffffffff8092e980
0xffffffff8092e980 is in __mtx_lock_sleep (/usr/src/sys/kern/kern_mutex.c:438).
433                      * owner stops running or the state of the lock changes.
434                      */
435                     v = m->mtx_lock;
436                     if (v != MTX_UNOWNED) {
437                             owner = (struct thread *)(v & ~MTX_FLAGMASK);
438                             if (TD_IS_RUNNING(owner)) {
439                                     if (LOCK_LOG_TEST(&m->lock_object, 0))
440                                             CTR3(KTR_LOCK,
441                                                 "%s: spinning on %p held by %p",
442                                                 __func__, m, owner);

(kgdb) backtrace
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80948642 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451
#2  0xffffffff80948a25 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758
#3  0xffffffff809488b3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:687
#4  0xffffffff80d4aadb in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851
#5  0xffffffff80d4a75c in trap (frame=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:203
#6  0xffffffff80d307f2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#7  0xffffffff8092e980 in __mtx_lock_sleep (c=0xfffffe0000e09198, tid=18446735277688524800, opts=0, file=0x0, line=0)
    at /usr/src/sys/kern/kern_mutex.c:437
#8  0xffffffff80b4a725 in xprt_active (xprt=0xfffff801247cae00) at /usr/src/sys/rpc/svc.c:394
#9  0xffffffff80b4e0a5 in svc_vc_soupcall (so=0xfffffe0000e09180, arg=<value optimized out>, waitflag=0)
    at /usr/src/sys/rpc/svc_vc.c:953
#10 0xffffffff809bcc52 in sowakeup (so=0xfffff8012473a570, sb=0xfffff8012473a600) at /usr/src/sys/kern/uipc_sockbuf.c:191
#11 0xffffffff80aea942 in tcp_do_segment (m=<value optimized out>, th=<value optimized out>, so=0xfffff8012473a570, 
    tp=0xfffff801b93ed810, drop_hdrlen=<value optimized out>, tlen=324, iptos=<value optimized out>, ti_locked=Cannot access memory at address 0x1
)
    at /usr/src/sys/netinet/tcp_input.c:1842
#12 0xffffffff80ae7720 in tcp_input (m=<value optimized out>, off0=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:1376
#13 0xffffffff80a77f57 in ip_input (m=0xfffff8003e5b6500) at /usr/src/sys/netinet/ip_input.c:734
#14 0xffffffff80a177d2 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0xfac0622db6ac96f8)
    at /usr/src/sys/net/netisr.c:976
#15 0xffffffff80a0eb76 in ether_demux (ifp=<value optimized out>, m=0xfffff8003e5b6500) at /usr/src/sys/net/if_ethersubr.c:851
#16 0xffffffff80a0f81e in ether_nh_input (m=<value optimized out>) at /usr/src/sys/net/if_ethersubr.c:646
#17 0xffffffff80a177d2 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0xfac0622db6ac96f8)
    at /usr/src/sys/net/netisr.c:976
#18 0xffffffff804e121b in em_rxeof (count=99) at /usr/src/sys/dev/e1000/if_em.c:4711
#19 0xffffffff804e1663 in em_msix_rx (arg=0xfffff8000449ca00) at /usr/src/sys/dev/e1000/if_em.c:1639
#20 0xffffffff8091482b in intr_event_execute_handlers (p=<value optimized out>, ie=0xfffff8000446fd00)
    at /usr/src/sys/kern/kern_intr.c:1264
#21 0xffffffff80914c76 in ithread_loop (arg=0xfffff800044d40a0) at /usr/src/sys/kern/kern_intr.c:1277
#22 0xffffffff8091244a in fork_exit (callout=0xffffffff80914be0 <ithread_loop>, arg=0xfffff800044d40a0, frame=0xfffffe01ee7eeac0)
    at /usr/src/sys/kern/kern_fork.c:1018
#23 0xffffffff80d30d2e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611
#24 0x0000000000000000 in ?? ()
Comment 1 g_amanakis 2015-11-06 18:13:52 UTC
Setting hw.em.enable_msix=0 in bootloader.conf resolves the symptoms.
Comment 2 Rick Macklem freebsd_committer freebsd_triage 2015-11-16 00:42:04 UTC
Created attachment 163160 [details]
patch that might fix this problem

I think this crash might have been caused by a race
between svcpool_destroy() and the socket upcall.
The code in svcpool_destroy() assumes that SVC_RELEASE(xprt)
drops the ref cnt to 0, so that SVC_DESTROY() is called.
-->SVC_DESTROY() shuts down the socket upcall.
--> If the ref cnt doesn't go to 0, svcpool_destroy() will
    mtx_destroy() the mutexes prematurely.

I am not sure, but the race might have been introduced by
r267228 since, prior to this there was a single mutex for
the pool, held while all xprt's are unregistered.
After r267228, there is a group of mutexes, where the code
only held one at a time, so I think an xprt might get re-registered
on another group after that group has had all de-registered.

The attached little patch moves the mtx_lock() calls to a
separate loop before the xprt_unregister loops, so that all
locks are held while all are de-registered.

I've added mav@ to the cc list, since he might be the guy
that actually understands this.

Anyhow, if you could test the attached patch with msi interrupts
re-enabled and see if the crashes go away, that would be great.
(I don't think that this indicates that the em(4) driver is broken.
 I suspect that it just affects timing of the interrupts that tripped
 over this race.)
Comment 3 g_amanakis 2015-11-16 18:53:01 UTC
(In reply to Rick Macklem from comment #2)
Thanks for the insight. However, I just tried the patch proposed and it still crashes.
Comment 4 Rick Macklem freebsd_committer freebsd_triage 2015-11-17 01:29:31 UTC
Created attachment 163217 [details]
Patch to fix sg_threadcount++ so the mutex is held when done

I took a closer look at svc.c and the only way that the race
I think might have happened can occur is if svc_run() doesn't
wait for all threads to terminate. I also can now see that the
first patch wouldn't have fixed anything, so I'm not surprised
it didn't work.

The only thing I can see that is broken in the code and might
allow svc_run() to return before all threads have terminated is:
- The thread count sg_threadcount is incremented when the sg_lock
  mutex isn't held.
  --> This could conceivably result in a corrupted sg_threadcount,
      which would allow svc_run() to return before all threads have
      terminated.

This second patch fixes the code so that sg_threadcount++ is always
done when the sg_lock mutex is held.

If you can test this one instead of the last one, that would be
appreciated. I do know this patch fixes the above problem, but I
don't know if this is the cause of your crashes.

Also, this bug seems to have existed in the code forever and all
that r267228 did was switch from not holding the pool mutex to
not holding the group mutex. So Alexander, you are off the hook,
I think.;-) I've left you on the cc, since you probably know this
code better than anyone else and might have some insight w.r.t. this crash.
Comment 5 g_amanakis 2015-11-17 20:40:13 UTC
(In reply to Rick Macklem from comment #4)
I am testing it, so far it seems ok. Give me a couple of days to test it thoroughly.
Comment 6 g_amanakis 2015-11-18 03:00:42 UTC
(In reply to Rick Macklem from comment #4)
It still crashes. I am going to get a dump tomorrow.
Comment 7 g_amanakis 2015-11-18 07:18:51 UTC
This a fresh dump with the second patch applied, it seems the same. The problem seems to occur in xprt_active().

Unread portion of the kernel message buffer:
stack pointer           = 0x28:0xfffffe01ee7ee430
frame pointer           = 0x28:0xfffffe01ee7ee4b0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq266: em1:rx0)
trap number             = 9
panic: general protection fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80984e30 at kdb_backtrace+0x60
#1 0xffffffff809489e6 at vpanic+0x126
#2 0xffffffff809488b3 at panic+0x43
#3 0xffffffff80d4ab6b at trap_fatal+0x36b
#4 0xffffffff80d4a7ec at trap+0x75c
#5 0xffffffff80d30882 at calltrap+0x8
#6 0xffffffff80b4a725 at xprt_active+0x45
#7 0xffffffff80b4e185 at svc_vc_soupcall+0x35
#8 0xffffffff809bcc52 at sowakeup+0x82
#9 0xffffffff80aea942 at tcp_do_segment+0x2b22
#10 0xffffffff80ae7720 at tcp_input+0x12b0
#11 0xffffffff80a77f57 at ip_input+0x97
#12 0xffffffff80a177d2 at netisr_dispatch_src+0x62
#13 0xffffffff80a0eb76 at ether_demux+0x126
#14 0xffffffff80a0f81e at ether_nh_input+0x35e
#15 0xffffffff80a177d2 at netisr_dispatch_src+0x62
#16 0xffffffff804e121b at em_rxeof+0x2eb
#17 0xffffffff804e1663 at em_msix_rx+0x33
Uptime: 10h5m21s
Dumping 3079 out of 8134 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/if_tap.ko.symbols...done.
Loaded symbols for /boot/kernel/if_tap.ko.symbols
Reading symbols from /boot/kernel/if_bridge.ko.symbols...done.
Loaded symbols for /boot/kernel/if_bridge.ko.symbols
Reading symbols from /boot/kernel/bridgestp.ko.symbols...done.
Loaded symbols for /boot/kernel/bridgestp.ko.symbols
Reading symbols from /boot/kernel/aio.ko.symbols...done.
Loaded symbols for /boot/kernel/aio.ko.symbols
Reading symbols from /boot/kernel/coretemp.ko.symbols...done.
Loaded symbols for /boot/kernel/coretemp.ko.symbols
Reading symbols from /boot/kernel/ipmi.ko.symbols...done.
Loaded symbols for /boot/kernel/ipmi.ko.symbols
Reading symbols from /boot/kernel/smbus.ko.symbols...done.
Loaded symbols for /boot/kernel/smbus.ko.symbols
Reading symbols from /boot/kernel/aesni.ko.symbols...done.
Loaded symbols for /boot/kernel/aesni.ko.symbols
Reading symbols from /boot/kernel/crypto.ko.symbols...done.
Loaded symbols for /boot/kernel/crypto.ko.symbols
Reading symbols from /boot/kernel/mps.ko.symbols...done.
Loaded symbols for /boot/kernel/mps.ko.symbols
Reading symbols from /boot/kernel/vmm.ko.symbols...done.
Loaded symbols for /boot/kernel/vmm.ko.symbols
Reading symbols from /boot/kernel/nmdm.ko.symbols...done.
Loaded symbols for /boot/kernel/nmdm.ko.symbols
Reading symbols from /boot/kernel/geom_eli.ko.symbols...done.
Loaded symbols for /boot/kernel/geom_eli.ko.symbols
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
Reading symbols from /boot/kernel/ipfw.ko.symbols...done.
Loaded symbols for /boot/kernel/ipfw.ko.symbols
Reading symbols from /boot/kernel/dummynet.ko.symbols...done.
Loaded symbols for /boot/kernel/dummynet.ko.symbols
Reading symbols from /boot/kernel/ipfw_nat.ko.symbols...done.
Loaded symbols for /boot/kernel/ipfw_nat.ko.symbols
Reading symbols from /boot/kernel/libalias.ko.symbols...done.
Loaded symbols for /boot/kernel/libalias.ko.symbols
Reading symbols from /boot/kernel/ctl.ko.symbols...done.
Loaded symbols for /boot/kernel/ctl.ko.symbols
Reading symbols from /boot/kernel/iscsi.ko.symbols...done.
Loaded symbols for /boot/kernel/iscsi.ko.symbols
Reading symbols from /boot/kernel/nullfs.ko.symbols...done.
Loaded symbols for /boot/kernel/nullfs.ko.symbols
Reading symbols from /boot/kernel/fdescfs.ko.symbols...done.
Loaded symbols for /boot/kernel/fdescfs.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) list *0xfffffe01ee7ee430
No source file for address 0xfffffe01ee7ee430.
(kgdb) backtrace
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80948642 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#2  0xffffffff80948a25 in vpanic (fmt=<value optimized out>, 
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758
#3  0xffffffff809488b3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:687
#4  0xffffffff80d4ab6b in trap_fatal (frame=<value optimized out>, 
    eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851
#5  0xffffffff80d4a7ec in trap (frame=<value optimized out>)
    at /usr/src/sys/amd64/amd64/trap.c:203
#6  0xffffffff80d30882 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#7  0xffffffff8092e980 in __mtx_lock_sleep (c=0xfffffe0000e09198, 
    tid=18446735277688524800, opts=0, file=0x0, line=0)
    at /usr/src/sys/kern/kern_mutex.c:437
#8  0xffffffff80b4a725 in xprt_active (xprt=0xfffff80043c93800)
    at /usr/src/sys/rpc/svc.c:394
#9  0xffffffff80b4e185 in svc_vc_soupcall (so=0xfffffe0000e09180, 
    arg=<value optimized out>, waitflag=0) at /usr/src/sys/rpc/svc_vc.c:953
#10 0xffffffff809bcc52 in sowakeup (so=0xfffff80167289570, sb=0xfffff80167289600)
    at /usr/src/sys/kern/uipc_sockbuf.c:191
#11 0xffffffff80aea942 in tcp_do_segment (m=<value optimized out>, 
    th=<value optimized out>, so=0xfffff80167289570, tp=0xfffff80167e15000, 
    drop_hdrlen=<value optimized out>, tlen=236, iptos=<value optimized out>, 
    ti_locked=Cannot access memory at address 0x1
) at /usr/src/sys/netinet/tcp_input.c:1842
    off0=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:1376
#13 0xffffffff80a77f57 in ip_input (m=0xfffff80043661e00)
    at /usr/src/sys/netinet/ip_input.c:734
#14 0xffffffff80a177d2 in netisr_dispatch_src (proto=<value optimized out>, 
    source=<value optimized out>, m=0xce444773859b4170)
    at /usr/src/sys/net/netisr.c:976
#15 0xffffffff80a0eb76 in ether_demux (ifp=<value optimized out>, 
    m=0xfffff80043661e00) at /usr/src/sys/net/if_ethersubr.c:851
#16 0xffffffff80a0f81e in ether_nh_input (m=<value optimized out>)
    at /usr/src/sys/net/if_ethersubr.c:646
#17 0xffffffff80a177d2 in netisr_dispatch_src (proto=<value optimized out>, 
    source=<value optimized out>, m=0xce444773859b4170)
    at /usr/src/sys/net/netisr.c:976
#18 0xffffffff804e121b in em_rxeof (count=99)
    at /usr/src/sys/dev/e1000/if_em.c:4711
#19 0xffffffff804e1663 in em_msix_rx (arg=0xfffff8000449ca00)
    at /usr/src/sys/dev/e1000/if_em.c:1639
---Type <return> to continue, or q <return> to quit---
#20 0xffffffff8091482b in intr_event_execute_handlers (p=<value optimized out>, 
    ie=0xfffff8000446fd00) at /usr/src/sys/kern/kern_intr.c:1264
#21 0xffffffff80914c76 in ithread_loop (arg=0xfffff800044d40a0)
    at /usr/src/sys/kern/kern_intr.c:1277
#22 0xffffffff8091244a in fork_exit (callout=0xffffffff80914be0 <ithread_loop>, 
    arg=0xfffff800044d40a0, frame=0xfffffe01ee7eeac0)
    at /usr/src/sys/kern/kern_fork.c:1018
#23 0xffffffff80d30dbe in fork_trampoline ()
    at /usr/src/sys/amd64/amd64/exception.S:611
#24 0x0000000000000000 in ?? ()
Current language:  auto; currently minimal
(kgdb) 
(kgdb) list *0xffffffff80b4a725
0xffffffff80b4a725 is in xprt_active (/usr/src/sys/rpc/svc.c:396).
391     {
392             SVCGROUP *grp = xprt->xp_group;
393
394             mtx_lock(&grp->sg_lock);
395
396             if (!xprt->xp_registered) {
397                     /*
398                      * Race with xprt_unregister - we lose.
399                      */
400                     mtx_unlock(&grp->sg_lock);
(kgdb)
Comment 8 Rick Macklem freebsd_committer freebsd_triage 2015-11-18 22:30:02 UTC
Created attachment 163299 [details]
patch that locks mutex when request queue is updated

This patch fixes the only other issue I can spot in the kernel
server RPC. There was a case where the request queue was being
updated, but the st_lock mutex was not held. This bug could have
conceivably resulted in a corrupted request queue.
Comment 9 Rick Macklem freebsd_committer freebsd_triage 2015-11-18 22:32:39 UTC
Created attachment 163300 [details]
patch that makes NFSv4.1 server release the backchannel xprt

This patch fixes the NFSv4.1 server so that it SVC_RELEASE()s the
backchannel xprt when the nfsd threads are terminated.
This patch only affects NFSv4.1, so it doesn't matter unless you
are running NFSv4.1 client mounts.
Comment 10 Rick Macklem freebsd_committer freebsd_triage 2015-11-18 22:44:59 UTC
I have just added 2 more patches that might be relevant to the crashes.
When the nfsd threads are terminated, this is what is supposed to happen:
- All nfsd threads running in svc_run_internal() return to svc_run().
- svc_run() waits for all these threads to return.
- After svc_run returns, the nfsd calls svcpool_destroy().
- svcpool_destroy() unregisters all the xprts (which represent the TCP sockets)
  - at this point, the reference count should be 1 for all xprts
  --> Then svcpool_destroy() calls SVC_RELEASE(xprt) for all of them, which
      drops the reference count to 0 and calls SVC_DESTROY()
     --> This actually calls svc_vc_destroy(), which shuts down the socket upcall
         and after that, destroys the mutexes.

My best guess w.r.t. the crashes is that the reference count gets messed up on
an xprt, so it doesn't get SVC_DESTROY()'d. Then a socket upcall calls xprt_active() after the mutex has been destroyed and BOOM.

The two patched should be applied along with the first one.
The second patch fixes the one other place that I can spot where the server
side krpc code isn't quite SMP safe. Although unlikely, it is conceivable
that this could cause the crashes.

The third patch makes sure that the backchannel xprt is dereferenced before
the call to svcpool_destroy(). The one seems a more likely culprit, but only
if you have clients doing NFSv4.1 mounts against the server.

If you could try the second patch (and the third if you have NFSv4.1 mounts),
that would be appreciated.

One final comment: I am assuming that you are terminating the nfsd threads
by sending a SIGUSR1 to the nfsd master. This is the only way the nfsd
threads should be terminated. (If you are using /etc/rc.d/nfsd, it should
be doing that, but you might try using "kill -USR1 <pid-of-nfsd-master>"
directly, just in case the shell script is busted.

This pretty well exhausts what I can see that might cause the crashes and
I can't reproduce a crash here, so hopefully you can make some progress
from here.

Good luck with it, rick
Comment 11 g_amanakis 2015-11-19 19:15:10 UTC
I am currently testing with the three last patches applied as I do have NFSv4.1 clients running on Linux. Give me a couple of days and I will provide feedback. Thank you for the insight.
Comment 12 g_amanakis 2015-11-21 14:49:33 UTC
The three last patches resolve the bug.
@rick thank you.
@koobs MFC?
Comment 13 Rick Macklem freebsd_committer freebsd_triage 2015-11-21 21:53:02 UTC
A variant of the first patch has already been committed to head by mav@.
He also noted to me that the second patch isn't needed. The "else" case
doesn't have any other thread manipulating the list, so the thread mutex
isn't needed.

Since you are using NFSv4.1, I'm pretty sure that the third patch is the
one that fixes the problem. I will commit this to head soon and MFC it
before closing this PR. (Oh, and the NFSv4.1 server stuff isn't in stable/9,
so the fix only applies to 10 and head.)

Thanks for doing the testing, so this could get resolved.

rick
Comment 14 Rick Macklem freebsd_committer freebsd_triage 2015-11-21 21:55:12 UTC
Also, although it was good to get this crash resolved, I would recommend
trying to avoid doing nfsd thread restarts. (If by any chance you were doing
this to avoid access problems during export/mount updates, the "-S" option
on mount suspends/resumes the nfsd threads, which should be preferable to
killing them off and restarting them.)

rick
Comment 15 commit-hook freebsd_committer freebsd_triage 2015-11-21 23:55:53 UTC
A commit references this bug:

Author: rmacklem
Date: Sat Nov 21 23:55:46 UTC 2015
New revision: 291150
URL: https://svnweb.freebsd.org/changeset/base/291150

Log:
  When the nfsd threads are terminated, the NFSv4 server state
  (opens, locks, etc) is retained, which I believe is correct behaviour.
  However, for NFSv4.1, the server also retained a reference to the xprt
  (RPC transport socket structure) for the backchannel. This caused
  svcpool_destroy() to not call SVC_DESTROY() for the xprt and allowed
  a socket upcall to occur after the mutexes in the svcpool were destroyed,
  causing a crash.
  This patch fixes the code so that the backchannel xprt structure is
  dereferenced just before svcpool_destroy() is called, so the code
  does do an SVC_DESTROY() on the xprt, which shuts down the socket upcall.

  Tested by:	g_amanakis@yahoo.com
  PR:		204340
  MFC after:	2 weeks

Changes:
  head/sys/fs/nfs/nfs_var.h
  head/sys/fs/nfsserver/nfs_nfsdkrpc.c
  head/sys/fs/nfsserver/nfs_nfsdstate.c
Comment 16 Rick Macklem freebsd_committer freebsd_triage 2015-12-06 01:22:17 UTC
The patch that fixes this has been committed to head as r291150 and
stable/10 as r291869.
Comment 17 Andriy Gapon freebsd_committer freebsd_triage 2017-01-27 15:10:34 UTC
Rick,
we got what looks like a very similar crash in FreeBSD 10.3:

db_trace_self_wrapper+0x2a
kdb_backtrace+0x37
vpanic+0xf7
panic+0x67
trap_fatal+0x264
trap_pfault+0x216
trap+0x32b
calltrap+0x8
__mtx_lock_sleep+0xa2
xprt_active+0xe7
svc_vc_soupcall+0x25
sowakeup+0x69
tcp_do_segment+0x319e
tcp_input+0x701
ip_input+0x14c
netisr_dispatch_src+0x228
ether_demux+0x1a5
ether_nh_input+0x1fc
netisr_dispatch_src+0x228
tcp_lro_flush+0x2b
ixgbe_rxeof+0x30d
ixgbe_msix_que+0x88
intr_event_execute_handlers+0x102
ithread_loop+0x9a
fork_exit+0x11f
fork_trampoline+0xe

It's this lock:
void                                                                                                                                                                                                                                   
xprt_active(SVCXPRT *xprt)                                                                                                                                                                                                             
{                                                                                                                                                                                                                                      
        SVCGROUP *grp = xprt->xp_group;                                                                                                                                                                                                
                                                                                                                                                                                                                                       
        mtx_lock(&grp->sg_lock);

Do you have any suggestions?
Thank you!
Comment 18 Rick Macklem freebsd_committer freebsd_triage 2017-01-30 03:04:24 UTC
Andriy Gapon, did your crash occur when the machine was being shut down
(or the nfsd threads were being killed off)?

If not, it is not caused by what these patches were intended for.
(They are in 10.3.)

Your crash basically indicates that either:
1 - The sg_group was free'd when a socket upcall was still in progress.
    Since the sg_group structures aren't free'd until the nfsd threads
    are killed (shutdown or ??), I don't think this can happen during
    normal operation.
OR
2 - The xprt structure that referenced the sg_group was free'd prematurely
    and the sg pointer was bogus.

If it was #2, I think I can come up with a simple patch to avoid this.
(Basically acquire a reference count on the xprt structure during the
 socket upcall.)

rick
Comment 19 Andriy Gapon freebsd_committer freebsd_triage 2017-01-30 08:55:48 UTC
(In reply to Rick Macklem from comment #18)
> Andriy Gapon, did your crash occur when the machine was being shut down
(or the nfsd threads were being killed off)?

Yes, it did.  I am told that this was easy to reproduce by SIGKILL to nfsd.
Comment 20 Rick Macklem freebsd_committer freebsd_triage 2017-01-30 21:44:50 UTC
Well, you should never kill the nfsd with SIGKILL. The way
that is intended to be "safe" is to send SIGUSR1 to the "(master)"
nfsd. It will shut things down and kill off the other threads.

I'm not sure that I know of any way to make SIGKILL safe.
I'm pretty sure "man nfsd" says this, but maybe it needs
more emphasis?

I also believe that /etc/rc.d/nfsd does this correctly.

rick
Comment 21 Rick Macklem freebsd_committer freebsd_triage 2017-01-30 21:49:49 UTC
Oh, and if you get a crash when shutting down the nfsd threads via
"kill -USR1 <pid-master-nfsd>", then let us know, since something is
still broken.
(If you are curious, the "(master)" nfsd is the one that gets new
 TCP connections and it must stop doing that before the kernel
 threads are terminated. Otherwise you can easily get a socket
 upcall after the data structures have been free'd.)

rick
Comment 22 Andriy Gapon freebsd_committer freebsd_triage 2017-01-31 08:03:05 UTC
(In reply to Rick Macklem from comment #20)

Rick, thank you very much for the explanation!  I knew that nfsd processes were special as they 'lend their stacks to kernel' or something like that.  But I failed to realise that that put restrictions in the signals as well.

I should also explain that kill -9 was not used to shutdown nfsd or as a replacement for the normal nfsd management.  It was used just to demonstrate the problem.

I think that originally the problem happened when gdb was used on an nfsd process.

I understand that the nfsd processes are special.  But the situation seems to be a bit fragile.  The current design is old and proven.  But perhaps we could switch to using kernel processes or maybe we could mark the nfsd processes with a special flag somehow as to prevent them being killed SIGKILL or stopped with SIGSTOP (i.e. prevent normal signal delivery for all signals).

Lastly, just to clarify, should we avoid using debuggers / SIGSTOP with nfsd?
Comment 23 Rick Macklem freebsd_committer freebsd_triage 2017-02-02 01:45:10 UTC
Created attachment 179512 [details]
add svcpool_close so that svcpool_destroy doesn't get called when nfsd threads are killed

This patch adds a new function to the server krpc called svcpool_close().
It is similar to svcpool_destroy(), but does not free the data structures,
so that the pool can be used again.

This function is then used instead of svcpool_destroy(), svcpool_create()
when the nfsd threads are killed.

These crashes are caused because the data structures were free'd by
svcpool_destroy() when the nfsd threads were killed off (or signalled somehow).
By avoiding the svcpool_destroy() call, the crashes should be avoided.
Comment 24 Rick Macklem freebsd_committer freebsd_triage 2017-02-02 01:48:29 UTC
Please test the patch I just attached. (4th one) I think it might make the
code less fragile to nfsd threads being signalled. I have not been able
to create a crash with the patch during limited testing.

Since avg@'s crash occurred with the other patches, I have reopened the
PR.
Comment 25 Rick Macklem freebsd_committer freebsd_triage 2017-02-06 00:58:15 UTC
Created attachment 179661 [details]
add svcpool_close (cleaned up version)

This patch implements the same logic as 179512, but is cleaned up
by factoring out the code common to svcpool_destroy() and svcpool_close()
and placing it in svcpool_cleanup().

Semantically equivalent to 179512 for testing.
Comment 26 Andriy Gapon freebsd_committer freebsd_triage 2017-02-07 09:38:28 UTC
(In reply to Rick Macklem from comment #25)
Rick, thank you very much!
We tested this patch and it makes nfsd more robust with respect to SIGKILL.
We haven't seen any regressions.
Comment 27 commit-hook freebsd_committer freebsd_triage 2017-02-14 17:49:45 UTC
A commit references this bug:

Author: avg
Date: Tue Feb 14 17:49:08 UTC 2017
New revision: 313735
URL: https://svnweb.freebsd.org/changeset/base/313735

Log:
  add svcpool_close to handle killed nfsd threads

  This patch adds a new function to the server krpc called
  svcpool_close().  It is similar to svcpool_destroy(), but does not free
  the data structures, so that the pool can be used again.

  This function is then used instead of svcpool_destroy(),
  svcpool_create() when the nfsd threads are killed.

  PR:		204340
  Reported by:	Panzura
  Approved by:	rmacklem
  Obtained from:	rmacklem
  MFC after:	1 week

Changes:
  head/sys/fs/nfsserver/nfs_nfsdkrpc.c
  head/sys/rpc/svc.c
  head/sys/rpc/svc.h
Comment 28 commit-hook freebsd_committer freebsd_triage 2017-02-21 09:30:01 UTC
A commit references this bug:

Author: avg
Date: Tue Feb 21 09:29:11 UTC 2017
New revision: 314033
URL: https://svnweb.freebsd.org/changeset/base/314033

Log:
  MFC r313735: add svcpool_close to handle killed nfsd threads

  PR:		204340
  Reported by:	Panzura
  Approved by:	rmacklem
  Obtained from:	rmacklem

Changes:
_U  stable/11/
  stable/11/sys/fs/nfsserver/nfs_nfsdkrpc.c
  stable/11/sys/rpc/svc.c
  stable/11/sys/rpc/svc.h
Comment 29 commit-hook freebsd_committer freebsd_triage 2017-02-21 09:30:03 UTC
A commit references this bug:

Author: avg
Date: Tue Feb 21 09:29:47 UTC 2017
New revision: 314034
URL: https://svnweb.freebsd.org/changeset/base/314034

Log:
  MFC r313735: add svcpool_close to handle killed nfsd threads

  PR:		204340
  Reported by:	Panzura
  Reviewed by:	rmacklem
  Approved by:	rmacklem

Changes:
_U  stable/10/
  stable/10/sys/fs/nfsserver/nfs_nfsdkrpc.c
  stable/10/sys/rpc/svc.c
  stable/10/sys/rpc/svc.h
Comment 30 Andriy Gapon freebsd_committer freebsd_triage 2017-02-21 09:36:06 UTC
Rick, thank you again!
Comment 31 Rick Macklem freebsd_committer freebsd_triage 2017-02-21 21:14:07 UTC
And thanks go to you for doing the testing and commits.