On a X9SCM with em NICs restarting nfsd with mounted shares remotely results in kernel trap 9 with panic and reboot. The system is running a GENERIC kernel on an up-to-date FreeBSD 10.2-RELEASE. On /var/log/messages: 1406 Nov 5 16:43:00 x3200 kernel: 1407 Nov 5 16:43:00 x3200 kernel: 1408 Nov 5 16:43:00 x3200 kernel: Fatal trap 9: general protection fault while in kernel mode 1409 Nov 5 16:43:00 x3200 kernel: cpuid = 0; apic id = 00 1410 Nov 5 16:43:00 x3200 kernel: instruction pointer = 0x20:0xffffffff8092e980 1411 Nov 5 16:43:00 x3200 kernel: stack pointer = 0x28:0xfffffe01ee7ee430 1412 Nov 5 16:43:00 x3200 kernel: frame pointer = 0x28:0xfffffe01ee7ee4b0 1413 Nov 5 16:43:00 x3200 kernel: code segment = base rx0, limit 0xfffff, type 0x1b 1414 Nov 5 16:43:00 x3200 kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 1415 Nov 5 16:43:00 x3200 kernel: processor eflags = 1416 Nov 5 16:43:00 x3200 kernel: interrupt enabled, resume, IOPL = 0 1417 Nov 5 16:43:00 x3200 kernel: current process = 12 (irq266: em1:rx0) 1418 Nov 5 16:45:06 x3200 syslogd: kernel boot file is /boot/kernel/kernel 1419 Nov 5 16:45:06 x3200 kernel: trap number = 9 1420 Nov 5 16:45:06 x3200 kernel: panic: general protection fault 1421 Nov 5 16:45:06 x3200 kernel: cpuid = 0 1422 Nov 5 16:45:06 x3200 kernel: KDB: stack backtrace: 1423 Nov 5 16:45:06 x3200 kernel: #0 0xffffffff80984e30 at kdb_backtrace+0x60 1424 Nov 5 16:45:06 x3200 kernel: #1 0xffffffff809489e6 at vpanic+0x126 1425 Nov 5 16:45:06 x3200 kernel: #2 0xffffffff809488b3 at panic+0x43 1426 Nov 5 16:45:06 x3200 kernel: #3 0xffffffff80d4aadb at trap_fatal+0x36b 1427 Nov 5 16:45:06 x3200 kernel: #4 0xffffffff80d4a75c at trap+0x75c 1428 Nov 5 16:45:06 x3200 kernel: #5 0xffffffff80d307f2 at calltrap+0x8 1429 Nov 5 16:45:06 x3200 kernel: #6 0xffffffff80b4a725 at xprt_active+0x45 1430 Nov 5 16:45:06 x3200 kernel: #7 0xffffffff80b4e0a5 at svc_vc_soupcall+0x35 1431 Nov 5 16:45:06 x3200 kernel: #8 0xffffffff809bcc52 at sowakeup+0x82 1432 Nov 5 16:45:06 x3200 kernel: #9 0xffffffff80aea942 at tcp_do_segment+0x2b22 1433 Nov 5 16:45:06 x3200 kernel: #10 0xffffffff80ae7720 at tcp_input+0x12b0 1434 Nov 5 16:45:06 x3200 kernel: #11 0xffffffff80a77f57 at ip_input+0x97 1435 Nov 5 16:45:06 x3200 kernel: #12 0xffffffff80a177d2 at netisr_dispatch_src+0x62 1436 Nov 5 16:45:06 x3200 kernel: #13 0xffffffff80a0eb76 at ether_demux+0x126 1437 Nov 5 16:45:06 x3200 kernel: #14 0xffffffff80a0f81e at ether_nh_input+0x35e 1438 Nov 5 16:45:06 x3200 kernel: #15 0xffffffff80a177d2 at netisr_dispatch_src+0x62 1439 Nov 5 16:45:06 x3200 kernel: #16 0xffffffff804e121b at em_rxeof+0x2eb 1440 Nov 5 16:45:06 x3200 kernel: #17 0xffffffff804e1663 at em_msix_rx+0x33 1441 Nov 5 16:45:06 x3200 kernel: Uptime: 2m10s 1442 Nov 5 16:45:06 x3200 kernel: Dumping 460 out of 8134 MB:..4%..11%..21%..32%..42%..53%..63%..73%..84%..91%Copyright (c) 1992-201 5 The FreeBSD Project. Running kgdb: (kgdb) list *0xffffffff8092e980 0xffffffff8092e980 is in __mtx_lock_sleep (/usr/src/sys/kern/kern_mutex.c:438). 433 * owner stops running or the state of the lock changes. 434 */ 435 v = m->mtx_lock; 436 if (v != MTX_UNOWNED) { 437 owner = (struct thread *)(v & ~MTX_FLAGMASK); 438 if (TD_IS_RUNNING(owner)) { 439 if (LOCK_LOG_TEST(&m->lock_object, 0)) 440 CTR3(KTR_LOCK, 441 "%s: spinning on %p held by %p", 442 __func__, m, owner); (kgdb) backtrace #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff80948642 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff80948a25 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff809488b3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:687 #4 0xffffffff80d4aadb in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851 #5 0xffffffff80d4a75c in trap (frame=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:203 #6 0xffffffff80d307f2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #7 0xffffffff8092e980 in __mtx_lock_sleep (c=0xfffffe0000e09198, tid=18446735277688524800, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:437 #8 0xffffffff80b4a725 in xprt_active (xprt=0xfffff801247cae00) at /usr/src/sys/rpc/svc.c:394 #9 0xffffffff80b4e0a5 in svc_vc_soupcall (so=0xfffffe0000e09180, arg=<value optimized out>, waitflag=0) at /usr/src/sys/rpc/svc_vc.c:953 #10 0xffffffff809bcc52 in sowakeup (so=0xfffff8012473a570, sb=0xfffff8012473a600) at /usr/src/sys/kern/uipc_sockbuf.c:191 #11 0xffffffff80aea942 in tcp_do_segment (m=<value optimized out>, th=<value optimized out>, so=0xfffff8012473a570, tp=0xfffff801b93ed810, drop_hdrlen=<value optimized out>, tlen=324, iptos=<value optimized out>, ti_locked=Cannot access memory at address 0x1 ) at /usr/src/sys/netinet/tcp_input.c:1842 #12 0xffffffff80ae7720 in tcp_input (m=<value optimized out>, off0=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:1376 #13 0xffffffff80a77f57 in ip_input (m=0xfffff8003e5b6500) at /usr/src/sys/netinet/ip_input.c:734 #14 0xffffffff80a177d2 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0xfac0622db6ac96f8) at /usr/src/sys/net/netisr.c:976 #15 0xffffffff80a0eb76 in ether_demux (ifp=<value optimized out>, m=0xfffff8003e5b6500) at /usr/src/sys/net/if_ethersubr.c:851 #16 0xffffffff80a0f81e in ether_nh_input (m=<value optimized out>) at /usr/src/sys/net/if_ethersubr.c:646 #17 0xffffffff80a177d2 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0xfac0622db6ac96f8) at /usr/src/sys/net/netisr.c:976 #18 0xffffffff804e121b in em_rxeof (count=99) at /usr/src/sys/dev/e1000/if_em.c:4711 #19 0xffffffff804e1663 in em_msix_rx (arg=0xfffff8000449ca00) at /usr/src/sys/dev/e1000/if_em.c:1639 #20 0xffffffff8091482b in intr_event_execute_handlers (p=<value optimized out>, ie=0xfffff8000446fd00) at /usr/src/sys/kern/kern_intr.c:1264 #21 0xffffffff80914c76 in ithread_loop (arg=0xfffff800044d40a0) at /usr/src/sys/kern/kern_intr.c:1277 #22 0xffffffff8091244a in fork_exit (callout=0xffffffff80914be0 <ithread_loop>, arg=0xfffff800044d40a0, frame=0xfffffe01ee7eeac0) at /usr/src/sys/kern/kern_fork.c:1018 #23 0xffffffff80d30d2e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #24 0x0000000000000000 in ?? ()
Setting hw.em.enable_msix=0 in bootloader.conf resolves the symptoms.
Created attachment 163160 [details] patch that might fix this problem I think this crash might have been caused by a race between svcpool_destroy() and the socket upcall. The code in svcpool_destroy() assumes that SVC_RELEASE(xprt) drops the ref cnt to 0, so that SVC_DESTROY() is called. -->SVC_DESTROY() shuts down the socket upcall. --> If the ref cnt doesn't go to 0, svcpool_destroy() will mtx_destroy() the mutexes prematurely. I am not sure, but the race might have been introduced by r267228 since, prior to this there was a single mutex for the pool, held while all xprt's are unregistered. After r267228, there is a group of mutexes, where the code only held one at a time, so I think an xprt might get re-registered on another group after that group has had all de-registered. The attached little patch moves the mtx_lock() calls to a separate loop before the xprt_unregister loops, so that all locks are held while all are de-registered. I've added mav@ to the cc list, since he might be the guy that actually understands this. Anyhow, if you could test the attached patch with msi interrupts re-enabled and see if the crashes go away, that would be great. (I don't think that this indicates that the em(4) driver is broken. I suspect that it just affects timing of the interrupts that tripped over this race.)
(In reply to Rick Macklem from comment #2) Thanks for the insight. However, I just tried the patch proposed and it still crashes.
Created attachment 163217 [details] Patch to fix sg_threadcount++ so the mutex is held when done I took a closer look at svc.c and the only way that the race I think might have happened can occur is if svc_run() doesn't wait for all threads to terminate. I also can now see that the first patch wouldn't have fixed anything, so I'm not surprised it didn't work. The only thing I can see that is broken in the code and might allow svc_run() to return before all threads have terminated is: - The thread count sg_threadcount is incremented when the sg_lock mutex isn't held. --> This could conceivably result in a corrupted sg_threadcount, which would allow svc_run() to return before all threads have terminated. This second patch fixes the code so that sg_threadcount++ is always done when the sg_lock mutex is held. If you can test this one instead of the last one, that would be appreciated. I do know this patch fixes the above problem, but I don't know if this is the cause of your crashes. Also, this bug seems to have existed in the code forever and all that r267228 did was switch from not holding the pool mutex to not holding the group mutex. So Alexander, you are off the hook, I think.;-) I've left you on the cc, since you probably know this code better than anyone else and might have some insight w.r.t. this crash.
(In reply to Rick Macklem from comment #4) I am testing it, so far it seems ok. Give me a couple of days to test it thoroughly.
(In reply to Rick Macklem from comment #4) It still crashes. I am going to get a dump tomorrow.
This a fresh dump with the second patch applied, it seems the same. The problem seems to occur in xprt_active(). Unread portion of the kernel message buffer: stack pointer = 0x28:0xfffffe01ee7ee430 frame pointer = 0x28:0xfffffe01ee7ee4b0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq266: em1:rx0) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff80984e30 at kdb_backtrace+0x60 #1 0xffffffff809489e6 at vpanic+0x126 #2 0xffffffff809488b3 at panic+0x43 #3 0xffffffff80d4ab6b at trap_fatal+0x36b #4 0xffffffff80d4a7ec at trap+0x75c #5 0xffffffff80d30882 at calltrap+0x8 #6 0xffffffff80b4a725 at xprt_active+0x45 #7 0xffffffff80b4e185 at svc_vc_soupcall+0x35 #8 0xffffffff809bcc52 at sowakeup+0x82 #9 0xffffffff80aea942 at tcp_do_segment+0x2b22 #10 0xffffffff80ae7720 at tcp_input+0x12b0 #11 0xffffffff80a77f57 at ip_input+0x97 #12 0xffffffff80a177d2 at netisr_dispatch_src+0x62 #13 0xffffffff80a0eb76 at ether_demux+0x126 #14 0xffffffff80a0f81e at ether_nh_input+0x35e #15 0xffffffff80a177d2 at netisr_dispatch_src+0x62 #16 0xffffffff804e121b at em_rxeof+0x2eb #17 0xffffffff804e1663 at em_msix_rx+0x33 Uptime: 10h5m21s Dumping 3079 out of 8134 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/if_tap.ko.symbols...done. Loaded symbols for /boot/kernel/if_tap.ko.symbols Reading symbols from /boot/kernel/if_bridge.ko.symbols...done. Loaded symbols for /boot/kernel/if_bridge.ko.symbols Reading symbols from /boot/kernel/bridgestp.ko.symbols...done. Loaded symbols for /boot/kernel/bridgestp.ko.symbols Reading symbols from /boot/kernel/aio.ko.symbols...done. Loaded symbols for /boot/kernel/aio.ko.symbols Reading symbols from /boot/kernel/coretemp.ko.symbols...done. Loaded symbols for /boot/kernel/coretemp.ko.symbols Reading symbols from /boot/kernel/ipmi.ko.symbols...done. Loaded symbols for /boot/kernel/ipmi.ko.symbols Reading symbols from /boot/kernel/smbus.ko.symbols...done. Loaded symbols for /boot/kernel/smbus.ko.symbols Reading symbols from /boot/kernel/aesni.ko.symbols...done. Loaded symbols for /boot/kernel/aesni.ko.symbols Reading symbols from /boot/kernel/crypto.ko.symbols...done. Loaded symbols for /boot/kernel/crypto.ko.symbols Reading symbols from /boot/kernel/mps.ko.symbols...done. Loaded symbols for /boot/kernel/mps.ko.symbols Reading symbols from /boot/kernel/vmm.ko.symbols...done. Loaded symbols for /boot/kernel/vmm.ko.symbols Reading symbols from /boot/kernel/nmdm.ko.symbols...done. Loaded symbols for /boot/kernel/nmdm.ko.symbols Reading symbols from /boot/kernel/geom_eli.ko.symbols...done. Loaded symbols for /boot/kernel/geom_eli.ko.symbols Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols Reading symbols from /boot/kernel/ipfw.ko.symbols...done. Loaded symbols for /boot/kernel/ipfw.ko.symbols Reading symbols from /boot/kernel/dummynet.ko.symbols...done. Loaded symbols for /boot/kernel/dummynet.ko.symbols Reading symbols from /boot/kernel/ipfw_nat.ko.symbols...done. Loaded symbols for /boot/kernel/ipfw_nat.ko.symbols Reading symbols from /boot/kernel/libalias.ko.symbols...done. Loaded symbols for /boot/kernel/libalias.ko.symbols Reading symbols from /boot/kernel/ctl.ko.symbols...done. Loaded symbols for /boot/kernel/ctl.ko.symbols Reading symbols from /boot/kernel/iscsi.ko.symbols...done. Loaded symbols for /boot/kernel/iscsi.ko.symbols Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded symbols for /boot/kernel/nullfs.ko.symbols Reading symbols from /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for /boot/kernel/fdescfs.ko.symbols #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) list *0xfffffe01ee7ee430 No source file for address 0xfffffe01ee7ee430. (kgdb) backtrace #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff80948642 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff80948a25 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff809488b3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:687 #4 0xffffffff80d4ab6b in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851 #5 0xffffffff80d4a7ec in trap (frame=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:203 #6 0xffffffff80d30882 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #7 0xffffffff8092e980 in __mtx_lock_sleep (c=0xfffffe0000e09198, tid=18446735277688524800, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:437 #8 0xffffffff80b4a725 in xprt_active (xprt=0xfffff80043c93800) at /usr/src/sys/rpc/svc.c:394 #9 0xffffffff80b4e185 in svc_vc_soupcall (so=0xfffffe0000e09180, arg=<value optimized out>, waitflag=0) at /usr/src/sys/rpc/svc_vc.c:953 #10 0xffffffff809bcc52 in sowakeup (so=0xfffff80167289570, sb=0xfffff80167289600) at /usr/src/sys/kern/uipc_sockbuf.c:191 #11 0xffffffff80aea942 in tcp_do_segment (m=<value optimized out>, th=<value optimized out>, so=0xfffff80167289570, tp=0xfffff80167e15000, drop_hdrlen=<value optimized out>, tlen=236, iptos=<value optimized out>, ti_locked=Cannot access memory at address 0x1 ) at /usr/src/sys/netinet/tcp_input.c:1842 off0=<value optimized out>) at /usr/src/sys/netinet/tcp_input.c:1376 #13 0xffffffff80a77f57 in ip_input (m=0xfffff80043661e00) at /usr/src/sys/netinet/ip_input.c:734 #14 0xffffffff80a177d2 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0xce444773859b4170) at /usr/src/sys/net/netisr.c:976 #15 0xffffffff80a0eb76 in ether_demux (ifp=<value optimized out>, m=0xfffff80043661e00) at /usr/src/sys/net/if_ethersubr.c:851 #16 0xffffffff80a0f81e in ether_nh_input (m=<value optimized out>) at /usr/src/sys/net/if_ethersubr.c:646 #17 0xffffffff80a177d2 in netisr_dispatch_src (proto=<value optimized out>, source=<value optimized out>, m=0xce444773859b4170) at /usr/src/sys/net/netisr.c:976 #18 0xffffffff804e121b in em_rxeof (count=99) at /usr/src/sys/dev/e1000/if_em.c:4711 #19 0xffffffff804e1663 in em_msix_rx (arg=0xfffff8000449ca00) at /usr/src/sys/dev/e1000/if_em.c:1639 ---Type <return> to continue, or q <return> to quit--- #20 0xffffffff8091482b in intr_event_execute_handlers (p=<value optimized out>, ie=0xfffff8000446fd00) at /usr/src/sys/kern/kern_intr.c:1264 #21 0xffffffff80914c76 in ithread_loop (arg=0xfffff800044d40a0) at /usr/src/sys/kern/kern_intr.c:1277 #22 0xffffffff8091244a in fork_exit (callout=0xffffffff80914be0 <ithread_loop>, arg=0xfffff800044d40a0, frame=0xfffffe01ee7eeac0) at /usr/src/sys/kern/kern_fork.c:1018 #23 0xffffffff80d30dbe in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #24 0x0000000000000000 in ?? () Current language: auto; currently minimal (kgdb) (kgdb) list *0xffffffff80b4a725 0xffffffff80b4a725 is in xprt_active (/usr/src/sys/rpc/svc.c:396). 391 { 392 SVCGROUP *grp = xprt->xp_group; 393 394 mtx_lock(&grp->sg_lock); 395 396 if (!xprt->xp_registered) { 397 /* 398 * Race with xprt_unregister - we lose. 399 */ 400 mtx_unlock(&grp->sg_lock); (kgdb)
Created attachment 163299 [details] patch that locks mutex when request queue is updated This patch fixes the only other issue I can spot in the kernel server RPC. There was a case where the request queue was being updated, but the st_lock mutex was not held. This bug could have conceivably resulted in a corrupted request queue.
Created attachment 163300 [details] patch that makes NFSv4.1 server release the backchannel xprt This patch fixes the NFSv4.1 server so that it SVC_RELEASE()s the backchannel xprt when the nfsd threads are terminated. This patch only affects NFSv4.1, so it doesn't matter unless you are running NFSv4.1 client mounts.
I have just added 2 more patches that might be relevant to the crashes. When the nfsd threads are terminated, this is what is supposed to happen: - All nfsd threads running in svc_run_internal() return to svc_run(). - svc_run() waits for all these threads to return. - After svc_run returns, the nfsd calls svcpool_destroy(). - svcpool_destroy() unregisters all the xprts (which represent the TCP sockets) - at this point, the reference count should be 1 for all xprts --> Then svcpool_destroy() calls SVC_RELEASE(xprt) for all of them, which drops the reference count to 0 and calls SVC_DESTROY() --> This actually calls svc_vc_destroy(), which shuts down the socket upcall and after that, destroys the mutexes. My best guess w.r.t. the crashes is that the reference count gets messed up on an xprt, so it doesn't get SVC_DESTROY()'d. Then a socket upcall calls xprt_active() after the mutex has been destroyed and BOOM. The two patched should be applied along with the first one. The second patch fixes the one other place that I can spot where the server side krpc code isn't quite SMP safe. Although unlikely, it is conceivable that this could cause the crashes. The third patch makes sure that the backchannel xprt is dereferenced before the call to svcpool_destroy(). The one seems a more likely culprit, but only if you have clients doing NFSv4.1 mounts against the server. If you could try the second patch (and the third if you have NFSv4.1 mounts), that would be appreciated. One final comment: I am assuming that you are terminating the nfsd threads by sending a SIGUSR1 to the nfsd master. This is the only way the nfsd threads should be terminated. (If you are using /etc/rc.d/nfsd, it should be doing that, but you might try using "kill -USR1 <pid-of-nfsd-master>" directly, just in case the shell script is busted. This pretty well exhausts what I can see that might cause the crashes and I can't reproduce a crash here, so hopefully you can make some progress from here. Good luck with it, rick
I am currently testing with the three last patches applied as I do have NFSv4.1 clients running on Linux. Give me a couple of days and I will provide feedback. Thank you for the insight.
The three last patches resolve the bug. @rick thank you. @koobs MFC?
A variant of the first patch has already been committed to head by mav@. He also noted to me that the second patch isn't needed. The "else" case doesn't have any other thread manipulating the list, so the thread mutex isn't needed. Since you are using NFSv4.1, I'm pretty sure that the third patch is the one that fixes the problem. I will commit this to head soon and MFC it before closing this PR. (Oh, and the NFSv4.1 server stuff isn't in stable/9, so the fix only applies to 10 and head.) Thanks for doing the testing, so this could get resolved. rick
Also, although it was good to get this crash resolved, I would recommend trying to avoid doing nfsd thread restarts. (If by any chance you were doing this to avoid access problems during export/mount updates, the "-S" option on mount suspends/resumes the nfsd threads, which should be preferable to killing them off and restarting them.) rick
A commit references this bug: Author: rmacklem Date: Sat Nov 21 23:55:46 UTC 2015 New revision: 291150 URL: https://svnweb.freebsd.org/changeset/base/291150 Log: When the nfsd threads are terminated, the NFSv4 server state (opens, locks, etc) is retained, which I believe is correct behaviour. However, for NFSv4.1, the server also retained a reference to the xprt (RPC transport socket structure) for the backchannel. This caused svcpool_destroy() to not call SVC_DESTROY() for the xprt and allowed a socket upcall to occur after the mutexes in the svcpool were destroyed, causing a crash. This patch fixes the code so that the backchannel xprt structure is dereferenced just before svcpool_destroy() is called, so the code does do an SVC_DESTROY() on the xprt, which shuts down the socket upcall. Tested by: g_amanakis@yahoo.com PR: 204340 MFC after: 2 weeks Changes: head/sys/fs/nfs/nfs_var.h head/sys/fs/nfsserver/nfs_nfsdkrpc.c head/sys/fs/nfsserver/nfs_nfsdstate.c
The patch that fixes this has been committed to head as r291150 and stable/10 as r291869.
Rick, we got what looks like a very similar crash in FreeBSD 10.3: db_trace_self_wrapper+0x2a kdb_backtrace+0x37 vpanic+0xf7 panic+0x67 trap_fatal+0x264 trap_pfault+0x216 trap+0x32b calltrap+0x8 __mtx_lock_sleep+0xa2 xprt_active+0xe7 svc_vc_soupcall+0x25 sowakeup+0x69 tcp_do_segment+0x319e tcp_input+0x701 ip_input+0x14c netisr_dispatch_src+0x228 ether_demux+0x1a5 ether_nh_input+0x1fc netisr_dispatch_src+0x228 tcp_lro_flush+0x2b ixgbe_rxeof+0x30d ixgbe_msix_que+0x88 intr_event_execute_handlers+0x102 ithread_loop+0x9a fork_exit+0x11f fork_trampoline+0xe It's this lock: void xprt_active(SVCXPRT *xprt) { SVCGROUP *grp = xprt->xp_group; mtx_lock(&grp->sg_lock); Do you have any suggestions? Thank you!
Andriy Gapon, did your crash occur when the machine was being shut down (or the nfsd threads were being killed off)? If not, it is not caused by what these patches were intended for. (They are in 10.3.) Your crash basically indicates that either: 1 - The sg_group was free'd when a socket upcall was still in progress. Since the sg_group structures aren't free'd until the nfsd threads are killed (shutdown or ??), I don't think this can happen during normal operation. OR 2 - The xprt structure that referenced the sg_group was free'd prematurely and the sg pointer was bogus. If it was #2, I think I can come up with a simple patch to avoid this. (Basically acquire a reference count on the xprt structure during the socket upcall.) rick
(In reply to Rick Macklem from comment #18) > Andriy Gapon, did your crash occur when the machine was being shut down (or the nfsd threads were being killed off)? Yes, it did. I am told that this was easy to reproduce by SIGKILL to nfsd.
Well, you should never kill the nfsd with SIGKILL. The way that is intended to be "safe" is to send SIGUSR1 to the "(master)" nfsd. It will shut things down and kill off the other threads. I'm not sure that I know of any way to make SIGKILL safe. I'm pretty sure "man nfsd" says this, but maybe it needs more emphasis? I also believe that /etc/rc.d/nfsd does this correctly. rick
Oh, and if you get a crash when shutting down the nfsd threads via "kill -USR1 <pid-master-nfsd>", then let us know, since something is still broken. (If you are curious, the "(master)" nfsd is the one that gets new TCP connections and it must stop doing that before the kernel threads are terminated. Otherwise you can easily get a socket upcall after the data structures have been free'd.) rick
(In reply to Rick Macklem from comment #20) Rick, thank you very much for the explanation! I knew that nfsd processes were special as they 'lend their stacks to kernel' or something like that. But I failed to realise that that put restrictions in the signals as well. I should also explain that kill -9 was not used to shutdown nfsd or as a replacement for the normal nfsd management. It was used just to demonstrate the problem. I think that originally the problem happened when gdb was used on an nfsd process. I understand that the nfsd processes are special. But the situation seems to be a bit fragile. The current design is old and proven. But perhaps we could switch to using kernel processes or maybe we could mark the nfsd processes with a special flag somehow as to prevent them being killed SIGKILL or stopped with SIGSTOP (i.e. prevent normal signal delivery for all signals). Lastly, just to clarify, should we avoid using debuggers / SIGSTOP with nfsd?
Created attachment 179512 [details] add svcpool_close so that svcpool_destroy doesn't get called when nfsd threads are killed This patch adds a new function to the server krpc called svcpool_close(). It is similar to svcpool_destroy(), but does not free the data structures, so that the pool can be used again. This function is then used instead of svcpool_destroy(), svcpool_create() when the nfsd threads are killed. These crashes are caused because the data structures were free'd by svcpool_destroy() when the nfsd threads were killed off (or signalled somehow). By avoiding the svcpool_destroy() call, the crashes should be avoided.
Please test the patch I just attached. (4th one) I think it might make the code less fragile to nfsd threads being signalled. I have not been able to create a crash with the patch during limited testing. Since avg@'s crash occurred with the other patches, I have reopened the PR.
Created attachment 179661 [details] add svcpool_close (cleaned up version) This patch implements the same logic as 179512, but is cleaned up by factoring out the code common to svcpool_destroy() and svcpool_close() and placing it in svcpool_cleanup(). Semantically equivalent to 179512 for testing.
(In reply to Rick Macklem from comment #25) Rick, thank you very much! We tested this patch and it makes nfsd more robust with respect to SIGKILL. We haven't seen any regressions.
A commit references this bug: Author: avg Date: Tue Feb 14 17:49:08 UTC 2017 New revision: 313735 URL: https://svnweb.freebsd.org/changeset/base/313735 Log: add svcpool_close to handle killed nfsd threads This patch adds a new function to the server krpc called svcpool_close(). It is similar to svcpool_destroy(), but does not free the data structures, so that the pool can be used again. This function is then used instead of svcpool_destroy(), svcpool_create() when the nfsd threads are killed. PR: 204340 Reported by: Panzura Approved by: rmacklem Obtained from: rmacklem MFC after: 1 week Changes: head/sys/fs/nfsserver/nfs_nfsdkrpc.c head/sys/rpc/svc.c head/sys/rpc/svc.h
A commit references this bug: Author: avg Date: Tue Feb 21 09:29:11 UTC 2017 New revision: 314033 URL: https://svnweb.freebsd.org/changeset/base/314033 Log: MFC r313735: add svcpool_close to handle killed nfsd threads PR: 204340 Reported by: Panzura Approved by: rmacklem Obtained from: rmacklem Changes: _U stable/11/ stable/11/sys/fs/nfsserver/nfs_nfsdkrpc.c stable/11/sys/rpc/svc.c stable/11/sys/rpc/svc.h
A commit references this bug: Author: avg Date: Tue Feb 21 09:29:47 UTC 2017 New revision: 314034 URL: https://svnweb.freebsd.org/changeset/base/314034 Log: MFC r313735: add svcpool_close to handle killed nfsd threads PR: 204340 Reported by: Panzura Reviewed by: rmacklem Approved by: rmacklem Changes: _U stable/10/ stable/10/sys/fs/nfsserver/nfs_nfsdkrpc.c stable/10/sys/rpc/svc.c stable/10/sys/rpc/svc.h
Rick, thank you again!
And thanks go to you for doing the testing and commits.