Summary: | [panic] [kevent] mutex nm_kn_lock not owned, netmap with INVARIANTS+WITNESS | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Harald Schmalzbauer <bugzilla.freebsd> | ||||
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | afedorov, vmaffione | ||||
Priority: | --- | Keywords: | patch | ||||
Version: | 11.0-STABLE | ||||||
Hardware: | Any | ||||||
OS: | Any | ||||||
See Also: | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206053 | ||||||
Attachments: |
|
Description
Harald Schmalzbauer
2017-06-07 16:24:09 UTC
Vincezzo Maffione already tried to find a solution, but he mentioned that help from kevent experts was needed. (You can find the discussion and tested fixes here https://github.com/luigirizzo/netmap/issues/318) Thanks, -harry Like kib@ suggested in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206053, I already tried to replacing 1 with 0 in the sys/dev/netmap/netmap_freebsd.c:netmap_kqfilter() at the call to knlist_add(), which does indeed prevent the initial panic (when the guest boots), but ends up in another panic short afterwards: (quoting my github report) panic: mutex nm_kn_lock owned at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2176 cpuid = 6 KDB: stack backtrace: #0 0xffffffff805cb1a7 at kdb_backtrace+0x67 #1 0xffffffff8058b0e6 at vpanic+0x186 #2 0xffffffff8058b163 at panic+0x43 #3 0xffffffff8056c216 at __mtx_assert+0xc6 now: #4 0xffffffff80545219 at knote+0x39 #5 0xffffffff8041ebc7 at nm_os_selwakeup+0x87 #6 0xffffffff8041c94d at netmap_notify+0x1d #7 0xffffffff8041c701 at netmap_poll+0x821 #8 0xffffffff8041f63c at netmap_knrw+0x6c #9 0xffffffff805443d7 at kqueue_kevent+0x397 #10 0xffffffff80543fd6 at kern_kevent_fp+0x96 #11 0xffffffff80543eef at kern_kevent+0x9f #12 0xffffffff80543cf8 at sys_kevent+0x138 #13 0xffffffff80881b5a at amd64_syscall+0x57a #14 0xffffffff8086563b at Xfast_syscall+0xfb #0 doadump (textdump=<value optimized out>) at pcpu.h:222 #1 0xffffffff8058ab60 in kern_reboot (howto=260) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff8058b120 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff8058b163 in panic (fmt=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff8056c216 in __mtx_assert (c=<value optimized out>, what=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_mutex.c:1013 #5 0xffffffff80545219 in knote (list=<value optimized out>, hint=256, lockflags=0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2034 #6 0xffffffff8041ebc7 in nm_os_selwakeup (si=0xfffffe00088b94c0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:1259 #7 0xffffffff8041c94d in netmap_notify (kring=<value optimized out>, flags=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap.c:2777 #8 0xffffffff8041c701 in netmap_poll (priv=<value optimized out>, events=<value optimized out>, sr=0x0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap.c:2735 #9 0xffffffff8041f63c in netmap_knrw (kn=<value optimized out>, hint=<value optimized out>, events=1) at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:1313 #10 0xffffffff805443d7 in kqueue_kevent (kq=0xfffff8001aa08600, td=0xfffff80141470560, nchanges=<value optimized out>, nevents=<value optimized out>, k_ops=0xfffffe045b5028a0, timeout=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1700 #11 0xffffffff80543fd6 in kern_kevent_fp (td=0xfffff80141470560, fp=<value optimized out>, nchanges=0, nevents=<value optimized out>, k_ops=<value optimized out>, timeout=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1050 #12 0xffffffff80543eef in kern_kevent (td=0xfffff80141470560, fd=6, nchanges=0, nevents=64, k_ops=0xfffffe045b5028a0, timeout=0x0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:993 #13 0xffffffff80543cf8 in sys_kevent (td=0xfffff80141470560, uap=0xfffffe045b502a30) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:925 #14 0xffffffff80881b5a in amd64_syscall (td=0xfffff80141470560, traced=0) at subr_syscall.c:135 #15 0xffffffff8086563b in Xfast_syscall () at /usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:396 #16 0x000000080122813a in ?? () I still have to setup a test environment for this bug, but please note that now stable/11 contains the latest netmap code (as well as stable/12 and HEAD). Created attachment 201116 [details]
kernel trace
(In reply to Vincenzo Maffione from comment #3) I can reproduce the panic with HEAD r342550 (two bhyve vm + VALE switch). This patch fix the panic: Index: sys/dev/netmap/netmap_freebsd.c =================================================================== --- sys/dev/netmap/netmap_freebsd.c (revision 342550) +++ sys/dev/netmap/netmap_freebsd.c (working copy) @@ -1321,7 +1321,12 @@ /* use a non-zero hint to tell the notification from the * call done in kqueue_scan() which uses 0 */ - KNOTE_UNLOCKED(&si->si.si_note, 0x100 /* notification */); + + if (mtx_owned(&si->m)) { + KNOTE_LOCKED(&si->si.si_note, 0x100 /* notification */); + } else { + KNOTE_UNLOCKED(&si->si.si_note, 0x100 /* notification */); + } } void It seems there are several paths to call the function nm_os_selwakeup(): Trace from bhyve: dtrace: script './nm_os_selwakeup.d' matched 1 probe CPU ID FUNCTION:NAME 18 39521 nm_os_selwakeup:entry kernel`netmap_notify+0x1d kernel`nm_vale_flush+0xb70 kernel`netmap_vale_vp_txsync+0x51d kernel`netmap_ioctl+0x4e1 kernel`freebsd_netmap_ioctl+0x88 kernel`devfs_ioctl+0xca kernel`VOP_IOCTL_APV+0x63 kernel`vn_ioctl+0x124 kernel`devfs_ioctl_f+0x1f kernel`kern_ioctl+0x29b kernel`sys_ioctl+0x15d kernel`amd64_syscall+0x276 kernel`0xffffffff8107a81d 20 39521 nm_os_selwakeup:entry kernel`netmap_notify+0x1d kernel`netmap_poll+0x5e6 kernel`netmap_knrw+0x6f kernel`kqueue_kevent+0x56d kernel`kern_kevent_fp+0x97 kernel`kern_kevent+0x9f kernel`kern_kevent_generic+0x70 kernel`sys_kevent+0x61 kernel`amd64_syscall+0x276 kernel`0xffffffff8107a81d The first does not hold the mutex ("nm_kn_lock"), the second does. bhyve trace with userstack: dtrace: script './nm_os_selwakeup.d' matched 1 probe CPU ID FUNCTION:NAME 20 39521 nm_os_selwakeup:entry kernel`netmap_notify+0x1d kernel`nm_vale_flush+0xb70 kernel`netmap_vale_vp_txsync+0x51d kernel`netmap_ioctl+0x4e1 kernel`freebsd_netmap_ioctl+0x88 kernel`devfs_ioctl+0xca kernel`VOP_IOCTL_APV+0x63 kernel`vn_ioctl+0x124 kernel`devfs_ioctl_f+0x1f kernel`kern_ioctl+0x29b kernel`sys_ioctl+0x15d kernel`amd64_syscall+0x276 kernel`0xffffffff8107a81d libc.so.7`ioctl+0xa bhyve`pci_vtnet_netmap_tx+0x91 bhyve`pci_vtnet_proctx+0x1a4 bhyve`pci_vtnet_tx_thread+0x1ce libthr.so.3`0x800689756 19 39521 nm_os_selwakeup:entry kernel`netmap_notify+0x1d kernel`netmap_poll+0x5e6 kernel`netmap_knrw+0x6f kernel`kqueue_kevent+0x56d kernel`kern_kevent_fp+0x97 kernel`kern_kevent+0x9f kernel`kern_kevent_generic+0x70 kernel`sys_kevent+0x61 kernel`amd64_syscall+0x276 kernel`0xffffffff8107a81d libc.so.7`_kevent+0xa bhyve`mevent_dispatch+0x2d4 bhyve`main+0x848 bhyve`_start+0x11b ld-elf.so.1`0x80029c000 Just a quick note that I'm unfortunately busy with completely different things :-( And for the records, here's Vincenzo's approach which I usesd (successfully as far as I remember) back in 11.2: +++ src/sys/dev/netmap/netmap_freebsd.c 2017-06-17 11:22:53.685116000 +0200 @@ -1374,7 +1374,7 @@ kn->kn_fop = (ev == EVFILT_WRITE) ? &netmap_wfiltops : &netmap_rfiltops; kn->kn_hook = priv; - knlist_add(&si->si.si_note, kn, 1); + knlist_add(&si->si.si_note, kn, 0); // XXX unlock(priv) ND("register %p %s td %p priv %p kn %p np_nifp %p kn_fp/fpop %s", na, na->ifp->if_xname, curthread, priv, kn, No skills to contribute and no time to test at the moment, sorry. Thanks for taking care, will get back to it :-) -harry Hi Aleksandr, Thanks a lot for the debugging and the fix. Your patch looks the correct solution to this problem. I tried to make it more compact here https://reviews.freebsd.org/D18876 What do you think? Could you test it? Thanks, Vincenzo This has been fixed by r343344. |