The following panic happens on stable/11, r319654, after creating a vale(4) port as byhve(8)-virtio backend, but only if kernel was compiled with WITNESS and INVARIANTS options: panic: mutex nm_kn_lock not owned at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2169 cpuid = 6 KDB: stack backtrace: #0 0xffffffff805bf327 at kdb_backtrace+0x67 #1 0xffffffff8057f266 at vpanic+0x186 #2 0xffffffff8057f2e3 at panic+0x43 #3 0xffffffff805603d4 at __mtx_assert+0xb4 #4 0xffffffff805395f0 at knlist_add+0x20 #5 0xffffffff80418d80 at netmap_kqfilter+0x110 #6 0xffffffff8045a667 at devfs_kqfilter_f+0x77 #7 0xffffffff8053787b at kqueue_register+0x78b #8 0xffffffff805382a2 at kqueue_kevent+0x92 #9 0xffffffff805381a6 at kern_kevent_fp+0x96 #10 0xffffffff805380bf at kern_kevent+0x9f #11 0xffffffff80537ec8 at sys_kevent+0x138 #12 0xffffffff80875b7a at amd64_syscall+0x57a #13 0xffffffff8085965b at Xfast_syscall+0xfb (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:222 #1 0xffffffff8057ece0 in kern_reboot (howto=260) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff8057f2a0 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff8057f2e3 in panic (fmt=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff805603d4 in __mtx_assert (c=0x0, what=0, file=0x0, line=0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_mutex.c:1000 #5 0xffffffff805395f0 in knlist_add (knl=0xfffffe0008a6c448, kn=0xfffff801be72ee80, islocked=1) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2089 #6 0xffffffff80418d80 in netmap_kqfilter (dev=<value optimized out>, kn=0xfffff801be72ee80) at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:781 #7 0xffffffff8045a667 in devfs_kqfilter_f (fp=0xfffff8002441d3c0, kn=0xfffff801be72ee80) at /usr/local/share/deploy-tools/RELENG_11/src/sys/fs/devfs/devfs_vnops.c:837 #8 0xffffffff8053787b in kqueue_register (kq=0xfffff8001a65ba00, kev=0xfffffe0446ceb650, td=0xfffff8003d11e560, waitok=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1334 #9 0xffffffff805382a2 in kqueue_kevent (kq=0xfffff8001a65ba00, td=0xfffff8003d11e560, nchanges=4, nevents=<value optimized out>, k_ops=0xfffffe0446ceb8a0, timeout=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1019 #10 0xffffffff805381a6 in kern_kevent_fp (td=0xfffff8003d11e560, fp=<value optimized out>, nchanges=4, nevents=<value optimized out>, k_ops=<value optimized out>, timeout=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1050 #11 0xffffffff805380bf in kern_kevent (td=0xfffff8003d11e560, fd=7, nchanges=4, nevents=0, k_ops=0xfffffe0446ceb8a0, timeout=0x0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:993 #12 0xffffffff80537ec8 in sys_kevent (td=0xfffff8003d11e560, uap=0xfffffe0446ceba30) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:925 #13 0xffffffff80875b7a in amd64_syscall (td=0xfffff8003d11e560, traced=0) at subr_syscall.c:135 #14 0xffffffff8085965b in Xfast_syscall () at /usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:396 #15 0x000000080122713a in ?? () Previous frame inner to this frame (corrupt stack?) Without WITNESS and INVARIANTS, the bhyve guest gets booted and runs fine up to some point. Reported via ML also for HEAD netmap code merged to stable, but this trace is from stable/11 netmap code! (https://lists.freebsd.org/pipermail/freebsd-net/2017-June/048124.html) -harry
Vincezzo Maffione already tried to find a solution, but he mentioned that help from kevent experts was needed. (You can find the discussion and tested fixes here https://github.com/luigirizzo/netmap/issues/318) Thanks, -harry
Like kib@ suggested in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=206053, I already tried to replacing 1 with 0 in the sys/dev/netmap/netmap_freebsd.c:netmap_kqfilter() at the call to knlist_add(), which does indeed prevent the initial panic (when the guest boots), but ends up in another panic short afterwards: (quoting my github report) panic: mutex nm_kn_lock owned at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2176 cpuid = 6 KDB: stack backtrace: #0 0xffffffff805cb1a7 at kdb_backtrace+0x67 #1 0xffffffff8058b0e6 at vpanic+0x186 #2 0xffffffff8058b163 at panic+0x43 #3 0xffffffff8056c216 at __mtx_assert+0xc6 now: #4 0xffffffff80545219 at knote+0x39 #5 0xffffffff8041ebc7 at nm_os_selwakeup+0x87 #6 0xffffffff8041c94d at netmap_notify+0x1d #7 0xffffffff8041c701 at netmap_poll+0x821 #8 0xffffffff8041f63c at netmap_knrw+0x6c #9 0xffffffff805443d7 at kqueue_kevent+0x397 #10 0xffffffff80543fd6 at kern_kevent_fp+0x96 #11 0xffffffff80543eef at kern_kevent+0x9f #12 0xffffffff80543cf8 at sys_kevent+0x138 #13 0xffffffff80881b5a at amd64_syscall+0x57a #14 0xffffffff8086563b at Xfast_syscall+0xfb #0 doadump (textdump=<value optimized out>) at pcpu.h:222 #1 0xffffffff8058ab60 in kern_reboot (howto=260) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff8058b120 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff8058b163 in panic (fmt=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff8056c216 in __mtx_assert (c=<value optimized out>, what=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_mutex.c:1013 #5 0xffffffff80545219 in knote (list=<value optimized out>, hint=256, lockflags=0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:2034 #6 0xffffffff8041ebc7 in nm_os_selwakeup (si=0xfffffe00088b94c0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:1259 #7 0xffffffff8041c94d in netmap_notify (kring=<value optimized out>, flags=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap.c:2777 #8 0xffffffff8041c701 in netmap_poll (priv=<value optimized out>, events=<value optimized out>, sr=0x0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap.c:2735 #9 0xffffffff8041f63c in netmap_knrw (kn=<value optimized out>, hint=<value optimized out>, events=1) at /usr/local/share/deploy-tools/RELENG_11/src/sys/dev/netmap/netmap_freebsd.c:1313 #10 0xffffffff805443d7 in kqueue_kevent (kq=0xfffff8001aa08600, td=0xfffff80141470560, nchanges=<value optimized out>, nevents=<value optimized out>, k_ops=0xfffffe045b5028a0, timeout=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1700 #11 0xffffffff80543fd6 in kern_kevent_fp (td=0xfffff80141470560, fp=<value optimized out>, nchanges=0, nevents=<value optimized out>, k_ops=<value optimized out>, timeout=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:1050 #12 0xffffffff80543eef in kern_kevent (td=0xfffff80141470560, fd=6, nchanges=0, nevents=64, k_ops=0xfffffe045b5028a0, timeout=0x0) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:993 #13 0xffffffff80543cf8 in sys_kevent (td=0xfffff80141470560, uap=0xfffffe045b502a30) at /usr/local/share/deploy-tools/RELENG_11/src/sys/kern/kern_event.c:925 #14 0xffffffff80881b5a in amd64_syscall (td=0xfffff80141470560, traced=0) at subr_syscall.c:135 #15 0xffffffff8086563b in Xfast_syscall () at /usr/local/share/deploy-tools/RELENG_11/src/sys/amd64/amd64/exception.S:396 #16 0x000000080122813a in ?? ()
I still have to setup a test environment for this bug, but please note that now stable/11 contains the latest netmap code (as well as stable/12 and HEAD).
Created attachment 201116 [details] kernel trace
(In reply to Vincenzo Maffione from comment #3) I can reproduce the panic with HEAD r342550 (two bhyve vm + VALE switch).
This patch fix the panic: Index: sys/dev/netmap/netmap_freebsd.c =================================================================== --- sys/dev/netmap/netmap_freebsd.c (revision 342550) +++ sys/dev/netmap/netmap_freebsd.c (working copy) @@ -1321,7 +1321,12 @@ /* use a non-zero hint to tell the notification from the * call done in kqueue_scan() which uses 0 */ - KNOTE_UNLOCKED(&si->si.si_note, 0x100 /* notification */); + + if (mtx_owned(&si->m)) { + KNOTE_LOCKED(&si->si.si_note, 0x100 /* notification */); + } else { + KNOTE_UNLOCKED(&si->si.si_note, 0x100 /* notification */); + } } void It seems there are several paths to call the function nm_os_selwakeup(): Trace from bhyve: dtrace: script './nm_os_selwakeup.d' matched 1 probe CPU ID FUNCTION:NAME 18 39521 nm_os_selwakeup:entry kernel`netmap_notify+0x1d kernel`nm_vale_flush+0xb70 kernel`netmap_vale_vp_txsync+0x51d kernel`netmap_ioctl+0x4e1 kernel`freebsd_netmap_ioctl+0x88 kernel`devfs_ioctl+0xca kernel`VOP_IOCTL_APV+0x63 kernel`vn_ioctl+0x124 kernel`devfs_ioctl_f+0x1f kernel`kern_ioctl+0x29b kernel`sys_ioctl+0x15d kernel`amd64_syscall+0x276 kernel`0xffffffff8107a81d 20 39521 nm_os_selwakeup:entry kernel`netmap_notify+0x1d kernel`netmap_poll+0x5e6 kernel`netmap_knrw+0x6f kernel`kqueue_kevent+0x56d kernel`kern_kevent_fp+0x97 kernel`kern_kevent+0x9f kernel`kern_kevent_generic+0x70 kernel`sys_kevent+0x61 kernel`amd64_syscall+0x276 kernel`0xffffffff8107a81d The first does not hold the mutex ("nm_kn_lock"), the second does.
bhyve trace with userstack: dtrace: script './nm_os_selwakeup.d' matched 1 probe CPU ID FUNCTION:NAME 20 39521 nm_os_selwakeup:entry kernel`netmap_notify+0x1d kernel`nm_vale_flush+0xb70 kernel`netmap_vale_vp_txsync+0x51d kernel`netmap_ioctl+0x4e1 kernel`freebsd_netmap_ioctl+0x88 kernel`devfs_ioctl+0xca kernel`VOP_IOCTL_APV+0x63 kernel`vn_ioctl+0x124 kernel`devfs_ioctl_f+0x1f kernel`kern_ioctl+0x29b kernel`sys_ioctl+0x15d kernel`amd64_syscall+0x276 kernel`0xffffffff8107a81d libc.so.7`ioctl+0xa bhyve`pci_vtnet_netmap_tx+0x91 bhyve`pci_vtnet_proctx+0x1a4 bhyve`pci_vtnet_tx_thread+0x1ce libthr.so.3`0x800689756 19 39521 nm_os_selwakeup:entry kernel`netmap_notify+0x1d kernel`netmap_poll+0x5e6 kernel`netmap_knrw+0x6f kernel`kqueue_kevent+0x56d kernel`kern_kevent_fp+0x97 kernel`kern_kevent+0x9f kernel`kern_kevent_generic+0x70 kernel`sys_kevent+0x61 kernel`amd64_syscall+0x276 kernel`0xffffffff8107a81d libc.so.7`_kevent+0xa bhyve`mevent_dispatch+0x2d4 bhyve`main+0x848 bhyve`_start+0x11b ld-elf.so.1`0x80029c000
Just a quick note that I'm unfortunately busy with completely different things :-( And for the records, here's Vincenzo's approach which I usesd (successfully as far as I remember) back in 11.2: +++ src/sys/dev/netmap/netmap_freebsd.c 2017-06-17 11:22:53.685116000 +0200 @@ -1374,7 +1374,7 @@ kn->kn_fop = (ev == EVFILT_WRITE) ? &netmap_wfiltops : &netmap_rfiltops; kn->kn_hook = priv; - knlist_add(&si->si.si_note, kn, 1); + knlist_add(&si->si.si_note, kn, 0); // XXX unlock(priv) ND("register %p %s td %p priv %p kn %p np_nifp %p kn_fp/fpop %s", na, na->ifp->if_xname, curthread, priv, kn, No skills to contribute and no time to test at the moment, sorry. Thanks for taking care, will get back to it :-) -harry
Hi Aleksandr, Thanks a lot for the debugging and the fix. Your patch looks the correct solution to this problem. I tried to make it more compact here https://reviews.freebsd.org/D18876 What do you think? Could you test it? Thanks, Vincenzo
This has been fixed by r343344.