I have experienced multiple hard reboots of my FreeBSD 11.2-RELEASE system which occurs when I try to deactivate a Wireguard interface with wg-quick. This does not always occur, on occasion I am able to activate & deactivate all interfaces without issue. I am unable to determine which specific conditions cause the hard reboot. After creating/configuring the appropriate keypairs & configuration files, a new interface is created on a specific routing table with: # setfib $FIB route add default $DEFAULTGATEWAY # setfib $FIB wg-quick up wg$N After the connection has been active for a short period, it is deactivated with: # setfib $FIB wg-quick down wg$N The last few messages seen in debug.log prior to the reboot are of the form: Dec 12 10:50:16 $hostname kernel: ifa_maintain_loopback_route: deletion failed for interface wg0: 3 The configuration files are simple: [Interface] Address = ${PRIVATE_IP}/24 PrivateKey = $PRIVATEKEY DNS = 127.0.0.1 [Peer] PublicKey = $PUBLICKEY Endpoint = ${PUBLIC_IP}:51820 AllowedIPs = 0.0.0.0/0 PersistentKeepalive = 30
There is currently a report at OPNsense including a stacktrace which seems to be a kernel panic in UFS code. https://github.com/opnsense/plugins/pull/1049 I tried to reproduce that on multiple environments but without luck so far. It would help a lot if you could check if you have a kernel crashdump in /var/crash and can obtain a stacktrace. The handbook helps with instructions: https://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html
If it helps, I am using ZFS-on-root - I do not have any UFS filesystems. I have checked /var/crash and the directory is empty other than a 'minfree' file which contains only the text "2048". 'dumpdev' was set to "AUTO" in my /etc/rc.conf 'dumpdir' was not defined, I have now defined it as '/var/crash' and set the permissions to 700. I will check for any kernel dumps if a reboot occurs. I did notice that if I deactivated the Wireguard interfaces almost immediately after activation, they would deactivate without issues. Otherwise, when running about a dozen Wireguard instances which have been active for more than a few minutes, deactivating Wireguard interfaces sequentially could result in a hard reboot in an unpredictable manner - some interfaces will deactivate fine but one will cause a hard reboot.
I have seen the reboot happen twice more - still nothing in /var/crash
Here's a new report with a bit of crashdump: https://twitter.com/genneko217/status/1090218028480921600 https://gist.github.com/genneko/755f6160ba2594c5945b8fc18940ea71 HTH Michael
This appears to be a kernel bug rather than an issue with the port itself.
Looks like the issue is seen on both 11.2 and 12.0?
I was able to reproduce on 11.2 and 12.0, sometimes it takes up to 100 daemon restarts, sometimes after second one.
I see this on FreeBSD 12.0-RELEASE-p3 without setfib as well. Every 'wg-quick down wg0' command has resulted in a hard reboot. wireguard-0.0.20190123 wireguard-go-0.0.20181222
12-STABLE panicked once for me while deactivating wireguard interface (in fib 0). I haven't seen such an issue on 13-STABLE yet, despite the fact I am using wireguard here more often. I can provide neither crash dumps nor additional information.
(In reply to Marek Zarychta from comment #9) 13-CURRENT not STABLE of course
This stacktrace is from https://gist.github.com/genneko/755f6160ba2594c5945b8fc18940ea71 and I copied it here in case it vanishes on github. dumped core - see /var/crash/vmcore.0 Tue Jan 29 11:09:03 UTC 2019 FreeBSD 12.0-RELEASE-p2 FreeBSD 12.0-RELEASE-p2 GENERIC amd64 panic: page fault GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: <6>in_scrubprefix: err=65, prefix delete failed <6>wg0: deletion failed: 3 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80cc3fe3 stack pointer = 0x28:0xfffffe001de86300 frame pointer = 0x28:0xfffffe001de86450 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 3813 (wireguard-go) trap number = 12 panic: page fault cpuid = 1 time = 1548760075 KDB: stack backtrace: #0 0xffffffff80be7977 at kdb_backtrace+0x67 #1 0xffffffff80b9b563 at vpanic+0x1a3 #2 0xffffffff80b9b3b3 at panic+0x43 #3 0xffffffff8107496f at trap_fatal+0x35f #4 0xffffffff810749c9 at trap_pfault+0x49 #5 0xffffffff81073fee at trap+0x29e #6 0xffffffff8104f315 at calltrap+0x8 #7 0xffffffff80de0f73 at in6_purgeaddr+0x463 #8 0xffffffff80c9662f at if_purgeaddrs+0x21f #9 0xffffffff80ca79c1 at tunclose+0x1f1 #10 0xffffffff80a518ca at devfs_close+0x3ba #11 0xffffffff811f89b8 at VOP_CLOSE_APV+0x78 #12 0xffffffff80c7b6bf at vn_close1+0xdf #13 0xffffffff80c7a3c0 at vn_closefile+0x50 #14 0xffffffff80a5224c at devfs_close_f+0x2c #15 0xffffffff80b4363a at _fdrop+0x1a #16 0xffffffff80b466e4 at closef+0x244 #17 0xffffffff80b43b69 at closefp+0x99 Uptime: 5m14s Dumping 190 out of 2005 MB:..9%..17%..26%..34%..43%..51%..68%..76%..85%..93% Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/modules/vboxguest.ko...done. Loaded symbols for /boot/modules/vboxguest.ko Reading symbols from /boot/kernel/intpm.ko...Reading symbols from /usr/lib/debug//boot/kernel/intpm.ko.debug...done. done. Loaded symbols for /boot/kernel/intpm.ko Reading symbols from /boot/kernel/smbus.ko...Reading symbols from /usr/lib/debug//boot/kernel/smbus.ko.debug...done. done. Loaded symbols for /boot/kernel/smbus.ko #0 doadump (textdump=<value optimized out>) at pcpu.h:230 230 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=<value optimized out>) at pcpu.h:230 #1 0xffffffff80b9b14b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:446 #2 0xffffffff80b9b5c3 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:872 #3 0xffffffff80b9b3b3 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:799 #4 0xffffffff8107496f in trap_fatal (frame=0xfffffe001de86240, eva=0) at /usr/src/sys/amd64/amd64/trap.c:929 #5 0xffffffff810749c9 in trap_pfault (frame=0xfffffe001de86240, usermode=0) at pcpu.h:230 #6 0xffffffff81073fee in trap (frame=0xfffffe001de86240) at /usr/src/sys/amd64/amd64/trap.c:441 #7 0xffffffff8104f315 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232 #8 0xffffffff80cc3fe3 in rtsock_addrmsg (cmd=2, ifa=0xfffff80062f71200, fibnum=-1) at /usr/src/sys/net/rtsock.c:1337 #9 0xffffffff80de0f73 in in6_purgeaddr (ifa=0xfffff80062f71200) at /usr/src/sys/netinet6/in6.c:193 #10 0xffffffff80c9662f in if_purgeaddrs (ifp=0xfffff80062845000) at /usr/src/sys/net/if.c:995 #11 0xffffffff80ca79c1 in tunclose (dev=<value optimized out>, foo=<value optimized out>, bar=<value optimized out>, td=<value optimized out>) at /usr/src/sys/net/if_tun.c:478 #12 0xffffffff80a518ca in devfs_close (ap=<value optimized out>) at /usr/src/sys/fs/devfs/devfs_vnops.c:650 #13 0xffffffff811f89b8 in VOP_CLOSE_APV (vop=<value optimized out>, a=0xfffffe001de86788) at vnode_if.c:534 #14 0xffffffff80c7b6bf in vn_close1 (vp=0xfffff8006291ad20, flags=7, file_cred=0xfffff80062849a00, td=0xfffff8001da8d000, keep_ref=false) at vnode_if.h:225 #15 0xffffffff80c7a3c0 in vn_closefile (fp=0xfffff8006a031050, td=<value optimized out>) at /usr/src/sys/kern/vfs_vnops.c:1563 #16 0xffffffff80a5224c in devfs_close_f (fp=0xfffff8006a031050, td=<value optimized out>) at /usr/src/sys/fs/devfs/devfs_vnops.c:669 #17 0xffffffff80b4363a in _fdrop (fp=0xfffff8006a031050, td=<value optimized out>) at file.h:353 #18 0xffffffff80b466e4 in closef (fp=0xfffff8006a031050, td=0xfffff8001da8d000) at /usr/src/sys/kern/kern_descrip.c:2528 #19 0xffffffff80b43b69 in closefp (fdp=0xfffff8006a04d450, fd=<value optimized out>, fp=0xfffff8006a031050, td=0xfffff8001da8d000, holdleaders=0) at /usr/src/sys/kern/kern_descrip.c:1199 #20 0xffffffff81075449 in amd64_syscall (td=0xfffff8001da8d000, traced=0) at subr_syscall.c:135 #21 0xffffffff8104fbfd in fast_syscall_common () at /usr/src/sys/amd64/amd64/exception.S:504 #22 0x000000000048bdb0 in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb)
Set version to original (and earliest) version issue was identified in.
(In reply to Kubilay Kocak from comment #12) IMO tagging an issue with the earliest version where it's reproducible rather than the latest is likely to somewhat reduce the likelihood that it is noticed / triaged / addressed by the appropriate developer. (For this specific PR it doesn't matter as that's already happened.)
Can somebody test this and see if it "fixes" the issue: https://git.zx2c4.com/wireguard-go/patch/?id=3fafe92382d6231ee066f62ac946fbc909aeac5d This obviously doesn't address a rather grave kernel bug, but perhaps it's enough to work around the issue for now.
Sorry, wrong link. Try this, rather: https://git.zx2c4.com/wireguard-go/patch/?h=jd/rancid-freebsd-hack
(In reply to Jason A. Donenfeld from comment #15) As I've been recently playing with WireGuard on FreeBSD again, I quickly tested the patch on a 4-core FreeBSD 12.0p3 VM and found it almost worked around the kernel issue. With the patched wireguard-go, only 2 out of 25000+ "service wireguard restart" caused kernel panic, while panic occured every 5 to 50 restarts without the patch. As a side note, I also noticed in my recent testing - No kernel panic on single-core FreeBSD 12.0p3 / 13-CURRENT VMs with the unpatched wireguard-go-0.0.20181222 / 20190409 and 10000+ restarts. - No kernel panic on a 4-core FreeBSD 13-CURRENT r346132 VM with the unpatched wireguard-go-0.0.20190409 and 40000+ restarts. A stacktrace of the panic with the patch is as follows. (Panics without the patch are the same as the one mentioned in comment #4 and #11.) Hope this helps. dumped core - see /var/crash/vmcore.2 Sat Apr 20 12:07:44 UTC 2019 FreeBSD 12.0-RELEASE-p3 FreeBSD 12.0-RELEASE-p3 GENERIC amd64 panic: page fault GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: page fault cpuid = 1 time = 1555762025 KDB: stack backtrace: #0 0xffffffff80be7977 at kdb_backtrace+0x67 #1 0xffffffff80b9b563 at vpanic+0x1a3 #2 0xffffffff80b9b3b3 at panic+0x43 #3 0xffffffff8107496f at trap_fatal+0x35f #4 0xffffffff810749c9 at trap_pfault+0x49 #5 0xffffffff81073fee at trap+0x29e #6 0xffffffff8104f435 at calltrap+0x8 #7 0xffffffff80ca90d7 at tunifioctl+0x257 #8 0xffffffff80c9a072 at ifhwioctl+0x2f2 #9 0xffffffff80c9c05f at ifioctl+0x45f #10 0xffffffff80c04f3d at kern_ioctl+0x26d #11 0xffffffff80c04c5e at sys_ioctl+0x15e #12 0xffffffff81075449 at amd64_syscall+0x369 #13 0xffffffff8104fd1d at fast_syscall_common+0x101 Uptime: 22m44s Dumping 171 out of 469 MB:..10%..19%..29%..38%..47%..57%..66%..75%..85%..94% Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /usr/lib/debug//boot/kernel/zfs.ko.debug...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /usr/lib/debug//boot/kernel/opensolaris.ko.debug...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/modules/vboxguest.ko...done. Loaded symbols for /boot/modules/vboxguest.ko Reading symbols from /boot/kernel/intpm.ko...Reading symbols from /usr/lib/debug//boot/kernel/intpm.ko.debug...done. done. Loaded symbols for /boot/kernel/intpm.ko Reading symbols from /boot/kernel/smbus.ko...Reading symbols from /usr/lib/debug//boot/kernel/smbus.ko.debug...done. done. Loaded symbols for /boot/kernel/smbus.ko #0 doadump (textdump=<value optimized out>) at pcpu.h:230 230 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump (textdump=<value optimized out>) at pcpu.h:230 #1 0xffffffff80b9b14b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:446 #2 0xffffffff80b9b5c3 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:872 #3 0xffffffff80b9b3b3 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:799 #4 0xffffffff8107496f in trap_fatal (frame=0xfffffe000fe94590, eva=1040) at /usr/src/sys/amd64/amd64/trap.c:929 #5 0xffffffff810749c9 in trap_pfault (frame=0xfffffe000fe94590, usermode=0) at pcpu.h:230 #6 0xffffffff81073fee in trap (frame=0xfffffe000fe94590) at /usr/src/sys/amd64/amd64/trap.c:441 #7 0xffffffff8104f435 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:232 #8 0xffffffff80b7ad4c in __mtx_lock_sleep (c=0xfffff8001045bc98, v=4) at /usr/src/sys/kern/kern_mutex.c:577 #9 0xffffffff80ca90d7 in tunifioctl (ifp=<value optimized out>, cmd=<value optimized out>, data=0xfffff80002f98c00 "wg0") at /usr/src/sys/net/if_tun.c:543 #10 0xffffffff80c9a072 in ifhwioctl (cmd=<value optimized out>, ifp=<value optimized out>, data=<value optimized out>, td=0xfffff80002f22000) at /usr/src/sys/net/if.c:2881 #11 0xffffffff80c9c05f in ifioctl (so=0xfffff8000969b6d0, cmd=3274795323, data=<value optimized out>, td=0xfffff80002f22000) at /usr/src/sys/net/if.c:3086 #12 0xffffffff80c04f3d in kern_ioctl (td=0xfffff80002f22000, fd=3, com=3274795323, data=<value optimized out>) at file.h:330 #13 0xffffffff80c04c5e in sys_ioctl (td=0xfffff80002f22000, uap=0xfffff80002f223c0) at /usr/src/sys/kern/sys_generic.c:712 #14 0xffffffff81075449 in amd64_syscall (td=0xfffff80002f22000, traced=0) at subr_syscall.c:135 #15 0xffffffff8104fd1d in fast_syscall_common () at /usr/src/sys/amd64/amd64/exception.S:504 #16 0x000000080046611a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb)
Interesting. That looks like *another*, separate, race. Yikes, this kernel driver... Let's see if this works around it: https://git.zx2c4.com/WireGuard/patch/?id=69ffe5b7f58ce6f55dda2b9e13ff364a0d9b3dcd
(That commit should be combined with the previous one: https://git.zx2c4.com/wireguard-go/patch/?h=jd/rancid-freebsd-hack )
I applied both patches on OPNsense 19.1 (based on HardenedBSD 11.2) and it seems to work. Nice work!
Now it crashed, after around 1000 restarts, sorry.
Stack trace, please.
Alright, I've now spent a bit of time tracking down these race conditions and reproducing them in a VM. It looks like there are two separate kernel race conditions: - SIOCGIFSTATUS races with SIOCIFDESTROY, which was being triggered by the call to ifconfig(8) in the route monitor script. This should be now fixed by: https://git.zx2c4.com/WireGuard/patch/?id=90c546598c0a9d9da82c138c6c9c1396c453368e - The asynchronous callback of IPv6 Link Local DAD conflict with both SIOCIFDESTROY and the /dev/tun cloning mechanism, resulting in a wide variety of crashes with dangling pointers on IPv6 address lists. I'm able to trigger this by just running `while true; do ifconfig tun0 create; ifconfig tun0 destroy; done` and after a while, there's one sort of crash. This should now be fixed by: https://git.zx2c4.com/wireguard-go/patch/?id=bb42ec7d185ab5f5cd3867ac1258edff86b7f307 I'd appreciate it if Michael Muenz could test these and make sure it fixes his own reproducer. After, Bernhard Froehlich can backport those two packages into the ports tree. Finally, THIS BUG SHOULD REMAIN OPEN until the FreeBSD kernel team actually gets the man power to fix these race conditions; the above only represents a few workarounds but does not address the underlying issue of this bug at all.
Also, given the success of `while true; do ifconfig tun0 create; ifconfig tun0 destroy; done` at crashing the kernel, I'm pretty sure you can remove "triggered by net/wireguard" from the title of the bug report.
(while true; do ifconfig tun0 create; ifconfig tun0 destroy; done)& (while true; do for i in {1..30}; do ifconfig tun0 & done; wait; done)& Potentially more vicious reproducer.
(In reply to Jason A. Donenfeld from comment #22) Thank you for various workarounds. Those patches work so far. I noticed wg-quick down occasionally hangs at piperd, but now it seems to be patched in the upsteam master. Really quick! I'm testing it now.
Indeed those patches I posted here are (already) out of date. But upstream master now has patches that appear to workaround the kernel bugs of this report. I think we're done here on the WireGuard front and it should be time to ship the fixes in the official package. However, do pipe up if you're able to crash things again in relation to WireGuard with yet-even-more FreeBSD kernel race conditions, and I'll take out the hatchet and try to hack around it.
A commit references this bug: Author: decke Date: Tue Apr 23 12:33:45 UTC 2019 New revision: 499754 URL: https://svnweb.freebsd.org/changeset/ports/499754 Log: net/wireguard: work around numerous kernel panics on shutdown in tun(4) There are numerous race conditions. But even this will crash it: while true; do ifconfig tun0 create; ifconfig tun0 destroy; done It seems like LLv6 is related, which we're not using anyway, so explicitly disable it on the interface. PR: 233955 Changes: head/net/wireguard-go/Makefile head/net/wireguard-go/files/ head/net/wireguard-go/files/patch-bb42ec7d185ab5f5cd3867ac1258edff86b7f307
A commit references this bug: Author: decke Date: Tue Apr 23 12:36:30 UTC 2019 New revision: 499755 URL: https://svnweb.freebsd.org/changeset/ports/499755 Log: net/wireguard: workaround SIOCGIFSTATUS race in FreeBSD kernel PR: 233955 Changes: head/net/wireguard/Makefile head/net/wireguard/files/patch-b3e1a1b07d3631bd816f9bfc27452a89dc29fa28
A commit references this bug: Author: kevans Date: Tue Apr 23 17:28:28 UTC 2019 New revision: 346602 URL: https://svnweb.freebsd.org/changeset/base/346602 Log: tun(4): Defer clearing TUN_OPEN until much later tun destruction will not continue until TUN_OPEN is cleared. There are brief moments in tunclose where the mutex is dropped and we've already cleared TUN_OPEN, so tun_destroy would be able to proceed while we're in the middle of cleaning up the tun still. tun_destroy should be blocked until these parts (address/route purges, mostly) are complete. PR: 233955 MFC after: 2 weeks Changes: head/sys/net/if_tun.c
r346602 probably fixes the ip6_purgeaddr panic noted here. The tun mtx is dropped while we purge addresses/routes and the TUN_OPEN flag was cleared, giving tundestroy a chance to rototill the interface somewhere in the middle of that. I've opened https://reviews.freebsd.org/D20027 to try and settle the ioctl race.
A commit references this bug: Author: kevans Date: Thu Apr 25 12:44:08 UTC 2019 New revision: 346670 URL: https://svnweb.freebsd.org/changeset/base/346670 Log: tun/tap: close race between destroy/ioctl handler It seems that there should be a better way to handle this, but this seems to be the more common approach and it should likely get replaced in all of the places it happens... Basically, thread 1 is in the process of destroying the tun/tap while thread 2 is executing one of the ioctls that requires the tun/tap mutex and the mutex is destroyed before the ioctl handler can acquire it. This is only one of the races described/found in PR 233955. PR: 233955 Reviewed by: ae MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20027 Changes: head/sys/net/if_tap.c head/sys/net/if_tun.c
> I'm pretty sure you can remove "triggered by net/wireguard" from the title of the bug report. Indeed, updated - I think it's good to keep wireguard in the title as it seems it might be the common user-facing impact, but shouldn't imply that wireguard is somehow at fault.
A commit references this bug: Author: kevans Date: Thu May 9 03:51:35 UTC 2019 New revision: 347378 URL: https://svnweb.freebsd.org/changeset/base/347378 Log: MFC r346602, r346670-r346671, r347183: tun/tap race fixes r346602: tun(4): Defer clearing TUN_OPEN until much later tun destruction will not continue until TUN_OPEN is cleared. There are brief moments in tunclose where the mutex is dropped and we've already cleared TUN_OPEN, so tun_destroy would be able to proceed while we're in the middle of cleaning up the tun still. tun_destroy should be blocked until these parts (address/route purges, mostly) are complete. r346670: tun/tap: close race between destroy/ioctl handler It seems that there should be a better way to handle this, but this seems to be the more common approach and it should likely get replaced in all of the places it happens... Basically, thread 1 is in the process of destroying the tun/tap while thread 2 is executing one of the ioctls that requires the tun/tap mutex and the mutex is destroyed before the ioctl handler can acquire it. This is only one of the races described/found in PR 233955. r346671: tun(4): Don't allow open of open or dying devices Previously, a pid check was used to prevent open of the tun(4); this works, but may not make the most sense as we don't prevent the owner process from opening the tun device multiple times. The potential race described near tun_pid should not be an issue: if a tun(4) is to be handed off, its fd has to have been sent via control message or some other mechanism that duplicates the fd to the receiving process so that it may set the pid. Otherwise, the pid gets cleared when the original process closes it and you have no effective handoff mechanism. Close up another potential issue with handing a tun(4) off by not clobbering state if the closer isn't the controller anymore. If we want some state to be cleared, we should do that a little more surgically. Additionally, nothing prevents a dying tun(4) from being "reopened" in the middle of tun_destroy as soon as the mutex is unlocked, quickly leading to a bad time. Return EBUSY if we're marked for destruction, as well, and the consumer will need to deal with it. The associated character device will be destroyed in short order. r347183: geom: fix initialization order There's a race between the initialization of devsoftc.mtx (by devinit) and the creation of the geom worker thread g_run_events, which calls devctl_queue_data_f. Both of those are initialized at SI_SUB_DRIVERS and SI_ORDER_FIRST, which means the geom worked thread can be created before the mutex has been initialized, leading to the panic below: wpanic: mtx_lock() of spin mutex (null) @ /usr/home/osstest/build.135317.build-amd64-freebsd/freebsd/sys/kern/subr_bus.c:620 cpuid = 3 time = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe003b968710 vpanic() at vpanic+0x19d/frame 0xfffffe003b968760 panic() at panic+0x43/frame 0xfffffe003b9687c0 __mtx_lock_flags() at __mtx_lock_flags+0x145/frame 0xfffffe003b968810 devctl_queue_data_f() at devctl_queue_data_f+0x6a/frame 0xfffffe003b968840 g_dev_taste() at g_dev_taste+0x463/frame 0xfffffe003b968a00 g_load_class() at g_load_class+0x1bc/frame 0xfffffe003b968a30 g_run_events() at g_run_events+0x197/frame 0xfffffe003b968a70 fork_exit() at fork_exit+0x84/frame 0xfffffe003b968ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe003b968ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic [ thread pid 13 tid 100029 ] Stopped at kdb_enter+0x3b: movq $0,kdb_why Fix this by initializing geom at SI_ORDER_SECOND instead of SI_ORDER_FIRST. PR: 233955 Changes: _U stable/11/ stable/11/sys/geom/geom.h stable/11/sys/net/if_tap.c stable/11/sys/net/if_tun.c _U stable/12/ stable/12/sys/geom/geom.h stable/12/sys/net/if_tap.c stable/12/sys/net/if_tun.c
tun(4) should now be in decent enough shape on all supported branches. If anyone has an unpatched wireguard package laying around and some time to try and reproduce any of these issues on either of the -STABLE snapshots being built tomorrow (or -STABLE built past r347378, of course) I'd greatly appreciate it. I'm tentatively closing this as FIXED, since the more obvious races have been addressed and MFC'd.