Created attachment 222473 [details] panic backtrace When a (configured) wg interface is destroyed, the system panics. #ifconfig wg0 destroy panic: page fault cpuid = 1 time = 1613404414 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00e7cb3540 vpanic() at vpanic+0x181/frame 0xfffffe00e7cb3590 panic() at panic+0x43/frame 0xfffffe00e7cb35f0 trap_fatal() at trap_fatal+0x387/frame 0xfffffe00e7cb3650 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00e7cb36b0 trap() at trap+0x27d/frame 0xfffffe00e7cb37c0 calltrap() at calltrap+0x8/frame 0xfffffe00e7cb37c0 --- trap 0xc, rip = 0xffffffff805011a1, rsp = 0xfffffe00e7cb3890, rbp = 0xfffffe00e7cb38b0 --- grouptask_block() at grouptask_block+0x11/frame 0xfffffe00e7cb38b0 taskqgroup_detach() at taskqgroup_detach+0x18/frame 0xfffffe00e7cb38e0 wg_detach() at wg_detach+0xc8/frame 0xfffffe00e7cb3910 iflib_pseudo_deregister() at iflib_pseudo_deregister+0xfe/frame 0xfffffe00e7cb3940 if_clone_destroyif() at if_clone_destroyif+0x1b5/frame 0xfffffe00e7cb3980 if_clone_destroy() at if_clone_destroy+0x195/frame 0xfffffe00e7cb39d0 ifioctl() at ifioctl+0x4a4/frame 0xfffffe00e7cb3a90 kern_ioctl() at kern_ioctl+0x26d/frame 0xfffffe00e7cb3b00 sys_ioctl() at sys_ioctl+0xf6/frame 0xfffffe00e7cb3bc0 amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe00e7cb3cf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00e7cb3cf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80041f5aa, rsp = 0x7fffffffe148, rbp = 0x7fffffffe160 --- Some more data in the attached file, I'm not sure if it helps. vmcore and matching debug kernel/sources are available on demand. version: FreeBSD 14.0-CURRENT, commit 88db1cc9f197a376817ce27ba269348666bbd4b7 (freebsd/main, freebsd/HEAD, main) with minor local changes
It also happens on stable/13-7973db046, plain GENERIC, no changes Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xa4 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80c53541 stack pointer = 0x28:0xfffffe00a8548790 frame pointer = 0x28:0xfffffe00a85487b0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 753 (ifconfig) trap number = 12 panic: page fault cpuid = 1 time = 1613565900 KDB: stack backtrace: #0 0xffffffff80c56875 at kdb_backtrace+0x65 #1 0xffffffff80c09441 at vpanic+0x181 #2 0xffffffff80c092b3 at panic+0x43 #3 0xffffffff810891a7 at trap_fatal+0x387 #4 0xffffffff810891ff at trap_pfault+0x4f #5 0xffffffff8108885d at trap+0x27d #6 0xffffffff8105fe28 at calltrap+0x8 #7 0xffffffff80c54758 at taskqgroup_detach+0x18 #8 0xffffffff829249f8 at wg_detach+0xc8 #9 0xffffffff80d33c1e at iflib_pseudo_deregister+0xfe #10 0xffffffff80d20ff5 at if_clone_destroyif+0x1b5 #11 0xffffffff80d20de6 at if_clone_destroy+0x196 #12 0xffffffff80d1e31e at ifioctl+0x44e #13 0xffffffff80c7621d at kern_ioctl+0x26d #14 0xffffffff80c75f16 at sys_ioctl+0xf6 #15 0xffffffff81089aac at amd64_syscall+0x10c #16 0xffffffff8106074e at fast_syscall_common+0xf8
I see same panic on my FreeBSD Current box. I've tried bisect this problem. It have been caused since 38bfc6dee33bedb290e1ea2540f97a86fe3caee0 commit. https://cgit.freebsd.org/src/commit/?id=38bfc6dee33bedb290e1ea2540f97a86fe3caee0
(In reply to Yuichiro NAITO from comment #2) Thanks for tracking it down, looking.
The problem is basically that the added an IFDI_DETACH() call to iflib_pseudo_deregister(), but that was already taken care of by the DEVICE_DETACH method of a device created by iflib_clone_create(). It looks like the device is only created in the first place because iflib wants a device_t associated with driver contexts. Even with the commit reverted you can crash a system with "devctl detach wg0" or so. I think iflib_pseudo_detach() should rely on the ifnet cloning framework to handle teardown and simply return an error if it detects that something is detaching the device without having destroyed the interface first.
https://reviews.freebsd.org/D28774
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0f9544d03e89d180f94a7a84b110ec7d2b6c625a commit 0f9544d03e89d180f94a7a84b110ec7d2b6c625a Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-02-19 22:08:34 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-02-19 22:10:41 +0000 iflib: Fix detach of pseudo interfaces In commit 38bfc6dee33b we added an IFDI_DETACH() call to iflib_pseudo_deregister() since it looked like it was missing. One is present in the error-handling path of iflib_pseudo_register(). However, the detach actually comes from the DEVICE_DETACH() method for the above-mentioned device_t, so now we're calling IFDI_DETACH() twice when destroying a pseudo interface. Fix the problem by not calling IFDI_DETACH() from the device detach routine. This way we can ensure that iflib de-initialization always happens in a consistent order. It also ensures that you can't do silly things like "devctl detach <pseudo ifnet>", which would previously detach the driver without tearing down the corresponding ifnet. PR: 253541 Reviewed by: erj MFC after: 1 week Fixes: 38bfc6dee33b Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28774 sys/net/iflib_clone.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
(In reply to commit-hook from comment #6) Mark, thanks for your effort and fast response! Meanwhile running freebsd/main 504e64af32ba6c62fdcc894a3b1da76061c64796 I can confirm, that the problem has been solved, the system does not longer panic. From my point of view the ticket may be closed. BTW, the interface destruction was my attempt to reconfigure the interface, I could not find another way. But that is another problem... Thanks! Frank
(In reply to Frank Behrens from comment #7) Thanks for confirming and sorry for the breakage. I'll close the PR once this change has been merged to stable/13.
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=4664afc05402abb8e572975ec65210bdcd8b2ea1 commit 4664afc05402abb8e572975ec65210bdcd8b2ea1 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2021-02-19 22:08:34 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2021-02-24 15:20:55 +0000 iflib: Fix detach of pseudo interfaces In commit 38bfc6dee33b we added an IFDI_DETACH() call to iflib_pseudo_deregister() since it looked like it was missing. One is present in the error-handling path of iflib_pseudo_register(). However, the detach actually comes from the DEVICE_DETACH() method for the above-mentioned device_t, so now we're calling IFDI_DETACH() twice when destroying a pseudo interface. Fix the problem by not calling IFDI_DETACH() from the device detach routine. This way we can ensure that iflib de-initialization always happens in a consistent order. It also ensures that you can't do silly things like "devctl detach <pseudo ifnet>", which would previously detach the driver without tearing down the corresponding ifnet. PR: 253541 Reviewed by: erj Fixes: 38bfc6dee33b Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28774 (cherry picked from commit 0f9544d03e89d180f94a7a84b110ec7d2b6c625a) sys/net/iflib_clone.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-)
Thanks for the report.
Can anyone please merge this into releng/13.0 since it fixes an easy to trigger panic?
(In reply to Bernhard Froehlich from comment #11) The regression (see comment 2) should not be present in 13.0. Can you show the exact panic you're seeing?
I could not trigger it with FreeBSD 13.0-RC1 or 12.2p4 # ifconfig wg0 destroy the process hangs in D-state. ------------- # truss -p 28527 shows no activity ------------- # kill -9 28527 doesn't kill the process ------------- procstat 28527 PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM 28527 3161 28527 92965 92965 1 chris tun_cond FreeBSD ELF64 ifconfig ------------- But if after the process "ifconfig wg0 destroy" hangs I run: service wireguard stop then the "ifconfig wg0 destroy" stops running and truss shows: close(3) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) exit(0x0) process exit, rval = 0
(In reply to Mark Johnston from comment #12) I was able to reliably panic 13.0-BETA4 with an ifconfig wg0 destroy if a wireguard connection was established already. Since this bug description looked very similar I assumed it is the same. Just tried now on a test VM with RC1 and I cannot reproduce this panic at the moment (but found two other panics, will create PRs for them).
(In reply to Bernhard Froehlich from comment #14) My wg0 interface was up but with no connection established, maybe that's why it didn't panic.
I created PR 254114 with my details but the stacktrace looks different so definitely not the same issue.