Created attachment 201173 [details] vnet_epair_test.sh: Script for reproducing vnet jail epair destroy panic When creating an epair interface pair for a VNET enabled jail, and then using ifconfig within this jail, the kernel will often panic later when destroying the jail and finally the epair interface again. However this will not happen when ifconfig is not used within the jail or when it is used outside of the jail, and it will not happen every time. But when it happens, it always happens at the moment the ifconfig destroy epair is done. This has been tested and reproduced on 12.0-RELEASE-p2 and 13.0-CURRENT r343065. I have included a script which reproduces this. It is based on an older script which tested for a similar issue, and I changed it so that it will test this 999 times, with an optional 'panic' argument for triggering the critical ifconfig command that makes the difference here. With the panic argument it will reliably panic my system on every run, at worst after a couple hundred loops or so (perhaps it is some kind of race condition?). Without the panic argument the system never crashes. I have also included the kernel trace I obtained from the 13.0-CURRENT system, and can supply a kernel memory dump if you need it. So what side effect would this innocent ifconfig command have that it affects a later ifconfig destroy command? It also does not matter which interface you query with it, like when you run ifconfig lo0 or something else, as long as I use ifconfig at least once I can trigger this.
Created attachment 201174 [details] trace_13.0-CURRENT-r343065.txt: kernel trace
Interesting fact after doing some more testing: All is well also when I do not remove the jails, or when I remove the jail *after* destroying the epair interface. Only when I remove the jail *before* destroying the epair interface *and* I run the ifconfig command inside the jail, I can trigger the panic. I hope I provided enough info, let me know if I can test and/or provide anything else to pinpoint the issue here.
With r349853 I don't get a panic with your script, but if I assign an IP inside the jail (jexec <id> ifconfig inet 1.2.3.4) instead of just listing the interfaces, it panics on destroy. (kgdb) #0 __curthread () at /space/system/usr_src/sys/amd64/include/pcpu.h:246 #1 doadump (textdump=1) at /space/system/usr_src/sys/kern/kern_shutdown.c:392 #2 0xffffffff8050cf70 in kern_reboot (howto=260) at /space/system/usr_src/sys/kern/kern_shutdown.c:479 #3 0xffffffff8050d3e9 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /space/system/usr_src/sys/kern/kern_shutdown.c:905 #4 0xffffffff8050d123 in panic (fmt=<unavailable>) at /space/system/usr_src/sys/kern/kern_shutdown.c:832 #5 0xffffffff807e758c in trap_fatal (frame=0xfffffe01598227c0, eva=0) at /space/system/usr_src/sys/amd64/amd64/trap.c:943 #6 0xffffffff807e698c in trap (frame=0xfffffe01598227c0) at /space/system/usr_src/sys/amd64/amd64/trap.c:221 #7 <signal handler called> #8 0xffffffff805f2045 in strncmp (s1=<optimized out>, s2=<optimized out>, n=<optimized out>) at /space/system/usr_src/sys/libkern/strncmp.c:44 #9 0xffffffff80605d31 in ifunit_ref (name=0xfffffe0159822a20 "panic_test1b") at /space/system/usr_src/sys/net/if.c:2434 #10 0xffffffff80607ef8 in ifioctl (so=0xfffff809a1afd368, cmd=3223349536, data=0xfffffe0159822a20 "panic_test1b", td=0xfffff8014c83e5a0) at /space/system/usr_src/sys/net/if.c:3093 #11 0xffffffff8057658d in fo_ioctl (fp=<optimized out>, com=3223349536, data=0xfffff800020e2180, active_cred=0x0, td=0xfffff8014c83e5a0) at /space/system/usr_src/sys/sys/file.h:333 #12 kern_ioctl (td=0xfffff8014c83e5a0, fd=3, com=3223349536, data=0xfffff800020e2180 "") at /space/system/usr_src/sys/kern/sys_generic.c:800 #13 0xffffffff805762ad in sys_ioctl (td=0xfffff8014c83e5a0, uap=0xfffff8014c83e968) at /space/system/usr_src/sys/kern/sys_generic.c:712 #14 0xffffffff807e801a in syscallenter (td=0xfffff8014c83e5a0) at /space/system/usr_src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #15 amd64_syscall (td=0xfffff8014c83e5a0, traced=0) at /space/system/usr_src/sys/amd64/amd64/trap.c:1181
This is almost certainly the same problem as the one discussed in #238870 and https://reviews.freebsd.org/D20868. The patches in https://reviews.freebsd.org/D20868 and https://reviews.freebsd.org/D20869 work around the panic, but are not fully correct fixes.
I have had the same problem quite early on. I believe it is a bug in the VNET cleanup code. It has an easy workaround, which works quite well in my setup: Before destroying the interface, remove it from the jail (maybe use a prestop hook in the jail.conf). Use this command on the host: ifconfig $interfaceName -vnet $jailName It will remove the interface from the jail's VNET. Then you can destroy the epair on the host.
The problem is still persistent in recent CURRENT ( FreeBSD 13.0-CURRENT #26 r356437: Tue Jan 7 07:19:34 CET 2020 amd64). See also PR 238326 and PR 219901, a bug known since 2017.
^Triage: Track earliest reported/reproducible/affected branch
My case seems similar : * Using 12.2-RELEASE * Jail defined through ezjail * using vnet, and jib (/usr/share/examples/jails/jib) to manage the interface ezjail-admin (one)start works without problem, if logged as root ezjail-admin (one) stop works However when logged as another user, sudo ezjail-admin (one)stop or su -; ezjail-admin (one)stop provoque a panic (pagefault) of the host. I join the crashinfo output, definition of the jail, rc.conf of host and jail. If needed I can provide the vmcrash file or even the virtualbox diskimage used to reproduce the bug
Created attachment 219930 [details] bug through ezjail : host rc.conf
Created attachment 219931 [details] bug through ezjail : jail definition
Created attachment 219932 [details] bug through ezjail : jail rc.conf
Created attachment 219933 [details] bug through ezjail : crashinfo output
(In reply to freebsd from comment #8) I have tested it some time ago and it looks like ezjail is not able to shut down the guest system in the correct way. If the child interfaces of epairb in the jail are destroyed before the withdrawal of epair from the jail and jail shutdown, then panic doesn't occur. From the other hand, ezjail officially doesn't support vnet and is unmaintained since a while.
(In reply to Marek Zarychta from comment #13) This isn't an ezjail bug. It's a kernel issue. I'm working on a fix. Some discussion in https://reviews.freebsd.org/D27279 but that's not going to be the final fix.
A commit references this bug: Author: kp Date: Tue Dec 1 16:24:00 UTC 2020 New revision: 368237 URL: https://svnweb.freebsd.org/changeset/base/368237 Log: if: Fix panic when destroying vnet and epair simultaneously When destroying a vnet and an epair (with one end in the vnet) we often panicked. This was the result of the destruction of the epair, which destroys both ends simultaneously, happening while vnet_if_return() was moving the struct ifnet to its home vnet. This can result in a freed ifnet being re-added to the home vnet V_ifnet list. That in turn panics the next time the ifnet is used. Prevent this race by ensuring that vnet_if_return() cannot run at the same time as if_detach() or epair_clone_destroy(). PR: 238870, 234985, 244703, 250870 MFC after: 2 weeks Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D27378 Changes: head/sys/net/if.c
A commit references this bug: Author: kp Date: Tue Dec 15 15:33:29 UTC 2020 New revision: 368663 URL: https://svnweb.freebsd.org/changeset/base/368663 Log: MFC r368237: if: Fix panic when destroying vnet and epair simultaneously When destroying a vnet and an epair (with one end in the vnet) we often panicked. This was the result of the destruction of the epair, which destroys both ends simultaneously, happening while vnet_if_return() was moving the struct ifnet to its home vnet. This can result in a freed ifnet being re-added to the home vnet V_ifnet list. That in turn panics the next time the ifnet is used. Prevent this race by ensuring that vnet_if_return() cannot run at the same time as if_detach() or epair_clone_destroy(). PR: 238870, 234985, 244703, 250870 Sponsored by: Modirum MDPay Changes: _U stable/12/ stable/12/sys/net/if.c
Thanks for the fix. It looks promising, but the panic still occurs when the jail is stopped without removing all child interfaces of epairb within the jail. Perhaps it's step forward in right direction, but still no way to use VLAN subintrfaces inside VNET jails. Tested on FreeBSD 12.2-STABLE r368664.
(In reply to Marek Zarychta from comment #17) What setup do you use and what panic do you see?
Since a while, I am testing setup with epair(4) bridged to LACP lagg(4) with a few VLANs. I am able to create, utilise and destroy vlan(4) subinterfaces on epairb within the VNET jail. The only drawback is that all vlan(4) interfaces created on epairb have to be destroyed prior to stoping the VNET jail. If it's done manually then everything works fine, if the jail is stopped by sysutils/ezjail then panic occurs. I am aware that sysutils/ezjail is not actively maintained, neither capable to support VNET framework. Please don't get me wrong, I am not complaining on the patch, but I believed that it solves all issues regarding VNET jail panics which was the wrong assumption.
(In reply to Marek Zarychta from comment #19) Yes, and if you'd describe your setup and show the panic you're running into maybe we could fix that problem too. The if_vlan:basic test does this: it creates a vlan on top of an epair (actually in two jails, to do a basic plan test) and then the jails and epairs get destroyed. That does not panic. So, clearly you're doing something different, so please tell us what that is!
Thanks for the patch. It solves the issue of the clean removal of orphaned interfaces. I had a deeper look into this and it came out, that netgraph(3) was the culprit. In the meantime, the kernel was upgraded to 12.2-STABLE r368671. I confirm that with netgraph modules not loaded panic doesn't happen. This machine has swap on ZFS zvol so I was not able to get a local core dump. Network interfaces (VLAN over LACP lagg(4)) doesn't allow to utilize netdump(4) server, but with slow motion on the serial console, I was able to get the panic. I can share the screencasts if still relevant.
(In reply to Marek Zarychta from comment #21) Yes! Panics are relevant! Explain how you trigger it and show the crashdump! Please!
(In reply to Kristof Provost from comment #22) Prior to the panic, such messages appear: ng node vlan0 needs NGF_REALLY_DIE ng node vlan1 needs NGF_REALLY_DIE These messages regard to the interfaces created atop epairb in the jail. I have sent you link to the screencast but don't want to disclose it here. Thank you for the patience and for solving this old bug. It looks like VNET jails can be now widely supported and gain the ability to use vlan(4) interfaces when netgraph(3) is not required on the host.
(In reply to Marek Zarychta from comment #23) You're running into #233622, which is a different bug.
A commit in branch releng/12.1 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e0c15f45abd4bd5165e11b557a8c90d0faf5cfeb commit e0c15f45abd4bd5165e11b557a8c90d0faf5cfeb Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2021-01-18 21:55:53 +0000 Commit: Ed Maste <emaste@FreeBSD.org> CommitDate: 2021-01-29 00:58:55 +0000 MFC r368237: if: Fix panic when destroying vnet and epair simultaneously When destroying a vnet and an epair (with one end in the vnet) we often panicked. This was the result of the destruction of the epair, which destroys both ends simultaneously, happening while vnet_if_return() was moving the struct ifnet to its home vnet. This can result in a freed ifnet being re-added to the home vnet V_ifnet list. That in turn panics the next time the ifnet is used. Prevent this race by ensuring that vnet_if_return() cannot run at the same time as if_detach() or epair_clone_destroy(). PR: 238870, 234985, 244703, 250870 Sponsored by: Modirum MDPay Approved by: so sys/net/if.c | 147 +++++++++++++++++++++++++++++++++++++------------------ sys/net/if_var.h | 24 ++------- 2 files changed, 104 insertions(+), 67 deletions(-)
A commit in branch releng/12.2 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e682b62c96e94c60d830e4414215032e0d4f8dad commit e682b62c96e94c60d830e4414215032e0d4f8dad Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2020-09-12 16:33:05 +0000 Commit: Ed Maste <emaste@FreeBSD.org> CommitDate: 2021-01-29 01:14:24 +0000 MFC r368237: if: Fix panic when destroying vnet and epair simultaneously When destroying a vnet and an epair (with one end in the vnet) we often panicked. This was the result of the destruction of the epair, which destroys both ends simultaneously, happening while vnet_if_return() was moving the struct ifnet to its home vnet. This can result in a freed ifnet being re-added to the home vnet V_ifnet list. That in turn panics the next time the ifnet is used. Prevent this race by ensuring that vnet_if_return() cannot run at the same time as if_detach() or epair_clone_destroy(). PR: 238870, 234985, 244703, 250870 Sponsored by: Modirum MDPay Approved by: so sys/net/if.c | 147 +++++++++++++++++++++++++++++++++++++------------------ sys/net/if_var.h | 24 ++------- 2 files changed, 104 insertions(+), 67 deletions(-)