Bug 234985 - epair: Kernel panic when destroying epair interface of vnet jail after using ifconfig inside the jail
Summary: epair: Kernel panic when destroying epair interface of vnet jail after using ...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: amd64 Any
: Normal Affects Some People
Assignee: freebsd-net (Nobody)
Keywords: crash, needs-patch, needs-qa, vimage
Depends on:
Reported: 2019-01-16 00:57 UTC by Henno Schooljan
Modified: 2020-09-29 00:21 UTC (History)
12 users (show)

See Also:
koobs: mfc-stable12?

vnet_epair_test.sh: Script for reproducing vnet jail epair destroy panic (2.42 KB, application/x-sh)
2019-01-16 00:57 UTC, Henno Schooljan
no flags Details
trace_13.0-CURRENT-r343065.txt: kernel trace (1.90 KB, text/plain)
2019-01-16 00:58 UTC, Henno Schooljan
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Henno Schooljan 2019-01-16 00:57:34 UTC
Created attachment 201173 [details]
vnet_epair_test.sh: Script for reproducing vnet jail epair destroy panic

When creating an epair interface pair for a VNET enabled jail, and then using ifconfig within this jail, the kernel will often panic later when destroying the jail and finally the epair interface again. However this will not happen when ifconfig is not used within the jail or when it is used outside of the jail, and it will not happen every time. But when it happens, it always happens at the moment the ifconfig destroy epair is done.

This has been tested and reproduced on 12.0-RELEASE-p2 and 13.0-CURRENT r343065.

I have included a script which reproduces this. It is based on an older script which tested for a similar issue, and I changed it so that it will test this 999 times, with an optional 'panic' argument for triggering the critical ifconfig command that makes the difference here.
With the panic argument it will reliably panic my system on every run, at worst after a couple hundred loops or so (perhaps it is some kind of race condition?). Without the panic argument the system never crashes.

I have also included the kernel trace I obtained from the 13.0-CURRENT system, and can supply a kernel memory dump if you need it.

So what side effect would this innocent ifconfig command have that it affects a later ifconfig destroy command? It also does not matter which interface you query with it, like when you run ifconfig lo0 or something else, as long as I use ifconfig at least once I can trigger this.
Comment 1 Henno Schooljan 2019-01-16 00:58:43 UTC
Created attachment 201174 [details]
trace_13.0-CURRENT-r343065.txt: kernel trace
Comment 2 Henno Schooljan 2019-01-16 01:20:04 UTC
Interesting fact after doing some more testing: All is well also when I do not remove the jails, or when I remove the jail *after* destroying the epair interface.

Only when I remove the jail *before* destroying the epair interface *and* I run the ifconfig command inside the jail, I can trigger the panic.

I hope I provided enough info, let me know if I can test and/or provide anything else to pinpoint the issue here.
Comment 3 Alexander Leidinger freebsd_committer 2019-07-09 18:14:52 UTC
With r349853 I don't get a panic with your script, but if I assign an IP inside the jail (jexec <id> ifconfig inet instead of just listing the interfaces, it panics on destroy.

(kgdb) #0  __curthread () at /space/system/usr_src/sys/amd64/include/pcpu.h:246
#1  doadump (textdump=1) at /space/system/usr_src/sys/kern/kern_shutdown.c:392
#2  0xffffffff8050cf70 in kern_reboot (howto=260)
    at /space/system/usr_src/sys/kern/kern_shutdown.c:479
#3  0xffffffff8050d3e9 in vpanic (fmt=<optimized out>, ap=<optimized out>)
    at /space/system/usr_src/sys/kern/kern_shutdown.c:905
#4  0xffffffff8050d123 in panic (fmt=<unavailable>)
    at /space/system/usr_src/sys/kern/kern_shutdown.c:832
#5  0xffffffff807e758c in trap_fatal (frame=0xfffffe01598227c0, eva=0)
    at /space/system/usr_src/sys/amd64/amd64/trap.c:943
#6  0xffffffff807e698c in trap (frame=0xfffffe01598227c0)
    at /space/system/usr_src/sys/amd64/amd64/trap.c:221
#7  <signal handler called>
#8  0xffffffff805f2045 in strncmp (s1=<optimized out>, s2=<optimized out>,
    n=<optimized out>) at /space/system/usr_src/sys/libkern/strncmp.c:44
#9  0xffffffff80605d31 in ifunit_ref (name=0xfffffe0159822a20 "panic_test1b")
    at /space/system/usr_src/sys/net/if.c:2434
#10 0xffffffff80607ef8 in ifioctl (so=0xfffff809a1afd368, cmd=3223349536,
    data=0xfffffe0159822a20 "panic_test1b", td=0xfffff8014c83e5a0)
    at /space/system/usr_src/sys/net/if.c:3093
#11 0xffffffff8057658d in fo_ioctl (fp=<optimized out>, com=3223349536,
    data=0xfffff800020e2180, active_cred=0x0, td=0xfffff8014c83e5a0)
    at /space/system/usr_src/sys/sys/file.h:333
#12 kern_ioctl (td=0xfffff8014c83e5a0, fd=3, com=3223349536,
    data=0xfffff800020e2180 "")
    at /space/system/usr_src/sys/kern/sys_generic.c:800
#13 0xffffffff805762ad in sys_ioctl (td=0xfffff8014c83e5a0,
    uap=0xfffff8014c83e968) at /space/system/usr_src/sys/kern/sys_generic.c:712
#14 0xffffffff807e801a in syscallenter (td=0xfffff8014c83e5a0)
    at /space/system/usr_src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#15 amd64_syscall (td=0xfffff8014c83e5a0, traced=0)
    at /space/system/usr_src/sys/amd64/amd64/trap.c:1181
Comment 4 Kristof Provost freebsd_committer 2019-07-09 18:27:57 UTC
This is almost certainly the same problem as the one discussed in #238870 and https://reviews.freebsd.org/D20868.

The patches in https://reviews.freebsd.org/D20868 and https://reviews.freebsd.org/D20869 work around the panic, but are not fully correct fixes.
Comment 5 Rocco 2019-09-26 16:42:17 UTC
I have had the same problem quite early on. I believe it is a bug in the VNET cleanup code.
It has an easy workaround, which works quite well in my setup:

Before destroying the interface, remove it from the jail (maybe use a prestop hook in the jail.conf). Use this command on the host:

ifconfig $interfaceName -vnet $jailName

It will remove the interface from the jail's VNET. Then you can destroy the epair on the host.
Comment 6 O. Hartmann 2020-01-08 09:48:28 UTC
The problem is still persistent in recent CURRENT ( FreeBSD 13.0-CURRENT #26 r356437: Tue Jan  7 07:19:34 CET 2020 amd64).

See also PR 238326 and PR 219901, a bug known since 2017.
Comment 7 Kubilay Kocak freebsd_committer freebsd_triage 2020-01-25 03:18:24 UTC
^Triage: Track earliest reported/reproducible/affected branch