This is reproducible for me - any time I stop this jail the system panics. I appreciate your patience as I am new to kernel debugging, so if I have omitted necessary information it is out of ignorance and not malice. :-) =============================================== Unread portion of the kernel message buffer: <6>in6_purgeaddr: err=65, destination address delete failed ng node ng0_unifi_1 needs NGF_REALLY_DIE Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x0 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff8263dba6 stack pointer = 0x28:0xfffffe008caeb6c0 frame pointer = 0x28:0xfffffe008caeb6e0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 60579 (jail) trap number = 12 panic: page fault cpuid = 0 time = 1543479102 KDB: stack backtrace: #0 0xffffffff80be74a7 at kdb_backtrace+0x67 #1 0xffffffff80b9b093 at vpanic+0x1a3 #2 0xffffffff80b9aee3 at panic+0x43 #3 0xffffffff8107394f at trap_fatal+0x35f #4 0xffffffff810739a9 at trap_pfault+0x49 #5 0xffffffff81072fce at trap+0x29e #6 0xffffffff8104e865 at calltrap+0x8 #7 0xffffffff80ca0dd5 at ether_ifdetach+0x35 #8 0xffffffff80caab14 at vlan_clone_destroy+0x24 #9 0xffffffff80c9ea26 at if_clone_destroyif+0x116 #10 0xffffffff80c9f338 at if_clone_detach+0xc8 #11 0xffffffff80cc7b3c at vnet_destroy+0x13c #12 0xffffffff80b63480 at prison_deref+0x2b0 #13 0xffffffff80b64d04 at sys_jail_remove+0x364 #14 0xffffffff81074429 at amd64_syscall+0x369 #15 0xffffffff8104f14d at fast_syscall_common+0x101 Uptime: 1m36s Dumping 768 out of 16178 MB:..3%..11%..21%..32%..42%..53%..61%..71%..82%..92% __curthread () at ./machine/pcpu.h:230 230 ./machine/pcpu.h: No such file or directory. (kgdb) list *0xffffffff8263dba6 0xffffffff8263dba6 is in ng_ether_detach (/usr/src/sys/netgraph/ng_ether.c:367). 362 /usr/src/sys/netgraph/ng_ether.c: No such file or directory. (kgdb) backtrace #0 __curthread () at ./machine/pcpu.h:230 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff80b9ac7b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:446 #3 0xffffffff80b9b0f3 in vpanic (fmt=<optimized out>, ap=0xfffffe008bcf2410) at /usr/src/sys/kern/kern_shutdown.c:872 #4 0xffffffff80b9aee3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:799 #5 0xffffffff8107394f in trap_fatal (frame=0xfffffe008bcf2600, eva=0) at /usr/src/sys/amd64/amd64/trap.c:929 #6 0xffffffff810739a9 in trap_pfault (frame=0xfffffe008bcf2600, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765 #7 0xffffffff81072fce in trap (frame=0xfffffe008bcf2600) at /usr/src/sys/amd64/amd64/trap.c:441 #8 <signal handler called> #9 ng_ether_detach (ifp=0xfffff800038a6800) at /usr/src/sys/netgraph/ng_ether.c:367 #10 0xffffffff80ca0dd5 in ether_ifdetach (ifp=0xfffff800038a6800) at /usr/src/sys/net/if_ethersubr.c:981 #11 0xffffffff80caab14 in vlan_clone_destroy (ifc=0xfffff80013c55600, ifp=0xfffff800038a6800) at /usr/src/sys/net/if_vlan.c:1106 #12 0xffffffff80c9ea26 in if_clone_destroyif (ifc=0xfffff80013c55600, ifp=0xfffff800038a6800) at /usr/src/sys/net/if_clone.c:330 #13 0xffffffff80c9f338 in if_clone_detach (ifc=0xfffff80013c55600) at /usr/src/sys/net/if_clone.c:451 #14 0xffffffff80cc7b3c in vnet_sysuninit () at /usr/src/sys/net/vnet.c:597 #15 vnet_destroy (vnet=0xfffff80013c7dd00) at /usr/src/sys/net/vnet.c:284 #16 0xffffffff80b63480 in prison_deref (pr=0xffffffff81b0b3c0 <prison0>, flags=19) at /usr/src/sys/kern/kern_jail.c:2634 #17 0xffffffff80b64d04 in sys_jail_remove (td=<optimized out>, uap=<optimized out>) at /usr/src/sys/kern/kern_jail.c:2257 #18 0xffffffff81074429 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #19 amd64_syscall (td=0xfffff80117757000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1076 #20 <signal handler called> #21 0x000000080030f0aa in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffea28
Can you describe your setup? It looks like you might have a vlan interface involved somewhere, but knowing how it's all set up will likely make reproducing this easier.
Yes, I can go into some more detail about the networking. On the host system, only igb1 is currently active. It is a trunk interface, so on the host I have igb1.1 configured. I am utilizing Devin Teske's jng to create a netgraph bridge to igb1, exposing the trunked interface to the jail. This jail only needs 1 VLAN, but I anticipate adding others later that will utilize more, and this seemed more elegant than cloning the VLAN interfaces individually. In the jail, I am replicate the same setup as the host to access the tagged interface. host rc.conf: ============================= ifconfig_igb1="up" vlans_igb1="1" ifconfig_igb1_1="192.168.1.2 netmask 255.255.255.0" ============================= jail rc.conf: ============================= vlans_ng0_unifi="1" ifconfig_ng0_unifi="up" ifconfig_ng0_unifi_1="inet 192.168.1.3 netmask 255.255.255.0" ============================= For this particular jail the configuration is overkill (I suppose I could just bridge to igb1.1 on the host). But I am proving out this strategy for some of the other services that I will need to host on this machine later. Otherwise, I will have to go back to the strategy of mangling multiple FIBs.
I can probably have a quick look this evening unless anyone beats me to it. Looking at the backtrace I have a suspicion of what's going on.
Hi Bjoern & Kristof, I wanted to provide some additional information that I have gathered today. I have upgraded this system to 12.0-RC3 and can confirm the issue remains. I also believe it is related to the netgraph module. I have converted a jail to use jib and mod if_bridge/if_epair. I can now stop the jail without experiencing the kernel panic. I hope this helps confirm or direct your suspicions. Let me know if I can do any additional tests. I will look into it further when I can, but it may be some time before I can do so, and it is entirely possible that the networking code exceeds my skill in C. Best, Jordan
Sorry, I got side-tracked while I was looking at this. Release it to net@ in case someone else beats me to fixing it.
This PR lack many valuable technical details. If the problem is really in netgraph, please you describe used nodes and their hooks and settings because just notice of some "jng" (whatever it is) not enough. Plain ng0 is p2p-interface and vlan is not. You should also supply any additional details that may be relevand and do not forget to show output of "ngctl list" when jail is up and running, before is panices kernel at shutdown.