Bug 233622 - panic: page not present fault when stopping VIMAGE jail on 12.0-RC2, netgraph
Summary: panic: page not present fault when stopping VIMAGE jail on 12.0-RC2, netgraph
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Bjoern A. Zeeb
URL:
Keywords: crash, panic, vimage
Depends on:
Blocks:
 
Reported: 2018-11-29 08:19 UTC by Jordan Boland
Modified: 2018-12-13 22:03 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jordan Boland 2018-11-29 08:19:29 UTC
This is reproducible for me - any time I stop this jail the system panics.  I appreciate your patience as I am new to kernel debugging, so if I have omitted necessary information it is out of ignorance and not malice.  :-)

===============================================

Unread portion of the kernel message buffer:
<6>in6_purgeaddr: err=65, destination address delete failed
ng node ng0_unifi_1 needs NGF_REALLY_DIE


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff8263dba6
stack pointer           = 0x28:0xfffffe008caeb6c0
frame pointer           = 0x28:0xfffffe008caeb6e0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 60579 (jail)
trap number             = 12
panic: page fault
cpuid = 0
time = 1543479102
KDB: stack backtrace:
#0 0xffffffff80be74a7 at kdb_backtrace+0x67
#1 0xffffffff80b9b093 at vpanic+0x1a3
#2 0xffffffff80b9aee3 at panic+0x43
#3 0xffffffff8107394f at trap_fatal+0x35f
#4 0xffffffff810739a9 at trap_pfault+0x49
#5 0xffffffff81072fce at trap+0x29e
#6 0xffffffff8104e865 at calltrap+0x8
#7 0xffffffff80ca0dd5 at ether_ifdetach+0x35
#8 0xffffffff80caab14 at vlan_clone_destroy+0x24
#9 0xffffffff80c9ea26 at if_clone_destroyif+0x116
#10 0xffffffff80c9f338 at if_clone_detach+0xc8
#11 0xffffffff80cc7b3c at vnet_destroy+0x13c
#12 0xffffffff80b63480 at prison_deref+0x2b0
#13 0xffffffff80b64d04 at sys_jail_remove+0x364
#14 0xffffffff81074429 at amd64_syscall+0x369
#15 0xffffffff8104f14d at fast_syscall_common+0x101
Uptime: 1m36s
Dumping 768 out of 16178 MB:..3%..11%..21%..32%..42%..53%..61%..71%..82%..92%

__curthread () at ./machine/pcpu.h:230
230     ./machine/pcpu.h: No such file or directory.
(kgdb) list *0xffffffff8263dba6
0xffffffff8263dba6 is in ng_ether_detach (/usr/src/sys/netgraph/ng_ether.c:367).
362     /usr/src/sys/netgraph/ng_ether.c: No such file or directory.
(kgdb) backtrace
#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80b9ac7b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:446
#3  0xffffffff80b9b0f3 in vpanic (fmt=<optimized out>, ap=0xfffffe008bcf2410) at /usr/src/sys/kern/kern_shutdown.c:872
#4  0xffffffff80b9aee3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:799
#5  0xffffffff8107394f in trap_fatal (frame=0xfffffe008bcf2600, eva=0) at /usr/src/sys/amd64/amd64/trap.c:929
#6  0xffffffff810739a9 in trap_pfault (frame=0xfffffe008bcf2600, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765
#7  0xffffffff81072fce in trap (frame=0xfffffe008bcf2600) at /usr/src/sys/amd64/amd64/trap.c:441
#8  <signal handler called>
#9  ng_ether_detach (ifp=0xfffff800038a6800) at /usr/src/sys/netgraph/ng_ether.c:367
#10 0xffffffff80ca0dd5 in ether_ifdetach (ifp=0xfffff800038a6800) at /usr/src/sys/net/if_ethersubr.c:981
#11 0xffffffff80caab14 in vlan_clone_destroy (ifc=0xfffff80013c55600, ifp=0xfffff800038a6800)
    at /usr/src/sys/net/if_vlan.c:1106
#12 0xffffffff80c9ea26 in if_clone_destroyif (ifc=0xfffff80013c55600, ifp=0xfffff800038a6800)
    at /usr/src/sys/net/if_clone.c:330
#13 0xffffffff80c9f338 in if_clone_detach (ifc=0xfffff80013c55600) at /usr/src/sys/net/if_clone.c:451
#14 0xffffffff80cc7b3c in vnet_sysuninit () at /usr/src/sys/net/vnet.c:597
#15 vnet_destroy (vnet=0xfffff80013c7dd00) at /usr/src/sys/net/vnet.c:284
#16 0xffffffff80b63480 in prison_deref (pr=0xffffffff81b0b3c0 <prison0>, flags=19) at /usr/src/sys/kern/kern_jail.c:2634
#17 0xffffffff80b64d04 in sys_jail_remove (td=<optimized out>, uap=<optimized out>) at /usr/src/sys/kern/kern_jail.c:2257
#18 0xffffffff81074429 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#19 amd64_syscall (td=0xfffff80117757000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1076
#20 <signal handler called>
#21 0x000000080030f0aa in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffea28
Comment 1 Kristof Provost freebsd_committer 2018-11-29 08:26:49 UTC
Can you describe your setup? It looks like you might have a vlan interface involved somewhere, but knowing how it's all set up will likely make reproducing this easier.
Comment 2 Jordan Boland 2018-11-29 08:37:57 UTC
Yes, I can go into some more detail about the networking.

On the host system, only igb1 is currently active.  It is a trunk interface, so on the host I have igb1.1 configured.

I am utilizing Devin Teske's jng to create a netgraph bridge to igb1, exposing the trunked interface to the jail.  This jail only needs 1 VLAN, but I anticipate adding others later that will utilize more, and this seemed more elegant than cloning the VLAN interfaces individually.

In the jail, I am replicate the same setup as the host to access the tagged interface.

host rc.conf:
=============================
ifconfig_igb1="up"
vlans_igb1="1"
ifconfig_igb1_1="192.168.1.2 netmask 255.255.255.0"
=============================


jail rc.conf:
=============================
vlans_ng0_unifi="1"
ifconfig_ng0_unifi="up"
ifconfig_ng0_unifi_1="inet 192.168.1.3 netmask 255.255.255.0"
=============================


For this particular jail the configuration is overkill (I suppose I could just bridge to igb1.1 on the host).  But I am proving out this strategy for some of the other services that I will need to host on this machine later.  Otherwise, I will have to go back to the strategy of mangling multiple FIBs.
Comment 3 Bjoern A. Zeeb freebsd_committer 2018-11-29 11:33:29 UTC
I can probably have a quick look this evening unless anyone beats me to it.
Looking at the backtrace I have a suspicion of what's going on.
Comment 4 Jordan Boland 2018-12-04 09:03:01 UTC
Hi Bjoern & Kristof,

I wanted to provide some additional information that I have gathered today.  I have upgraded this system to 12.0-RC3 and can confirm the issue remains.  I also believe it is related to the netgraph module.  I have converted a jail to use jib and mod if_bridge/if_epair.  I can now stop the jail without experiencing the kernel panic.

I hope this helps confirm or direct your suspicions.  Let me know if I can do any additional tests.  I will look into it further when I can, but it may be some time before I can do so, and it is entirely possible that the networking code exceeds my skill in C.

Best,

Jordan