Bug 233622 - panic: page not present fault when stopping VIMAGE jail on 12.0-RC2, netgraph
Summary: panic: page not present fault when stopping VIMAGE jail on 12.0-RC2, netgraph
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net mailing list
URL:
Keywords: crash, panic, vimage
Depends on:
Blocks:
 
Reported: 2018-11-29 08:19 UTC by Jordan Boland
Modified: 2019-09-03 23:37 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jordan Boland 2018-11-29 08:19:29 UTC
This is reproducible for me - any time I stop this jail the system panics.  I appreciate your patience as I am new to kernel debugging, so if I have omitted necessary information it is out of ignorance and not malice.  :-)

===============================================

Unread portion of the kernel message buffer:
<6>in6_purgeaddr: err=65, destination address delete failed
ng node ng0_unifi_1 needs NGF_REALLY_DIE


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff8263dba6
stack pointer           = 0x28:0xfffffe008caeb6c0
frame pointer           = 0x28:0xfffffe008caeb6e0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 60579 (jail)
trap number             = 12
panic: page fault
cpuid = 0
time = 1543479102
KDB: stack backtrace:
#0 0xffffffff80be74a7 at kdb_backtrace+0x67
#1 0xffffffff80b9b093 at vpanic+0x1a3
#2 0xffffffff80b9aee3 at panic+0x43
#3 0xffffffff8107394f at trap_fatal+0x35f
#4 0xffffffff810739a9 at trap_pfault+0x49
#5 0xffffffff81072fce at trap+0x29e
#6 0xffffffff8104e865 at calltrap+0x8
#7 0xffffffff80ca0dd5 at ether_ifdetach+0x35
#8 0xffffffff80caab14 at vlan_clone_destroy+0x24
#9 0xffffffff80c9ea26 at if_clone_destroyif+0x116
#10 0xffffffff80c9f338 at if_clone_detach+0xc8
#11 0xffffffff80cc7b3c at vnet_destroy+0x13c
#12 0xffffffff80b63480 at prison_deref+0x2b0
#13 0xffffffff80b64d04 at sys_jail_remove+0x364
#14 0xffffffff81074429 at amd64_syscall+0x369
#15 0xffffffff8104f14d at fast_syscall_common+0x101
Uptime: 1m36s
Dumping 768 out of 16178 MB:..3%..11%..21%..32%..42%..53%..61%..71%..82%..92%

__curthread () at ./machine/pcpu.h:230
230     ./machine/pcpu.h: No such file or directory.
(kgdb) list *0xffffffff8263dba6
0xffffffff8263dba6 is in ng_ether_detach (/usr/src/sys/netgraph/ng_ether.c:367).
362     /usr/src/sys/netgraph/ng_ether.c: No such file or directory.
(kgdb) backtrace
#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80b9ac7b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:446
#3  0xffffffff80b9b0f3 in vpanic (fmt=<optimized out>, ap=0xfffffe008bcf2410) at /usr/src/sys/kern/kern_shutdown.c:872
#4  0xffffffff80b9aee3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:799
#5  0xffffffff8107394f in trap_fatal (frame=0xfffffe008bcf2600, eva=0) at /usr/src/sys/amd64/amd64/trap.c:929
#6  0xffffffff810739a9 in trap_pfault (frame=0xfffffe008bcf2600, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765
#7  0xffffffff81072fce in trap (frame=0xfffffe008bcf2600) at /usr/src/sys/amd64/amd64/trap.c:441
#8  <signal handler called>
#9  ng_ether_detach (ifp=0xfffff800038a6800) at /usr/src/sys/netgraph/ng_ether.c:367
#10 0xffffffff80ca0dd5 in ether_ifdetach (ifp=0xfffff800038a6800) at /usr/src/sys/net/if_ethersubr.c:981
#11 0xffffffff80caab14 in vlan_clone_destroy (ifc=0xfffff80013c55600, ifp=0xfffff800038a6800)
    at /usr/src/sys/net/if_vlan.c:1106
#12 0xffffffff80c9ea26 in if_clone_destroyif (ifc=0xfffff80013c55600, ifp=0xfffff800038a6800)
    at /usr/src/sys/net/if_clone.c:330
#13 0xffffffff80c9f338 in if_clone_detach (ifc=0xfffff80013c55600) at /usr/src/sys/net/if_clone.c:451
#14 0xffffffff80cc7b3c in vnet_sysuninit () at /usr/src/sys/net/vnet.c:597
#15 vnet_destroy (vnet=0xfffff80013c7dd00) at /usr/src/sys/net/vnet.c:284
#16 0xffffffff80b63480 in prison_deref (pr=0xffffffff81b0b3c0 <prison0>, flags=19) at /usr/src/sys/kern/kern_jail.c:2634
#17 0xffffffff80b64d04 in sys_jail_remove (td=<optimized out>, uap=<optimized out>) at /usr/src/sys/kern/kern_jail.c:2257
#18 0xffffffff81074429 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#19 amd64_syscall (td=0xfffff80117757000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1076
#20 <signal handler called>
#21 0x000000080030f0aa in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffea28
Comment 1 Kristof Provost freebsd_committer 2018-11-29 08:26:49 UTC
Can you describe your setup? It looks like you might have a vlan interface involved somewhere, but knowing how it's all set up will likely make reproducing this easier.
Comment 2 Jordan Boland 2018-11-29 08:37:57 UTC
Yes, I can go into some more detail about the networking.

On the host system, only igb1 is currently active.  It is a trunk interface, so on the host I have igb1.1 configured.

I am utilizing Devin Teske's jng to create a netgraph bridge to igb1, exposing the trunked interface to the jail.  This jail only needs 1 VLAN, but I anticipate adding others later that will utilize more, and this seemed more elegant than cloning the VLAN interfaces individually.

In the jail, I am replicate the same setup as the host to access the tagged interface.

host rc.conf:
=============================
ifconfig_igb1="up"
vlans_igb1="1"
ifconfig_igb1_1="192.168.1.2 netmask 255.255.255.0"
=============================


jail rc.conf:
=============================
vlans_ng0_unifi="1"
ifconfig_ng0_unifi="up"
ifconfig_ng0_unifi_1="inet 192.168.1.3 netmask 255.255.255.0"
=============================


For this particular jail the configuration is overkill (I suppose I could just bridge to igb1.1 on the host).  But I am proving out this strategy for some of the other services that I will need to host on this machine later.  Otherwise, I will have to go back to the strategy of mangling multiple FIBs.
Comment 3 Bjoern A. Zeeb freebsd_committer 2018-11-29 11:33:29 UTC
I can probably have a quick look this evening unless anyone beats me to it.
Looking at the backtrace I have a suspicion of what's going on.
Comment 4 Jordan Boland 2018-12-04 09:03:01 UTC
Hi Bjoern & Kristof,

I wanted to provide some additional information that I have gathered today.  I have upgraded this system to 12.0-RC3 and can confirm the issue remains.  I also believe it is related to the netgraph module.  I have converted a jail to use jib and mod if_bridge/if_epair.  I can now stop the jail without experiencing the kernel panic.

I hope this helps confirm or direct your suspicions.  Let me know if I can do any additional tests.  I will look into it further when I can, but it may be some time before I can do so, and it is entirely possible that the networking code exceeds my skill in C.

Best,

Jordan
Comment 5 Bjoern A. Zeeb freebsd_committer 2019-01-15 23:20:05 UTC
Sorry, I got side-tracked while I was looking at this.  Release it to net@ in case someone else beats me to fixing it.
Comment 6 Eugene Grosbein freebsd_committer 2019-01-16 09:11:45 UTC
This PR lack many valuable technical details. If the problem is really in netgraph, please you describe used nodes and their hooks and settings because just notice of some "jng" (whatever it is) not enough. Plain ng0 is p2p-interface and vlan is not. You should also supply any additional details that may be relevand and do not forget to show output of "ngctl list" when jail is up and running, before is panices kernel at shutdown.
Comment 7 Arne Steinkamm 2019-09-03 23:37:09 UTC
I can reproduce this panic with 12.0-RELEASE-p7 r350232 and a pretty forward out-of-the-handbook Jail setup.

This should be enough to get a nasty panic:

Host:
/etc/rc.conf:
[...]
vlans_em0="vlcx0"
create_args_vlcx0="vlan 18"
ifconfig_em0="up"
ifconfig_vlcx0="inet 10.8.8.110 netmask 255.255.255.0"
[...]
jail_enable="YES"
jail_confwarn="YES"
jail_parallel_start="NO"
jail_list="jv"
jail_reverse_stop="YES"

---------------------------------------------------------------------

/etc/jail.conf:
exec.start = "/bin/sh /etc/rc";
exec.stop = "/bin/sh /etc/rc.shutdown";
exec.clean;

jv {
        host.hostname = "julesverne.stk.cx";
        path = "/var/local/prison/jv";
        exec.clean;
        exec.system_user = "root";
        exec.jail_user = "root";
        vnet;
        exec.clean;
        vnet.interface = "ng0_jv";
        exec.system_user = "root";
        exec.jail_user = "root";
        exec.prestart += "/l/om/sbin/jng bridge jv em0";
        exec.poststop += "/l/om/sbin/jng shutdown jv";

        # Standard stuff
        exec.consolelog = "/var/local/log/jails/jv_console.log";
        mount.devfs;          #mount devfs
        allow.raw_sockets;    #allow ping-pong
        devfs_ruleset="5";    #devfs ruleset for this jail
        mount.devfs;
}

--------------------------------------------------------------------

Jail /etc/rc.conf:
[...]
ifconfig_ng0_jv="up"
vlans_ng0_jv="jjvcx0"
create_args_jjvcx0="vlan 18"
ifconfig_jjvcx0="inet 10.8.8.190 netmask 255.255.255.0"
[...]


/l/om/sbin/jng is a copy of /usr/src/share/examples/jails/jng

This should be everything you need to get exact the panic described in this bug report.