Bug 233622 - panic: page not present fault when stopping VIMAGE jail on 12.0-RC2, netgraph
Summary: panic: page not present fault when stopping VIMAGE jail on 12.0-RC2, netgraph
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net mailing list
URL:
Keywords: crash, panic, vimage
Depends on:
Blocks:
 
Reported: 2018-11-29 08:19 UTC by Jordan Boland
Modified: 2020-02-12 19:56 UTC (History)
9 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jordan Boland 2018-11-29 08:19:29 UTC
This is reproducible for me - any time I stop this jail the system panics.  I appreciate your patience as I am new to kernel debugging, so if I have omitted necessary information it is out of ignorance and not malice.  :-)

===============================================

Unread portion of the kernel message buffer:
<6>in6_purgeaddr: err=65, destination address delete failed
ng node ng0_unifi_1 needs NGF_REALLY_DIE


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x0
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff8263dba6
stack pointer           = 0x28:0xfffffe008caeb6c0
frame pointer           = 0x28:0xfffffe008caeb6e0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 60579 (jail)
trap number             = 12
panic: page fault
cpuid = 0
time = 1543479102
KDB: stack backtrace:
#0 0xffffffff80be74a7 at kdb_backtrace+0x67
#1 0xffffffff80b9b093 at vpanic+0x1a3
#2 0xffffffff80b9aee3 at panic+0x43
#3 0xffffffff8107394f at trap_fatal+0x35f
#4 0xffffffff810739a9 at trap_pfault+0x49
#5 0xffffffff81072fce at trap+0x29e
#6 0xffffffff8104e865 at calltrap+0x8
#7 0xffffffff80ca0dd5 at ether_ifdetach+0x35
#8 0xffffffff80caab14 at vlan_clone_destroy+0x24
#9 0xffffffff80c9ea26 at if_clone_destroyif+0x116
#10 0xffffffff80c9f338 at if_clone_detach+0xc8
#11 0xffffffff80cc7b3c at vnet_destroy+0x13c
#12 0xffffffff80b63480 at prison_deref+0x2b0
#13 0xffffffff80b64d04 at sys_jail_remove+0x364
#14 0xffffffff81074429 at amd64_syscall+0x369
#15 0xffffffff8104f14d at fast_syscall_common+0x101
Uptime: 1m36s
Dumping 768 out of 16178 MB:..3%..11%..21%..32%..42%..53%..61%..71%..82%..92%

__curthread () at ./machine/pcpu.h:230
230     ./machine/pcpu.h: No such file or directory.
(kgdb) list *0xffffffff8263dba6
0xffffffff8263dba6 is in ng_ether_detach (/usr/src/sys/netgraph/ng_ether.c:367).
362     /usr/src/sys/netgraph/ng_ether.c: No such file or directory.
(kgdb) backtrace
#0  __curthread () at ./machine/pcpu.h:230
#1  doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:366
#2  0xffffffff80b9ac7b in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:446
#3  0xffffffff80b9b0f3 in vpanic (fmt=<optimized out>, ap=0xfffffe008bcf2410) at /usr/src/sys/kern/kern_shutdown.c:872
#4  0xffffffff80b9aee3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:799
#5  0xffffffff8107394f in trap_fatal (frame=0xfffffe008bcf2600, eva=0) at /usr/src/sys/amd64/amd64/trap.c:929
#6  0xffffffff810739a9 in trap_pfault (frame=0xfffffe008bcf2600, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:765
#7  0xffffffff81072fce in trap (frame=0xfffffe008bcf2600) at /usr/src/sys/amd64/amd64/trap.c:441
#8  <signal handler called>
#9  ng_ether_detach (ifp=0xfffff800038a6800) at /usr/src/sys/netgraph/ng_ether.c:367
#10 0xffffffff80ca0dd5 in ether_ifdetach (ifp=0xfffff800038a6800) at /usr/src/sys/net/if_ethersubr.c:981
#11 0xffffffff80caab14 in vlan_clone_destroy (ifc=0xfffff80013c55600, ifp=0xfffff800038a6800)
    at /usr/src/sys/net/if_vlan.c:1106
#12 0xffffffff80c9ea26 in if_clone_destroyif (ifc=0xfffff80013c55600, ifp=0xfffff800038a6800)
    at /usr/src/sys/net/if_clone.c:330
#13 0xffffffff80c9f338 in if_clone_detach (ifc=0xfffff80013c55600) at /usr/src/sys/net/if_clone.c:451
#14 0xffffffff80cc7b3c in vnet_sysuninit () at /usr/src/sys/net/vnet.c:597
#15 vnet_destroy (vnet=0xfffff80013c7dd00) at /usr/src/sys/net/vnet.c:284
#16 0xffffffff80b63480 in prison_deref (pr=0xffffffff81b0b3c0 <prison0>, flags=19) at /usr/src/sys/kern/kern_jail.c:2634
#17 0xffffffff80b64d04 in sys_jail_remove (td=<optimized out>, uap=<optimized out>) at /usr/src/sys/kern/kern_jail.c:2257
#18 0xffffffff81074429 in syscallenter (td=<optimized out>) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135
#19 amd64_syscall (td=0xfffff80117757000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1076
#20 <signal handler called>
#21 0x000000080030f0aa in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffffffea28
Comment 1 Kristof Provost freebsd_committer 2018-11-29 08:26:49 UTC
Can you describe your setup? It looks like you might have a vlan interface involved somewhere, but knowing how it's all set up will likely make reproducing this easier.
Comment 2 Jordan Boland 2018-11-29 08:37:57 UTC
Yes, I can go into some more detail about the networking.

On the host system, only igb1 is currently active.  It is a trunk interface, so on the host I have igb1.1 configured.

I am utilizing Devin Teske's jng to create a netgraph bridge to igb1, exposing the trunked interface to the jail.  This jail only needs 1 VLAN, but I anticipate adding others later that will utilize more, and this seemed more elegant than cloning the VLAN interfaces individually.

In the jail, I am replicate the same setup as the host to access the tagged interface.

host rc.conf:
=============================
ifconfig_igb1="up"
vlans_igb1="1"
ifconfig_igb1_1="192.168.1.2 netmask 255.255.255.0"
=============================


jail rc.conf:
=============================
vlans_ng0_unifi="1"
ifconfig_ng0_unifi="up"
ifconfig_ng0_unifi_1="inet 192.168.1.3 netmask 255.255.255.0"
=============================


For this particular jail the configuration is overkill (I suppose I could just bridge to igb1.1 on the host).  But I am proving out this strategy for some of the other services that I will need to host on this machine later.  Otherwise, I will have to go back to the strategy of mangling multiple FIBs.
Comment 3 Bjoern A. Zeeb freebsd_committer 2018-11-29 11:33:29 UTC
I can probably have a quick look this evening unless anyone beats me to it.
Looking at the backtrace I have a suspicion of what's going on.
Comment 4 Jordan Boland 2018-12-04 09:03:01 UTC
Hi Bjoern & Kristof,

I wanted to provide some additional information that I have gathered today.  I have upgraded this system to 12.0-RC3 and can confirm the issue remains.  I also believe it is related to the netgraph module.  I have converted a jail to use jib and mod if_bridge/if_epair.  I can now stop the jail without experiencing the kernel panic.

I hope this helps confirm or direct your suspicions.  Let me know if I can do any additional tests.  I will look into it further when I can, but it may be some time before I can do so, and it is entirely possible that the networking code exceeds my skill in C.

Best,

Jordan
Comment 5 Bjoern A. Zeeb freebsd_committer 2019-01-15 23:20:05 UTC
Sorry, I got side-tracked while I was looking at this.  Release it to net@ in case someone else beats me to fixing it.
Comment 6 Eugene Grosbein freebsd_committer 2019-01-16 09:11:45 UTC
This PR lack many valuable technical details. If the problem is really in netgraph, please you describe used nodes and their hooks and settings because just notice of some "jng" (whatever it is) not enough. Plain ng0 is p2p-interface and vlan is not. You should also supply any additional details that may be relevand and do not forget to show output of "ngctl list" when jail is up and running, before is panices kernel at shutdown.
Comment 7 Arne Steinkamm 2019-09-03 23:37:09 UTC
I can reproduce this panic with 12.0-RELEASE-p7 r350232 and a pretty forward out-of-the-handbook Jail setup.

This should be enough to get a nasty panic:

Host:
/etc/rc.conf:
[...]
vlans_em0="vlcx0"
create_args_vlcx0="vlan 18"
ifconfig_em0="up"
ifconfig_vlcx0="inet 10.8.8.110 netmask 255.255.255.0"
[...]
jail_enable="YES"
jail_confwarn="YES"
jail_parallel_start="NO"
jail_list="jv"
jail_reverse_stop="YES"

---------------------------------------------------------------------

/etc/jail.conf:
exec.start = "/bin/sh /etc/rc";
exec.stop = "/bin/sh /etc/rc.shutdown";
exec.clean;

jv {
        host.hostname = "julesverne.stk.cx";
        path = "/var/local/prison/jv";
        exec.clean;
        exec.system_user = "root";
        exec.jail_user = "root";
        vnet;
        exec.clean;
        vnet.interface = "ng0_jv";
        exec.system_user = "root";
        exec.jail_user = "root";
        exec.prestart += "/l/om/sbin/jng bridge jv em0";
        exec.poststop += "/l/om/sbin/jng shutdown jv";

        # Standard stuff
        exec.consolelog = "/var/local/log/jails/jv_console.log";
        mount.devfs;          #mount devfs
        allow.raw_sockets;    #allow ping-pong
        devfs_ruleset="5";    #devfs ruleset for this jail
        mount.devfs;
}

--------------------------------------------------------------------

Jail /etc/rc.conf:
[...]
ifconfig_ng0_jv="up"
vlans_ng0_jv="jjvcx0"
create_args_jjvcx0="vlan 18"
ifconfig_jjvcx0="inet 10.8.8.190 netmask 255.255.255.0"
[...]


/l/om/sbin/jng is a copy of /usr/src/share/examples/jails/jng

This should be everything you need to get exact the panic described in this bug report.
Comment 8 xsan 2019-11-24 15:42:32 UTC
I have the same problem, and it's very easy way to show that.
I use `qjail` tool to manage jails.

# first create jail, and use vnet for jail.
qjail create -4 192.168.1.101 testjail
qjail config -w em0 -v none testjail

# repeat the follows command, page fault will happend on stop command, and system reboot.
qjail start testjail
qjail stop testjail

System: FreeBSD 12.1-RELEASE amd64

Logs:

Nov 24 21:44:09 FingerAge kernel: epair3a: link state changed to DOWN
Nov 24 21:44:09 FingerAge kernel: epair3b: link state changed to DOWN
Nov 24 21:44:52 FingerAge syslogd: kernel boot file is /boot/kernel/kernel
Nov 24 21:44:52 FingerAge kernel:
Nov 24 21:44:52 FingerAge syslogd: last message repeated 1 times
Nov 24 21:44:52 FingerAge kernel: Fatal trap 12: page fault while in kernel mode
Nov 24 21:44:52 FingerAge kernel: cpuid = 7; apic id = 07
Nov 24 21:44:52 FingerAge kernel: fault virtual address = 0x410
Nov 24 21:44:52 FingerAge kernel: fault code            = supervisor read data, page not present
Nov 24 21:44:52 FingerAge kernel: instruction pointer   = 0x20:0xffffffff80baff2d
Nov 24 21:44:52 FingerAge kernel: stack pointer         = 0x28:0xfffffe00403c3940
Nov 24 21:44:52 FingerAge kernel: frame pointer         = 0x28:0xfffffe00403c39c0
Nov 24 21:44:52 FingerAge kernel: code segment          = base rx0, limit 0xfffff, type 0x1b
Nov 24 21:44:52 FingerAge kernel:                       = DPL 0, pres 1, long 1, def32 0, gran 1
Nov 24 21:44:52 FingerAge kernel: processor eflags      = interrupt enabled, resume, IOPL = 0
Nov 24 21:44:52 FingerAge kernel: current process               = 0 (thread taskq)
Nov 24 21:44:52 FingerAge kernel: trap number           = 12
Nov 24 21:44:52 FingerAge kernel: panic: page fault
Nov 24 21:44:52 FingerAge kernel: cpuid = 7
Nov 24 21:44:52 FingerAge kernel: time = 1574603049
Nov 24 21:44:52 FingerAge kernel: KDB: stack backtrace:
Nov 24 21:44:52 FingerAge kernel: #0 0xffffffff80c1d297 at kdb_backtrace+0x67
Nov 24 21:44:52 FingerAge kernel: #1 0xffffffff80bd05cd at vpanic+0x19d
Nov 24 21:44:52 FingerAge kernel: #2 0xffffffff80bd0423 at panic+0x43
Nov 24 21:44:52 FingerAge kernel: #3 0xffffffff810a7dcc at trap_fatal+0x39c
Nov 24 21:44:52 FingerAge kernel: #4 0xffffffff810a7e19 at trap_pfault+0x49
Nov 24 21:44:52 FingerAge kernel: #5 0xffffffff810a740f at trap+0x29f
Nov 24 21:44:52 FingerAge kernel: #6 0xffffffff81081a0c at calltrap+0x8
Nov 24 21:44:52 FingerAge kernel: #7 0xffffffff80ccd5e1 at if_detach_internal+0x261
Nov 24 21:44:52 FingerAge kernel: #8 0xffffffff80cd490c at if_vmove+0x3c
Nov 24 21:44:52 FingerAge kernel: #9 0xffffffff80cd48b8 at vnet_if_return+0x48
Nov 24 21:44:52 FingerAge kernel: #10 0xffffffff80cfe2b4 at vnet_destroy+0x124
Nov 24 21:44:52 FingerAge kernel: #11 0xffffffff80b98870 at prison_deref+0x2a0
Nov 24 21:44:52 FingerAge kernel: #12 0xffffffff80c2fa74 at taskqueue_run_locked+0x154
Nov 24 21:44:52 FingerAge kernel: #13 0xffffffff80c30da8 at taskqueue_thread_loop+0x98
Nov 24 21:44:52 FingerAge kernel: #14 0xffffffff80b90c23 at fork_exit+0x83
Nov 24 21:44:52 FingerAge kernel: #15 0xffffffff81082a4e at fork_trampoline+0xe