Bug 238326

Summary: Kernel crash on jail stop (VIMAGE/VNET)
Product: Base System Reporter: paul.le.gauret
Component: kernAssignee: freebsd-jail (Nobody) <jail>
Status: New ---    
Severity: Affects Only Me CC: alexx, alfa, bz, chris, freebsd, graudeejs, markus, mason, ohartmann, paul.le.gauret, pprocacci, reshadpatuck1, sascha.folie, sigsys, spam123, trashcan, yp2008cn
Priority: ---    
Version: 12.1-RELEASE   
Hardware: Any   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=234985
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219901

Description paul.le.gauret 2019-06-04 21:05:26 UTC
Jails managed through the native jails tools with the attached jail.conf file (so called "thin jails", VNET). I can reliably crash the system by issuing a "service jail restart". The crash happens when jails are being stopped.

[root@beastie ~]# uname -a
FreeBSD beastie 12.0-RELEASE-p4 FreeBSD 12.0-RELEASE-p4 GENERIC  amd64

[root@beastie ~]# service jail restart
Stopping jails: dns db ldap web nextcloud imap smtp testpacket_write_wait: Connection to xxx.xxx.xxx.xxx port 22: Broken pipe


As a workaround I can avoid the crash by inserting a

exec.poststop = "sleep 2";

statement in jail.conf. 1 second was not enough to avoid the crash.

Also worth noting that the issue does not appear if I totally disable networking, hence my guess this is somehow VIMAGE/VNET related.

I've managed to obtain a core dump but my attempts at debugging didn't go far as I'm not experienced with this. Of course happy to do more debugging if this can help identifying the issue. The system is not yet live so I can pretty much try anything on it.

[root@beastie /var/crash]# kgdb /boot/kernel/kernel /var/crash/vmcore.0
GNU gdb (GDB) 8.2.1 [GDB v8.2.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /boot/kernel/kernel...(no debugging symbols found)...done.
0xffffffff80bcd0bd in sched_switch ()
(kgdb) bt
#0  0xffffffff80bcd0bd in sched_switch ()
#1  0xffffffff80ba6de1 in mi_switch ()
#2  0xffffffff80bf554c in sleepq_wait ()
#3  0xffffffff80ba6817 in _sleep ()
#4  0xffffffff80bfae71 in taskqueue_thread_loop ()
#5  0xffffffff80b5bf33 in fork_exit ()
#6  <signal handler called>

My jail.conf file below. Note that the crash happens regardless of whether the jails with special permissions (db, builder) are configured. I also tried tweaking the mount/umount order without success.

$release="12.0-RELEASE"; # Release used to create the jail if not already existing
host.hostname = "${name}";
path = "/jails/${name}";
exec.consolelog = "/var/log/jail.${name}.console.log";
vnet = "new";
vnet.interface = "epair${jailnum}b";
### Create the jail if not already exsisting ###
exec.prestart = "if [ ! -d "/jails/THINJAILS/${name}" ]; then zfs clone zroot/jails/TEMPLATES/skeleton-${release}@skeleton zroot/jails/THINJAILS/${name}; fi";
exec.prestart += "if [ ! -d "/jails/${name}" ]; then mkdir /jails/${name}; fi";
### Mount filesystems for shared base (RO) and individual jail skeleton (RW)
exec.prestart += "mount_nullfs -o ro /jails/TEMPLATES/base-${release} /jails/${name}";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/etc /jails/${name}/etc";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/home /jails/${name}/home";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/root /jails/${name}/root";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/tmp /jails/${name}/tmp";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/var /jails/${name}/var";
exec.prestart += "mount_nullfs -o rw /jails/THINJAILS/${name}/usr/local /jails/${name}/usr/local";
### Mount /usr/ports from hosts
exec.prestart += "mount_nullfs -o ro /usr/ports /jails/${name}/usr/ports";
### Mount data filesystem if exists
exec.prestart += "if [ -d "/jails/DATA/${name}" ]; then mount_nullfs -o rw /jails/DATA/${name} /jails/${name}/data; fi";
### Mount devfs with the default ruleset (11) now that the jail filesystems are mounted
exec.prestart += "mount -t devfs -o ruleset=4 devfs /jails/${name}/dev";
### Create an ethernet pair and add to bridge0 (internal bridge)
exec.prestart += "ifconfig epair${jailnum} create up";
exec.prestart += "ifconfig epair${jailnum}a description '${name} - host interface'";
exec.prestart += "ifconfig epair${jailnum}b description '${name} - jail interface'";
exec.prestart += "ifconfig bridge0 addm epair${jailnum}a";
exec.start = "ifconfig epair${jailnum}b inet 192.168.1.${jailnum}/24";
exec.start += "ifconfig epair${jailnum}b inet6 xxx:xxx:xxx:xxx::${jailnum}:1/64";
exec.start += "route add default 192.168.1.1";
exec.start += "route -6 add default xxx:xxx:xxx:xxx::1:1";
### Proceed with boot through rc
exec.start += "/bin/sh /etc/rc";
### Shutdown the jail
exec.stop = "/bin/sh /etc/rc.shutdown";
### Give the jail a couple of seconds to shutdown (avoid issues unmounting filesystems)
#exec.poststop = "sleep 2";
### Destroy jail  network interface
exec.poststop += "ifconfig epair${jailnum}a destroy";
### Unmount filesystems
exec.poststop += "if [ -d "/jails/DATA/${name}" ]; then umount /jails/${name}/data; fi";
exec.poststop += "umount -f /jails/${name}/dev";
exec.poststop += "umount -f /jails/${name}/usr/ports";
exec.poststop += "umount -f /jails/${name}/etc";
exec.poststop += "umount -f /jails/${name}/home";
exec.poststop += "umount -f /jails/${name}/root";
exec.poststop += "umount -f /jails/${name}/tmp";
exec.poststop += "umount -f /jails/${name}/var";
exec.poststop += "umount -f /jails/${name}/usr/local";
exec.poststop += "umount -f /jails/${name}";

dns { $jailnum=2; };

db {
    $jailnum=3;
    allow.sysvipc;
};

ldap { $jailnum=4; };

web { $jailnum=5; };

nextcloud { $jailnum=6; };

imap { $jailnum=7; };

smtp { $jailnum=8; };

test { $jailnum=98; };

builder {
    $jailnum=99;
    enforce_statfs=0;
    allow.mount;
    allow.mount.nullfs;
    allow.mount.tmpfs;
    allow.mount.devfs;
    allow.chflags;
};
Comment 1 paul.le.gauret 2019-11-09 13:49:30 UTC
Issue had somehow disappeared from 12.0-RELEASE with one of the subsequent patches (think around -p3 or -p4). 

It is unfortunately back after upgrading to 12.1-RELEASE. Adding back the 2 second sleep in jail.conf still works as a workaround though.
Comment 2 Bjoern A. Zeeb freebsd_committer 2019-11-09 14:04:01 UTC
(In reply to paul.le.gauret from comment #1)

if you have a coredump;  check if you have the text files as well;  a panic string etc would be helpful;  the above 12.0 output was not.

man crashinfo (which might automatically run on the boot following the crash) can help (/etc/rc.d/savecore).


If you are running a release please make sure debug symbols are installed in /usr/lib/debug/boot/kernel.

https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-gdb.html   also has some info.
Comment 3 O. Hartmann 2019-11-13 07:15:03 UTC
A couple of time now (since July, I think) I see a similar phenomenon on a very new Fujitsu server with 13-CURRENT (FreeBSD 13.0-CURRENT #25 r354673: Wed Nov 13 06:47:48 CET 2019 amd64); we manage the jails with FreeBSD native aboard tools and configure those via /etc/jail.conf. Stopping jails brings down the box 100%, a shutdown, which triggers a clean shutdown I guess, too. In most cases I can circumference the crash by rebooting via "reboot". The box is a dual socket NUMA system, equipted with only 1 CPU and only on RAM bank filled with DIMMS. I'll append the dmesg output afterwards.

Due to a toolchain corruption on that system compiling a debugguing kernel isn't possible, so the information I have so far is the panic string from two coredumps:

Version String: FreeBSD 13.0-CURRENT #15 r354144: Tue Oct 29 06:21:38 CET 2019
Panic String: page fault

and 

Version String: FreeBSD 13.0-CURRENT #11 r353877: Tue Oct 22 11:02:32 CEST 2019
Panic String: m_getzone: invalid cluster size 0

The cores are too old to compare them with the recent kernel running and at the moment I do not dare to trigger a crash due to several needs of the box and harsh corruptions to the UFS/FFS SSD bearing the OS.

Maybe those issue with 12-STABLE and 13-CURRENT are linked, I regret not having an iron runnidng 12-STABLE right now on the same CPU type.
Comment 4 pprocacci 2020-01-08 02:19:23 UTC
I too am receiving a kernel panic given options similar to the reporter.
I've used a screen recorder to capture the panic.
If anyone is interested in the video file I'll post it somewhere.
If not, here is my transcribe of the video to text.


The panic text:

Freed UMA keg (rentry) was not empty (17 items).  Lost 1 pages of memory.



Stack trace looks as follows:

#0 0xffffffff80c1d967 at kdb_backtrace+0x67
#1 0xffffffff80bd0dcd at vpanic+0x19d
#2 0xffffffff80bd0c23 at panic+0x43
#3 0xffffffff810aab6c at trap_fatal+0x39c
#4 0xffffffff810aabbf at trap_pfault+0x4f
#5 0xffffffff810aa1f1 at trap+0x2a1
#6 0xffffffff8108373c at calltrap+0x8
#7 0xffffffff80bcb470 at _rm_rlock_hard+0x3b9
#8 0xffffffff80cfb5fe at rtinit+0x2ee
#9 0xffffffff80d4d39c at in_scrubprefix+0x23c
#10 0xffffffff80d64d7d at rip_ctlinput+0x9d
#11 0xffffffff80c5cb7c at pfctlinput+0x5c
#12 0xffffffff80cd0cea at if_down+0x13a
#13 0xffffffff80cce53a at if_detach_internal+0x87a
#14 0xffffffff80ccdcae at if_detach+0x2e
#15 0xffffffff82bc7c01 at epair_clone_destroy+0x81
#16 0xffffffff80cd64dd at if_clone_destroyif+0x10d
#17 0xffffffff80cd636e at if_clone_destroy+0x1be
Comment 5 pprocacci 2020-01-08 02:32:46 UTC
For completeness here my jail.conf and pertinent rc.conf

jail.conf:
++++++++++++++++++++++++++++++++++
$bridge = "bridge${vlan}";
$epair  = "epair${vlan}";
path   = "/jails/hosts/$name";

exec.prestart  = "ifconfig $bridge create up";
exec.prestart += "ifconfig $bridge addm $name";
exec.prestart += "ifconfig $epair create up";
exec.prestart += "ifconfig $bridge addm ${epair}a";
exec.clean;
exec.start     = "/bin/sh /etc/rc";
exec.stop      = "/bin/sh /etc/rc.shutdown";
exec.poststop  = "ifconfig $bridge deletem ${epair}a";
exec.poststop  = "ifconfig ${epair}a destroy";

vnet;
vnet.interface = "${epair}b";

resolver1 {
  $vlan   = "50";
}

++++++++++++++++++++++++++++++++++


rc.conf:
++++++++++++++++++++++++++++++++++
vlans_igb1="resolver1"
create_args_resolver1="vlan 50"
ifconfig_resolver1="inet 192.168.50.1 netmask 255.255.255.252"
++++++++++++++++++++++++++++++++++
Comment 6 pprocacci 2020-01-08 02:44:54 UTC
Also this seems to be related to 

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=234985
Comment 7 pprocacci 2020-01-08 02:55:30 UTC
I've fixed the problem with the following workaround:

exec.prestop   = "ifconfig ${epair}b -vnet $name";

This is taken nearly verbatim from the link I just posted.

$name in the command above can be either the name of the jail or the jail id.

This is a bug in the VNET cleanup code and it's necessary to remove the epair interface from the jail before stopping it.
Comment 8 Reshad Patuck 2020-01-08 03:19:58 UTC
(In reply to pprocacci from comment #7)
Hi,

Your backtrace looks very similar to mine at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219901

Can you get it to fail consistently?

I have been running a script that:
- brings the epair interfaces up
- attaches one end to a bridge
- brings a jail up
- adds the other epair interface to the jail
- kills the jail
- kills the epair interface

It only dies randomly in dev/prod boxes :(
Comment 9 pprocacci 2020-01-08 03:28:11 UTC
exec.prestop   = "ifconfig ${epair}b -vnet $name";

Before adding the above, it would kernel panic every single time.

The key is removing the vnet interface from the jail prior to shutting the jail down so the VNET cleanup code essentially has no interface to worry about.

If you're working on some sort of shell script; on the host you'd:

# ifconfig interface_name_inside_of_jail -vnet $jail_name_or_id

.... and then proceed to kill off the jail.  It shouldn't panic any more in relation to the VNET cleanup code.
Comment 10 O. Hartmann 2020-01-08 09:45:11 UTC
This issue is persistent on recent CURRENT ( FreeBSD 13.0-CURRENT #26 r356437: Tue Jan  7 07:19:34 CET 2020 amd64). The only reliable way to reboot the host without violent and destructive crashes is to issue "reboot" on the shell/console as root.
Comment 11 xsan 2020-02-29 16:07:22 UTC
The bug is very easy to reproduce in VIRTUAL MACHINE, eg: VirtualBox, Hyper-V, VMWare or ESXi, but not in real machine.
Comment 12 graudeejs 2020-05-28 10:07:23 UTC
For the record: I can easily replicate this issue on physical server at work on 12.1-RELEASE-p5.

This server is:

Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz 
FreeBSD/SMP: Multiprocessor System Detected: 48 CPUs
FreeBSD/SMP: 2 package(s) x 12 core(s) x 2 hardware threads


exec.prestop   = "ifconfig ${epair}b -vnet $name";
Comment 13 graudeejs 2020-05-28 10:08:33 UTC
Sorry lost a line

exec.prestop   = "ifconfig ${epair}b -vnet $name";

Mitigates the issue
Comment 14 Markus Stoff 2020-06-28 06:36:40 UTC
I can also reliably reproduce this on a physical machine using the vnet_epair_test.sh script at bug #234985.

Server:

CPU: AMD Ryzen 5 3600 6-Core Processor               (3593.32-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 12 CPUs
FreeBSD/SMP: 1 package(s) x 2 cache groups x 3 core(s) x 2 hardware threads


Running 'ifconfig ${epair}b -vnet ${jid}' before removing the jail avoids the kernel panic. However, I would prefer to shut my jails down in a clean way rather than just pulling the (network) plug.
Comment 15 O. Hartmann 2020-07-03 07:04:06 UTC
This problem is still present in 12-STABLE, CURRENT and 12.1-RELENG.

(In reply to pprocacci from comment #7)

On 12.1-RELENG (most recent), 12-STABLE and CURRENT (r362906), using the workaround as suggested in comment #7 (see above), using

exec.prestop=          "ifconfig ${if_vnet}a -vnet ${name}"; 

where ${if_vnet} is expanded to my epair interface and its subinterface is "a" instead of "b" (a is the interafce owned by the jail in the inner), I receive

variable if_net not known error

It seems that only the command exec.poststop is affected, all other commands, either stop/start targetting the running jail and those targetting the non-running jail (psotstop/prestart etc.) do not show the error.
Comment 16 Mason Loring Bliss freebsd_triage 2020-08-09 20:30:02 UTC
Markus Stoff wrote:

> Running 'ifconfig ${epair}b -vnet ${jid}' before removing the jail avoids 
> the kernel panic. However, I would prefer to shut my jails down in a 
> clean way rather than just pulling the (network) plug.

While it's a little awkward-looking, you can do something like this to make
sure you've cleanly shut down and detached:

    exec.prestop = "/usr/sbin/jexec ${name} /bin/sh /etc/rc.shutdown";
    exec.prestop += "/sbin/ifconfig epair${ep}b -vnet ${name}";

    exec.poststop = "ifconfig $bridge deletem epair${ep}a";
    exec.poststop += "ifconfig epair${ep}a destroy";

The notable thing is that exec.prestop and exec.poststop run in system 
context, not jail context, so you need the jexec to execute the clean 
shutdown - but it works.
Comment 17 Markus Stoff 2020-09-12 13:17:59 UTC
(In reply to Mason Loring Bliss from comment #16)

Yes, this will work. It still feels a bit hacky, though... ;-)
Comment 18 rob2g2 2020-10-10 14:33:25 UTC
same problem here on FreeBSD 12.1 p10