Created attachment 176290 [details] screenshot of the crash Upgraded from FreeBSD 10.3-RELEASE-p10 to FreeBSD 11-RELEASE-p2 used generic configuration file with options below: options IPFIREWALL options IPFIREWALL_VERBOSE options IPFIREWALL_DEFAULT_TO_ACCEPT options DUMMYNET options IPDIVERT options CONSPEED=115200 options HZ=1000 options VIMAGE the installkernel ended with --- afterinstall --- kldxref /boot/kernel kldxref: unknown metadata record 4 in file atacard.ko kldxref: unknown metadata record 4 in file atp.ko kldxref: unknown metadata record 4 in file atp.ko kldxref: unknown metadata record 4 in file cmx.ko kldxref: unknown metadata record 4 in file fdc.ko kldxref: unknown metadata record 4 in file if_an.ko kldxref: unknown metadata record 4 in file if_aue.ko kldxref: unknown metadata record 4 in file if_axe.ko kldxref: unknown metadata record 4 in file if_axge.ko kldxref: unknown metadata record 4 in file if_cdce.ko kldxref: unknown metadata record 4 in file if_cdce.ko kldxref: unknown metadata record 4 in file if_cdce.ko kldxref: unknown metadata record 4 in file if_cs.ko kldxref: unknown metadata record 4 in file if_cue.ko kldxref: unknown metadata record 4 in file if_ed.ko kldxref: unknown metadata record 4 in file if_ed.ko kldxref: unknown metadata record 4 in file if_ed.ko kldxref: unknown metadata record 4 in file if_ep.ko kldxref: unknown metadata record 4 in file if_fe.ko kldxref: unknown metadata record 4 in file if_ipheth.ko kldxref: unknown metadata record 4 in file if_kue.ko kldxref: unknown metadata record 4 in file if_mos.ko kldxref: unknown metadata record 4 in file if_rsu.ko kldxref: unknown metadata record 4 in file if_rue.ko kldxref: unknown metadata record 4 in file if_rum.ko kldxref: unknown metadata record 4 in file if_run.ko kldxref: unknown metadata record 4 in file if_smsc.ko kldxref: unknown metadata record 4 in file if_sn.ko kldxref: unknown metadata record 4 in file if_uath.ko kldxref: unknown metadata record 4 in file if_udav.ko kldxref: unknown metadata record 4 in file if_upgt.ko kldxref: unknown metadata record 4 in file if_ural.ko kldxref: unknown metadata record 4 in file if_urndis.ko kldxref: unknown metadata record 4 in file if_urtw.ko kldxref: unknown metadata record 4 in file if_urtwn.ko kldxref: unknown metadata record 4 in file if_wi.ko kldxref: unknown metadata record 4 in file if_xe.ko kldxref: unknown metadata record 4 in file if_zyd.ko kldxref: unknown metadata record 4 in file kernel kldxref: unknown metadata record 4 in file kernel kldxref: unknown metadata record 4 in file kernel kldxref: unknown metadata record 4 in file kernel kldxref: unknown metadata record 4 in file kernel kldxref: unknown metadata record 4 in file kernel kldxref: unknown metadata record 4 in file kernel kldxref: unknown metadata record 4 in file kernel kldxref: unknown metadata record 4 in file ng_bt3c.ko kldxref: unknown metadata record 4 in file ng_ubt.ko kldxref: unknown metadata record 4 in file snd_uaudio.ko kldxref: unknown metadata record 4 in file u3g.ko kldxref: unknown metadata record 4 in file uark.ko kldxref: unknown metadata record 4 in file uart.ko kldxref: unknown metadata record 4 in file ubsa.ko kldxref: unknown metadata record 4 in file ubtbcmfw.ko kldxref: unknown metadata record 4 in file uchcom.ko kldxref: unknown metadata record 4 in file ucycom.ko kldxref: unknown metadata record 4 in file udbp.ko kldxref: unknown metadata record 4 in file uep.ko kldxref: unknown metadata record 4 in file ufm.ko kldxref: unknown metadata record 4 in file ufoma.ko kldxref: unknown metadata record 4 in file uftdi.ko kldxref: unknown metadata record 4 in file ugensa.ko kldxref: unknown metadata record 4 in file ugold.ko kldxref: unknown metadata record 4 in file uhid.ko kldxref: unknown metadata record 4 in file uhso.ko kldxref: unknown metadata record 4 in file uipaq.ko kldxref: unknown metadata record 4 in file ukbd.ko kldxref: unknown metadata record 4 in file uled.ko kldxref: unknown metadata record 4 in file ulpt.ko kldxref: unknown metadata record 4 in file umass.ko kldxref: unknown metadata record 4 in file umcs.ko kldxref: unknown metadata record 4 in file umct.ko kldxref: unknown metadata record 4 in file umodem.ko kldxref: unknown metadata record 4 in file umodem.ko kldxref: unknown metadata record 4 in file umoscom.ko kldxref: unknown metadata record 4 in file ums.ko kldxref: unknown metadata record 4 in file uplcom.ko kldxref: unknown metadata record 4 in file urio.ko kldxref: unknown metadata record 4 in file usie.ko kldxref: unknown metadata record 4 in file uslcom.ko kldxref: unknown metadata record 4 in file uvisor.ko kldxref: unknown metadata record 4 in file uvscom.ko kldxref: unknown metadata record 4 in file wsp.ko
I am running FreeBSD 11-RELEASE-p1 installed from scratch using cdrom.iso. I have tested ipfw on the host and in a vimage jail with out any problems. My custom kernel only has vimage compiled in. The host is running ipfw without usimg DUMMYNET, IPDIVERT or IPFIREWALL_NAT. The vimage jail is also running ipfw without using those same functions. The only problem with ipfw is the vimage jails ipfw log messages get intermingled into the host's ipfw log file. I also tested with options VIMAGE options IPFIREWALL options IPFIREWALL_NAT # ipfw kernel nat support options IPDIVERT # divert sockets options LIBALIAS # required by IPFIREWALL_NAT compiled into the kernel and the host system booted fine with ipfw on the host and the vimage jail worked the same as NOT compiling in ipfw. Did not test ipfw using using those "functions listed above" on the host or vimage jail. The only reason to compile ipfw into the kernel is if the host is not running ipfw. A vimage jail does not kldload modules on first reference like the host does so you have to compile them into the kernel. An alternative is to configure your vimage jail's jail.conf with a exec.prestart option to kldload the ipfw modules used by the vimage jail. I didn't get any error messages from installkernel task during the vimage kernel compile. My guess is nospam@ofloo.net has problem with his upgrade to 11.0 or had existing kernel compile problems before the upgrade which left his updated system messed up. Suggest a install of 11.0 to a blank disk will correct this problem.
No I reinstalled freebsd10.3 and I had the exact same error, I'll try to install freebsd11 and see what happens, however I'd rather work on helping finding the problem then going around it. When I read your answer on reinstall, I got windows flash backs. where you reboot if something doesn't work and when you think your system is a little slow you just reinstall. But I guess that's just me. So nothing previous is going on.
The issue also happens when i compile vimage jail under freebsd11 however this time no compile errors. If you like have a video of the boot process if it's useful. Let me know.
% kldstat Id Refs Address Size Name 1 47 0xffffffff80200000 200eb88 kernel 2 1 0xffffffff82210000 30aec0 zfs.ko 3 2 0xffffffff8251b000 adc0 opensolaris.ko 4 2 0xffffffff82526000 9d50 bridgestp.ko 5 1 0xffffffff82530000 127b0 if_bridge.ko 6 1 0xffffffff82543000 15af8 if_lagg.ko 7 1 0xffffffff82559000 1620 accf_data.ko 8 1 0xffffffff8255b000 2710 accf_http.ko 9 1 0xffffffff8255e000 4c60 coretemp.ko 10 1 0xffffffff82563000 b3e8 aesni.ko 11 3 0xffffffff8256f000 2e20 smbus.ko 12 1 0xffffffff82572000 6688 ichsmb.ko 13 1 0xffffffff82579000 115b8 ipmi.ko 14 1 0xffffffff82621000 10582 geom_eli.ko 15 1 0xffffffff82632000 587b fdescfs.ko 16 1 0xffffffff82638000 3710 ums.ko 17 1 0xffffffff8263c000 4485 if_epair.ko also it appears it's only one jail in particular that has issues, not entirely sure why though. I don't see that much difference between the jails only what is different is that the one that is crashing has 2 vlans running rather then one, not sure how this can be an issue though.
I disabled all the daemons except sshd and still it crashed, ..
I'm having the same problem and can reliably reproduce it. I noticed it when testing vmadm (https://github.com/project-fifo/r-vmadm) and starting and stopping a jail a few times. The basic steps for start are: rctl -a jail:d0f4fea3-e368-4346-b44c-50cfbcffa287:memoryuse:deny=1024M jail:d0f4fea3-e368-4346-b44c-50cfbcffa287:memorylocked:deny=1024M jail:d0f4fea3-e368-4346-b44c-50cfbcffa287:shmsize:deny=1024M jail:d0f4fea3-e368-4346-b44c-50cfbcffa287:pcpu:deny=100 jail:d0f4fea3-e368-4346-b44c-50cfbcffa287:maxproc:deny=2000 mount -t devfs devfs /zroot/jails/d0f4fea3-e368-4346-b44c-50cfbcffa287/root/dev mount -t devfs devfs /zroot/jails/d0f4fea3-e368-4346-b44c-50cfbcffa287/root/jail/dev jail -i -c persist name=d0f4fea3-e368-4346-b44c-50cfbcffa287 path=/zroot/jails/d0f4fea3-e368-4346-b44c-50cfbcffa287/root host.hostuuid=d0f4fea3-e368-4346-b44c-50cfbcffa287 host.hostname=test devfs_ruleset=4 securelevel=2 sysvmsg=new sysvsem=new sysvshm=new allow.raw_sockets children.max=1 vnet=new vnet.interface=epair0b exec.start="/sbin/ifconfig epair0b name net0p; /sbin/ifconfig net0p.5 create vlan 5 vlandev net0p; /sbin/ifconfig net0p.5 name net0; /sbin/ifconfig net0 inet 192.168.1.234 255.255.255.0; /sbin/route add default -gateway 192.168.1.1; /sbin/ifconfig lo0 127.0.0.1 up; jail -c persist name=d0f4fea3-e368-4346-b44c-50cfbcffa287 host.hostname=test path=/jail ip4=inherit devfs_ruleset=4 securelevel=2 sysvmsg=new sysvsem=new sysvshm=new allow.raw_sockets exec.start='sh /etc/rc'" ifconfig epair0a name j1:net0 and destroying the jail the same way in reverse (stop, unmount, remove rctl entries dstrouy j1:net0) kernel is FreeBSD fifo-bsd 11.0-RELEASE-p1 with which is the standard kenrel config plus nooptions SCTP # Stream Control Transmission Protocol options VIMAGE # VNET/Vimage support options RACCT # Resource containers options RCTL # same as above I've uploaded the kernel dump here https://www.dropbox.com/s/73mb8e64cb7zwbe/crash.tar.xz?dl=0 (it's too big to attach)
Created attachment 184394 [details] core.txt from the kernel panic I'll attach the core.txt from one of those crashes directly.
Adding more context, the bug seems to be the same as discussed here: http://mpc.lists.freebsd.current.narkive.com/Wotl1Q0o/panic-possibly-on-on-bridge-member-removal
Created attachment 184403 [details] Test case to introduce this bug. Adding a test case, it works nearly 100% reliably for me when run as one of the first commands on the system.
I used a similar script to reproduce the bug and noticed it only occurs when the host's epair nic went up before destroying the jail. This snippet manually attaches the nic to the jail after it was started and takes "yes" as first argument to change the host's nic state. $ ./crash-demo.sh no ... > done $ ./crash-demo.sh yes > crash -- #!/bin/sh UPDOWNIF="$1" BRIDGE_IF=bridge1 ifconfig $BRIDGE_IF create set -x for i in $(seq 0 200); do #jail -c vnet persist path=$RELEASE_FOLDER name=jail-vnet jail -c vnet persist name=jail-vnet epair_a="$(ifconfig epair create)" epair_b="$(echo $epair_a | rev | cut -c2- | rev)b" mac_a=$(openssl rand -hex 6 | sed 's/\(..\)/\1:/g; s/.$//') ifconfig $epair_a name a-$i ifconfig a-$i ether "$mac_a" if [ "$UPDOWNIF" == "yes" ]; then ifconfig a-$i up fi ifconfig $BRIDGE_IF addm a-$i ifconfig $epair_b vnet jail-vnet jexec jail-vnet /sbin/ifconfig $epair_b name vnet0 jexec jail-vnet /sbin/ifconfig vnet0 up jexec jail-vnet /sbin/ifconfig jail -r jail-vnet if [ "$UPDOWNIF" == "yes" ]; then ifconfig a-$i down fi ifconfig $BRIDGE_IF deletem a-$i ifconfig a-$i destroy done echo "done"
Given how easy this is to reproduce and we've 3 people in here now any chance to change the importance from 'affects only me' to 'affects some people'?
Flip the "affects some people" switch. IIUC FreeBSD really doesn't pay much attention to that field; it's a default field in Bugzilla. e.g. there's no formal triage procedure.
I'm not sure if this is the cause, but I can't upgrade 2 other machines with 10 jails because of it. So to me it is important as well.
I should of payed more attention to what was said never mind. Disregard that last comment.
Created attachment 184726 [details] core.0.txt on a VM with HyperThreading disabled on the host The core.0.txt after crashing the system on a 11.0-RELEASE-p11 with VIMAGE enabled.
I've posted a proposed patch in https://reviews.freebsd.org/D11782 The panic in the last comment happens because ifp->if_bpf is NULL, which happens due to a race in bpf_if cleanup (as described in the patch). With this patch the script in Comment #10 no longer panics.
The following code (run in bash right after startup) while true do PAIR=`ifconfig epair create | sed 's/a\$//'` ifconfig bridge0 addm ${PAIR}a jail -i -c name=crash persist vnet=new vnet.interface=${PAIR}b exec.start="/sbin/ifconfig ${PAIR}b name net0p" jail -r crash ifconfig ${PAIR}a destroy done still panics the system for me after aplying the patch, the panic is so 'bad' that not even a coredump seems to be written.
(In reply to Heinz N. Gies from comment #17) I cannot produce a crash with that script. Do you have access to the console of that machine? Without more information there's not much we can do here...
Hi sorry, I stand corrected, the kernel dind't panic, that is why no core dum was created however it seems the network stack of the bridge device gets killed by it. If I run the script on a console (thanks for the hint!) the host becomes completely unreachable until the script stops. I'm not sure if this is the same problem or should be in a own issue? The ifconfig on the host in question is: em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO> ether 00:25:90:a6:3b:c7 inet 192.168.1.22 netmask 0xffffff00 broadcast 192.168.1.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active em1: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO> ether 00:25:90:a6:3b:c6 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect status: no carrier lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> groups: lo bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether 02:6a:d8:fe:02:00 nd6 options=9<PERFORMNUD,IFDISABLED> groups: bridge id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: em0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 1 priority 128 path cost 2000000 Note: Removing em0 from bridge0 does resolve the issue.
Please open a separate PR for that one. Are you accessing the device through the bridge interface?
Done submitted as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221122
(In reply to Kristof Provost from comment #18) I cannot reproduce the issue either. All networking on 500 jails appears to be stable and even without further delay while running the script from comment #17. Although, I've noticed the main network to go down for 3-6 seconds when the epair device added to a bridge is the first device on the bridge. Might be related to comment #19.
in addition to comment #22: This was the case on unpatched kernels as well. So what I've described is a separate issue in any case.
A commit references this bug: Author: kp Date: Wed Aug 16 19:40:07 UTC 2017 New revision: 322590 URL: https://svnweb.freebsd.org/changeset/base/322590 Log: bpf: Fix incorrect cleanup Cleaning up a bpf_if is a two stage process. We first move it to the bpf_freelist (in bpfdetach()) and only later do we actually free it (in bpf_ifdetach()). We cannot set the ifp->if_bpf to NULL from bpf_ifdetach() because it's possible that the ifnet has already gone away, or that it has been assigned a new bpf_if. This can lead to a struct ifnet which is up, but has if_bpf set to NULL, which will panic when we try to send the next packet. Keep track of the pointer to the bpf_if (because it's not always ifp->if_bpf), and NULL it immediately in bpfdetach(). PR: 213896 MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11782 Changes: head/sys/net/bpf.c
A commit references this bug: Author: kp Date: Wed Aug 30 21:18:57 UTC 2017 New revision: 323034 URL: https://svnweb.freebsd.org/changeset/base/323034 Log: MFC r322590: bpf: Fix incorrect cleanup Cleaning up a bpf_if is a two stage process. We first move it to the bpf_freelist (in bpfdetach()) and only later do we actually free it (in bpf_ifdetach()). We cannot set the ifp->if_bpf to NULL from bpf_ifdetach() because it's possible that the ifnet has already gone away, or that it has been assigned a new bpf_if. This can lead to a struct ifnet which is up, but has if_bpf set to NULL, which will panic when we try to send the next packet. Keep track of the pointer to the bpf_if (because it's not always ifp->if_bpf), and NULL it immediately in bpfdetach(). PR: 213896 Differential Revision: https://reviews.freebsd.org/D11782 Changes: _U stable/11/ stable/11/sys/net/bpf.c