Created attachment 150437 [details] dump txt I'm using CURRENT (r275684) amd64. How to reproduce: 1 - ifconfig bridge0 create up 2 - shutdown -r now I'm running one jail with vnet enabled but it is not in use. I can upload the kernel(.symbols) and the vmcore if necessary.
Can you apply this patch, rebuild your system and see if it fixes the problem? Index: if_bridge.c =================================================================== --- if_bridge.c (revision 275555) +++ if_bridge.c (working copy) @@ -1812,6 +1812,7 @@ /* Check if the interface is a span port */ BRIDGE_LIST_LOCK(); + CURVNET_SET(ifp->if_vnet); LIST_FOREACH(sc, &V_bridge_list, sc_list) { BRIDGE_LOCK(sc); LIST_FOREACH(bif, &sc->sc_spanlist, bif_next) @@ -1822,6 +1823,7 @@ BRIDGE_UNLOCK(sc); } + CURVNET_RESTORE(); BRIDGE_LIST_UNLOCK(); }
(In reply to Craig Rodrigues from comment #1) Same panic after apply patch. But in my configuration i have 2 jails with epairs in bridge #1 0xffffffff80966327 in kern_reboot (howto=Unhandled dwarf expression opcode 0x93 ) at /usr/src/sys/kern/kern_shutdown.c:447 #2 0xffffffff80966918 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:746 #3 0xffffffff80966742 in kassert_panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:634 #4 0xffffffff8095024a in __mtx_lock_flags (c=0xfffffe0069a26278, opts=Unhandled dwarf expression op code 0x93 ) at /usr/src/sys/kern/kern_mutex.c:214 #5 0xffffffff8241457f in bridge_ifdetach (arg=<value optimized out>, ifp=0xfffff80105552800) at /usr/src/sys/modules/if_bridge/../../net/if_bridge.c:1814 #6 0xffffffff80a3794d in if_detach_internal (ifp=0xfffff80105552800, vmove=0) at /usr/src/sys/net/if.c:932 #7 0xffffffff80a375ea in if_detach (ifp=0x0) at /usr/src/sys/net/if.c:863 #8 0xffffffff80a43d16 in lo_clone_destroy (ifp=0xfffff80105552800) at /usr/src/sys/net/if_loop.c:116 #9 0xffffffff80a3f220 in if_clone_destroyif (ifc=0xfffff8000f00da80, ifp=<value optimized out>) at /usr/src/sys/net/if_clone.c:676 #10 0xffffffff80a3fae8 in if_clone_detach (ifc=<value optimized out>) at /usr/src/sys/net/if_clone.c:450 #11 0xffffffff80a43ba6 in vnet_loif_uninit (unused=<value optimized out>) at /usr/src/sys/net/if_loop.c:167 #12 0xffffffff80a55437 in vnet_destroy (vnet=0xfffff80105509380) at /usr/src/sys/net/vnet.c:593 #13 0xffffffff80939b9d in prison_deref (pr=0xffffffff81516eb0, flags=Cannot access memory at address 0x3 ) at /usr/src/sys/kern/kern_jail.c:2582 #14 0xffffffff8093b224 in sys_jail_remove (td=<value optimized out>, uap=<value optimized out>) at /usr/src/sys/kern/kern_jail.c:2248 #15 0xffffffff80d9a90a in amd64_syscall (td=0xfffff8000f445940, traced=0) at subr_syscall.c:133 #16 0xffffffff80d78e6b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:395 #17 0x0000000800ec3b6a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal
(kgdb) f 5 #5 0xffffffff8241457f in bridge_ifdetach (arg=<value optimized out>, ifp=0xfffff80105552800) at /usr/src/sys/modules/if_bridge/../../net/if_bridge.c:1814 1814 BRIDGE_LIST_LOCK(); Current language: auto; currently minimal (kgdb) list 1809 BRIDGE_UNLOCK(sc); 1810 return; 1811 } 1812 1813 /* Check if the interface is a span port */ 1814 BRIDGE_LIST_LOCK(); 1815 CURVNET_SET(ifp->if_vnet); 1816 LIST_FOREACH(sc, &V_bridge_list, sc_list) { 1817 BRIDGE_LOCK(sc); 1818 LIST_FOREACH(bif, &sc->sc_spanlist, bif_next)
Panic String: mtx_lock() of destroyed mutex @ /usr/src/sys/modules/if_bridge/../../net/if_bridge.c:1814
1. This panic happens since r272568: Author: hrs Date: Sun Oct 5 19:43:37 2014 New Revision: 272568 URL: https://svnweb.freebsd.org/changeset/base/272568 Log: Virtualize if_bridge(4) cloner. Modified: head/sys/net/if_bridge.c Maybe hrs@ can be added to/comment on this bug report? 2. On a recent kernel you can work around this bug by adding "device if_bridge" to your kernel
Herbert found by creating VNET jails and then stopping them, he could reproduce the problem. Here are the steps he used to reproduce the problem: /etc/rc.conf: hostname="beastie.home.lan" ifconfig_em0="inet 192.168.1.25 netmask 0xffff0000" defaultrouter="192.168.1.255" cloned_interfaces="bridge0" ifconfig_bridge0="inet 10.0.0.1 netmask 0xff000000" sshd_enable="YES" fsck_y_enable="YES" background_fsck="NO" syslogd_flags="-ss" gateway_enable="YES" pf_enable="NO" pflog_enable="NO" jail_enable="NO" jail_list="jail01 jail02 jail03 jail04" devfs_load_rulesets="YES" /etc/jail.conf: jail01 { name = "jail01"; path = /usr/local/jails/jail01; mount.devfs; host.hostname = jail01.home.lan; vnet = "new"; vnet.interface = "epair0b"; persist; exec.prestart = "ifconfig epair0 create"; exec.prestart += "ifconfig bridge0 addm epair0a"; exec.prestart += "ifconfig epair0a up"; exec.start = ""; #exec.start = "/bin/sh /etc/rc"; exec.poststart = "jexec $name ifconfig epair0b 10.0.0.10 netmask 255.0.0.0 up"; exec.poststart += "jexec $name route add default 10.0.0.1"; exec.poststart += "jexec $name sh /etc/rc"; exec.stop = "/bin/sh /etc/rc.shutdown"; exec.poststop = "ifconfig bridge0 deletem epair0a"; exec.poststop += "ifconfig epair0a destroy"; } /etc/rc.conf in jail01: hostname="jail01.home.lan" sshd_enable="YES" sendmail_enable="NONE" Starting jail with "/etc/rc.d/jail onestart jail01" or "jail -c jail01". Stopping jail with "/etc/rc.d/jail onestop jail01" or "jail -r jail01".
Created attachment 151803 [details] dump2.txt
Herbert provided a traceback from his kernel panic. This looks like the source of the problem: panic: mtx_lock() of destroyed mutex @ /usr/src/sys/modules/if_bridge/../../net/if_bridge.c:1814 It looks like after destroying a jail, a mutex is destroyed, but this destroyed mutex is used later on in another jail.
(In reply to Craig Rodrigues from comment #1) That patch would be bogus as the CURVNET_SET()/RESTORE() would have to be before/after locking as that lack is virtualised as well. But it's also not the real problem.
(In reply to Craig Rodrigues from comment #8) No, it's still used in the same jail. What seems to happen is: (a) the bridges get destroyed (all members detached, etc.), the lock gets destroyed. (b) the loopback interface in the same jail gets destroyed (c) the globally registered eventhandler in if_bridge is called for the interface (lo) disappearing. (d) we get to the point where we try to acquire the lock which we previously destroyed. Either extra checks in bridge_ifdetach() need to be implemented to catch that case (and I think that's not possible without adding extra bandaid information), or proper handling of net cloned interfaces and startup/teardown ordering needs to be implemented "as a whole". With all that the CURVET_SET/RESTORE question from comment #1 remains, as to what happens if bridge_members in the normal case reside in different VNETs (child jails)?
Created attachment 151999 [details] a patch to fix this panic This patch should fix the panic. As Bjoern pointed out, ifnet_departure event on the lo0 interface calls bridge_ifdetach() when destroying a vnet jail. The problem is that vnet_bridge_uninit() can be called before it. The patch uses the fact that a vnet whose V_bridge_cloner == NULL is tearing down (or not initialized). I think it is safe to ignore this detach handler in that case.
(In reply to Bjoern A. Zeeb from comment #10) > With all that the CURVET_SET/RESTORE question from comment #1 remains, > as to what happens if bridge_members in the normal case reside > in different VNETs (child jails)? Is it possible to have bridge members across different vnets? As long as using if_vmove(), member interfaces cannot be moved without detaching from the parent if_bridge(4) interface.
at revision 277518: With your patch I no longer get a panic, but the system freezes when I stop the jail(s).
(In reply to h.skuhra from comment #13) Can you let me know more specifics? I am trying to reproduce it on the latest current but I couldn't. Completely no response from the system or "jail -r" command did not return?
The whole system stopped responding. But today I get a panic again: # kgdb kernel.debug /var/crash/vmcore.8 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: CURVNET_SET() recursion in in_leavegroup_locked() line 1284, prev in vnet_destroy() 0xfffff80005267dc0 -> 0xfffff80005267dc0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0093ff3860 in_leavegroup_locked() at in_leavegroup_locked+0xcc/frame 0xfffffe0093ff38b0 in_difaddr_ioctl() at in_difaddr_ioctl+0x2e7/frame 0xfffffe0093ff3900 in_control() at in_control+0xba/frame 0xfffffe0093ff3980 if_purgeaddrs() at if_purgeaddrs+0xa4/frame 0xfffffe0093ff3a10 <5>bridge0: link state changed to DOWN <6>epair0a: promiscuous mode disabled if_detach_internal() at if_detach_internal+0x1e3/frame 0xfffffe0093ff3a70 if_vmove() at if_vmove+0x1e/frame 0xfffffe0093ff3ab0 vnet_destroy() at vnet_destroy+0x148/frame 0xfffffe0093ff3af0 prison_deref() at prison_deref+0x1fd/frame 0xfffffe0093ff3b20 taskqueue_run_locked() at taskqueue_run_locked+0xf0/frame 0xfffffe0093ff3b80 taskqueue_thread_loop() at <5>epair0a: link state changed to DOWN <5>epair0b: link state changed to DOWN taskqueue_thread_loop+0x9b/frame 0xfffffe0093ff3bb0 fork_exit() at fork_exit+0x84/frame 0xfffffe0093ff3bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0093ff3bf0 --- trap 0, rip = 0, rsp = 0xfffffe0093ff3cb0, rbp = 0 --- Fatal trap 9: general protection fault while in kernel mode cpuid = 1; apic id = 01 instruction pointer = 0x20:0xffffffff805ee977 stack pointer = 0x28:0xfffffe0093ff3830 frame pointer = 0x28:0xfffffe0093ff3860 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (thread taskq) Reading symbols from /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for /boot/kernel/fdescfs.ko.symbols Reading symbols from /boot/kernel/if_bridge.ko.symbols...done. Loaded symbols for /boot/kernel/if_bridge.ko.symbols Reading symbols from /boot/kernel/bridgestp.ko.symbols...done. Loaded symbols for /boot/kernel/bridgestp.ko.symbols Reading symbols from /boot/kernel/if_epair.ko.symbols...done. Loaded symbols for /boot/kernel/if_epair.ko.symbols #0 doadump (textdump=Unhandled dwarf expression opcode 0x93 ) at pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 doadump (textdump=Unhandled dwarf expression opcode 0x93 ) at pcpu.h:219 #1 0xffffffff802dfe1e in db_dump (dummy=<value optimized out>, dummy2=Unhandled dwarf expression opcode 0x93 ) at /usr/src/sys/ddb/db_command.c:533 #2 0xffffffff802df8bc in db_command (cmd_table=0x0) at /usr/src/sys/ddb/db_command.c:440 #3 0xffffffff802df624 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493 #4 0xffffffff802e2160 in db_trap (type=<value optimized out>, code=Unhandled dwarf expression opcode 0x93 ) at /usr/src/sys/ddb/db_main.c:251 #5 0xffffffff8052d22e in kdb_trap (type=Unhandled dwarf expression opcode 0x93 ) at /usr/src/sys/kern/subr_kdb.c:654 #6 0xffffffff80732cc9 in trap_fatal (frame=0xfffffe0093ff3780, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:856 #7 0xffffffff8073297e in trap (frame=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:201 #8 0xffffffff80712c32 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:235 #9 0xffffffff805ee977 in igmp_change_state (inm=0xfffff8000533b800) at /usr/src/sys/netinet/igmp.c:2302 #10 0xffffffff805f4f54 in in_leavegroup_locked (inm=0xfffff8000533b800, imf=<value optimized out>) at /usr/src/sys/netinet/in_mcast.c:1285 #11 0xffffffff805f1cb7 in in_difaddr_ioctl (data=<value optimized out>, ifp=0xfffff8000578f000, td=<value optimized out>) at /usr/src/sys/netinet/in.c:603 #12 0xffffffff805f0d7a in in_control (so=<value optimized out>, cmd=<value optimized out>, data=0xfffffe0093ff3998 "", ifp=0xfffff8000578f000, td=0x0) at /usr/src/sys/netinet/in.c:219 #13 0xffffffff805c2454 in if_purgeaddrs (ifp=0xfffff8000578f000) at /usr/src/sys/net/if.c:816 #14 0xffffffff805c2853 in if_detach_internal (ifp=0xfffff8000578f000, vmove=1) at /usr/src/sys/net/if.c:913 #15 0xffffffff805c310e in if_vmove (ifp=0xfffff8000578f000, new_vnet=0xfffff8000272b0c0) at /usr/src/sys/net/if.c:1007 #16 0xffffffff805e06a8 in vnet_destroy (vnet=0xfffff80005267dc0) at /usr/src/sys/net/vnet.c:287 #17 0xffffffff804c455d in prison_deref (pr=0xffffffff80aa0fe0, flags=Cannot access memory at address 0x3 ) at /usr/src/sys/kern/kern_jail.c:2576 #18 0xffffffff8053db90 in taskqueue_run_locked (queue=0xfffff80002869b00) at /usr/src/sys/kern/subr_taskqueue.c:431 #19 0xffffffff8053e7db in taskqueue_thread_loop (arg=<value optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:695 #20 0xffffffff804bd284 in fork_exit (callout=0xffffffff8053e740 <taskqueue_thread_loop>, arg=0xffffffff80dc6a60, frame=0xfffffe0093ff3c00) at /usr/src/sys/kern/kern_fork.c:996 #21 0xffffffff8071316e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:610 #22 0x0000000000000000 in ?? () Current language: auto; currently minimal FreeBSD build.home.lan 11.0-CURRENT FreeBSD 11.0-CURRENT #1 r277827M: Thu Jan 29 16:19:57 CET 2015 herbert@build.home.lan:/usr/obj/usr/src/sys/VM amd64 This system is running in Virtualbox on Fedora 21.
(In reply to h.skuhra from comment #15) Thank you. Can I have information about your network interface configuration, too?
Sure, here they are: % ifconfig -a em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM> ether 08:00:27:cf:2c:a6 inet 172.16.4.251 netmask 0xffff0000 broadcast 172.16.255.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2 inet 127.0.0.1 netmask 0xff000000 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> groups: lo bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether 02:86:b3:c7:23:00 inet 10.0.0.1 netmask 0xff000000 broadcast 10.255.255.255 nd6 options=9<PERFORMNUD,IFDISABLED> groups: bridge id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200 root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0 member: epair1a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 5 priority 128 path cost 2000 member: epair0a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP> ifmaxaddr 0 port 4 priority 128 path cost 2000 epair0a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8<VLAN_MTU> ether 02:ff:80:00:04:0a inet6 fe80::ff:80ff:fe00:40a%epair0a prefixlen 64 scopeid 0x4 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active groups: epair epair1a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8<VLAN_MTU> ether 02:ff:80:00:05:0a inet6 fe80::ff:80ff:fe00:50a%epair1a prefixlen 64 scopeid 0x5 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>) status: active groups: epair
Sorry, the last panic obviously was a Virtualbox problem. Although I haven't touched the VM settings. Well, after disabling nested paging for the VM I only get a panic when pf is enabled. But I think this a different issue! I probably have to run CURRENT also on real hardware! Thanks.
now in log ifa_del_loopback_route: deletion failed: 3 ifa_del_loopback_route: deletion failed: 48 Freed UMA keg (tcp_inpcb) was not empty (63 items). Lost 7 pages of memory. Freed UMA keg (tcpcb) was not empty (18 items). Lost 6 pages of memory. kernel applied on r277815+1a00ddf(master)-dirty
jail start epair2a: Ethernet address: 02:ff:40:00:05:0a epair2b: Ethernet address: 02:ff:90:00:06:0b epair2a: link state changed to UP epair2b: link state changed to UP epair2a: promiscuous mode enabled em0: link state changed to DOWN em0: link state changed to UP jail stop ifa_del_loopback_route: deletion failed: 3 ifa_del_loopback_route: deletion failed: 48 Freed UMA keg (tcp_inpcb) was not empty (63 items). Lost 7 pages of memory. Freed UMA keg (tcpcb) was not empty (18 items). Lost 6 pages of memory. epair2a: promiscuous mode disabled em0: link state changed to DOWN epair2a: link state changed to DOWN epair2b: link state changed to DOWN em0: link state changed to UP
@max.n.boyarov: are you using PF inside or outside your jail? The KEG warnings are known. With this latest patch are you getting kernel panics, or is your system hanging when you delete your jail?
(In reply to Craig Rodrigues from comment #21) >> @max.n.boyarov: are you using PF inside or outside your jail? No I don't have firewall at all, but if it need for tests I could setup pf or ipfw. >> With this latest patch >> are you getting kernel panics, or is your system hanging when you delete >> your jail? No panics or hangs, I can start/stop jails (at test time i restart jails 5 times).
@hrs , can you commit this fix? It does solve the problem. The earlier comment in this thread about hangs are unrelated to your fix.
A commit references this bug: Author: hrs Date: Sat Feb 14 18:15:15 UTC 2015 New revision: 278766 URL: https://svnweb.freebsd.org/changeset/base/278766 Log: Fix a panic when tearing down a vnet on a VIMAGE-enabled kernel. There was a race that bridge_ifdetach() could be called via ifnet_departure event handler after vnet_bridge_uninit(). PR: 195859 Reported by: Danilo Egea Gondolfo Changes: head/sys/net/if_bridge.c
(In reply to Craig Rodrigues from comment #23) Committed just now. Please reopen this PR if this problem persists. I will merge virtualization of if_bridge cloner to -stable including this fix later.