Hello, here's a iflib related panic I get on my real-world cold-standby setup with 12.1-prerelease and debug kernel. It happens when creating a vlan(4) child with if_igb(4) pair as lagg(4) parent: <6>vlan0: link state changed to UP panic: sleeping in an epoch section cpuid = 1 time = 1568620268 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe000058b380 vpanic() at vpanic+0x19d/frame 0xfffffe000058b3d0 panic() at panic+0x43/frame 0xfffffe000058b430 _sleep() at _sleep+0x466/frame 0xfffffe000058b4d0 pause_sbt() at pause_sbt+0x10f/frame 0xfffffe000058b510 e1000_reset_hw_82580() at e1000_reset_hw_82580+0x1cc/frame 0xfffffe000058b550 em_if_stop() at em_if_stop+0x1b/frame 0xfffffe000058b570 iflib_stop() at iflib_stop+0xc3/frame 0xfffffe000058b5c0 iflib_vlan_register() at iflib_vlan_register+0xad/frame 0xfffffe000058b600 lagg_register_vlan() at lagg_register_vlan+0xda/frame 0xfffffe000058b660 vlan_config() at vlan_config+0x50b/frame 0xfffffe000058b6c0 vlan_clone_create() at vlan_clone_create+0x29b/frame 0xfffffe000058b730 if_clone_createif() at if_clone_createif+0x4a/frame 0xfffffe000058b780 ifioctl() at ifioctl+0x6fe/frame 0xfffffe000058b850 kern_ioctl() at kern_ioctl+0x2b0/frame 0xfffffe000058b8b0 sys_ioctl() at sys_ioctl+0x15d/frame 0xfffffe000058b980 amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe000058bab0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe000058bab0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80047439a, rsp = 0x7fffffffe348, rbp = 0x7fffffffe350 --- KDB: enter: panic #9 0xffffffff805cf4ca in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/kern_shutdown.c:866 #10 0xffffffff805cf273 in panic (fmt=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/kern_shutdown.c:804 #11 0xffffffff805da0b6 in _sleep (ident=0xffffffff80ef0941, lock=0x0, priority=0, wmesg=<value optimized out>, sbt=42949672, pr=0, flags=256) at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/kern_synch.c:150 #12 0xffffffff805da4af in pause_sbt (wmesg=<value optimized out>, sbt=42949672, pr=<value optimized out>, flags=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/kern_synch.c:332 #13 0xffffffff81b3e7cc in e1000_reset_hw_82580 (hw=0xfffffe004b7eb008) at RELENG_12/src/sys/dev/e1000/e1000_osdep.h:97 #14 0xffffffff81b0c86b in em_if_stop (ctx=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_12/src/sys/dev/e1000/if_em.c:1867 #15 0xffffffff806f74f3 in iflib_stop (ctx=0xfffff8000291f800) at ifdi_if.h:268 #16 0xffffffff80704e3d in iflib_vlan_register (arg=0xfffff8000291f800, ifp=0xfffff8000295a800, vtag=232) at /usr/local/share/deploy-tools/RELENG_12/src/sys/net/iflib.c:3883 #17 0xffffffff806eb94a in lagg_register_vlan (arg=<value optimized out>, ifp=<value optimized out>, vtag=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_12/src/sys/net/if_lagg.c:452 #18 0xffffffff806f68fb in vlan_config (ifv=0xfffff80002555c00, p=0xfffff80002895000, vid=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_12/src/sys/net/if_vlan.c:1431 #19 0xffffffff806f596b in vlan_clone_create (ifc=0xfffff800024dec00, name=0xfffffe000058b8d0 "vlan0", len=18446735277655190528, params=<value optimized out>) at /usr/local/share/deploy-tools/RELENG_12/src/sys/net/if_vlan.c:1066 #20 0xffffffff806e1c3a in if_clone_createif (ifc=0xfffff800024dec00, name=0xfffffe000058b8d0 "vlan0", len=16, params=0x22db40 <Address 0x22db40 out of bounds>) at /usr/local/share/deploy-tools/RELENG_12/src/sys/net/if_clone.c:229 #21 0xffffffff806d90be in ifioctl (so=<value optimized out>, cmd=3223349628, data=0xfffffe000058b8d0 "vlan0", td=0xfffff800037a55e0) at /usr/local/share/deploy-tools/RELENG_12/src/sys/net/if.c:3097 ---Type <return> to continue, or q <return> to quit--- #22 0xffffffff8063d870 in kern_ioctl (td=0xfffff800037a55e0, fd=<value optimized out>, com=3223349628, data=<value optimized out>) at RELENG_12/src/sys/sys/file.h:337 #23 0xffffffff8063d54d in sys_ioctl (td=0xfffff800037a55e0, uap=0xfffff800037a59a0) at /usr/local/share/deploy-tools/RELENG_12/src/sys/kern/sys_generic.c:712 #24 0xffffffff8093abe6 in amd64_syscall (td=0xfffff800037a55e0, traced=0) at RELENG_12/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #25 0xffffffff80912550 in fast_syscall_common () at /usr/local/share/deploy-tools/RELENG_12/src/sys/amd64/amd64/exception.S:581 #26 0x000000080047439a in ?? () Previous frame inner to this frame (corrupt stack?) It's almost identical to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=232362 Only this line/function istn't listed with the new hardware (then Kawela, 82576 –> now StonyLake, i350: #13 0xffffffff80569511 in e1000_disable_pcie_master_generic (hw=0xfffffe0000790008) I'll mark the old one as duplicate. Thanks, -harry
*** Bug 232362 has been marked as a duplicate of this bug. ***
I'm not an expert on locking issues, but it appears that lagg_register_vlan() enters an epoch (via the now-confusingly named LAGG_RLOCK() macro) that iflib_vlan_register() is run inside of, and the msec_delay()->safe_pause_ms()->pause() in the em driver is causing the "sleeping in an epoch section" panic. A quick fix would be to make em *not sleep* during that e1000_reset_hw_82580() function, but that doesn't seem ideal; why should it not be allowed to sleep?
Created attachment 208207 [details] Possible workaround It might be possible to simply test for epoch. My experience if_ioctl()'s that sleep usually cause problems.
(In reply to Hans Petter Selasky from comment #3) Happy to confirm that your patch prevents the machine from panicking during vlan(4) child setup. Haven't done further tests, but to my very limited understanding of the change, any side effects are very unlikely. Thanks, -harry
Just a "me too" here. I'm running 12.1p3/amd64 (with vlan + lagg + em, throw even bridge in) and I'm experiencing deadlocks (VFS related I suspect). So I turned on INVARIANTS, WITNESS, etc..., but could not boot without this patch.
Created attachment 217889 [details] Suggested patch for lagg Here is patch (against head), that prevents lagg reconfiguration to use epoch and uses sleepable lock.
Hi, the panic is still here. It's easily reproduced using the bhyve + e1000 device emulation. # sh /usr/share/examples/bhyve/vmrun.sh -c 2 -m 1024M -n e1000 -t tap0 -t tap1 -d head.img freebsd-head root@vm-13:~ # ifconfig em0 up root@vm-13:~ # ifconfig em1 up root@vm-13:~ # ifconfig lagg create lagg0 root@vm-13:~ # ifconfig lagg0 laggproto lacp laggport em0 laggport em1 192.168.1.1 netmask 255.255.255.0 root@vm-13:~ # ifconfig vlan create vlan 1001 vlandev lagg0 panic: sleepq_add: td 0xfffffe00497c6e00 to sleep on wchan 0xffffffff815ac3d1 with sleeping prohibited cpuid = 1 time = 1607417902 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0049b0a470 vpanic() at vpanic+0x181/frame 0xfffffe0049b0a4c0 panic() at panic+0x43/frame 0xfffffe0049b0a520 sleepq_add() at sleepq_add+0x359/frame 0xfffffe0049b0a570 _sleep() at _sleep+0x20c/frame 0xfffffe0049b0a620 pause_sbt() at pause_sbt+0xfe/frame 0xfffffe0049b0a650 e1000_reset_hw_82540() at e1000_reset_hw_82540+0x177/frame 0xfffffe0049b0a680 em_if_stop() at em_if_stop+0x1b/frame 0xfffffe0049b0a6a0 iflib_stop() at iflib_stop+0xbd/frame 0xfffffe0049b0a6f0 iflib_vlan_register() at iflib_vlan_register+0xe8/frame 0xfffffe0049b0a730 lagg_register_vlan() at lagg_register_vlan+0x102/frame 0xfffffe0049b0a790 vlan_config() at vlan_config+0x553/frame 0xfffffe0049b0a7f0 vlan_clone_create() at vlan_clone_create+0x2a2/frame 0xfffffe0049b0a860 if_clone_createif() at if_clone_createif+0x4a/frame 0xfffffe0049b0a8b0 ifioctl() at ifioctl+0x783/frame 0xfffffe0049b0a980 kern_ioctl() at kern_ioctl+0x289/frame 0xfffffe0049b0a9f0 sys_ioctl() at sys_ioctl+0x12a/frame 0xfffffe0049b0aac0 amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe0049b0abf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0049b0abf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x80042629a, rsp = 0x7fffffffe168, rbp = 0x7fffffffe180 --- KDB: enter: panic [ thread pid 891 tid 100072 ] Stopped at kdb_enter+0x37: movq $0,0x10aa456(%rip) db> The patch from Gleb solves the issue. So, maybe commit it?
With patch provided by Gleb I'm not observe this panic anymore It is 13-CURRENT, r366075 cloned_interfaces="lagg0 vlan101" ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 212.8.x.y netmask 255.255.255.240" ifconfig_vlan101="vlan 101 vlandev lagg0 192.168.1.29/24"
A commit references this bug: Author: glebius Date: Tue Dec 8 16:46:01 UTC 2020 New revision: 368448 URL: https://svnweb.freebsd.org/changeset/base/368448 Log: The list of ports in configuration path shall be protected by locks, epoch shall be used only for fast path. Thus use LAGG_XLOCK() in lagg_[un]register_vlan. This fixes sleeping in epoch panic. PR: 240609 Changes: head/sys/net/if_lagg.c
Can someone gently confirm if this seems to be the same bug? I've hit this with TrueNAS 12.0-U1 and now my system is off-line due to this. https://ibb.co/xLnnYmn It seems that TrueNAS 12.0-U1 is built agains 12.2-RELEASE-p2: FreeBSD freenas.win.versatushpc.com.br 12.2-RELEASE-p2 FreeBSD 12.2-RELEASE-p2 663e6b09467(HEAD) TRUENAS amd64 Sorry for not writing the issue as a text, but I only have this console right now and Serial is for whatever reasons not working.
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=4058265d605de7e6e66d9ad5153ac496f4f3c628 commit 4058265d605de7e6e66d9ad5153ac496f4f3c628 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2020-12-08 16:46:00 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2021-03-09 22:39:06 +0000 The list of ports in configuration path shall be protected by locks, epoch shall be used only for fast path. Thus use LAGG_XLOCK() in lagg_[un]register_vlan. This fixes sleeping in epoch panic. PR: 240609 (cherry picked from commit e1074ed6a08033ee571b4bedb3ffe6049a4a7361) sys/net/if_lagg.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
Looks like problem fixed long time ago. Please re-open if I mistake.