Bug 232362

Summary: if_lagg(4)+if_vlan(4) panic: sleeping in an epoch section (igb, e1000_82575, actually 82576 in use)
Product: Base System Reporter: Harald Schmalzbauer <bugzilla.freebsd>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Closed DUPLICATE    
Severity: Affects Some People CC: hselasky, jailbird, mmacy, pi
Priority: --- Keywords: IntelNetworking, crash, needs-qa
Version: CURRENT   
Hardware: amd64   
OS: Any   

Description Harald Schmalzbauer 2018-10-17 18:07:40 UTC
Hello,

unfortunately me again, only with error reports, no patches:
FreeBSD 12.0-FP0_ALPHA10 #0 r339388M: Tue Oct 16 23:55:39 CEST 2018
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff81932340
vpanic() at vpanic+0x1a3/frame 0xffffffff819323a0
panic() at panic+0x43/frame 0xffffffff81932400
_sleep() at _sleep+0x456/frame 0xffffffff819324a0
pause_sbt() at pause_sbt+0x11f/frame 0xffffffff819324e0
e1000_disable_pcie_master_generic() at e1000_disable_pcie_master_generic+0x151/frame 0xffffffff81932520
e1000_reset_hw_82575() at e1000_reset_hw_82575+0x17/frame 0xffffffff81932560
em_if_stop() at em_if_stop+0x1b/frame 0xffffffff81932580
iflib_stop() at iflib_stop+0xb1/frame 0xffffffff819325d0
iflib_vlan_register() at iflib_vlan_register+0xad/frame 0xffffffff81932610
lagg_register_vlan() at lagg_register_vlan+0xda/frame 0xffffffff81932670
vlan_config() at vlan_config+0x51e/frame 0xffffffff819326d0
vlan_clone_create() at vlan_clone_create+0x2c4/frame 0xffffffff81932740
if_clone_createif() at if_clone_createif+0x4a/frame 0xffffffff81932790
ifioctl() at ifioctl+0x80c/frame 0xffffffff81932850
kern_ioctl() at kern_ioctl+0x2ba/frame 0xffffffff819328b0
sys_ioctl() at sys_ioctl+0x16a/frame 0xffffffff81932980
amd64_syscall() at amd64_syscall+0x28c/frame 0xffffffff81932ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xffffffff81932ab0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800485f6a, rsp = 0x7fffffffe368, rbp = 0x7fffffffe370 ---
KDB: enter: panic

#7  0xffffffff80ab8915 in calltrap () at /usr/local/share/deploy-tools/HEAD/src/sys/amd64/amd64/exception.S:232
#8  0xffffffff807b717b in kdb_enter (why=0xffffffff80c25469 "panic", msg=<value optimized out>) at cpufunc.h:65
#9  0xffffffff80770670 in vpanic (fmt=<value optimized out>, ap=0xffffffff819323e0)
    at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_shutdown.c:861
#10 0xffffffff80770413 in panic (fmt=<value optimized out>)
    at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_shutdown.c:799
#11 0xffffffff8077b256 in _sleep (ident=0xffffffff81101763, lock=0x0, priority=0, 
    wmesg=0x300000003 <Address 0x300000003 out of bounds>, sbt=<value optimized out>, pr=-2135615130, flags=256)
    at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_synch.c:150
#12 0xffffffff8077b65f in pause_sbt (wmesg=<value optimized out>, sbt=4294967, pr=<value optimized out>, 
    flags=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_synch.c:332
#13 0xffffffff80569511 in e1000_disable_pcie_master_generic (hw=0xfffffe0000790008) at e1000_osdep.h:88
#14 0xffffffff80558797 in e1000_reset_hw_82575 (hw=0xfffffe0000790008)
    at /usr/local/share/deploy-tools/HEAD/src/sys/dev/e1000/e1000_82575.c:1356
#15 0xffffffff8054014b in em_if_stop (ctx=<value optimized out>)
    at /usr/local/share/deploy-tools/HEAD/src/sys/dev/e1000/if_em.c:1842
#16 0xffffffff80896b01 in iflib_stop (ctx=0xfffff80002a2c000) at ifdi_if.h:268
#17 0xffffffff808a41ad in iflib_vlan_register (arg=0xfffff80002a2c000, ifp=0xfffff80002a32800, vtag=132)
    at /usr/local/share/deploy-tools/HEAD/src/sys/net/iflib.c:3949
#18 0xffffffff8088a7ea in lagg_register_vlan (arg=<value optimized out>, ifp=<value optimized out>, vtag=<value optimized out>)
    at /usr/local/share/deploy-tools/HEAD/src/sys/net/if_lagg.c:451
#19 0xffffffff8089600e in vlan_config (ifv=0xfffff80015916200, p=0xfffff80015874000, vid=<value optimized out>)
    at /usr/local/share/deploy-tools/HEAD/src/sys/net/if_vlan.c:1439
#20 0xffffffff80895064 in vlan_clone_create (ifc=<value optimized out>, name=0xffffffff819328d0 "vlan4", 
    len=18446735277834331904, params=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/net/if_vlan.c:1079
#21 0xffffffff808808ba in if_clone_createif (ifc=0xfffff8000cfb6700, name=0xffffffff819328d0 "vlan4", len=16, 
    params=0x22cb40 <Address 0x22cb40 out of bounds>) at /usr/local/share/deploy-tools/HEAD/src/sys/net/if_clone.c:229
#22 0xffffffff8087835c in ifioctl (so=0xfffff800158f56d0, cmd=3223349628, data=<value optimized out>, td=<value optimized out>)
    at /usr/local/share/deploy-tools/HEAD/src/sys/net/if.c:3046
#23 0xffffffff807dc82a in kern_ioctl (td=0xfffff800153a7000, fd=3, com=<value optimized out>, data=<value optimized out>)
    at file.h:330
#24 0xffffffff807dc4ea in sys_ioctl (td=0xfffff800153a7000, uap=0xfffff800153a73c0)
    at /usr/local/share/deploy-tools/HEAD/src/sys/kern/sys_generic.c:712
#25 0xffffffff80adfd7c in amd64_syscall (td=0xfffff800153a7000, traced=0) at subr_syscall.c:135
#26 0xffffffff80ab91fd in fast_syscall_common () at /usr/local/share/deploy-tools/HEAD/src/sys/amd64/amd64/exception.S:504
#27 0x0000000800485f6a in ?? ()
Previous frame inner to this frame (corrupt stack?)

(kgdb) frame 14
#14 0xffffffff80558797 in e1000_reset_hw_82575 (hw=0xfffffe0000790008)
    at /usr/local/share/deploy-tools/HEAD/src/sys/dev/e1000/e1000_82575.c:1356
1356		ret_val = e1000_disable_pcie_master_generic(hw);

Actually, my hardware is 82576 (Kawela), but this most likely isn't related...
Just to mention, core and source are available, easy for me to provide more info!

The panic thing happens during rc(8)/init/boot, haven't checked if manually processing same commands doesn't lead to panic, like it was the case with https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230510 if I remember correctly.  Will check tomorrow.

Thanks for looking into it,
-harry
Comment 2 Harald Schmalzbauer 2018-10-20 11:15:03 UTC
The problem seems to be iflib/igb specific, I guess.
Using if_lagg(4) as parent with one USB ethernet (axge(4)/ue) port doesn't crash the system!
Manually configuring a if_vlan(4) child on a if_lagg(4) parent with a if_igb(4) port also leads to a panic, like when rc(8) is doing it.  I tested with i210 and 82576.

Here's the i210 backtrace:
panic: sleeping in an epoch section
cpuid = 2
time = 1539941673
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff819e2360
vpanic() at vpanic+0x1a3/frame 0xffffffff819e23c0
panic() at panic+0x43/frame 0xffffffff819e2420
_sleep() at _sleep+0x456/frame 0xffffffff819e24c0
pause_sbt() at pause_sbt+0x11f/frame 0xffffffff819e2500
e1000_reset_hw_82580() at e1000_reset_hw_82580+0x1d6/frame 0xffffffff819e2540
em_if_stop() at em_if_stop+0x1b/frame 0xffffffff819e2560
iflib_stop() at iflib_stop+0xb1/frame 0xffffffff819e25b0
iflib_vlan_register() at iflib_vlan_register+0xad/frame 0xffffffff819e25f0
lagg_register_vlan() at lagg_register_vlan+0xda/frame 0xffffffff819e2650
vlan_config() at vlan_config+0x51e/frame 0xffffffff819e26b0
vlan_ioctl() at vlan_ioctl+0x32e/frame 0xffffffff819e2710
ifhwioctl() at ifhwioctl+0x20e/frame 0xffffffff819e2790
ifioctl() at ifioctl+0x757/frame 0xffffffff819e2850
kern_ioctl() at kern_ioctl+0x2ba/frame 0xffffffff819e28b0
sys_ioctl() at sys_ioctl+0x16a/frame 0xffffffff819e2980
amd64_syscall() at amd64_syscall+0x28c/frame 0xffffffff819e2ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xffffffff819e2ab0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800485f6a, rsp = 0x7fffffffe0e8, rbp = 0x7fffffffe120 ---
KDB: enter: panic
#5  0xffffffff807b7aa3 in kdb_trap (type=3, code=0, tf=<value optimized out>)
    at /usr/local/share/deploy-tools/HEAD/src/sys/kern/subr_kdb.c:693
#6  0xffffffff80adef00 in trap (frame=0xffffffff819e2290) at /usr/local/share/deploy-tools/HEAD/src/sys/amd64/amd64/trap.c:619
#7  0xffffffff80ab8915 in calltrap () at /usr/local/share/deploy-tools/HEAD/src/sys/amd64/amd64/exception.S:232
#8  0xffffffff807b717b in kdb_enter (why=0xffffffff80c25469 "panic", msg=<value optimized out>) at cpufunc.h:65
#9  0xffffffff80770670 in vpanic (fmt=<value optimized out>, ap=0xffffffff819e2400)
    at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_shutdown.c:861
#10 0xffffffff80770413 in panic (fmt=<value optimized out>)
    at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_shutdown.c:799
#11 0xffffffff8077b256 in _sleep (ident=0xffffffff81101762, lock=0x0, priority=0, wmesg=0xc5 <Address 0xc5 out of bounds>, 
    sbt=<value optimized out>, pr=-2136234978, flags=256) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_synch.c:150
#12 0xffffffff8077b65f in pause_sbt (wmesg=<value optimized out>, sbt=42949670, pr=<value optimized out>, 
    flags=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_synch.c:332
#13 0xffffffff80558516 in e1000_reset_hw_82580 (hw=0xffffffff81b84008) at e1000_osdep.h:97
#14 0xffffffff8054014b in em_if_stop (ctx=<value optimized out>)
    at /usr/local/share/deploy-tools/HEAD/src/sys/dev/e1000/if_em.c:1842
#15 0xffffffff80896b01 in iflib_stop (ctx=0xfffff80002b52c00) at ifdi_if.h:268
#16 0xffffffff808a41ad in iflib_vlan_register (arg=0xfffff80002b52c00, ifp=0xfffff80002a9e000, vtag=232)
    at /usr/local/share/deploy-tools/HEAD/src/sys/net/iflib.c:3949
#17 0xffffffff8088a7ea in lagg_register_vlan (arg=<value optimized out>, ifp=<value optimized out>, vtag=<value optimized out>)
    at /usr/local/share/deploy-tools/HEAD/src/sys/net/if_lagg.c:451
#18 0xffffffff8089600e in vlan_config (ifv=0xfffff800150f4180, p=0xfffff80002a25000, vid=<value optimized out>)
    at /usr/local/share/deploy-tools/HEAD/src/sys/net/if_vlan.c:1439
#19 0xffffffff8089579e in vlan_ioctl (ifp=<value optimized out>, cmd=<value optimized out>, data=0xffffffff819e28d0 "lagg1egn")
    at /usr/local/share/deploy-tools/HEAD/src/sys/net/if_vlan.c:1828
#20 0xffffffff8087640e in ifhwioctl (cmd=2149607737, ifp=<value optimized out>, data=0xffffffff819e28d0 "lagg1egn", 
    td=0xfffff80015496000) at /usr/local/share/deploy-tools/HEAD/src/sys/net/if.c:2861
#21 0xffffffff808782a7 in ifioctl (so=0xfffff8001580ca38, cmd=2149607737, data=<value optimized out>, td=0xfffff80015496000)
    at /usr/local/share/deploy-tools/HEAD/src/sys/net/if.c:3081
#22 0xffffffff807dc82a in kern_ioctl (td=0xfffff80015496000, fd=3, com=<value optimized out>, data=<value optimized out>)
    at file.h:330
#23 0xffffffff807dc4ea in sys_ioctl (td=0xfffff80015496000, uap=0xfffff800154963c0)
---Type <return> to continue, or q <return> to quit---
    at /usr/local/share/deploy-tools/HEAD/src/sys/kern/sys_generic.c:712
#24 0xffffffff80adfd7c in amd64_syscall (td=0xfffff80015496000, traced=0) at subr_syscall.c:135
#25 0xffffffff80ab91fd in fast_syscall_common () at /usr/local/share/deploy-tools/HEAD/src/sys/amd64/amd64/exception.S:504
#26 0x0000000800485f6a in ?? ()

I don't think this will be fixable with a one liner...
Can somebody with more knowledge estimate when this can be fixed? Hopefully before 12.0-RELEASE?
I don't have any other NICs available for testing, like bnxt(4), which is also iflib based.  Can somebody with mlx4en(4) or cxgb(4) check? I guess these are non-iflib based NICs.

Thanks,
-harry
Comment 3 Harald Schmalzbauer 2018-10-28 10:20:26 UTC
Just a short note that r339547 (vlan: Fix panic with lagg and vlan) doesn't influence this problem and, like expected, hasn't changed the panic (locally merged to 12-stable).
Likewise r339587 (Resolve deadlock between epoch(9) and various network interface SX-locks,).

Tell me if I can provide more info.

-harry
Comment 4 Harald Schmalzbauer 2019-09-16 08:25:16 UTC

*** This bug has been marked as a duplicate of bug 240609 ***