If I utilize rc.network(8) to create a vlan(4) child of a iflib em0(4) parent device, I get really a lot of LOR's (most likely all LOR's too fast to read). Estimated 500-2000 LOR's before panic. Here's the backtrace: #1 0xffffffff803ecb2b in db_dump (dummy=<value optimized out>, dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/ddb/db_command.c:574 #2 0xffffffff803ec8f9 in db_command (cmd_table=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/ddb/db_command.c:481 #3 0xffffffff803ec674 in db_command_loop () at /usr/local/share/deploy-tools/HEAD/src/sys/ddb/db_command.c:534 #4 0xffffffff803ef8ff in db_trap (type=<value optimized out>, code=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/ddb/db_main.c:252 #5 0xffffffff80834923 in kdb_trap (type=3, code=0, tf=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/subr_kdb.c:693 #6 0xffffffff80b4a3ef in trap (frame=0xfffffe00751e8560) at /usr/local/share/deploy-tools/HEAD/src/sys/amd64/amd64/trap.c:605 #7 0xffffffff80b25d95 in calltrap () at /usr/local/share/deploy-tools/HEAD/src/sys/amd64/amd64/exception.S:232 #8 0xffffffff80833ffb in kdb_enter (why=0xffffffff80ca81cc "panic", msg=<value optimized out>) at cpufunc.h:65 #9 0xffffffff807e9c00 in vpanic (fmt=<value optimized out>, ap=0xfffffe00751e86d0) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_shutdown.c:852 #10 0xffffffff807e9c93 in panic (fmt=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_shutdown.c:790 #11 0xffffffff8084bab5 in propagate_priority (td=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/subr_turnstile.c:228 #12 0xffffffff8084c56d in turnstile_wait (ts=0xfffff80003089e40, owner=0xfffff800035da580, queue=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/subr_turnstile.c:783 #13 0xffffffff807c7cf1 in __mtx_lock_sleep (c=0xfffff80003636ee0, v=<value optimized out>, opts=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_mutex.c:639 #14 0xffffffff807c7a79 in __mtx_lock_flags (c=0xfffff80003636ee0, opts=<value optimized out>, file=<value optimized out>, line=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_mutex.c:255 #15 0xffffffff807e3725 in _rm_wlock (rm=0xfffff80003636e88) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_rmlock.c:540 #16 0xffffffff807e3a94 in _rm_wlock_debug (rm=0xfffff80003636e88, file=0xffffffff80cb26fe "/usr/local/share/deploy-tools/HEAD/src/sys/net/if_vlan.c", line=1639) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_rmlock.c:605 #17 0xffffffff80904f9b in vlan_link_state (ifp=0xfffff80003ba3000) at /usr/local/share/deploy-tools/HEAD/src/sys/net/if_vlan.c:1639 #18 0xffffffff808efd90 in do_link_state_change (arg=0xfffff80003ba3000, pending=1) at /usr/local/share/deploy-tools/HEAD/src/sys/net/if.c:2332 #19 0xffffffff808484bc in taskqueue_run_locked (queue=0xfffff800030b3500) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/subr_taskqueue.c:465 #20 0xffffffff8084832a in taskqueue_run (queue=0xfffff800030b3500) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/subr_taskqueue.c:484 #21 0xffffffff807ab180 in ithread_loop (arg=<value optimized out>) at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_intr.c:1043 #22 0xffffffff807a8004 in fork_exit (callout=0xffffffff807ab040 <ithread_loop>, arg=0xfffff8000359e080, frame=0xfffffe00751e8ac0) ---Type <return> to continue, or q <return> to quit--- at /usr/local/share/deploy-tools/HEAD/src/sys/kern/kern_fork.c:1057 #23 0xffffffff80b26d6e in fork_trampoline () at /usr/local/share/deploy-tools/HEAD/src/sys/amd64/amd64/exception.S:990 #24 0x0000000000000000 in ?? () Unfortunately I don't have the source easily accessabele on that system and also no easy way to capture console output. Please tell me if additional info is a prerequisite to analyze the problem, I'll provide the missing parts. Thanks, -harry P.S.: If I manually create the vlan(4) child from multi user shell, there are LOR's but _no_ panic happening. Also, the vlan(4) device works afterwards.
P.P.S.: Backtrace is from todays sources, but I saw the panic happening much longer ago. Maybe monthly since April this year. Haven't found time to report earlier, sorry. -harry
https://reviews.freebsd.org/D16808
Kudos! No more LORs and the described problem is solved with D16808 against r338093. But vlan(4) doesn't work as expected. I have to reduce MTU to 1468 on the vlan(4) device to get frames passed out. I haven't really checked much, since there were some offloading changes recently and I'm not sure if if_valn(4) is known to be under rework/broken. I have if_em(4) (I217-V) as parent device. For the moment I haven't disabled any offloading feature, so the interfaces involved read like this: em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000 options=81249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER> ether 56:be:f7:0b:d7:4e hwaddr inet 192.0.2.1 netmask 0xffffff00 broadcast 192.0.2.255 inet6 2001:db8:1::3:1 prefixlen 64 inet6 fe80::54be:f7ff:fe0b:d74e%em0 prefixlen 64 scopeid 0x1 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> vlegn: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1468 options=403<RXCSUM,TXCSUM,LRO> ether 56:be:f7:0b:d7:4e inet 169.254.0.1 netmask 0xffffff00 broadcast 169.254.0.255 inet6 2001:db8:2::3:2 prefixlen 64 inet6 fe80::54be:f7ff:fe0b:d74e%vlegn prefixlen 64 scopeid 0x3 groups: vlan vlan: 1234 vlanpcp: 0 parent interface: em0 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> Usually vlegn (if_vlan(4) child of em0) should work with the inherited MTU of 9000 Shall I file a different PR? And test with different offloading scenarios before doing so? Thanks, -harry
(In reply to Harald Schmalzbauer from comment #3) Just wanted to confirm that the functionality/MTU problem was unrelated to the tested D16808 and seems to be fixed in r338305. Quick local if_vlan(4) tests passed with D16808 applied to r338305. thanks!
A commit references this bug: Author: mmacy Date: Fri Sep 21 01:37:09 UTC 2018 New revision: 338850 URL: https://svnweb.freebsd.org/changeset/base/338850 Log: fix vlan locking to permit sx acquisition in ioctl calls - update vlan(9) to handle changes earlier this year in multicast locking Tested by: np@, darkfiberu at gmail.com PR: 230510 Reviewed by: mjoras@, shurd@, sbruno@ Approved by: re (gjb@) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16808 Changes: head/sys/net/if_var.h head/sys/net/if_vlan.c
*** Bug 230655 has been marked as a duplicate of this bug. ***