Bug 220980

Summary: [panic] panic when destroying vlan interface with traffic
Product: Base System Reporter: Kun Xie <kxie>
Component: kernAssignee: Matt Joras <mjoras>
Status: Closed FIXED    
Severity: Affects Only Me CC: bugzilla.freebsd, dv, emaste, mjoras
Priority: --- Keywords: crash, needs-qa
Version: 11.0-RELEASEFlags: koobs: mfc-stable11?
mjoras: mfc-stable10-
Hardware: amd64   
OS: Any   
URL: https://reviews.freebsd.org/D11370

Description Kun Xie 2017-07-24 20:28:03 UTC
When running around 30MBps bridge-over-vlan traffic on a virtualbox vm based on FreeBSD 11-Release live CD, I tried to destroy one of the VLAN interfaces  of the bridge with "ifconfig vlanX destroy". Sometimes(~25%), a panic happened. The panic info is as follows:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x3b0
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80ab48ca
stack pointer           = 0x28:0xfffffe003c947550
frame pointer           = 0x28:0xfffffe003c9475d0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq19: virtio_pci0)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff80b24077 at kdb_backtrace+0x67
#1 0xffffffff80ad93e2 at vpanic+0x182
#2 0xffffffff80ad9253 at panic+0x43
#3 0xffffffff80fa0d31 at trap_fatal+0x351
#4 0xffffffff80fa0f23 at trap_pfault+0x1e3
#5 0xffffffff80fa04cc at trap+0x26c
#6 0xffffffff80f84141 at calltrap+0x8
#7 0xffffffff80ad35a6 at _rm_rlock+0x3c6
#8 0xffffffff80beb6d9 at vlan_input+0xb9
#9 0xffffffff80be2a65 at ether_demux+0x95
#10 0xffffffff80be3752 at ether_nh_input+0x322
#11 0xffffffff80bfa095 at netisr_dispatch_src+0xa5
#12 0xffffffff80be2d76 at ether_input+0x26
#13 0xffffffff8091ea8c at vtnet_rxq_eof+0x84c
#14 0xffffffff8091f643 at vtnet_rx_vq_intr+0x93
#15 0xffffffff8091b4d0 at vtpci_legacy_intr+0xb0
#16 0xffffffff80a9340f at intr_event_execute_handlers+0x20f
#17 0xffffffff80a93676 at ithread_loop+0xc6

There is another bug #198580 similar to mine, but it's in 9.2-STABLE so I post a new one.
Comment 1 Matt Joras freebsd_committer freebsd_triage 2017-07-24 21:37:05 UTC
This should be fixed by https://reviews.freebsd.org/D11370

Feel free to test the diff if you'd like.
Comment 2 Matt Joras freebsd_committer freebsd_triage 2017-07-24 21:38:12 UTC
I should note, the fix is not committed yet.
Comment 3 Kun Xie 2017-07-24 21:45:12 UTC
(In reply to Matt Joras from comment #2)

Great! I'll try it. Thanks!
Comment 4 Harald Schmalzbauer 2017-07-26 07:23:01 UTC
(In reply to Matt Joras from comment #1)

Just wanted too drop a note that I've been using it for a month in a semi-productive machine.
It's working fine, haven't found any regression, but wasn't suffering from the former locking deficiencies also.
So not much weight this note, but worth posting I hope ;-)

-harry
Comment 5 Matt Joras freebsd_committer freebsd_triage 2017-07-26 15:12:54 UTC
Thanks Harry! I appreciate the note and your running the patch :).
Comment 6 Matt Joras freebsd_committer freebsd_triage 2017-07-26 15:14:18 UTC
*** Bug 198580 has been marked as a duplicate of this bug. ***
Comment 7 commit-hook freebsd_committer freebsd_triage 2017-08-15 17:53:11 UTC
A commit references this bug:

Author: mjoras
Date: Tue Aug 15 17:52:37 UTC 2017
New revision: 322548
URL: https://svnweb.freebsd.org/changeset/base/322548

Log:
  Rework vlan(4) locking.

  Previously the locking of vlan(4) interfaces was not very comprehensive.
  Particularly there was very little protection against the destruction of
  active vlan(4) interfaces or concurrent modification of a vlan(4)
  interface. The former readily produced several different panics.

  The changes can be summarized as using two global vlan locks (an
  rmlock(9) and an sx(9)) to protect accesses to the if_vlantrunk field of
  struct ifnet, in addition to other places where global exclusive access
  is required. vlan(4) should now be much more resilient to the destruction
  of active interfaces and concurrent calls into the configuration path.

  PR:	220980
  Reviewed by:	ae, markj, mav, rstone
  Approved by:	rstone (mentor)
  MFC after:	4 weeks
  Sponsored by:	Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D11370

Changes:
  head/sys/net/if_vlan.c
Comment 8 Matt Joras freebsd_committer freebsd_triage 2017-08-15 18:02:33 UTC
I will MFC this to 11 since it is a trivial merge. I am not going to MFC to 10 or 9 because I would have to MFC it with some other changes to the vlan locking that were never MFC.