Bug 244168

Summary: sys.net.if_lagg_test.lacp_linkstate_destroy_stress panics kernel
Product: Base System Reporter: Li-Wen Hsu <lwhsu>
Component: testsAssignee: freebsd-testing (Nobody) <testing>
Status: Open ---    
Severity: Affects Only Me CC: bdrewery
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Li-Wen Hsu freebsd_committer 2020-02-16 16:46:15 UTC
sys.net.if_lagg_test.lacp_linkstate_destroy_stress panics i386 kernel

https://ci.freebsd.org/job/FreeBSD-head-i386-test/8445/consoleFull

if_lagg_test:lacp_linkstate_destroy_stress  ->  panic: vm_fault_lookup: fault on nofault entry, addr: 0                                                                                                                           [1742/1858]cpuid = 5                                                                                                                                                                                                                                    time = 1581628143                               
KDB: stack backtrace:                                                                                                                                                                                                                        db_trace_self_wrapper(69,198e3e00,14f9e44,198e3e00,d8c6ff4,...) at db_trace_self_wrapper+0x2a/frame 0xd8c6fc0                                                                                                                                
kdb_backtrace(ffdf,8bc7000,8be2000,8be2018,18c7198,...) at kdb_backtrace+0x2e/frame 0xd8c7020                                                                                                                                                vpanic(14f9e44,d8c7068,d8c7068,d8c7138,12c8fbc,...) at vpanic+0x11f/frame 0xd8c7048                                                                                                                                                          
panic(14f9e44,14dc991,0,2,0,...) at panic+0x14/frame 0xd8c705c                                                                                                                                                                               vm_fault(8bc7000,0,1,0,0) at vm_fault+0x1c9c/frame 0xd8c7138                                                                                                                                                                                 
vm_fault_trap(8bc7000,4,1,0,0,0) at vm_fault_trap+0x52/frame 0xd8c7160                                                                                                                                                                       trap_pfault(4,0,0) at trap_pfault+0x161/frame 0xd8c71b4                                           
trap(d8c728c,8,28,28,7d3,...) at trap+0x3a8/frame 0xd8c7280                                                                                                                                                                                  calltrap() at 0xffc0316f/frame 0xd8c7280                                                                                                                                                                                                     --- trap 0xc, eip = 0x10316ea, esp = 0xd8c72cc, ebp = 0xd8c72fc ---                
witness_checkorder(1ba15e30,9,2489d2d7,7d3,0) at witness_checkorder+0x5a/frame 0xd8c72fc                                                                                                                                                     _sx_xlock(1ba15e30,0,2489d2d7,7d3,2696b000,...) at _sx_xlock+0x4d/frame 0xd8c7324                                                                                                                                                            lagg_port_state(2696b000,1) at lagg_port_state+0x2f/frame 0xd8c7348                                       
do_link_state_change(2696b000,1) at do_link_state_change+0xb4/frame 0xd8c7370                                                                                                                                                                taskqueue_run_locked(b8abe00,0,d8c73e4,1027b5e,bd55000,...) at taskqueue_run_locked+0x97/frame 0xd8c73c8                                                                                                                                     
taskqueue_run(bd55000) at taskqueue_run+0x44/frame 0xd8c73d8                                                                                                                                                                                 taskqueue_swi_run(0) at taskqueue_swi_run+0xe/frame 0xd8c73e4                                                  
ithread_loop(196abfa0,d8c7468) at ithread_loop+0x283/frame 0xd8c7434                                                                                                                                                                         fork_exit(f985c0,196abfa0,d8c7468,0,0,...) at fork_exit+0x69/frame 0xd8c7454                                                                                                                                                                 
fork_trampoline() at 0xffc033de/frame 0xd8c7454                                                                                                                                                                                              --- kthread start                                                      
KDB: enter: panic                                                                                                                                                                                                                            [ thread pid 12 tid 100025 ]                                                                                   
Stopped at      kdb_enter+0x35: movl    $0,kdb_why                                                                                                                                                                                           db:0:kdb.enter.panic> show pcpu                      
cpuid        = 5                                                                                                                                                                                                                             dynamic pcpu = 0x9db3180                                                                                                                                                                                                                     curthread    = 0x1962d180: pid 12 tid 100025 critnest 1 "swi6: task queue"                                                                                                                                                                   
curpcb       = 0xd8c74c0                                                                                                                                                                                                                     fpcurthread  = none                                  
idlethread   = 0x19619c00: tid 100008 "idle: cpu5"                                                                                                                                                                                           APIC ID      = 5                                     
currentldt   = 0x50                                                                                                                                                                                                                          trampstk     = 0xffc12ff0                            
kesp0        = 0xd8c74b0                                   
common_tssp  = 0xffc01b88                                  
tlb gen      = 186060                                      
curvnet      = 0xbd06000                                   
spin locks held:                                           
db:0:kdb.enter.panic>
Comment 1 commit-hook freebsd_committer 2020-02-16 16:50:13 UTC
A commit references this bug:

Author: lwhsu
Date: Sun Feb 16 16:49:29 UTC 2020
New revision: 358003
URL: https://svnweb.freebsd.org/changeset/base/358003

Log:
  Temporarily skip sys.net.if_lagg_test.lacp_linkstate_destroy_stress on i386 CI

  It panics kernel

  PR:		244168
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/tests/sys/net/if_lagg_test.sh
Comment 2 Li-Wen Hsu freebsd_committer 2020-06-11 18:57:49 UTC
Also happen on amd64:

https://ci.freebsd.org/job/FreeBSD-head-amd64-test/15477/consoleFull

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x8
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80c31cc5
stack pointer           = 0x28:0xfffffe002c701940
frame pointer           = 0x28:0xfffffe002c7019c0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (swi6: task queue)
trap number             = 12
panic: page fault
cpuid = 0
time = 1591900613
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe002c7015f0
vpanic() at vpanic+0x182/frame 0xfffffe002c701640
panic() at panic+0x43/frame 0xfffffe002c7016a0
trap_fatal() at trap_fatal+0x387/frame 0xfffffe002c701700
trap_pfault() at trap_pfault+0x99/frame 0xfffffe002c701760
trap() at trap+0x2a5/frame 0xfffffe002c701870
calltrap() at calltrap+0x8/frame 0xfffffe002c701870
--- trap 0xc, rip = 0xffffffff80c31cc5, rsp = 0xfffffe002c701940, rbp = 0xfffffe002c7019c0 ---
witness_checkorder() at witness_checkorder+0x65/frame 0xfffffe002c7019c0
_sx_xlock() at _sx_xlock+0x67/frame 0xfffffe002c701a00
lagg_port_state() at lagg_port_state+0x39/frame 0xfffffe002c701a30
do_link_state_change() at do_link_state_change+0xcb/frame 0xfffffe002c701a80
taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe002c701b00
taskqueue_run() at taskqueue_run+0x4d/frame 0xfffffe002c701b20
ithread_loop() at ithread_loop+0x279/frame 0xfffffe002c701bb0
fork_exit() at fork_exit+0x80/frame 0xfffffe002c701bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe002c701bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Comment 3 commit-hook freebsd_committer 2020-06-11 19:00:35 UTC
A commit references this bug:

Author: lwhsu
Date: Thu Jun 11 18:59:57 UTC 2020
New revision: 362072
URL: https://svnweb.freebsd.org/changeset/base/362072

Log:
  Skip sys.net.if_lagg_test.lacp_linkstate_destroy_stress in CI because of panic

  PR:		244168
  Sponsored by:	The FreeBSD Foundation

Changes:
  head/tests/sys/net/if_lagg_test.sh
Comment 4 Bryan Drewery freebsd_committer 2020-06-11 19:26:43 UTC
I have a potential fix for this I've been testing. Will link the review
when I put it out.
Comment 5 Bryan Drewery freebsd_committer 2020-06-15 17:36:58 UTC
https://reviews.freebsd.org/D25284
Comment 6 commit-hook freebsd_committer 2020-08-13 22:07:21 UTC
A commit references this bug:

Author: bdrewery
Date: Thu Aug 13 22:06:27 UTC 2020
New revision: 364220
URL: https://svnweb.freebsd.org/changeset/base/364220

Log:
  lagg: Avoid adding a port to a lagg device being destroyed.

  The lagg_clone_destroy() handles detach and waiting for ifconfig callers
  to drain already.

  This narrows the race for 2 panics that the tests triggered. Both were a
  consequence of adding a port to the lagg device after it had already detached
  from all of its ports. The link state task would run after lagg_clone_destroy()
  free'd the lagg softc.

      kernel:trap_fatal+0xa4
      kernel:trap_pfault+0x61
      kernel:trap+0x316
      kernel:witness_checkorder+0x6d
      kernel:_sx_xlock+0x72
      if_lagg.ko:lagg_port_state+0x3b
      kernel:if_down+0x144
      kernel:if_detach+0x659
      if_tap.ko:tap_destroy+0x46
      kernel:if_clone_destroyif+0x1b7
      kernel:if_clone_destroy+0x8d
      kernel:ifioctl+0x29c
      kernel:kern_ioctl+0x2bd
      kernel:sys_ioctl+0x16d
      kernel:amd64_syscall+0x337

      kernel:trap_fatal+0xa4
      kernel:trap_pfault+0x61
      kernel:trap+0x316
      kernel:witness_checkorder+0x6d
      kernel:_sx_xlock+0x72
      if_lagg.ko:lagg_port_state+0x3b
      kernel:do_link_state_change+0x9b
      kernel:taskqueue_run_locked+0x10b
      kernel:taskqueue_run+0x49
      kernel:ithread_loop+0x19c
      kernel:fork_exit+0x83

  PR:		244168
  Reviewed by:	markj
  MFC after:	2 weeks
  Sponsored by:	Dell EMC
  Differential Revision:	https://reviews.freebsd.org/D25284

Changes:
  head/sys/net/if_lagg.c
Comment 7 Bryan Drewery freebsd_committer 2020-08-13 22:07:28 UTC
I mention this PR in my commit r364220 but note that it is not 100% reliable
at fixing the tests.