sys.net.if_lagg_test.lacp_linkstate_destroy_stress panics i386 kernel https://ci.freebsd.org/job/FreeBSD-head-i386-test/8445/consoleFull if_lagg_test:lacp_linkstate_destroy_stress -> panic: vm_fault_lookup: fault on nofault entry, addr: 0 [1742/1858]cpuid = 5 time = 1581628143 KDB: stack backtrace: db_trace_self_wrapper(69,198e3e00,14f9e44,198e3e00,d8c6ff4,...) at db_trace_self_wrapper+0x2a/frame 0xd8c6fc0 kdb_backtrace(ffdf,8bc7000,8be2000,8be2018,18c7198,...) at kdb_backtrace+0x2e/frame 0xd8c7020 vpanic(14f9e44,d8c7068,d8c7068,d8c7138,12c8fbc,...) at vpanic+0x11f/frame 0xd8c7048 panic(14f9e44,14dc991,0,2,0,...) at panic+0x14/frame 0xd8c705c vm_fault(8bc7000,0,1,0,0) at vm_fault+0x1c9c/frame 0xd8c7138 vm_fault_trap(8bc7000,4,1,0,0,0) at vm_fault_trap+0x52/frame 0xd8c7160 trap_pfault(4,0,0) at trap_pfault+0x161/frame 0xd8c71b4 trap(d8c728c,8,28,28,7d3,...) at trap+0x3a8/frame 0xd8c7280 calltrap() at 0xffc0316f/frame 0xd8c7280 --- trap 0xc, eip = 0x10316ea, esp = 0xd8c72cc, ebp = 0xd8c72fc --- witness_checkorder(1ba15e30,9,2489d2d7,7d3,0) at witness_checkorder+0x5a/frame 0xd8c72fc _sx_xlock(1ba15e30,0,2489d2d7,7d3,2696b000,...) at _sx_xlock+0x4d/frame 0xd8c7324 lagg_port_state(2696b000,1) at lagg_port_state+0x2f/frame 0xd8c7348 do_link_state_change(2696b000,1) at do_link_state_change+0xb4/frame 0xd8c7370 taskqueue_run_locked(b8abe00,0,d8c73e4,1027b5e,bd55000,...) at taskqueue_run_locked+0x97/frame 0xd8c73c8 taskqueue_run(bd55000) at taskqueue_run+0x44/frame 0xd8c73d8 taskqueue_swi_run(0) at taskqueue_swi_run+0xe/frame 0xd8c73e4 ithread_loop(196abfa0,d8c7468) at ithread_loop+0x283/frame 0xd8c7434 fork_exit(f985c0,196abfa0,d8c7468,0,0,...) at fork_exit+0x69/frame 0xd8c7454 fork_trampoline() at 0xffc033de/frame 0xd8c7454 --- kthread start KDB: enter: panic [ thread pid 12 tid 100025 ] Stopped at kdb_enter+0x35: movl $0,kdb_why db:0:kdb.enter.panic> show pcpu cpuid = 5 dynamic pcpu = 0x9db3180 curthread = 0x1962d180: pid 12 tid 100025 critnest 1 "swi6: task queue" curpcb = 0xd8c74c0 fpcurthread = none idlethread = 0x19619c00: tid 100008 "idle: cpu5" APIC ID = 5 currentldt = 0x50 trampstk = 0xffc12ff0 kesp0 = 0xd8c74b0 common_tssp = 0xffc01b88 tlb gen = 186060 curvnet = 0xbd06000 spin locks held: db:0:kdb.enter.panic>
A commit references this bug: Author: lwhsu Date: Sun Feb 16 16:49:29 UTC 2020 New revision: 358003 URL: https://svnweb.freebsd.org/changeset/base/358003 Log: Temporarily skip sys.net.if_lagg_test.lacp_linkstate_destroy_stress on i386 CI It panics kernel PR: 244168 Sponsored by: The FreeBSD Foundation Changes: head/tests/sys/net/if_lagg_test.sh
Also happen on amd64: https://ci.freebsd.org/job/FreeBSD-head-amd64-test/15477/consoleFull Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x8 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80c31cc5 stack pointer = 0x28:0xfffffe002c701940 frame pointer = 0x28:0xfffffe002c7019c0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi6: task queue) trap number = 12 panic: page fault cpuid = 0 time = 1591900613 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe002c7015f0 vpanic() at vpanic+0x182/frame 0xfffffe002c701640 panic() at panic+0x43/frame 0xfffffe002c7016a0 trap_fatal() at trap_fatal+0x387/frame 0xfffffe002c701700 trap_pfault() at trap_pfault+0x99/frame 0xfffffe002c701760 trap() at trap+0x2a5/frame 0xfffffe002c701870 calltrap() at calltrap+0x8/frame 0xfffffe002c701870 --- trap 0xc, rip = 0xffffffff80c31cc5, rsp = 0xfffffe002c701940, rbp = 0xfffffe002c7019c0 --- witness_checkorder() at witness_checkorder+0x65/frame 0xfffffe002c7019c0 _sx_xlock() at _sx_xlock+0x67/frame 0xfffffe002c701a00 lagg_port_state() at lagg_port_state+0x39/frame 0xfffffe002c701a30 do_link_state_change() at do_link_state_change+0xcb/frame 0xfffffe002c701a80 taskqueue_run_locked() at taskqueue_run_locked+0xaa/frame 0xfffffe002c701b00 taskqueue_run() at taskqueue_run+0x4d/frame 0xfffffe002c701b20 ithread_loop() at ithread_loop+0x279/frame 0xfffffe002c701bb0 fork_exit() at fork_exit+0x80/frame 0xfffffe002c701bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe002c701bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
A commit references this bug: Author: lwhsu Date: Thu Jun 11 18:59:57 UTC 2020 New revision: 362072 URL: https://svnweb.freebsd.org/changeset/base/362072 Log: Skip sys.net.if_lagg_test.lacp_linkstate_destroy_stress in CI because of panic PR: 244168 Sponsored by: The FreeBSD Foundation Changes: head/tests/sys/net/if_lagg_test.sh
I have a potential fix for this I've been testing. Will link the review when I put it out.
https://reviews.freebsd.org/D25284
A commit references this bug: Author: bdrewery Date: Thu Aug 13 22:06:27 UTC 2020 New revision: 364220 URL: https://svnweb.freebsd.org/changeset/base/364220 Log: lagg: Avoid adding a port to a lagg device being destroyed. The lagg_clone_destroy() handles detach and waiting for ifconfig callers to drain already. This narrows the race for 2 panics that the tests triggered. Both were a consequence of adding a port to the lagg device after it had already detached from all of its ports. The link state task would run after lagg_clone_destroy() free'd the lagg softc. kernel:trap_fatal+0xa4 kernel:trap_pfault+0x61 kernel:trap+0x316 kernel:witness_checkorder+0x6d kernel:_sx_xlock+0x72 if_lagg.ko:lagg_port_state+0x3b kernel:if_down+0x144 kernel:if_detach+0x659 if_tap.ko:tap_destroy+0x46 kernel:if_clone_destroyif+0x1b7 kernel:if_clone_destroy+0x8d kernel:ifioctl+0x29c kernel:kern_ioctl+0x2bd kernel:sys_ioctl+0x16d kernel:amd64_syscall+0x337 kernel:trap_fatal+0xa4 kernel:trap_pfault+0x61 kernel:trap+0x316 kernel:witness_checkorder+0x6d kernel:_sx_xlock+0x72 if_lagg.ko:lagg_port_state+0x3b kernel:do_link_state_change+0x9b kernel:taskqueue_run_locked+0x10b kernel:taskqueue_run+0x49 kernel:ithread_loop+0x19c kernel:fork_exit+0x83 PR: 244168 Reviewed by: markj MFC after: 2 weeks Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D25284 Changes: head/sys/net/if_lagg.c
I mention this PR in my commit r364220 but note that it is not 100% reliable at fixing the tests.