Bug 255104 - FreeBSD 13.0-RELEASE panic/crash with ipfw/dummynet/divert & wlan
Summary: FreeBSD 13.0-RELEASE panic/crash with ipfw/dummynet/divert & wlan
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash, needs-patch, panic, regression
: 255295 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-04-16 00:43 UTC by Joshua Kinard
Modified: 2021-04-26 20:15 UTC (History)
3 users (show)

See Also:


Attachments
Config for my CUSTOM-13_0 kernel (autogenerated dump from a crash) (2.63 KB, text/plain)
2021-04-16 00:45 UTC, Joshua Kinard
no flags Details
compressed crashlog (29.04 KB, application/vnd.rar)
2021-04-25 05:13 UTC, Michael Meiszl
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joshua Kinard 2021-04-16 00:43:02 UTC
I have upgraded my router appliance to FreeBSD 13.0-RELEASE and when using IPFW + dummynet(4) + divert(4), I can trigger the kernel to panic in a very random fashion.

Background on my setup:

  - Hardware is a Protectli FW6C (https://protectli.com/product/fw6c/)
    * 16GB RAM
    * KINGSTON SUV500MS120G on /dev/ada0
    * 6x Intel 82583V GbE network ports supported by em(4) [em0 to em5]
    * Custom-added Qualcom AR9462 on ath0/wlan0

  - Custom kernel config installed in /boot/kernel.custom
    * Also a /boot/CUSTOM symlink pointing to /boot/kernel.custom
  - em0 is WAN, DHCP via dhclient(8) to my cable modem
  - em1 is LAN, connected to a Netgear switch
  - wlan0 is wireless LAN on a separate RFC1918 subnet from em1
  - Firewall setup is IPFW-based
    * Uses in-kernel NAT for em1 and wlan0 subnets
    * Uses dummynet(4) for fq_codel shaping
    * Uses divert(4) socket to route packets to Snort for inline inspection

Synopsis of what causes the crash:

  - Having Snort up and running in a tmux session
  - wlan0 is active and has a client station connected
  - ipfw divert(4) socket is active, feeding packets to Snort
  - Sending/receiving WLAN traffic will eventually cause a random panic/reboot
  - Traffic on the LAN on em1 does NOT appear to trigger a crash (note, see crash #4)

Here are samples of the crashes.  I do not have the original kernel for some of these, so I cannot generate full backtraces, but I do have several of the core dumps under /var/crash.  Let me know what is needed to help debug this.  Note, I feel that the issue highlighted in PR#255069 may be related somehow.  I also tried patch D29772 posted in PR#255041, and that had no effect.  Crash #6 is using this patched kernel, so I can run kgdb against it if needed.

Crash #1 (Only kgdb backtrace is available):
    #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    #1  doadump (textdump=<optimized out>) at ../../../kern/kern_shutdown.c:399
    #2  0xffffffff8074e645 in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:486
    #3  0xffffffff8074eac0 in vpanic (fmt=<optimized out>, ap=<optimized out>) at ../../../kern/kern_shutdown.c:919
    #4  0xffffffff8074e8c3 in panic (fmt=<unavailable>) at ../../../kern/kern_shutdown.c:843
    #5  0xffffffff80ad2037 in trap_fatal (frame=0xfffffe00dc46d8e0, eva=8) at ../../../amd64/amd64/trap.c:915
    #6  0xffffffff80ad2089 in trap_pfault (frame=frame@entry=0xfffffe00dc46d8e0, usermode=false, signo=<optimized out>, signo@entry=0x0, ucode=<optimized out>, ucode@entry=0x0) at ../../../amd64/amd64/trap.c:732
    #7  0xffffffff80ad1709 in trap (frame=0xfffffe00dc46d8e0) at ../../../amd64/amd64/trap.c:398
    #8  <signal handler called>
    #9  0xffffffff814f00a5 in dummynet_task () from /boot/CUSTOM/dummynet.ko
    #10 0xffffffff807aeda1 in taskqueue_run_locked (queue=0x8962c, queue@entry=0xfffff8000b02d300) at ../../../kern/subr_taskqueue.c:476
    #11 0xffffffff807b00bc in taskqueue_thread_loop (arg=<optimized out>, arg@entry=0xffffffff814fa048 <dn_tq>) at ../../../kern/subr_taskqueue.c:793
    #12 0xffffffff8070e05d in fork_exit (callout=0xffffffff807b0010 <taskqueue_thread_loop>, arg=0xffffffff814fa048 <dn_tq>, frame=0xfffffe00dc46db00) at ../../../kern/kern_fork.c:1069
    #13 <signal handler called>


Crash #2 (kgdb backtrace data unavailable):
    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address   = 0x8
    fault code              = supervisor read data, page not present
    instruction pointer     = 0x20:0xffffffff814f00a5
    stack pointer           = 0x28:0xfffffe00dc46d9a0
    frame pointer           = 0x28:0xfffffe00dc46da00
    code segment            = base rx0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags        = interrupt enabled, resume, IOPL = 0
    current process         = 0 (dummynet)
    trap number             = 12
    panic: page fault
    cpuid = 0
    time = 1618402444
    KDB: stack backtrace:
    #0 0xffffffff8079b0b5 at kdb_backtrace+0x65
    #1 0xffffffff8074ea51 at vpanic+0x181
    #2 0xffffffff8074e8c3 at panic+0x43
    #3 0xffffffff80ad2037 at trap_fatal+0x387
    #4 0xffffffff80ad2089 at trap_pfault+0x49
    #5 0xffffffff80ad1709 at trap+0x259
    #6 0xffffffff80aaa4e8 at calltrap+0x8
    #7 0xffffffff807aeda1 at taskqueue_run_locked+0x181
    #8 0xffffffff807b00bc at taskqueue_thread_loop+0xac
    #9 0xffffffff8070e05d at fork_exit+0x7d
    #10 0xffffffff80aab4ee at fork_trampoline+0xe
    Uptime: 9m23s
    Dumping 787 out of 16144 MB: (CTRL-C to abort) ..3%..11%..21%..31%..41%..51%..61%..72%..82%..92%


Crash #3 (this happened when sending Ctrl+C to the Snort process):
    Fatal trap 12: page fault while in kernel mode
    cpuid = 0; apic id = 00
    fault virtual address   = 0x8
    fault code              = supervisor read data, page not present
    instruction pointer     = 0x20:0xffffffff807ec20c
    stack pointer           = 0x28:0xfffffe011d7d07d0
    frame pointer           = 0x28:0xfffffe011d7d0810
    code segment            = base rx0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags        = interrupt enabled, resume, IOPL = 0
    current process         = 86334 (snort)
    trap number             = 12
    panic: page fault
    cpuid = 0
    time = 1618439898
    KDB: stack backtrace:
    #0 0xffffffff8079e8f5 at kdb_backtrace+0x65
    #1 0xffffffff80752291 at vpanic+0x181
    #2 0xffffffff80752103 at panic+0x43
    #3 0xffffffff80b05a37 at trap_fatal+0x387
    #4 0xffffffff80b05a89 at trap_pfault+0x49
    #5 0xffffffff80b05109 at trap+0x259
    #6 0xffffffff80addee8 at calltrap+0x8
    #7 0xffffffff807eaf68 at sbdestroy+0x18
    #8 0xffffffff807edd39 at sofree+0x309
    #9 0xffffffff807ee824 at soclose+0x2e4
    #10 0xffffffff806f8a91 at _fdrop+0x11
    #11 0xffffffff806fbdcb at closef+0x24b
    #12 0xffffffff806f8d92 at closefp+0x82
    #13 0xffffffff80b0621c at amd64_syscall+0x10c
    #14 0xffffffff80ade80e at fast_syscall_common+0xf8
    Uptime: 21m57s
    Dumping 786 out of 16146 MB:..3%..11%..21%..31%..41%..51%..62%..72%..82%..92%
    
    __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    55	/usr/src/sys/amd64/include/pcpu_aux.h: No such file or directory.
    (kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    #1  doadump (textdump=<optimized out>) at ../../../kern/kern_shutdown.c:399
    #2  0xffffffff80751e85 in kern_reboot (howto=260)
        at ../../../kern/kern_shutdown.c:486
    #3  0xffffffff80752300 in vpanic (fmt=<optimized out>, ap=<optimized out>)
        at ../../../kern/kern_shutdown.c:919
    #4  0xffffffff80752103 in panic (fmt=<unavailable>)
        at ../../../kern/kern_shutdown.c:843
    #5  0xffffffff80b05a37 in trap_fatal (frame=0xfffffe011d7d0710, eva=8)
        at ../../../amd64/amd64/trap.c:915
    #6  0xffffffff80b05a89 in trap_pfault (frame=frame@entry=0xfffffe011d7d0710, 
        usermode=false, signo=<optimized out>, signo@entry=0x0, 
        ucode=<optimized out>, ucode@entry=0x0) at ../../../amd64/amd64/trap.c:732
    #7  0xffffffff80b05109 in trap (frame=0xfffffe011d7d0710)
        at ../../../amd64/amd64/trap.c:398
    #8  <signal handler called>
    #9  sbcut_internal (sb=sb@entry=0xfffff802fa2d68a8, len=3404)
        at ../../../kern/uipc_sockbuf.c:1491
    #10 0xffffffff807eaf68 in sbflush_internal (sb=0xfffff802fa2d68a8, 
        sb@entry=0xfffff802fa2d6760) at ../../../kern/uipc_sockbuf.c:1431
    #11 sbrelease_internal (sb=0xfffff802fa2d68a8, sb@entry=0xfffff802fa2d6760, 
        so=0xfffff802fa2d6760, so@entry=0xfffff802fa2d68a8)
        at ../../../kern/uipc_sockbuf.c:721
    #12 sbdestroy (sb=sb@entry=0xfffff802fa2d68a8, so=so@entry=0xfffff802fa2d6760)
        at ../../../kern/uipc_sockbuf.c:749
    #13 0xffffffff807edd39 in sofree (so=so@entry=0xfffff802fa2d6760)
        at ../../../kern/uipc_socket.c:1158
    #14 0xffffffff807ee824 in soclose (so=0xfffff802fa2d6760)
        at ../../../kern/uipc_socket.c:1235
    #15 0xffffffff806f8a91 in fo_close (fp=fp@entry=0xfffff80010895500, td=0xd4c, 
        td@entry=0xfffffe012053a000) at ../../../sys/file.h:377
    #16 _fdrop (fp=fp@entry=0xfffff80010895500, td=0xd4c, 
        td@entry=0xfffffe012053a000) at ../../../kern/kern_descrip.c:3510
    #17 0xffffffff806fbdcb in closef (fp=fp@entry=0xfffff80010895500, 
        td=td@entry=0xfffffe012053a000) at ../../../kern/kern_descrip.c:2828
    #18 0xffffffff806f8d92 in closefp_impl (fdp=<optimized out>, fd=4, 
        fp=0xfffff80010895500, td=0xfffffe012053a000, audit=true)
        at ../../../kern/kern_descrip.c:1271
    #19 closefp (fdp=<optimized out>, fd=4, fp=0xfffff80010895500, 
        td=0xfffffe012053a000, holdleaders=<optimized out>, audit=true)
        at ../../../kern/kern_descrip.c:1328
    #20 0xffffffff80b0621c in syscallenter (td=0xfffffe012053a000)
        at ../../../amd64/amd64/../../kern/subr_syscall.c:189
    #21 amd64_syscall (td=0xfffffe012053a000, traced=0)
        at ../../../amd64/amd64/trap.c:1156
    #22 <signal handler called>
    #23 0x000000080915b40a in ?? ()
    Backtrace stopped: Cannot access memory at address 0x7fffff4b1458


Crash #4 (based on the stacktrace, this may have been caused by emX traffic):
    NOTE: I use an out-of-tree copy of em-7.7.8 from Intel upstream, modifed
          to compile under FreeBSD 13.0 (changes are trivial).
    Fatal trap 9: general protection fault while in kernel mode
    cpuid = 1; apic id = 02
    instruction pointer     = 0x20:0xffffffff8086e9dc
    stack pointer           = 0x28:0xfffffe00c5b9f840
    frame pointer           = 0x28:0xfffffe00c5b9f890
    code segment            = base rx0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags        = interrupt enabled, resume, IOPL = 0
    current process         = 0 (em0 que)
    trap number             = 9
    panic: general protection fault
    cpuid = 1
    time = 1618440500
    KDB: stack backtrace:
    #0 0xffffffff8079e8f5 at kdb_backtrace+0x65
    #1 0xffffffff80752291 at vpanic+0x181
    #2 0xffffffff80752103 at panic+0x43
    #3 0xffffffff80b05a37 at trap_fatal+0x387
    #4 0xffffffff80b055cf at trap+0x71f
    #5 0xffffffff80addee8 at calltrap+0x8
    #6 0xffffffff8088c488 at netisr_dispatch_src+0xc8
    #7 0xffffffff8086ddd9 at ether_input+0x69
    #8 0xffffffff8086a69a at if_input+0xa
    #9 0xffffffff81b1f000 at em_rxeof+0x260
    #10 0xffffffff81b20380 at em_handle_que+0x40
    #11 0xffffffff807b25e1 at taskqueue_run_locked+0x181
    #12 0xffffffff807b38fc at taskqueue_thread_loop+0xac
    #13 0xffffffff8071189d at fork_exit+0x7d
    #14 0xffffffff80adeeee at fork_trampoline+0xe
    Uptime: 9m14s
    Dumping 819 out of 16146 MB:..2%..12%..22%..32%..42%..51%..61%..71%..81%..92%
    
    __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    55	/usr/src/sys/amd64/include/pcpu_aux.h: No such file or directory.
    (kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    #1  doadump (textdump=<optimized out>) at ../../../kern/kern_shutdown.c:399
    #2  0xffffffff80751e85 in kern_reboot (howto=260)
        at ../../../kern/kern_shutdown.c:486
    #3  0xffffffff80752300 in vpanic (fmt=<optimized out>, ap=<optimized out>)
        at ../../../kern/kern_shutdown.c:919
    #4  0xffffffff80752103 in panic (fmt=<unavailable>)
        at ../../../kern/kern_shutdown.c:843
    #5  0xffffffff80b05a37 in trap_fatal (frame=0xfffffe00c5b9f780, eva=0)
        at ../../../amd64/amd64/trap.c:915
    #6  0xffffffff80b055cf in trap (frame=0xfffffe00c5b9f780)
        at ../../../amd64/amd64/trap.c:576
    #7  <signal handler called>
    #8  ether_input_internal (ifp=0x5f48844900310210, m=0xfffff8039a9e9d00)
        at ../../../net/if_ethersubr.c:524
    #9  ether_nh_input (m=0xfffff8039a9e9d00) at ../../../net/if_ethersubr.c:739
    #10 0xffffffff8088c488 in netisr_dispatch_src (proto=proto@entry=5, 
        source=<optimized out>, source@entry=0, m=m@entry=0xfffff8039a9e9d00)
        at ../../../net/netisr.c:1143
    #11 0xffffffff8088c76f in netisr_dispatch (proto=2594086144, proto@entry=5, 
        m=0x2d, m@entry=0xfffff8039a9e9d00) at ../../../net/netisr.c:1234
    #12 0xffffffff8086ddd9 in ether_input (ifp=<optimized out>, 
        m=0xfffff8039a9e9d00) at ../../../net/if_ethersubr.c:830
    #13 0xffffffff8086a69a in if_input (ifp=0xfffff8039a9e9d00, sendmp=0x0)
        at ../../../net/if.c:4391
    #14 0xffffffff81b1f000 in em_rxeof () from /boot/modules/if_em_updated.ko
    #15 0xffffffff81b20380 in em_handle_que () from /boot/modules/if_em_updated.ko
    #16 0xffffffff807b25e1 in taskqueue_run_locked (queue=0xfffff80017500200, 
        queue@entry=0xfffff80002bdfa00) at ../../../kern/subr_taskqueue.c:476
    #17 0xffffffff807b38fc in taskqueue_thread_loop (arg=<optimized out>, 
        arg@entry=0xfffffe002014e6a0) at ../../../kern/subr_taskqueue.c:793
    #18 0xffffffff8071189d in fork_exit (
        callout=0xffffffff807b3850 <taskqueue_thread_loop>, 
        arg=0xfffffe002014e6a0, frame=0xfffffe00c5b9fb00)
        at ../../../kern/kern_fork.c:1069
    #19 <signal handler called>


Crash #5:
    Fatal trap 12: page fault while in kernel mode
    cpuid = 1; apic id = 02
    fault virtual address   = 0x0
    fault code              = supervisor read data, page not present
    instruction pointer     = 0x20:0xffffffff8047ae0d
    stack pointer           = 0x28:0xfffffe001d3fc550
    frame pointer           = 0x28:0xfffffe001d3fc590
    code segment            = base rx0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags        = interrupt enabled, resume, IOPL = 0
    current process         = 12 (swi1: netisr 1)
    trap number             = 12
    panic: page fault
    cpuid = 1
    time = 1618441084
    KDB: stack backtrace:
    #0 0xffffffff8079e8f5 at kdb_backtrace+0x65
    #1 0xffffffff80752291 at vpanic+0x181
    #2 0xffffffff80752103 at panic+0x43
    #3 0xffffffff80b05a37 at trap_fatal+0x387
    #4 0xffffffff80b05a89 at trap_pfault+0x49
    #5 0xffffffff80b05109 at trap+0x259
    #6 0xffffffff80addee8 at calltrap+0x8
    #7 0xffffffff808a73a3 at ieee80211_parent_xmitpkt+0x13
    #8 0xffffffff808b988e at ieee80211_vap_pkt_send_dest+0x25e
    #9 0xffffffff808ba606 at ieee80211_vap_transmit+0x1d6
    #10 0xffffffff8086d82b at ether_output_frame+0xab
    #11 0xffffffff8086d727 at ether_output+0x6b7
    #12 0xffffffff808eb2e9 at ip_output_send+0x109
    #13 0xffffffff808eb062 at ip_output+0x12a2
    #14 0xffffffff808e8164 at ip_forward+0x394
    #15 0xffffffff808e7d89 at ip_input+0x6c9
    #16 0xffffffff8088cc1b at swi_net+0x12b
    #17 0xffffffff80714abd at ithread_loop+0x24d
    Uptime: 3m18s
    Dumping 849 out of 16146 MB:..2%..12%..21%..31%..42%..51%..61%..72%..81%..91%
    
    __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    55	/usr/src/sys/amd64/include/pcpu_aux.h: No such file or directory.
    (kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    #1  doadump (textdump=<optimized out>) at ../../../kern/kern_shutdown.c:399
    #2  0xffffffff80751e85 in kern_reboot (howto=260)
        at ../../../kern/kern_shutdown.c:486
    #3  0xffffffff80752300 in vpanic (fmt=<optimized out>, ap=<optimized out>)
        at ../../../kern/kern_shutdown.c:919
    #4  0xffffffff80752103 in panic (fmt=<unavailable>)
        at ../../../kern/kern_shutdown.c:843
    #5  0xffffffff80b05a37 in trap_fatal (frame=0xfffffe001d3fc490, eva=0)
        at ../../../amd64/amd64/trap.c:915
    #6  0xffffffff80b05a89 in trap_pfault (frame=frame@entry=0xfffffe001d3fc490, 
        usermode=false, signo=<optimized out>, signo@entry=0x0, 
        ucode=<optimized out>, ucode@entry=0x0) at ../../../amd64/amd64/trap.c:732
    #7  0xffffffff80b05109 in trap (frame=0xfffffe001d3fc490)
        at ../../../amd64/amd64/trap.c:398
    #8  <signal handler called>
    #9  ath_transmit (ic=<optimized out>, m=0xfffff801ed556200)
        at ../../../dev/ath/if_ath.c:3516
    #10 0xffffffff808a73a3 in ieee80211_parent_xmitpkt (ic=0x0, 
        ic@entry=0xfffffe00d844f000, m=m@entry=0xfffff8001e808300)
        at ../../../net80211/ieee80211_freebsd.c:717
    #11 0xffffffff808b988e in ieee80211_vap_pkt_send_dest (
        vap=vap@entry=0xfffff8001e266000, m=m@entry=0xfffff8001e808300, 
        ni=ni@entry=0xfffffe012c7b1000)
        at ../../../net80211/ieee80211_output.c:317
    #12 0xffffffff808ba606 in ieee80211_start_pkt (vap=0xfffff8001e266000, 
        m=0xfffff8001e808300) at ../../../net80211/ieee80211_output.c:474
    #13 ieee80211_vap_transmit (ifp=<optimized out>, m=<optimized out>)
        at ../../../net80211/ieee80211_output.c:534
    #14 0xffffffff8086d82b in ether_output_frame (
        ifp=ifp@entry=0xfffff8001e188000, m=0xfffffe012c7b1000)
        at ../../../net/if_ethersubr.c:511
    #15 0xffffffff8086d727 in ether_output (ifp=<optimized out>, 
        m=0xfffffe012c7b1000, dst=0xfffffe001d3fc8e0, ro=<optimized out>)
        at ../../../net/if_ethersubr.c:438
    #16 0xffffffff808eb2e9 in ip_output_send (inp=inp@entry=0x0, 
        ifp=0xfffff8001e188000, m=m@entry=0xfffff8001e808300, gw=<optimized out>, 
        gw@entry=0xfffffe001d3fc8e0, ro=<optimized out>, 
        ro@entry=0xfffffe001d3fc8c0, stamp_tag=<optimized out>)
        at ../../../netinet/ip_output.c:275
    #17 0xffffffff808eb062 in ip_output (m=m@entry=0xfffff8001e808300, 
        opt=<optimized out>, opt@entry=0x0, ro=<optimized out>, 
        ro@entry=0xfffffe001d3fc8c0, flags=flags@entry=1, imo=imo@entry=0x0, 
        inp=<optimized out>, inp@entry=0x0) at ../../../netinet/ip_output.c:812
    #18 0xffffffff808e8164 in ip_forward (m=0xfffff8001e808300, 
        srcrt=<optimized out>) at ../../../netinet/ip_input.c:1067
    #19 0xffffffff808e7d89 in ip_input (m=0x0) at ../../../netinet/ip_input.c:789
    #20 0xffffffff8088cc1b in netisr_process_workstream_proto (
        nwsp=<optimized out>, proto=1) at ../../../net/netisr.c:919
    #21 swi_net (arg=<optimized out>) at ../../../net/netisr.c:966
    #22 0xffffffff80714abd in intr_event_execute_handlers (p=<optimized out>, 
        ie=0xfffff80002826b00) at ../../../kern/kern_intr.c:1168
    #23 ithread_execute_handlers (p=<optimized out>, ie=0xfffff80002826b00)
        at ../../../kern/kern_intr.c:1181
    #24 ithread_loop (arg=arg@entry=0xfffff80002833ac0)
        at ../../../kern/kern_intr.c:1269
    #25 0xffffffff8071189d in fork_exit (
        callout=0xffffffff80714870 <ithread_loop>, arg=0xfffff80002833ac0, 
        frame=0xfffffe001d3fcb00) at ../../../kern/kern_fork.c:1069
    #26 <signal handler called>


Crash #6:
    Fatal trap 12: page fault while in kernel mode
    cpuid = 1; apic id = 02
    fault virtual address   = 0x388
    fault code              = supervisor read data, page not present
    instruction pointer     = 0x20:0xffffffff8088cc07
    stack pointer           = 0x28:0xfffffe001d3fc9c0
    frame pointer           = 0x28:0xfffffe001d3fca20
    code segment            = base rx0, limit 0xfffff, type 0x1b
                            = DPL 0, pres 1, long 1, def32 0, gran 1
    processor eflags        = interrupt enabled, resume, IOPL = 0
    current process         = 12 (swi1: netisr 1)
    trap number             = 12
    panic: page fault
    cpuid = 1
    time = 1618528473
    KDB: stack backtrace:
    #0 0xffffffff8079e8f5 at kdb_backtrace+0x65
    #1 0xffffffff80752291 at vpanic+0x181
    #2 0xffffffff80752103 at panic+0x43
    #3 0xffffffff80b05d07 at trap_fatal+0x387
    #4 0xffffffff80b05d59 at trap_pfault+0x49
    #5 0xffffffff80b053d9 at trap+0x259
    #6 0xffffffff80ade1b8 at calltrap+0x8
    #7 0xffffffff80714abd at ithread_loop+0x24d
    #8 0xffffffff8071189d at fork_exit+0x7d
    #9 0xffffffff80adf1be at fork_trampoline+0xe
    Uptime: 2m28s
    Dumping 781 out of 16146 MB:..3%..11%..21%..31%..41%..52%..62%..72%..82%..91%
    
    __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    55	/usr/src/sys/amd64/include/pcpu_aux.h: No such file or directory.
    (kgdb) #0  __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55
    #1  doadump (textdump=<optimized out>) at ../../../kern/kern_shutdown.c:399
    #2  0xffffffff80751e85 in kern_reboot (howto=260)
        at ../../../kern/kern_shutdown.c:486
    #3  0xffffffff80752300 in vpanic (fmt=<optimized out>, ap=<optimized out>)
        at ../../../kern/kern_shutdown.c:919
    #4  0xffffffff80752103 in panic (fmt=<unavailable>)
        at ../../../kern/kern_shutdown.c:843
    #5  0xffffffff80b05d07 in trap_fatal (frame=0xfffffe001d3fc900, eva=904)
        at ../../../amd64/amd64/trap.c:915
    #6  0xffffffff80b05d59 in trap_pfault (frame=frame@entry=0xfffffe001d3fc900, 
        usermode=false, signo=<optimized out>, signo@entry=0x0, 
        ucode=<optimized out>, ucode@entry=0x0) at ../../../amd64/amd64/trap.c:732
    #7  0xffffffff80b053d9 in trap (frame=0xfffffe001d3fc900)
        at ../../../amd64/amd64/trap.c:398
    #8  <signal handler called>
    #9  0xffffffff8088cc07 in netisr_process_workstream_proto (
        nwsp=<optimized out>, proto=1) at ../../../net/netisr.c:918
    #10 swi_net (arg=<optimized out>) at ../../../net/netisr.c:966
    #11 0xffffffff80714abd in intr_event_execute_handlers (p=<optimized out>, 
        ie=0xfffff80002826b00) at ../../../kern/kern_intr.c:1168
    #12 ithread_execute_handlers (p=<optimized out>, ie=0xfffff80002826b00)
        at ../../../kern/kern_intr.c:1181
    #13 ithread_loop (arg=arg@entry=0xfffff80002833ac0)
        at ../../../kern/kern_intr.c:1269
    #14 0xffffffff8071189d in fork_exit (
        callout=0xffffffff80714870 <ithread_loop>, arg=0xfffff80002833ac0, 
        frame=0xfffffe001d3fcb00) at ../../../kern/kern_fork.c:1069
    #15 <signal handler called>

-----------------------------------------------------------------------

I suspect the underlying flaw is somehow tied to an interaction with divert(8) and dummynet(8) and the wlan0 adapter.  Standard LAN traffic does not seem to trigger the panic, or at least trigger it as easily.  But WLAN traffic does trigger it very easily, usually within a minute or two of turning on the divert(8) rule, connecting a wireless station, and generating some wireless traffic.  I also suspect Snort is applying memory pressure somehow.  I am using the standard Talos ruleset (30-day delayed release, several months old).

This is how I start Snort-2.9.17:
snort -c /usr/local/etc/snort/snort.conf -i em0 -k none -A console -Q --daq ipfw --daq-mode inline --daq-var port=8000

And this is the divert(8) rule:
ipfw add 00049 divert 8000 all from any to any via em0

This is my NAT/dummynet configuration from the firewall:
/sbin/ipfw nat 1 config if em0 deny_in same_ports unreg_only reset
/sbin/ipfw pipe 1 config bw 294MBit/s burst 1048576        # Download pipe
/sbin/ipfw pipe 2 config bw 12MBit/s                       # Upload pipe
/sbin/ipfw sched 1 config pipe 1 type fq_codel target 5ms quantum 6000 flows 2048 interval 300 limit 15360 ecn
/sbin/ipfw sched 2 config pipe 2 type fq_codel ecn
/sbin/ipfw queue 01 config sched 2 weight 100              # Outbound TCP ACK
/sbin/ipfw queue 02 config sched 1 weight 100              # Inbound TCP ACK
/sbin/ipfw queue 03 config sched 2 weight  90              # Outbound HTTP/HTTPS/RSYNC
/sbin/ipfw queue 04 config sched 1 weight  90              # Inbound HTTP/HTTPS/RSYNC
/sbin/ipfw queue 05 config sched 2 weight  85              # Outbound DNS
/sbin/ipfw queue 06 config sched 1 weight  85              # Inbound DNS
/sbin/ipfw queue 07 config sched 2 weight  65              # Outbound Steam Client
/sbin/ipfw queue 08 config sched 1 weight  65              # Inbound Steam Client
/sbin/ipfw queue 09 config sched 2 weight  55              # Outbound IMAP/POP3/SMTP
/sbin/ipfw queue 10 config sched 1 weight  55              # Inbound IMAP/POP3/SMTP

That's about all I can think that is relevant.  Please let me know if any additional information is needed.  The system is rolled back to FreeBSD 12.2, but I am keeping the FreeBSD 13.0 boot environment, so I can easily reboot into 13.0 and try any patches out.
Comment 1 Joshua Kinard 2021-04-16 00:45:44 UTC
Created attachment 224146 [details]
Config for my CUSTOM-13_0 kernel (autogenerated dump from a crash)

Copied from one of the /var/crash/core.txt.* files
Comment 2 Michael Meiszl 2021-04-21 09:44:30 UTC
(In reply to Joshua Kinard from comment #1)
I just reported an error that seems to be alike to yours.
Here its more than just a router, so its harder to locate. But basically I have also tracked it down to ipfw panicing on incoming packets.

Fatal trap 12: page fault while in kernel mode
cpuid = 4; apic id = 04
fault virtual address	= 0x388
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80d3fa67
stack pointer	        = 0x28:0xfffffe00df2feac0
frame pointer	        = 0x28:0xfffffe00df2feb20
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 12 (swi1: netisr 0)
trap number		= 12
panic: page fault
cpuid = 4
time = 1618988377
KDB: stack backtrace:
#0 0xffffffff80c57345 at kdb_backtrace+0x65
#1 0xffffffff80c09d21 at vpanic+0x181
#2 0xffffffff80c09b93 at panic+0x43
#3 0xffffffff8108b187 at trap_fatal+0x387
#4 0xffffffff8108b1df at trap_pfault+0x4f
#5 0xffffffff8108a83d at trap+0x27d
#6 0xffffffff810617a8 at calltrap+0x8
#7 0xffffffff80bcae5d at ithread_loop+0x24d
#8 0xffffffff80bc7c5e at fork_exit+0x7e
#9 0xffffffff8106282e at fork_trampoline+0xe
Uptime: 2m6s

Runtime is random (seems to depend on how much traffic is incoming and since this is a central router and tunnel endpoint, it does not take long to crash)

The system stays alive with ipfw disabled, but of course, this is no valid option for this machine

(at least we are TWO now with this problem....)
Comment 3 Michael Meiszl 2021-04-21 15:00:50 UTC
*** Bug 255295 has been marked as a duplicate of this bug. ***
Comment 4 Michael Meiszl 2021-04-21 15:04:38 UTC
Because playing around with a mission critical machine is no real option, I have set up some old hardware directly with 13.0 and try to reproduce the crashes. But it will take some time because I have to find some more parts to simulate a heavy trafficing client behind this new router...
Comment 5 Joshua Kinard 2021-04-23 04:12:27 UTC
(In reply to Michael Meiszl from comment #4)

Yeah, in my case, using a divert(4) rule seemed to be part of a trigger condition, and that may be related to a similar bug reported in Bug #255164, which has a commit (652908599b6f) that addresses issues in ipfw and divert w/ the new unmapped mbuf feature.  I tried applying that patch, along with several others (Bug #254309, 703419774f86; Bug #255041, 9bacbf1ae243).  No dice.  Still able to reproduce random kernel crashes (not always a "panic" -- general protection fault this time).
Comment 6 Michael Meiszl 2021-04-23 04:29:44 UTC
the sad news is that my freshly installed testmachine does not show up any problems (yet).
So far, ipfw has not really anything to filter on it, the packets are just passed on.
I will try to create a more challenging setup, maybe even moving the main (real) v6 tunnel to that machine. But this will interrupt all internet services in case of a crash. Maybe I better wait for the weekend (or you might read "admin got killed by rioting users" soon).

My Testhardware:
I7-6400 / 32Gb
NVme 250Gb
3* Realtek (reX) cards
FBSD 13.0 (Generic, installed from ISO/Stick)

the "real" machine:
Ryzen 3400G / 32Gb
NVMe 250Gb / NVMe 2Tb
1* Intel Pro/10Gb (ix0,ix1) 
1* Realtek 1Gb (re0)
FBSD 12.2-RELEASE-p6 (Generic) working, Updated to 13.0 crashing
Comment 7 Jack 2021-04-23 04:33:43 UTC
I'm also getting random crashes with a similar setup

I have these in my custom kernel
device         if_bridge
options        LIBALIAS
options        IPFIREWALL
options        IPFIREWALL_DEFAULT_TO_ACCEPT
options        IPFIREWALL_NAT
options        IPDIVERT
options        IPSTEALTH

My ipfw rules
00101 allow ip from any to any via lo0
00102 divert 8668 ip from any to me in via igb1
00103 divert 8668 ip4 from 10.100.0.0/23 to not me out via igb1
00104 deny ip from any to any 25 via igb0
00200 deny ip from any to 127.0.0.0/8
00300 deny ip from 127.0.0.0/8 to any
65535 allow ip from any to any

rc.conf
natd_enable="YES"
natd_flags="-f /etc/natd.conf"
natd_interface="igb1"
gateway_enable="YES"
firewall_enable="YES"
firewall_type="OPEN"

/etc/natd.conf
use_sockets yes
same_ports yes
dynamic yes

I don't have debug turned on but ever since upgrading from 12.2 to 13.0-STABLE, it has been randomly crashing every few hours. The server is an NFS file server and PXE server so it doesn't see much external traffic, only lots of internal traffic.
Comment 8 Michael Meiszl 2021-04-24 05:27:15 UTC
Updated info: I once more tried to update the "real machine", disabling ipfw during the update process
This worked (like it did before already).
Then I deleted all "table" rules, converting the more than 1000 entries to "normal" rules and restarted ipfw.
This is still working (for some hours now, before the machine die not survive even 5mins).

So I guess, the problem is within the table processing of ipfw.

Let it run for some more days, then we can say if this is the right spot to look after.
Comment 9 Michael Meiszl 2021-04-24 06:39:40 UTC
yeah, I was too optimistic :-(
It worked, but only unless I did a firewall_enable="YES" into rc.conf and rebootet :-(
Soon after the reboot the machine got locked up again, only the reset key helped me out (even the console was dead).

But, what is really strange, it works if I use firewall_enable="NO", reboot, log in manually and do a "service ipfw onestart".
Then it runs for hours and honors all those rules.

I don't see the real difference in those two starting methods, only that everything is up and running already if I start the fw later on.

Maybe there is a strange and yet unknown dependency in 13???
Comment 10 Michael Meiszl 2021-04-25 05:13:27 UTC
Created attachment 224413 [details]
compressed crashlog

compressed crashlog
Comment 11 Michael Meiszl 2021-04-25 05:14:09 UTC
Update: started manually, the machine ran for 24hrs like a charm!

maybe somebody else should try this simple trick (set firewall to NO in rc.conf, reboot, login in and do "service ipfw onestart") and check if it works for him too?

This morning, I was curious and thought, maybe I could get the same effect if I delay the firewall loading at boot time by editing the /etc/rc.d/ipwf file. I have tried "Required: LOGIN" and rebootet.
Sadly, it did not work, the machine crashed after 2mins again :-(

Turning it off and started it manually like described above worked once more, it is up and running for over an hour now already (and I dont expect it to crash anymore).
But of course, this is not really an option. Usually the box runs forever unless a major update comes in and brings a new kernel. Then it reboots automatically and I now always have to remember to start the firewall afterwards by hand.

I have supplied the latest kernel crash listing here as an attachment, maybe somebody can take a look at it and find out what is happening? I have some more boxes to update but I wont start unless this bug is fixed.
Comment 12 Jack 2021-04-26 19:10:03 UTC
My machine has been crashing nonstop every few hours. I changed net.link.ether.ipfw=1 to 0 and it hasn't crashed in 19 hours so far. Maybe related to that sysctl?
Comment 13 Jack 2021-04-26 20:15:40 UTC
I recall previous statement about net.link.ether.ipfw=0 not causing crash, the server just mysteriously crashed again even with that setting changed.