Bug 255775 - panic with ipfw turned on at boot time
Summary: panic with ipfw turned on at boot time
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-ipfw (Nobody)
URL:
Keywords: crash, ipfilter
Depends on:
Blocks:
 
Reported: 2021-05-11 05:09 UTC by Michael Meiszl
Modified: 2021-06-19 16:51 UTC (History)
3 users (show)

See Also:


Attachments
crashlog full version (32.08 KB, application/x-zip-compressed)
2021-05-11 05:11 UTC, Michael Meiszl
no flags Details
ruleset mam (12.30 KB, text/plain)
2021-06-19 16:45 UTC, Michael Meiszl
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Meiszl 2021-05-11 05:09:25 UTC
as suggested by Mark Johnson, I add this as a "new" bug because it does not seem to be related to #255104 after some tests.

Description: 13.0 stock Kernel crashes within a few mins if ipfw has been turned on in rc.conf.
If turned off in rc.conf and started later on by root manually, ipfw works flawlessly and the machine is stable for weeks!

There is no fancy setup for ipfw, no divert, no nat, just plain "deny if it comes from addr x" rules.

As I have been told already, I created a kernel with latest patches (including 255104) and turned on INVARIANTS.

I attach the core.txt file at the end, a brief summary is here:

panic: Assertion m->m_nextpkt == NULL failed at /root/src/sys/net/iflib.c:4087
cpuid = 0
time = 1620674444
KDB: stack backtrace:
#0 0xffffffff80c400e5 at kdb_backtrace+0x65
#1 0xffffffff80bf5be1 at vpanic+0x181
#2 0xffffffff80bf59b3 at panic+0x43
#3 0xffffffff80d29c5b at iflib_if_transmit+0x15b
#4 0xffffffff80d0fb9b at ether_output_frame+0xab
#5 0xffffffff80d0faa1 at ether_output+0x6b1
#6 0xffffffff80da58ef at ip_output_send+0x8f
#7 0xffffffff80da55a5 at ip_output+0x1495
#8 0xffffffff80d12350 at gif_transmit+0x2f0
#9 0xffffffff80df2b9b at ip6_forward+0x95b
#10 0xffffffff80df4414 at ip6_input+0xf04
#11 0xffffffff80d2cb11 at netisr_dispatch_src+0xb1
#12 0xffffffff80d0fd3e at ether_demux+0x17e
#13 0xffffffff80d113cc at ether_nh_input+0x40c
#14 0xffffffff80d2cb11 at netisr_dispatch_src+0xb1
#15 0xffffffff80d10231 at ether_input+0xa1
#16 0xffffffff80d28bd7 at iflib_rxeof+0xe07
#17 0xffffffff80d2274a at _task_fn_rx+0x7a
Uptime: 25s
Dumping 1160 out of 32617 MB:..2%..12%..21%..31%..42%..51%..61%..71%..82%..91%

__curthread () at /root/src/sys/amd64/include/pcpu_aux.h:55
55		__asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  __curthread () at /root/src/sys/amd64/include/pcpu_aux.h:55
#1  doadump (textdump=<optimized out>)
    at /root/src/sys/kern/kern_shutdown.c:399
#2  0xffffffff80bf580b in kern_reboot (howto=260)
    at /root/src/sys/kern/kern_shutdown.c:486
#3  0xffffffff80bf5c50 in vpanic (fmt=<optimized out>, ap=<optimized out>)
    at /root/src/sys/kern/kern_shutdown.c:919
#4  0xffffffff80bf59b3 in panic (fmt=<unavailable>)
    at /root/src/sys/kern/kern_shutdown.c:843
#5  0xffffffff80d29c5b in iflib_if_transmit (ifp=0xfffff80003dff800, 
    m=0xfffff8005ce3ce00) at /root/src/sys/net/iflib.c:4087
#6  0xffffffff80d0fb9b in ether_output_frame (
    ifp=ifp@entry=0xfffff80003dff800, m=<unavailable>)
    at /root/src/sys/net/if_ethersubr.c:511
#7  0xffffffff80d0faa1 in ether_output (ifp=<optimized out>, 
    ifp@entry=<error reading variable: value is not available>, 
    m=<unavailable>, 
    m@entry=<error reading variable: value is not available>, 
    dst=0xfffffe003499c5a0, 
    dst@entry=<error reading variable: value is not available>, 
    ro=<optimized out>, 
    ro@entry=<error reading variable: value is not available>)
    at /root/src/sys/net/if_ethersubr.c:438
#8  0xffffffff80da58ef in ip_output_send (inp=inp@entry=0x0, 
    ifp=<unavailable>, ifp@entry=0xfffff80003dff800, 
    m=m@entry=0xfffff8005ce3ce00, gw=gw@entry=0xfffffe003499c5a0, 
    ro=<unavailable>, ro@entry=0x0, stamp_tag=<optimized out>)
    at /root/src/sys/netinet/ip_output.c:275
#9  0xffffffff80da55a5 in ip_output (m=0xfffff8005ce3ce00, m@entry=0x0, 
    opt=opt@entry=0x0, ro=<optimized out>, ro@entry=0x0, 
    flags=<optimized out>, flags@entry=0, imo=imo@entry=0x0, 
    inp=<optimized out>, inp@entry=0x0)
    at /root/src/sys/netinet/ip_output.c:812
#10 0xffffffff80d92c59 in in_gif_output (ifp=ifp@entry=0xfffff80134802000, 
    m=<optimized out>, m@entry=0xfffff8005cc87200, proto=<optimized out>, 
    ecn=<optimized out>) at /root/src/sys/netinet/in_gif.c:306
#11 0xffffffff80d12350 in gif_transmit (ifp=0xfffff80134802000, 
    m=0xfffff8005cc87200) at /root/src/sys/net/if_gif.c:380
#12 0xffffffff80df2b9b in ip6_forward (m=<unavailable>, srcrt=srcrt@entry=0)
    at /root/src/sys/netinet6/ip6_forward.c:387
#13 0xffffffff80df4414 in ip6_input (m=<unavailable>, 
    m@entry=<error reading variable: value is not available>)
    at /root/src/sys/netinet6/ip6_input.c:897
#14 0xffffffff80d2cb11 in netisr_dispatch_src (proto=6, 
    source=source@entry=0, m=0xfffff8005cc87200)
    at /root/src/sys/net/netisr.c:1143
#15 0xffffffff80d2ce5f in netisr_dispatch (proto=<unavailable>, 
    m=<unavailable>) at /root/src/sys/net/netisr.c:1234
#16 0xffffffff80d0fd3e in ether_demux (ifp=ifp@entry=0xfffff80003dff800, 
    m=<unavailable>) at /root/src/sys/net/if_ethersubr.c:923
#17 0xffffffff80d113cc in ether_input_internal (ifp=0xfffff80003dff800, 
    m=<unavailable>) at /root/src/sys/net/if_ethersubr.c:709
#18 ether_nh_input (m=<optimized out>, 
    m@entry=<error reading variable: value is not available>)
    at /root/src/sys/net/if_ethersubr.c:739
#19 0xffffffff80d2cb11 in netisr_dispatch_src (proto=proto@entry=5, 
    source=source@entry=0, m=m@entry=0xfffff8005cc87200)
    at /root/src/sys/net/netisr.c:1143
#20 0xffffffff80d2ce5f in netisr_dispatch (proto=<unavailable>, 
    proto@entry=5, m=<unavailable>, m@entry=0xfffff8005cc87200)
    at /root/src/sys/net/netisr.c:1234
#21 0xffffffff80d10231 in ether_input (ifp=0xfffff80003dff800, 
    ifp@entry=<error reading variable: value is not available>, 
    m=0xfffff8005cc87200, 
    m@entry=<error reading variable: value is not available>)
    at /root/src/sys/net/if_ethersubr.c:830
#22 0xffffffff80d28bd7 in iflib_rxeof (rxq=<optimized out>, 
    rxq@entry=0xfffff80003dcc000, budget=<optimized out>)
    at /root/src/sys/net/iflib.c:3006
#23 0xffffffff80d2274a in _task_fn_rx (context=0xfffff80003dcc000)
    at /root/src/sys/net/iflib.c:3949
#24 0xffffffff80c3ea77 in gtaskqueue_run_locked (
    queue=queue@entry=0xfffff80003988100)
    at /root/src/sys/kern/subr_gtaskqueue.c:371
#25 0xffffffff80c3e874 in gtaskqueue_thread_loop (
    arg=arg@entry=0xfffffe00379de008)
    at /root/src/sys/kern/subr_gtaskqueue.c:547
#26 0xffffffff80bb1f00 in fork_exit (
    callout=0xffffffff80c3e7e0 <gtaskqueue_thread_loop>, 
    arg=0xfffffe00379de008, frame=0xfffffe003499cc00)
    at /root/src/sys/kern/kern_fork.c:1069
#27 <signal handler called>
(kgdb)
Comment 1 Michael Meiszl 2021-05-11 05:11:22 UTC
Created attachment 224828 [details]
crashlog full version
Comment 2 Michael Meiszl 2021-05-11 05:27:50 UTC
I was wondering what the difference is between being started by rc.conf or manually by root afterwards.

Scrolling through the crashlog showed me that with rc start, the tunnel interface gif0 has not yet been created and attached. But ipfw contains (a lot) of rules for gif0 (actually it's his main job to keep the baddies from the net out of this machine and grant access only to certain services/machines).

Maybe this is a hint where to look for the mbuf being NULL ???
Comment 3 Andrey V. Elsukov freebsd_committer 2021-05-12 09:11:01 UTC
Your panic doesn't seem related to ipfw. This backtraces shows that your host receives IPv6 packet, that was forwarded into IP-IP tunnel. Then panic was triggered by KASSERT in iflib due to packet's mbuf has unexpected non NULL m_nextpkt field.
Comment 4 Michael Meiszl 2021-05-12 09:16:29 UTC
Thanks for the analyses, but what am I supposed to do now???

This mbuf stuff only occurs, if ipfw is loaded BEFORE the tunnel is even created.

(or, maybe its created just now and the first packets are coming in, can't decide, it looks like the machine survives until the tunnel starts receiving)
Comment 5 Andrey V. Elsukov freebsd_committer 2021-05-12 09:36:04 UTC
(In reply to Michael Meiszl from comment #4)
Can you show some output prom kgdb?
I think these commands should work to obtain needed info:

# cd /var/crash/
# kgdb -q /boot/kernel/kernel vmcore.0

f 11
p/x *m
f 8
p/x *m

You also can try to patch the kernel as workaround for test:

--- a/sys/netinet/ip_output.c
+++ b/sys/netinet/ip_output.c
@@ -807,6 +807,7 @@ ip_output(struct mbuf *m, struct mbuf *opt, struct route *ro, int flags,
                 * Reset layer specific mbuf flags
                 * to avoid confusing lower layers.
                 */
+               m->m_nextpkt = NULL;
                m_clrprotoflags(m);
                IP_PROBE(send, NULL, NULL, ip, ifp, ip, NULL);
                error = ip_output_send(inp, ifp, m, gw, ro,
Comment 6 Michael Meiszl 2021-05-12 11:30:54 UTC
The Data you have asked for:
[root@l3router ~]# cd /var/crash/
[root@l3router /var/crash]# kgdb -q /boot/kernel/kernel vmcore.0
Reading symbols from /boot/kernel/kernel...
Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug...
__curthread () at /root/src/sys/amd64/include/pcpu_aux.h:55
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) f 11
#11 0xffffffff80d12350 in gif_transmit (ifp=0xfffff80134802000, m=0xfffff8005cc87200) at /root/src/sys/net/if_gif.c:380
380                     error = in_gif_output(ifp, m, proto, ecn);
(kgdb) p/x *m
$1 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = {sle_next = 0x0},
    m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffff8005cc872a2, m_len = 0x14, m_type = 0x1, m_flags = 0x0, {{{m_pkthdr = {{
            snd_tag = 0xfffff80003dff800, rcvif = 0xfffff80003dff800}, tags = {slh_first = 0x0}, len = 0x50, flowid = 0x7a3d245c,
          csum_flags = 0xc000000, fibnum = 0x0, numa_domain = 0xff, rsstype = 0xbf, {rcv_tstmp = 0x0, {l2hlen = 0x0, l3hlen = 0x0,
              l4hlen = 0x0, l5hlen = 0x0, inner_l2hlen = 0x0, inner_l3hlen = 0x0, inner_l4hlen = 0x0, inner_l5hlen = 0x0}}, PH_per = {
            eight = {0x0, 0x0, 0x0, 0x0, 0x1c, 0x0, 0x0, 0x0}, sixteen = {0x0, 0x0, 0x1c, 0x0}, thirtytwo = {0x0, 0x1c}, sixtyfour = {
              0x1c00000000}, unintptr = {0x1c00000000}, ptr = 0x1c00000000}, PH_loc = {eight = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
            sixteen = {0x0, 0x0, 0x0, 0x0}, thirtytwo = {0x0, 0x0}, sixtyfour = {0x0}, unintptr = {0x0}, ptr = 0x0}}, {m_epg_npgs = 0x0,
          m_epg_nrdy = 0xf8, m_epg_hdrlen = 0xdf, m_epg_trllen = 0x3, m_epg_1st_off = 0xf800, m_epg_last_len = 0xffff, m_epg_flags = 0x0,
          m_epg_record_type = 0x0, __spare = {0x0, 0x0}, m_epg_enc_cnt = 0x0, m_epg_tls = 0x7a3d245c00000050,
          m_epg_so = 0xbfff00000c000000, m_epg_seqno = 0x0, m_epg_stailq = {stqe_next = 0x1c00000000}}}, {m_ext = {{
            ext_count = 0x209f36a0, ext_cnt = 0x26702e30209f36a0}, ext_size = 0x1c1fd705, ext_type = 0x86, ext_flags = 0x60dd, {{
              ext_buf = 0x1203f0628000000, ext_arg2 = 0xb41d0100af707004}, {extpg_pa = {0x1203f0628000000, 0xb41d0100af707004,
                0x2aa924398bd2db, 0x2f0801405014, 0xb2a1032000000000}, extpg_trail = {0x1, 0xbb, 0xbf, 0xf9, 0x9e, 0xae, 0x0, 0x0, 0x0,
                0x0, 0xa0, 0x2, 0xff, 0xff, 0xb8, 0xf1, 0x0, 0x0, 0x2, 0x4, 0x5, 0xa0, 0x4, 0x2, 0x8, 0xa, 0x1, 0x74, 0xa6, 0x8b, 0x0,
                0x0, 0x0, 0x0, 0x1, 0x3, 0x3, 0x6, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde,
                0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde}, extpg_hdr = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0,
                0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad}}},
          ext_free = 0xdeadc0dedeadc0de, ext_arg1 = 0xdeadc0dedeadc0de}, m_pktdat = 0xfffff8005cc87258}}, m_dat = 0xfffff8005cc87220}}
(kgdb) f 8
#8  0xffffffff80da58ef in ip_output_send (inp=inp@entry=0x0, ifp=<unavailable>, ifp@entry=0xfffff80003dff800,
    m=m@entry=0xfffff8005ce3ce00, gw=gw@entry=0xfffffe003499c5a0, ro=<unavailable>, ro@entry=0x0, stamp_tag=<optimized out>)
    at /root/src/sys/netinet/ip_output.c:275
275             error = (*ifp->if_output)(ifp, m, (const struct sockaddr *)gw, ro);
(kgdb) p/x *m
$2 = {{m_next = 0xdeadc0dedeadc0de, m_slist = {sle_next = 0xdeadc0dedeadc0de}, m_stailq = {stqe_next = 0xdeadc0dedeadc0de}}, {
    m_nextpkt = 0xdeadc0dedeadc0de, m_slistpkt = {sle_next = 0xdeadc0dedeadc0de}, m_stailqpkt = {stqe_next = 0xdeadc0dedeadc0de}},
  m_data = 0xdeadc0dedeadc0de, m_len = 0xdeadc0de, m_type = 0xde, m_flags = 0xdeadc0, {{{m_pkthdr = {{snd_tag = 0xdeadc0dedeadc0de,
            rcvif = 0xdeadc0dedeadc0de}, tags = {slh_first = 0xdeadc0dedeadc0de}, len = 0xdeadc0de, flowid = 0xdeadc0de,
          csum_flags = 0xdeadc0de, fibnum = 0xc0de, numa_domain = 0xad, rsstype = 0xde, {rcv_tstmp = 0xdeadc0dedeadc0de, {l2hlen = 0xde,
              l3hlen = 0xc0, l4hlen = 0xad, l5hlen = 0xde, inner_l2hlen = 0xde, inner_l3hlen = 0xc0, inner_l4hlen = 0xad,
              inner_l5hlen = 0xde}}, PH_per = {eight = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde}, sixteen = {0xc0de, 0xdead,
              0xc0de, 0xdead}, thirtytwo = {0xdeadc0de, 0xdeadc0de}, sixtyfour = {0xdeadc0dedeadc0de}, unintptr = {0xdeadc0dedeadc0de},
            ptr = 0xdeadc0dedeadc0de}, PH_loc = {eight = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde}, sixteen = {0xc0de, 0xdead,
              0xc0de, 0xdead}, thirtytwo = {0xdeadc0de, 0xdeadc0de}, sixtyfour = {0xdeadc0dedeadc0de}, unintptr = {0xdeadc0dedeadc0de},
            ptr = 0xdeadc0dedeadc0de}}, {m_epg_npgs = 0xde, m_epg_nrdy = 0xc0, m_epg_hdrlen = 0xad, m_epg_trllen = 0xde,
          m_epg_1st_off = 0xc0de, m_epg_last_len = 0xdead, m_epg_flags = 0xde, m_epg_record_type = 0xc0, __spare = {0xad, 0xde},
          m_epg_enc_cnt = 0xdeadc0de, m_epg_tls = 0xdeadc0dedeadc0de, m_epg_so = 0xdeadc0dedeadc0de, m_epg_seqno = 0xdeadc0dedeadc0de,
          m_epg_stailq = {stqe_next = 0xdeadc0dedeadc0de}}}, {m_ext = {{ext_count = 0xdeadc0de, ext_cnt = 0xdeadc0dedeadc0de},
          ext_size = 0xdeadc0de, ext_type = 0xde, ext_flags = 0xdeadc0, {{ext_buf = 0xdeadc0dedeadc0de, ext_arg2 = 0xdeadc0dedeadc0de}, {
              extpg_pa = {0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de, 0xdeadc0dedeadc0de},
              extpg_trail = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0,
                0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0,
                0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0,
                0xad, 0xde, 0xde, 0xc0, 0xad, 0xde}, extpg_hdr = {0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde,
                0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad, 0xde, 0xde, 0xc0, 0xad}}}, ext_free = 0xdeadc0dedeadc0de,
          ext_arg1 = 0xdeadc0dedeadc0de}, m_pktdat = 0xfffff8005ce3ce58}}, m_dat = 0xfffff8005ce3ce20}}
(kgdb)

I can't try out the kernel patches today anymore I am afraid, the machine is busy and cannot be rebooted for now. (else you might hear in the tv news tomorrow "admin got killed by upset users").
I will try them as soon as possible and report
Comment 7 Michael Meiszl 2021-05-12 11:33:16 UTC
btw: cute :-)
I dont understand anything from the output, but "deadcode" sounds funny, even written in hex :-)))
Comment 8 Andrey V. Elsukov freebsd_committer 2021-05-12 11:43:46 UTC
(In reply to Michael Meiszl from comment #7)

I think there is no need to test suggested patch. It seems the problem is due to "use after free". mbuf is already freed and memory were filled with 0xdeadc0de pattern. This is why KASSERT was triggered.
Comment 9 Michael Meiszl 2021-05-12 12:16:30 UTC
ok, as you wish.

But finding a "free after use" maybe very complicated to find, sorry to have triggered it somehow.

If you need more kdb, just give me advices, or I can upload the whole /var/crash files somewhere if you want to look at more details.
Comment 10 Michael Meiszl 2021-05-12 12:28:58 UTC
Just a note, in the file you have pointed out to patch, just a few lines below on 834 you find
   error = ip_fragment(ip, &m, mtu, ifp->if_hwassist);
        if (error)
                goto bad;
        for (; m; m = m0) {
                m0 = m->m_nextpkt;
                m->m_nextpkt = 0; <<<<<<<!!!!
                if (error == 0) {
                        /* Record statistics for this interface address. */
                        if (ia != NULL) {
                                counter_u64_add(ia->ia_ifa.ifa_opackets, 1);
                                counter_u64_add(ia->ia_ifa.ifa_obytes,
                                    m->m_pkthdr.len);
                        }

although legal "m->m_nextpkt = 0;" does not look right. Better it should be "m->m_nextpkt = NULL;" I think.

but 0 is surely not 0xdeadc0de...
Comment 11 Michael Meiszl 2021-05-15 07:24:51 UTC
after running for some days (with fw started manually) it crashed again yesterday, but on a totally different function:
Unread portion of the kernel message buffer:
panic: Assertion stp->st_flags == 0 failed at /root/src/sys/kern/sys_generic.c:1942
cpuid = 1
time = 1620999784
KDB: stack backtrace:
#0 0xffffffff80c400e5 at kdb_backtrace+0x65
#1 0xffffffff80bf5be1 at vpanic+0x181
#2 0xffffffff80bf59b3 at panic+0x43
#3 0xffffffff80c63b20 at seltdfini+0xa0
#4 0xffffffff80bac8fa at exit1+0x49a
#5 0xffffffff80bbddda at kproc_exit+0xaa
#6 0xffffffff82b5116e at smb_iod_thread+0x37e
#7 0xffffffff80bb1f00 at fork_exit+0x80
#8 0xffffffff8105c6ae at fork_trampoline+0xe
Uptime: 2d2h27m29s

I guess this has nothing to do with the main issue, but it made me revert to the original, unpatched 13.0 kernel for now.

My current approach is to start the fw with a combination of cron and at:

CronEntry: @reboot at -f /root/startfirewall now+3min

Startfirewall script: 
#!/bin/sh
/usr/sbin/service ipfw onestart

totally simple, but it seems to work for now. I did not notice the panic yesterday so the whole net ran without fw protection for almost a day. This is not acceptable. Cron+at limit the dangerous time after a panic or reboot to 3mins
Comment 12 Mark Johnston freebsd_committer 2021-06-16 13:37:26 UTC
Do you have the net.link.ether.ipfw sysctl set to 1 by any chance?
Comment 13 Michael Meiszl 2021-06-16 13:42:28 UTC
(In reply to Mark Johnston from comment #12)
no, it is 0 here
Comment 14 Michael Meiszl 2021-06-16 14:10:55 UTC
My suggested workaround still works flawlessly. When started "later" ipfw does not crash here. The machine now is up for weeks without any problems and it also comes back up alive after a reboot.
So I did not investigate any further during the last weeks.
But if you tell me to try out some new changes, I will.

Sadly, here is no L2 filtering going on at all.
Comment 15 Mark Johnston freebsd_committer 2021-06-16 14:16:49 UTC
(In reply to Michael Meiszl from comment #14)
I think to make progress on this we'd have to look at a vmcore from one of the panics.  For this I'd also need a copy of the matching /boot/kernel and /usr/lib/debug/boot/kernel directories.
Comment 16 Michael Meiszl 2021-06-16 15:36:39 UTC
yeah it still happens if I switch on IPFW at boottime.

Where do you want all those files to be uploaded to?
Comment 17 Mark Johnston freebsd_committer 2021-06-19 14:51:58 UTC
(In reply to Michael Meiszl from comment #16)
I don't have a good place to upload vmcores, sorry.  Google drive is used sometimes.  Reading through again, I'm not sure that a vmcore will be very useful.  I suspect your comment 2 is a good clue.  I don't quite follow though: the firewall rules reference gif0, and the rules are loaded before gif0 is created?  I would have assumed that this would not be permitted.  It may be more useful to share the exact ruleset you're using.
Comment 18 Michael Meiszl 2021-06-19 16:44:12 UTC
(In reply to Mark Johnston from comment #17)
No big problem, the rules are straightforward. See attached rulesfile.
(I needed to blur out some IPs, they are not good for the public)

Originally I had Table(1/2/3) filled by failtoban, but I converted them to static rules in the hope that the bug was inside the table management (did not work, sniff)

Anyway, its a very restrictive setting. 80% of the world is locked out. For V6 only certain ports to certain hosts are offered to the outside (there is no V4 restriction on ports because that is handled by a different machine).
Comment 19 Michael Meiszl 2021-06-19 16:45:07 UTC
Created attachment 225933 [details]
ruleset mam
Comment 20 Michael Meiszl 2021-06-19 16:51:24 UTC
About the load sequence: I was also suprised to see when I have checked it with "rcorder ..."
But it does not seem to be important. I've tried to change the order to insure that gif0 is created and up before ipfw is loaded, it made no difference to the crashes.