I have a standard pf deployment, active/passive firewalls with carp and pfsync. Most of the firewall clusters are fine, but I have one system that repeatedly crashes when pfsync is enabled. I can reliably repeat it by running puppet agent on the system in question. I don't have this particular condition on any of my other 13.1-based systems, with very similar configurations. kgdb /usr/lib/debug/boot/kernel/kernel.debug /var/crash/vmcore.0 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80cadb90 stack pointer = 0x28:0xfffffe0204794bc0 frame pointer = 0x28:0xfffffe0204794c20 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi1: pfsync) trap number = 12 panic: page fault cpuid = 0 time = 1670433489 KDB: stack backtrace: #0 0xffffffff80c694a5 at kdb_backtrace+0x65 #1 0xffffffff80c1bb5f at vpanic+0x17f #2 0xffffffff80c1b9d3 at panic+0x43 #3 0xffffffff810afdf5 at trap_fatal+0x385 #4 0xffffffff810afe4f at trap_pfault+0x4f #5 0xffffffff810875b8 at calltrap+0x8 #6 0xffffffff80dca82f at ip_fragment+0x24f #7 0xffffffff80dca1e3 at ip_output+0x1163 #8 0xffffffff8225a851 at pfsyncintr+0x151 #9 0xffffffff80bdbcfa at ithread_loop+0x25a #10 0xffffffff80bd8a9e at fork_exit+0x7e #11 0xffffffff8108862e at fork_trampoline+0xe Uptime: 43m36s Dumping 7356 out of 130983 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
I don't know if it's relevant, but this system is a backup firewall at this time, when it panics.
I've not seen that panic before. It's indeed related to pfsync. pfsyncintr() is in pfsync's output path, and it seems to be in the process of sending a packet out, during which ip_fragment() ends up apparently dereferencing a NULL pointer. It's not immediately obvious how this'd happen. A few bit of extra information to gather: ifconfig output for pfsync0 and whatever interface it's using as syndev. (I'm especially interested in MTU mismatches, but let's get the full output.) Given that you have a core dump it'd also be good if you could run it through kgdb (kgdb /boot/kernel/kernel /var/crash/vmcore.last) and get a backtrace (bt) for starters. Hang on to the core dump, because odds are we're going to want to look at more things later.
We have quite a few core files to play with, don't worry. (kgdb) bt #0 0xffffffff80c1b91e in doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:414 #1 0xffffffff80c1b75c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:490 #2 0xffffffff80c1bbce in kproc_shutdown (arg=0x0, howto=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:970 #3 0xffffffff80c1b9d3 in vpanic (fmt=0xffffffff811b4fb9 "%s", ap=0x100) at /usr/src/sys/kern/kern_shutdown.c:865 #4 0xffffffff810afdf5 in trap_fatal (frame=0xfffffe0204794b00, eva=18446741883351288384) at /usr/src/sys/amd64/amd64/trap.c:944 #5 0xffffffff810afe4f in trap_pfault (frame=0xfffffe0204794b00, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763 #6 <signal handler called> #7 0xffffffff80cadb90 in m_copym (m=0x0, off0=<optimized out>, len=8192, wait=1) at /usr/src/sys/kern/uipc_mbuf.c:517 #8 0xffffffff80dca82f in ip_fragment (ip=<optimized out>, m_frag=0xfffffe0204794d80, mtu=<optimized out>, if_hwassist_flags=<optimized out>) at /usr/src/sys/netinet/ip_output.c:1006 #9 0xffffffff80dca1e3 in ip_output (m=0x2000, opt=<optimized out>, ro=<optimized out>, flags=<optimized out>, imo=<optimized out>, inp=<optimized out>) at /usr/src/sys/netinet/ip_output.c:819 #10 0xffffffff8225a851 in pfsyncintr () from /boot/kernel/pfsync.ko #11 0xffffffff80bdbcfa in atomic_fcmpset_long (dst=0xfffffe0204081900, src=18446735282280295824, expect=<optimized out>) at /usr/src/sys/amd64/include/atomic.h:225 #12 intr_event_execute_handlers (ie=0xfffff8011310eb00, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1116 #13 ithread_execute_handlers (ie=<optimized out>, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1181 #14 ithread_loop (arg=0xfffff80115f50e20) at /usr/src/sys/kern/kern_intr.c:1269 #15 0xffffffff80bd8a9e in fork_exit (callout=0xffffffff80bdbaa0 <ithread_loop+64>, arg=0xfffff80115f50e20, frame=0xfffffe0204794f40) at /usr/src/sys/kern/kern_fork.c:1085 #16 <signal handler called> #17 0xffffffff80ba970f in mi_startup () at /usr/src/sys/kern/init_main.c:329 Backtrace stopped: Cannot access memory at address 0x8
<backup> pfsync0: flags=41<UP,RUNNING> metric 0 mtu 1500 pfsync: syncdev: igb0 syncpeer: 10.30.30.1 maxupd: 128 defer: on syncok: 0 groups: pfsync ifconfig igb0 igb0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> ether b8:ca:3a:65:2d:f4 inet 10.30.30.2 netmask 0xfffffff8 broadcast 10.30.30.7 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> <primary> pfsync0: flags=41<UP,RUNNING> metric 0 mtu 1500 pfsync: syncdev: igb0 syncpeer: 10.30.30.2 maxupd: 128 defer: on groups: pfsync (currently FBSD 13.0) ifconfig igb0 igb0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP> ether ec:f4:bb:c8:0a:2c inet 10.30.30.1 netmask 0xfffffff8 broadcast 10.30.30.7 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
(In reply to jjasen from comment #4) Can you confirm the version you're running (`uname -a`)? The line numbers don't quite match up for 13.1. My initial theory was that this was related to an MTU mismatch, but that appears to not be the case.
backup: crashing FreeBSD gw2 13.1-RELEASE-p3 FreeBSD 13.1-RELEASE-p3 GENERIC amd64 (just did the same thing doing an update check) primary: not crashing FreeBSD gw1 13.0-RELEASE-p4 FreeBSD 13.0-RELEASE-p4 #0: Tue Aug 24 07:33:27 UTC 2021 root@amd64-builder.daemonology.net:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
We seem to have two different crashdumps relating to pfsync. I posted the ones where pfsyncintr seems to be involved, but the ones overnight look different: Fatal trap 12: page fault while in kernel mode cpuid = 6; apic id = 0c fault virtual address = 0x388 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff826ed5e4 stack pointer = 0x28:0xfffffe017cbb1d40 frame pointer = 0x28:0xfffffe017cbb1da0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi4: clock (0)) trap number = 12 panic: page fault cpuid = 6 time = 1670920384 KDB: stack backtrace: #0 0xffffffff80c694a5 at kdb_backtrace+0x65 #1 0xffffffff80c1bb5f at vpanic+0x17f #2 0xffffffff80c1b9d3 at panic+0x43 #3 0xffffffff810afdf5 at trap_fatal+0x385 #4 0xffffffff810afe4f at trap_pfault+0x4f #5 0xffffffff810875b8 at calltrap+0x8 #6 0xffffffff80c3884d at softclock_call_cc+0x13d #7 0xffffffff80c38cd9 at softclock+0x79 #8 0xffffffff80bdbcfa at ithread_loop+0x25a #9 0xffffffff80bd8a9e at fork_exit+0x7e #10 0xffffffff8108862e at fork_trampoline+0xe Uptime: 16h46m12s Dumping 8003 out of 130983 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% 0xffffffff80c1b91e in doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:414 414 /usr/src/sys/kern/kern_shutdown.c: No such file or directory. (kgdb) bt #0 0xffffffff80c1b91e in doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:414 #1 0xffffffff80c1b75c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:490 #2 0xffffffff80c1bbce in kproc_shutdown (arg=0x0, howto=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:970 #3 0xffffffff80c1b9d3 in vpanic (fmt=0xffffffff811b4fb9 "%s", ap=0x100) at /usr/src/sys/kern/kern_shutdown.c:865 #4 0xffffffff810afdf5 in trap_fatal (frame=0xfffffe017cbb1c80, eva=18446741881073900480) at /usr/src/sys/amd64/amd64/trap.c:944 #5 0xffffffff810afe4f in trap_pfault (frame=0xfffffe017cbb1c80, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763 #6 <signal handler called> #7 0xffffffff826ed5e4 in pfsync_defer_tmo () from /boot/kernel/pfsync.ko #8 0xffffffff80c3884d in softclock_call_cc (c=0xfffff80d088882a0, cc=0xfffffe017cf393a0, direct=0) at /usr/src/sys/kern/kern_timeout.c:717 #9 0xffffffff80c38cd9 in softclock (arg=0xffffffff81cadc00 <cc_cpu>) at /usr/src/sys/kern/kern_timeout.c:812 #10 0xffffffff80bdbcfa in atomic_fcmpset_long (dst=0xfffff80104968858, src=18446741881077601184, expect=<optimized out>) at /usr/src/sys/amd64/include/atomic.h:225 #11 intr_event_execute_handlers (ie=0xfffff80104968800, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1116 #12 ithread_execute_handlers (ie=<optimized out>, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1181 #13 ithread_loop (arg=0xfffff80104895b00) at /usr/src/sys/kern/kern_intr.c:1269 #14 0xffffffff80bd8a9e in fork_exit (callout=0xffffffff80bdbaa0 <ithread_loop+64>, arg=0xfffff80104895b00, frame=0xfffffe017cbb1f40) at /usr/src/sys/kern/kern_fork.c:1085 #15 <signal handler called> #16 0xffffffff80ba970f in mi_startup () at /usr/src/sys/kern/init_main.c:329 Backtrace stopped: Cannot access memory at address 0x8
I updated another firewall from 13.1-p2 to 13.1-p3, to match the failing firewall. I was unable to replicate the failure condition so far. Disabling pfsync on the failing firewall seems to keep it stable, but has a negative impact on utility. Is there anything else I can provide in resolving this?
(In reply to jjasen from comment #8) > Is there anything else I can provide in resolving this? A way to reproduce this would be ideal. Short of that, whatever information you can extract from the crash in #7. The line number as well as the local variables would be good. It's not at all clear why it panics right now. Getting the local variables (starting with arg, *arg, pd, *pd, i.e. the order in which the code loads them, and checking what's sane and valid as we go) would be good too.
The only difference between the two 13.1-p3 firewalls is one uses ZFS root, where the other uses UFS. I don't think that's the problem. I can recreate this at will on the crashing firewall, by doing enough that it will attempt to write pfsync. I am stumped. I'll work on getting what I can out of the dump.
(In reply to jjasen from comment #7) Any of this help? If not, let me know what you'd like me to run (kgdb) frame 7 #7 0xffffffff821e85e4 in pfsync_defer_tmo () from /boot/kernel/pfsync.ko (kgdb) select-frame 0xfffffe00dd8b3da8 No frame at level 0xfffffe00dd8b3da8. (kgdb) select-frame fffffe00dd8b3da8 No symbol "fffffe00dd8b3da8" in current context. (kgdb) frame 7 #7 0xffffffff821e85e4 in pfsync_defer_tmo () from /boot/kernel/pfsync.ko (kgdb) info frame Stack level 7, frame at 0xfffffe00dd8b3db0: rip = 0xffffffff821e85e4 in pfsync_defer_tmo; saved rip = 0xffffffff80c3884d called by frame at 0xfffffe00dd8b3e50, caller of frame at 0xfffffe00dd8b3d40 Arglist at 0xfffffe00dd8b3da0, args: Locals at 0xfffffe00dd8b3da0, Previous frame's sp is 0xfffffe00dd8b3db0 Saved registers: rbp at 0xfffffe00dd8b3da0, rip at 0xfffffe00dd8b3da8
(In reply to jjasen from comment #11) No, that doesn't help. Try 'info locals' and 'print arg' in frame 7.
(In reply to Kristof Provost from comment #12) Unfortunately zilch. kgdb) info locals No symbol table info available. (kgdb) print arg No symbol "arg" in current context. What else can I try?
(In reply to jjasen from comment #13) At that point, rebuild a GENERIC-DEBUG version of the kernel and reproduce with that. Hopefully that one will have more information.
(In reply to Kristof Provost from comment #14) I'll post the additions to compile this shortly -- These pertain to to the pfsyncintr induced crash info locals et = {et_link = {tqe_next = 0x0, tqe_prev = 0xfffffe00dd853ad8}, et_td = 0xfffffe0105a4d900, et_section = {bucket = 0}, et_old_priority = 28 '\034'} sc = 0xfffff8007df94d80 saved_vnet = 0x0 c = <optimized out> m = 0xfffff8007df2b700 b = 0xfffffe0105bdc848 n = 0x0 frame 11 #11 0xffffffff82a58851 in pfsyncintr (arg=0xfffff8007df94d80) at /usr/src/sys/netpfil/pf/if_pfsync.c:2349 2349 ip_output(m, NULL, NULL, 0, NULL, NULL); frame 10 #10 0xffffffff80de8373 in ip_output (m=0xfffff8007df2b700, opt=<optimized out>, opt@entry=0x0, ro=<optimized out>, ro@entry=0x0, flags=flags@entry=0, imo=<optimized out>, imo@entry=0x0, inp=<optimized out>, inp@entry=0x0) at /usr/src/sys/netinet/ip_output.c:816 816 error = ip_fragment(ip, &m, mtu, ifp->if_hwassist); frame 9 0xffffffff80de89bf in ip_fragment (ip=<optimized out>, ip@entry=0xfffff8007df2b768, m_frag=m_frag@entry=0xfffffe0105c91d80, mtu=mtu@entry=9000, if_hwassist_flags=<optimized out>) at /usr/src/sys/netinet/ip_output.c:992 992 m->m_next = m_copym(m0, off, len, M_NOWAIT); frame 8 #8 0xffffffff80cc3a40 in m_copym (m=0x0, m@entry=0xfffff8007df2b700, off0=8268, len=8192, wait=wait@entry=1) at /usr/src/sys/kern/uipc_mbuf.c:510 510 if (off < m->m_len) frame 11 print arg print arg $1 = (void *) 0xfffff8007df94d80 frame 8 locals: info locals copyhdr = 0 off = 8188 np = <optimized out> top = <optimized out> n = <optimized out> frame 9: mhip = 0xfffff80509917268 m = 0xfffff80509917200 mhlen = <optimized out> m0 = <optimized out> len = 8192 hlen = 20 error = 0 ip_len = <optimized out> ip_off = 1600 off = 8268 nfrags = 2 mnext = <optimized out> firstlen = 56 frame 10: in_ifa_tracker = {rmp_cpuQueue = {rmq_next = 0x5, rmq_prev = 0xfffff8007df94d80}, rmp_rmlock = 0xfffff8009aa7d83c, rmp_thread = 0x0, rmp_flags = -1512796824, rmp_qentry = {le_next = 0x0, le_prev = 0xfffffe0105c91d40}} iproute = {ro_nh = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 128, ro_mtu = 0, spare = 0, ro_dst = {sa_len = 16 '\020', sa_family = 2 '\002', sa_data = '\000' <repeats 13 times>}} no_route_but_check_spd = <optimized out> ia = 0xfffff80088782480 vlan_pcp = -1 error = <optimized out> mtu = 9000 hlen = 20 ifp = <optimized out> ip = 0xfffff8007df2b768 ip_len = <optimized out> fibnum = 0 dst = 0xfffffe0105c91db0 gw = 0xfffff800b4023404 src = <optimized out> isbroadcast = <optimized out> m0 = <optimized out> ip_off = <optimized out> frame 11: et = {et_link = {tqe_next = 0x0, tqe_prev = 0xfffffe00dd853ad8}, et_td = 0xfffffe0105a4d900, et_section = {bucket = 0}, et_old_priority = 28 '\034'} sc = 0xfffff8007df94d80 saved_vnet = 0x0 c = <optimized out> m = 0xfffff8007df2b700 b = 0xfffffe0105bdc848 n = 0x0 amd64/conf file: more GENERIC-DEBUG ident GENERIC-DEBUG include GENERIC options KDB_UNATTENDED options DDB
So my current guess is that there's something wrong with the mbuf that the pfsync code produced. Presumably that it's not as long as we're expecting it to be, which causes the fragmentation code to run off the end of the mbuf chain and blow up. It's not clear to me how that'd happen, but it would be useful to experiment with the pfsync interface MTU. Try setting it to less than 4k (maybe back down to 1500) and see if the panic goes away. m_get2() allocates external storage for > MCLBYTES (4k) allocations, but that should still just work, at least according to my reading of the relevant code. It may be that I'm missing something. The above experiment should provide a nice data point for that.
(In reply to Kristof Provost from comment #16) Dropped the underlying interface and the pfsync interface to mtu 1300. Ran puppet, and that caused a pfsyncintr type crash. Rough night for this box, as it crashed a dozen times with the pfsync_defer_tmo type overnight. I'm working on getting that in a debug crash.
Just to rule out the improbable, this system has 128GB ram, where the others are 32GB, and uses zfs as its filesystem as opposed to UFS.
bt of pfsync_defer_tmo edited and snipped #8 0xffffffff82a595e4 in pfsync_defer_tmo (arg=0xfffff80480631500) at /usr/src/sys/netpfil/pf/if_pfsync.c:1823 frame 8 #8 0xffffffff82a595e4 in pfsync_defer_tmo (arg=0xfffff80480631500) at /usr/src/sys/netpfil/pf/if_pfsync.c:1823 1823 CURVNET_SET(m->m_pkthdr.rcvif->if_vnet); (kgdb) print arg $1 = (void *) 0xfffff80480631500 info frame Stack level 8, frame at 0xfffffe00dd8eadb0: rip = 0xffffffff82a595e4 in pfsync_defer_tmo (/usr/src/sys/netpfil/pf/if_pfsync.c:1823); saved rip = 0xffffffff80c4a14d called by frame at 0xfffffe00dd8eae50, caller of frame at 0xfffffe00dd8ead40 source language c. Arglist at 0xfffffe00dd8eada0, args: arg=0xfffff80480631500 Locals at 0xfffffe00dd8eada0, Previous frame's sp is 0xfffffe00dd8eadb0 Saved registers: rbx at 0xfffffe00dd8ead78, rbp at 0xfffffe00dd8eada0, r12 at 0xfffffe00dd8ead80, r13 at 0xfffffe00dd8ead88, r14 at 0xfffffe00dd8ead90, r15 at 0xfffffe00dd8ead98, rip at 0xfffffe00dd8eada8 info local et = {et_link = {tqe_next = 0x0, tqe_prev = 0xfffffe00dd84aad8}, et_td = 0xfffffe00dd709720, et_section = { bucket = 1}, et_old_priority = 40 '('} pd = 0xfffff80480631500 sc = <optimized out> st = 0xfffff80210e29750 m = 0xfffff800a5176500 b = 0xfffffe0105bcc470 saved_vnet = 0x0 Please let me know if there is anything else I can provide from a pfsync_defer_tmo crashdump.
(In reply to jjasen from comment #19) Try printing *m, m->m_pkthdr.rcvif, *m->m_pkthdr.rcvif and m->m_pkthdr.rcvif->if_vnet. That looks to be panicking in the line that sets the current vnet context, which is done based on the rcvif pointer in the mbuf, so if that one's NULL that might explain the panic at least. It's still a step away from the real bug, but it'd be some progress at least.
(In reply to Kristof Provost from comment #20) print *m $2 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, {m_nextpkt = 0x0, m_slistpkt = { sle_next = 0x0}, m_stailqpkt = {stqe_next = 0x0}}, m_data = 0xfffff800a5176568 "E", m_len = 60, m_type = 1, m_flags = 32770, {{{m_pkthdr = {{snd_tag = 0x0, rcvif = 0x0}, tags = {slh_first = 0x0}, len = 60, flowid = 0, csum_flags = 4, fibnum = 0, numa_domain = 255 '\377', rsstype = 0 '\000', {rcv_tstmp = 0, { l2hlen = 0 '\000', l3hlen = 0 '\000', l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = { eight = "\000\000\000\000\020\000\000", sixteen = {0, 0, 16, 0}, thirtytwo = {0, 16}, sixtyfour = { 68719476736}, unintptr = {68719476736}, ptr = 0x1000000000}, PH_loc = { eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = {0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}}, {m_epg_npgs = 0 '\000', m_epg_nrdy = 0 '\000', m_epg_hdrlen = 0 '\000', m_epg_trllen = 0 '\000', m_epg_1st_off = 0, m_epg_last_len = 0, m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x3c, m_epg_so = 0xff000000000004, m_epg_seqno = 0, m_epg_stailq = {stqe_next = 0x1000000000}}}, {m_ext = {{ ext_count = 939581509, ext_cnt = 0x4000003800e045}, ext_size = 2833084671, ext_type = 192, ext_flags = 1178792, {{ext_buf = 0x4000003c000045 <error: Cannot access memory at address 0x4000003c000045>, ext_arg2 = 0x3fca8c056510640}, {extpg_pa = {18014399516115013, 287290020280206912, 4838908228444246282, 2713560791, 163732743193248}, extpg_trail = "\002\004#\000\001\003\003\016\004\002\b\ny.\227\352\000\000\000\000to-112 36\n\000\000\004mon1\004nccs\004nasa\003gov\000\000\001\000\001\204\204\245\300q@\271h", extpg_hdr = "\v\000\000\000x|\247\000q@\271h\f\000\000\001\204\203\322\336q@\271"}}, ext_free = 0x0, ext_arg1 = 0x0}, m_pktdat = 0xfffff800a5176558 <incomplete sequence \340>}}, m_dat = print m->m_pkthdr.rcvif $3 = (struct ifnet *) print *m->m_pkthdr.rcvif Cannot access memory at address 0x0 print m->m_pkthdr.rcvif->if_vnet Cannot access memory at address 0x388
(In reply to jjasen from comment #21) Okay, so I think I see what's going on here. Essentially what's happening here is that we need to know what vnet we're in because we're in a callout and those don't belong to specific vnets. We got from that the mbuf's received interface, but that's not set for locally originated traffic, so we panic dereferencing a NULL pointer. Happily we also have a pointer to the pfsync interface, and we can just get the vnet pointer from there. I've not been able to meaningfully test this myself, because today there are not a lot of internets around where I'm staying. This patch is against main, but should just apply on stable/13 as well. diff --git a/sys/netpfil/pf/if_pfsync.c b/sys/netpfil/pf/if_pfsync.c index a2baf477873e..05c80ae725ea 100644 --- a/sys/netpfil/pf/if_pfsync.c +++ b/sys/netpfil/pf/if_pfsync.c @@ -1819,8 +1819,11 @@ pfsync_defer_tmo(void *arg) PFSYNC_BUCKET_LOCK_ASSERT(b); + if (sc->sc_sync_if == NULL) + return; + NET_EPOCH_ENTER(et); - CURVNET_SET(m->m_pkthdr.rcvif->if_vnet); + CURVNET_SET(sc->sc_sync_if->if_vnet); TAILQ_REMOVE(&b->b_deferrals, pd, pd_entry); b->b_deferred--; The other issue I still don't understand, but once you've tested the above patch it'd also be useful to test pfsync without defer mode. I wonder if the problem is related to that.
I turned off defer as a first step, and tested a puppet run. Another pfsyncintr crash. I'll try to get to the patch today.
(In reply to Kristof Provost from comment #22) With defer on, a patched kernel went into a pfsyncintr panic upon a puppet run. I had defer off initially, and was able to puppet, pkg and freebsd-update without catastrophe.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=fd02192c3acaefeb62db11e0c10ab36240b79ba2 commit fd02192c3acaefeb62db11e0c10ab36240b79ba2 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-01-13 03:34:20 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-01-13 19:41:25 +0000 pf: fix panic on deferred packets The pfsync_defer_tmo() callout needs to set the correct vnet before it can transmit packets. It used the rcvif in the mbuf to get this vnet, but that doesn't work for locally originated traffic. In that case the rcvif pointer is NULL, and the dereference leads to a panic. Instead use the sc_sync_if, which is always set (if pfsync is enabled, at least). PR: 268246 MFC after: 2 weeks sys/netpfil/pf/if_pfsync.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
I remain confused about the pfsyncintr() panic. Can you take another crash dump, with the above patch built-in, and the MTU set to 1500, and deferred mode disabled? Also get the same information you got in comment #15. There's a small difference in pfsyncintr() code path for deferred packets, and I want to rule out that the issue is related to that.
(In reply to Kristof Provost from comment #26) With -defer set on the pfsync interface, I can't get it to crash trying to tickle pfsyncintr. With defer toggled back on, it crashed almost immediately injecting local traffic. Do you still want those values?
(In reply to jjasen from comment #27) Yes. This is progress already, because now we know to look at the defer code paths but the other information will be useful too.
(In reply to Kristof Provost from comment #28) These help? bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=textdump@entry=1) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80c2bf41 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0xffffffff80c2c3be in vpanic (fmt=0xffffffff811ddd9e "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0xffffffff80c2c1c3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844 #5 0xffffffff810d5855 in trap_fatal (frame=0xfffffe020479eb00, eva=24) at /usr/src/sys/amd64/amd64/trap.c:944 #6 0xffffffff810d58af in trap_pfault (frame=0xfffffe020479eb00, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763 #7 <signal handler called> #8 0xffffffff80cc3a40 in m_copym (m=0x0, m@entry=0xfffff819c9e4a600, off0=8268, len=8192, wait=wait@entry=1) at /usr/src/sys/kern/uipc_mbuf.c:510 #9 0xffffffff80de89bf in ip_fragment (ip=<optimized out>, ip@entry=0xfffff819c9e4a668, m_frag=m_frag@entry=0xfffffe020479ed80, mtu=mtu@entry=9000, if_hwassist_flags=<optimized out>) at /usr/src/sys/netinet/ip_output.c:992 #10 0xffffffff80de8373 in ip_output (m=0xfffff819c9e4a600, opt=<optimized out>, opt@entry=0x0, ro=<optimized out>, ro@entry=0x0, flags=flags@entry=0, imo=<optimized out>, imo@entry=0x0, inp=<optimized out>, inp@entry=0x0) at /usr/src/sys/netinet/ip_output.c:816 #11 0xffffffff829d9851 in pfsyncintr (arg=0xfffff80118f98d80) at /usr/src/sys/netpfil/pf/if_pfsync.c:2353 #12 0xffffffff80beb31a in intr_event_execute_handlers (ie=0xfffff80118f74c00, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1168 #13 ithread_execute_handlers (ie=<optimized out>, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1181 #14 ithread_loop (arg=arg@entry=0xfffff80118f1de40) at /usr/src/sys/kern/kern_intr.c:1269 #15 0xffffffff80be7d1e in fork_exit (callout=0xffffffff80beb0c0 <ithread_loop>, arg=0xfffff80118f1de40, frame=0xfffffe020479ef40) at /usr/src/sys/kern/kern_fork.c:1093 #16 <signal handler called> #17 mi_startup () at /usr/src/sys/kern/init_main.c:322 Backtrace stopped: Cannot access memory at address 0x8 frame 11 #11 0xffffffff829d9851 in pfsyncintr (arg=0xfffff80118f98d80) at /usr/src/sys/netpfil/pf/if_pfsync.c:2353 2353 ip_output(m, NULL, NULL, 0, NULL, NULL); (kgdb) print arg $1 = (void *) 0xfffff80118f98d80 (kgdb) info locals et = {et_link = {tqe_next = 0x0, tqe_prev = 0xfffffe017cb5dad8}, et_td = 0xfffffe0204095900, et_section = { bucket = 0}, et_old_priority = 28 '\034'} sc = 0xfffff80118f98d80 saved_vnet = 0x0 c = <optimized out> m = 0xfffff819c9e4a600 b = 0xfffffe0212b453e0 n = 0x0 frame 10 #10 0xffffffff80de8373 in ip_output (m=0xfffff819c9e4a600, opt=<optimized out>, opt@entry=0x0, ro=<optimized out>, ro@entry=0x0, flags=flags@entry=0, imo=<optimized out>, imo@entry=0x0, inp=<optimized out>, inp@entry=0x0) at /usr/src/sys/netinet/ip_output.c:816 816 error = ip_fragment(ip, &m, mtu, ifp->if_hwassist); (kgdb) info locals in_ifa_tracker = {rmp_cpuQueue = {rmq_next = 0xc, rmq_prev = 0xfffff80118f98d80}, rmp_rmlock = 0xfffff803e7ed883c, rmp_thread = 0x0, rmp_flags = -1416921440, rmp_qentry = {le_next = 0x0, le_prev = 0xfffffe020479ed40}} iproute = {ro_nh = 0x0, ro_lle = 0x0, ro_prepend = 0x0, ro_plen = 0, ro_flags = 128, ro_mtu = 0, spare = 0, ro_dst = {sa_len = 16 '\020', sa_family = 2 '\002', sa_data = '\000' <repeats 13 times>}} no_route_but_check_spd = <optimized out> ia = 0xfffff8010514c300 vlan_pcp = -1 error = <optimized out> mtu = 9000 hlen = 20 ifp = <optimized out> ip = 0xfffff819c9e4a668 ip_len = <optimized out> fibnum = 0 dst = 0xfffffe020479edb0 gw = 0xfffff8034b0aa404 src = <optimized out> isbroadcast = <optimized out> m0 = <optimized out> ip_off = <optimized out> frame 9 #9 0xffffffff80de89bf in ip_fragment (ip=<optimized out>, ip@entry=0xfffff819c9e4a668, m_frag=m_frag@entry=0xfffffe020479ed80, mtu=mtu@entry=9000, if_hwassist_flags=<optimized out>) at /usr/src/sys/netinet/ip_output.c:992 992 m->m_next = m_copym(m0, off, len, M_NOWAIT); (kgdb) info locals mhip = 0xfffff8051c43b068 m = 0xfffff8051c43b000 mhlen = <optimized out> m0 = <optimized out> len = 8192 hlen = 20 error = 0 ip_len = <optimized out> ip_off = 1600 off = 8268 nfrags = 2 mnext = <optimized out> firstlen = 56 frame 8 #8 0xffffffff80cc3a40 in m_copym (m=0x0, m@entry=0xfffff819c9e4a600, off0=8268, len=8192, wait=wait@entry=1) at /usr/src/sys/kern/uipc_mbuf.c:510 510 if (off < m->m_len) (kgdb) info locals copyhdr = 0 off = 8188 np = <optimized out> top = <optimized out> n = <optimized out>
(In reply to jjasen from comment #29) We're looking at a deferred packet here, that's being allowed through, presumably after the pfsync peer ack'd the state. That should all just work, and I can't make it break in my test setup here. It looks like you have a setup where the packet is getting forwarded to an interface with a 9000 byte mtu, possibly after arriving on one with a 1500 byte mtu. Does that sound familiar? I think it's getting confused about the IP length somehow. Can you do a 'p *ip' and 'p/x *ip' in frame 10? That should print the IP header and may contain more clues.
(In reply to Kristof Provost from comment #30) p *ip $1 = {ip_hl = 5 '\005', ip_v = 4 '\004', ip_tos = 1 '\001', ip_len = 8011, ip_id = 42943, ip_off = 16390, ip_ttl = 32 ' ', ip_p = 1 '\001', ip_sum = 53252, ip_src = {s_addr = 1761679396}, ip_dst = {s_addr = 0}} p/x *ip $2 = {ip_hl = 0x5, ip_v = 0x4, ip_tos = 0x1, ip_len = 0x1f4b, ip_id = 0xa7bf, ip_off = 0x4006, ip_ttl = 0x20, ip_p = 0x1, ip_sum = 0xd004, ip_src = {s_addr = 0x69011824}, ip_dst = {s_addr = 0x0}} Our traffic interfaces are all 9000 MTU. Our pfsync interfaces are 1500 MTU. I am stumped as well. I have 6 more systems almost identical, so in theory this should be all of them, or none of them. The only variance is that all the others use VLANs, where this one has one physical interface in use.
(In reply to jjasen from comment #31) Do all machines use the same NIC hardware for the pfsync interfaces? It may also be interesting to experiment with LRO/TSO flags, on the off chance that they affect things. I will see if I can produce a patch to verify that the IP header length matches the mbuf length, so we can see if that's what's going on.
(In reply to Kristof Provost from comment #32) Of the other 6, yes. If they were any more identical, they'd have sequential serial numbers. -lro -tso4 -tso6 had no effect.
(In reply to jjasen from comment #33) Can you try this patch? diff --git a/sys/netpfil/pf/if_pfsync.c b/sys/netpfil/pf/if_pfsync.c index 61308a35a7e1..54824227da57 100644 --- a/sys/netpfil/pf/if_pfsync.c +++ b/sys/netpfil/pf/if_pfsync.c @@ -1734,6 +1738,7 @@ pfsync_defer(struct pf_kstate *st, struct mbuf *m) struct pfsync_softc *sc = V_pfsyncif; struct pfsync_deferral *pd; struct pfsync_bucket *b; + struct ip *ip; if (m->m_flags & (M_BCAST|M_MCAST)) return (0); @@ -1751,6 +1756,13 @@ pfsync_defer(struct pf_kstate *st, struct mbuf *m) return (0); } + ip = mtod(m, struct ip *); + if (ip->ip_v == 4) { + int len = ntohs(ip->ip_len); + if (m_length(m, NULL) != len) + printf("%s() defer len %d / %d\n", __func__, m_length(m, NULL), len); + } + PFSYNC_BUCKET_LOCK(b); PFSYNC_UNLOCK(sc); That won't fix anything, but if we're looking at an incorrect IP packet that should trigger a log at least. I can reproduce your backtrace by setting an incorrect ip_len here.
(In reply to Kristof Provost from comment #34) I have a core dump with this patch installed. What do you want out of it?
(In reply to jjasen from comment #35) Mostly just the kernel output just before the panic. I'd expect to see 'pfsync_defer(): defer len ...' lines, and those are the ones I'm interested in.
(In reply to Kristof Provost from comment #36) any of this help? #8 0xffffffff80cc3a40 in m_copym (m=0x0, m@entry=0xfffff8062624c500, off0=8268, len=8192, wait=wait@entry=1) at /usr/src/sys/kern/uipc_mbuf.c:510 510 if (off < m->m_len) frame 9 #9 0xffffffff80de89bf in ip_fragment (ip=<optimized out>, ip@entry=0xfffff8062624c568, m_frag=m_frag@entry=0xfffffe0204784d80, mtu=mtu@entry=9000, if_hwassist_flags=<optimized out>) at /usr/src/sys/netinet/ip_output.c:992 992 m->m_next = m_copym(m0, off, len, M_NOWAIT); frame 10 #10 0xffffffff80de8373 in ip_output (m=0xfffff8062624c500, opt=<optimized out>, opt@entry=0x0, ro=<optimized out>, ro@entry=0x0, flags=flags@entry=0, imo=<optimized out>, imo@entry=0x0, inp=<optimized out>, inp@entry=0x0) at /usr/src/sys/netinet/ip_output.c:816 816 error = ip_fragment(ip, &m, mtu, ifp->if_hwassist); frame 11 #11 0xffffffff82919851 in pfsyncintr (arg=0xfffff80118fbbd80) at /usr/src/sys/netpfil/pf/if_pfsync.c:2361 2361 ip_output(m, NULL, NULL, 0, NULL, NULL); (kgdb) p ip_output $1 = {int (struct mbuf *, struct mbuf *, struct route *, int, struct ip_moptions *, struct inpcb *)} 0xffffffff80de7210 <ip_output>
(In reply to jjasen from comment #37) No, that's not what I'm looking for. I need the kernel's log messages. They should be in /var/crash/msgbuf.txt
(In reply to Kristof Provost from comment #38) Sorry, no such file. Nothing in core.txt either.
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=8edf0b52c40762d64b3d4318235b58ae5d4eff58 commit 8edf0b52c40762d64b3d4318235b58ae5d4eff58 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-01-13 03:34:20 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-01-28 01:34:38 +0000 pf: fix panic on deferred packets The pfsync_defer_tmo() callout needs to set the correct vnet before it can transmit packets. It used the rcvif in the mbuf to get this vnet, but that doesn't work for locally originated traffic. In that case the rcvif pointer is NULL, and the dereference leads to a panic. Instead use the sc_sync_if, which is always set (if pfsync is enabled, at least). PR: 268246 MFC after: 2 weeks (cherry picked from commit fd02192c3acaefeb62db11e0c10ab36240b79ba2) sys/netpfil/pf/if_pfsync.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=76e1ec4e10fa9a85bb8090cdf61e54c7a19508ee commit 76e1ec4e10fa9a85bb8090cdf61e54c7a19508ee Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-01-13 03:34:20 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-01-27 03:16:43 +0000 pf: fix panic on deferred packets The pfsync_defer_tmo() callout needs to set the correct vnet before it can transmit packets. It used the rcvif in the mbuf to get this vnet, but that doesn't work for locally originated traffic. In that case the rcvif pointer is NULL, and the dereference leads to a panic. Instead use the sc_sync_if, which is always set (if pfsync is enabled, at least). PR: 268246 MFC after: 2 weeks (cherry picked from commit fd02192c3acaefeb62db11e0c10ab36240b79ba2) sys/netpfil/pf/if_pfsync.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
(In reply to jjasen from comment #39) Let's be a bit less subtle then: diff --git a/sys/netpfil/pf/if_pfsync.c b/sys/netpfil/pf/if_pfsync.c index 61308a35a7e1..d0bc699e4d29 100644 --- a/sys/netpfil/pf/if_pfsync.c +++ b/sys/netpfil/pf/if_pfsync.c @@ -1734,6 +1738,7 @@ pfsync_defer(struct pf_kstate *st, struct mbuf *m) struct pfsync_softc *sc = V_pfsyncif; struct pfsync_deferral *pd; struct pfsync_bucket *b; + struct ip *ip; if (m->m_flags & (M_BCAST|M_MCAST)) return (0); @@ -1751,6 +1756,13 @@ pfsync_defer(struct pf_kstate *st, struct mbuf *m) return (0); } + ip = mtod(m, struct ip *); + if (ip->ip_v == 4) { + int len = ntohs(ip->ip_len); + if (m_length(m, NULL) != len) + panic("Incorrect ip_len %d != m_length %d", len, m_length(m, NULL)); + } + PFSYNC_BUCKET_LOCK(b); PFSYNC_UNLOCK(sc); If that panics we should have a pretty good idea of how we can end up in that situation. If it doesn't we have another mystery.
(In reply to Kristof Provost from comment #42) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=textdump@entry=1) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80c2bf41 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0xffffffff80c2c3be in vpanic (fmt=0xffffffff811ddd9e "%s", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0xffffffff80c2c1c3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:844 #5 0xffffffff810d5855 in trap_fatal (frame=0xfffffe0204794b00, eva=24) at /usr/src/sys/amd64/amd64/trap.c:944 #6 0xffffffff810d58af in trap_pfault (frame=0xfffffe0204794b00, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /usr/src/sys/amd64/amd64/trap.c:763 #7 <signal handler called> #8 0xffffffff80cc3a40 in m_copym (m=0x0, m@entry=0xfffff8049ee49a00, off0=8268, len=8192, wait=wait@entry=1) at /usr/src/sys/kern/uipc_mbuf.c:510 #9 0xffffffff80de89bf in ip_fragment (ip=<optimized out>, ip@entry=0xfffff8049ee49a68, m_frag=m_frag@entry=0xfffffe0204794d80, mtu=mtu@entry=9000, if_hwassist_flags=<optimized out>) at /usr/src/sys/netinet/ip_output.c:992 #10 0xffffffff80de8373 in ip_output (m=0xfffff8049ee49a00, opt=<optimized out>, opt@entry=0x0, ro=<optimized out>, ro@entry=0x0, flags=flags@entry=0, imo=<optimized out>, imo@entry=0x0, inp=<optimized out>, inp@entry=0x0) at /usr/src/sys/netinet/ip_output.c:816 #11 0xffffffff82a58851 in pfsyncintr (arg=0xfffff80118fb0d80) at /usr/src/sys/netpfil/pf/if_pfsync.c:2361 #12 0xffffffff80beb31a in intr_event_execute_handlers (ie=0xfffff80116107b00, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1168 #13 ithread_execute_handlers (ie=<optimized out>, p=<optimized out>) at /usr/src/sys/kern/kern_intr.c:1181 #14 ithread_loop (arg=arg@entry=0xfffff80118f4ae60) at /usr/src/sys/kern/kern_intr.c:1269 #15 0xffffffff80be7d1e in fork_exit (callout=0xffffffff80beb0c0 <ithread_loop>, arg=0xfffff80118f4ae60, frame=0xfffffe0204794f40) at /usr/src/sys/kern/kern_fork.c:1093 #16 <signal handler called> #17 mi_startup () at /usr/src/sys/kern/init_main.c:322 Backtrace stopped: Cannot access memory at address 0x8 frame 8 #8 0xffffffff80cc3a40 in m_copym (m=0x0, m@entry=0xfffff8049ee49a00, off0=8268, len=8192, wait=wait@entry=1) at /usr/src/sys/kern/uipc_mbuf.c:510 510 if (off < m->m_len) What else can I give you?
(In reply to jjasen from comment #43) That .. that doesn't make a whole lot of sense. With that patch the system should have panicked before that point. We don't modify mbufs while they're deferred, so if it's bad coming out it had to have been bad going in. Yet we didn't panic on insertion. So either the bad mbuf took the other path, for which I've extended the test patch now: diff --git a/sys/netpfil/pf/if_pfsync.c b/sys/netpfil/pf/if_pfsync.c index 61308a35a7e1..ae913185ec8a 100644 --- a/sys/netpfil/pf/if_pfsync.c +++ b/sys/netpfil/pf/if_pfsync.c @@ -604,7 +604,11 @@ pfsync_state_import(struct pfsync_state *sp, int flags) if (!(flags & PFSYNC_SI_IOCTL)) { st->state_flags &= ~PFSTATE_NOSYNC; if (st->state_flags & PFSTATE_ACK) { + struct pfsync_bucket *b = pfsync_get_bucket(sc, st); + PFSYNC_BUCKET_LOCK(b); pfsync_q_ins(st, PFSYNC_S_IACK, true); + PFSYNC_BUCKET_UNLOCK(b); + pfsync_push_all(sc); } } @@ -1690,6 +1694,17 @@ pfsync_sendout(int schedswi, int c) if_inc_counter(sc->sc_ifp, IFCOUNTER_OBYTES, m->m_pkthdr.len); b->b_len = PFSYNC_MINPKT; + { + struct ip *ip; + + ip = mtod(m, struct ip *); + if (ip->ip_v == 4) { + int len = ntohs(ip->ip_len); + if (m_length(m, NULL) != len) + panic("Incorrect ip_len %d != m_length %d", len, m_length(m, NULL)); + } + } + if (!_IF_QFULL(&b->b_snd)) _IF_ENQUEUE(&b->b_snd, m); else { @@ -1734,6 +1749,7 @@ pfsync_defer(struct pf_kstate *st, struct mbuf *m) struct pfsync_softc *sc = V_pfsyncif; struct pfsync_deferral *pd; struct pfsync_bucket *b; + struct ip *ip; if (m->m_flags & (M_BCAST|M_MCAST)) return (0); @@ -1751,6 +1767,13 @@ pfsync_defer(struct pf_kstate *st, struct mbuf *m) return (0); } + ip = mtod(m, struct ip *); + if (ip->ip_v == 4) { + int len = ntohs(ip->ip_len); + if (m_length(m, NULL) != len) + panic("Incorrect ip_len %d != m_length %d", len, m_length(m, NULL)); + } + PFSYNC_BUCKET_LOCK(b); PFSYNC_UNLOCK(sc); Or ... something else is wrong with that particular machine. Have you run a memory test on it?
(In reply to Kristof Provost from comment #44) I ran memory tests, disk tests, and even completely rebuilt the machine before opening a bug report. I'll drop this patch in today, and get back to you.
am seeing this panic verbatim on 13.1-RELEASE-p3 (backup) with 12.3-RELEASE-p1 primary. Both instances running as xen VMs, xn driver.
primary upgraded to 13.1-RELEASE-p3 and now it's crashing too, within seconds of enabling pfsync (i.e. panic with kernel uptime 25s) rc config variables: -- ifconfig_xn3="mtu 1504 up" ifconfig_vlan140="inet 10.x.y.1/24 vlan 140 vlandev xn3 up" pf_enable="YES" pflog_enable="YES" pfsync_enable="YES" pfsync_syncdev="vlan140" pfsync_syncpeer="10.x.y.2" pfsync_ifconfig="defer" -- the vlan140 mtu is 1500 on both devices. vlan140: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
(In reply to Nick Hilliard from comment #47) Do you have other non-1500 byte MTU interfaces on that machine? I'm still unable to reproduce this panic. We do know it's related to defer mode (which you have, so that much matches at least), but I wonder if there's an MTU factor or not.
> Do you have other non-1500 byte MTU interfaces on that machine? not in service, no. xn4 is up and also has mtu 1504, but there is nothing configured on the interface (i.e. no vlans, no ip addresses, no traffic) # ifconfig | egrep 'mtu|parent' lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 xn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 xn1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 xn2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 xn3: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1504 xn4: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1504 vlan100: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 vlan: 100 vlanproto: 802.1q vlanpcp: 0 parent interface: xn3 vlan110: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 vlan: 110 vlanproto: 802.1q vlanpcp: 0 parent interface: xn3 vlan120: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 vlan: 120 vlanproto: 802.1q vlanpcp: 0 parent interface: xn3 vlan130: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 vlan: 130 vlanproto: 802.1q vlanpcp: 0 parent interface: xn3 vlan140: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 vlan: 140 vlanproto: 802.1q vlanpcp: 0 parent interface: xn3 pflog0: flags=141<UP,RUNNING,PROMISC> metric 0 mtu 33160 #
kernel panic hot off the presses db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe020479e980 vpanic() at vpanic+0x17f/frame 0xfffffe020479e9d0 panic() at panic+0x43/frame 0xfffffe020479ea30 trap_fatal() at trap_fatal+0x385/frame 0xfffffe020479ea90 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe020479eaf0 calltrap() at calltrap+0x8/frame 0xfffffe020479eaf0 --- trap 0xc, rip = 0xffffffff80cc3a40, rsp = 0xfffffe020479ebc0, rbp = 0xfffffe020479ec20 --- m_copym() at m_copym+0x30/frame 0xfffffe020479ec20 ip_fragment() at ip_fragment+0x24f/frame 0xfffffe020479ecc0 ip_output() at ip_output+0x1163/frame 0xfffffe020479edf0 pfsyncintr() at pfsyncintr+0x151/frame 0xfffffe020479ee60 ithread_loop() at ithread_loop+0x25a/frame 0xfffffe020479eef0 fork_exit() at fork_exit+0x7e/frame 0xfffffe020479ef30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe020479ef30 --- trap 0, rip = 0xffffffff80bb7bcf, rsp = 0, rbp = 0 --- mi_startup() at mi_startup+0xdf Uptime: 2h11m9s Dumping 7420 out of 130981 MB:..1%..11%..21%..31%..41%..51%..61% What do you need from the dump?
(In reply to jjasen from comment #50) That's still the same backtrace. If that's running with the patch from comment #47 I am perplexed. That would mean the mbuf is fine going into the queue, and broken coming out of it. That makes no sense at all.
This is new: __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
But now we at least have someone else potentially seeing the same thing. It's a relief to not be the only crazy one. How can I continue to help?
(In reply to jjasen from comment #53) I really need to be able to reproduce this, or at least poke at a machine that can trigger this somewhat reliably. Until then we're stuck doing test-kernels with added assertions. I've extended the patch to also check the mbuf when it comes out of the queue again. I'm kind of hoping it'll look good then, because then we might be looking at a bug in ip_fragment() rather than corruption while it's queued. I've uploaded the patch here, because it's getting a bit unwieldy for the comment field. https://people.freebsd.org/~kp/0001-pfsync-debug-hacks-for-PR268246.patch
I've pasted the output for #comment54 into http://p.ip.fi/FE7p
btw which source version is that patch committed to? I manually patched it against releng/13.1. # git branch -l main * releng/13.1 #
(In reply to Nick Hilliard from comment #55) That one's already fixed. See Comment #22 or https://cgit.freebsd.org/src/commit/?id=fd02192c3acaefeb62db11e0c10ab36240b79ba2 The debug patch is against main, but the code is basically identical in stable/13 or releng/13.1 and the patch will just work there too.
> That one's already fixed. See Comment #22 or https://cgit.freebsd.org/src/commit/?id=fd02192c3acaefeb62db11e0c10ab36240b79ba2 ok, i've committed that and enabled pfsync. Will let you know if it still crashes.
(In reply to Nick Hilliard from comment #58) I've found outbound traffic from the secondary firewall will trigger the pfsyncintr issue, versus the pfsync_defer_tmo issue you posted. You may want to try, depending on your environment, a puppet agent run on your backup firewall, a remote backup, scp'ing a crash dump to another machine, a 'pkg update', a freebsd-install fetch, and/or other remote connections forcing a pf state matching entry forcing a pfsync from the backup. Hope this helps!
(In reply to Kristof Provost from comment #57) no panics since that patch, so that seems to have resolved things. I'm not seeing jjasen@'s panic.
(In reply to Kristof Provost from comment #54) Compiling now. Also upgraded to STABLE src versus RELEASE for this one.
fresh from the dump -- What do you need from it? Fatal trap 12: page fault while in kernel mode cpuid = 10; apic id = 24 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80cd15d0 stack pointer = 0x28:0xfffffe020478fbc0 frame pointer = 0x28:0xfffffe020478fc20 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (swi1: pfsync) trap number = 12 panic: page fault cpuid = 10 time = 1675713418 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe020478f980 vpanic() at vpanic+0x151/frame 0xfffffe020478f9d0 panic() at panic+0x43/frame 0xfffffe020478fa30 trap_fatal() at trap_fatal+0x385/frame 0xfffffe020478fa90 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe020478faf0 calltrap() at calltrap+0x8/frame 0xfffffe020478faf0 --- trap 0xc, rip = 0xffffffff80cd15d0, rsp = 0xfffffe020478fbc0, rbp = 0xfffffe020478fc20 --- m_copym() at m_copym+0x30/frame 0xfffffe020478fc20 ip_fragment() at ip_fragment+0x24f/frame 0xfffffe020478fcc0 ip_output() at ip_output+0x1152/frame 0xfffffe020478fdf0 pfsyncintr() at pfsyncintr+0x151/frame 0xfffffe020478fe60 ithread_loop() at ithread_loop+0x25a/frame 0xfffffe020478fef0 fork_exit() at fork_exit+0x7e/frame 0xfffffe020478ff30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe020478ff30 --- trap 0x82c4efe0, rip = 0xffffffff80bc2cbf, rsp = 0, rbp = 0 --- mi_startup() at mi_startup+0xdf Uptime: 1m58s
(In reply to jjasen from comment #62) So that's still the same backtrace, and with the patch from comment #54 we can assume that the mbuf length and IP length actually match. 'info locals' in ip_fragment() would be a good start, as would prints of 'p *ip' and 'p/x *ip', much like we've done before. That'll probably look the same, but let's confirm anyway. That now looks more like a bug in ip_fragment() rather than any of the pfsync code, but let's see where the data takes us.
(In reply to Kristof Provost from comment #63) Depressing. (kgdb) frame 11 #11 0xffffffff80dfe81f in ip_fragment (ip=<optimized out>, ip@entry=0xfffff802d9447a68, m_frag=m_frag@entry=0xfffffe020478fd80, mtu=mtu@entry=9000, if_hwassist_flags=<optimized out>) at /root/usr/src/sys/netinet/ip_output.c:992 992 m->m_next = m_copym(m0, off, len, M_NOWAIT); (kgdb) info locals mhip = 0xfffff8036055c468 m = 0xfffff8036055c400 mhlen = <optimized out> m0 = <optimized out> len = 8192 hlen = 20 error = 0 ip_len = <optimized out> ip_off = 1600 off = 8268 nfrags = 2 mnext = <optimized out> firstlen = 56 (kgdb) p *ip value has been optimized out (kgdb) p/x *ip value has been optimized out
(In reply to jjasen from comment #64) Try 'p *mhip' and 'p/x *mhip'.
(In reply to Kristof Provost from comment #65) a -- I went to explore frame 12 a bit: frame 12 #12 0xffffffff80dfe1d2 in ip_output (m=0xfffff802d9447a00, opt=<optimized out>, opt@entry=0x0, ro=<optimized out>, ro@entry=0x0, flags=flags@entry=0, imo=<optimized out>, imo@entry=0x0, inp=<optimized out>, inp@entry=0x0) at /root/usr/src/sys/netinet/ip_output.c:816 816 error = ip_fragment(ip, &m, mtu, ifp->if_hwassist); (kgdb) p ip $5 = (struct ip *) 0xfffff802d9447a68 (kgdb) p/x *ip $6 = {ip_hl = 0x5, ip_v = 0x4, ip_tos = 0xd, ip_len = 0x9887, ip_id = 0xfb57, ip_off = 0x4006, ip_ttl = 0x20, ip_p = 0x1, ip_sum = 0xd004, ip_src = {s_addr = 0x69011824}, ip_dst = { s_addr = 0x0}} b -- back to frame 11, per request: (kgdb) frame 11 #11 0xffffffff80dfe81f in ip_fragment (ip=<optimized out>, ip@entry=0xfffff802d9447a68, m_frag=m_frag@entry=0xfffffe020478fd80, mtu=mtu@entry=9000, if_hwassist_flags=<optimized out>) at /root/usr/src/sys/netinet/ip_output.c:992 992 m->m_next = m_copym(m0, off, len, M_NOWAIT); (kgdb) p *mhip $7 = {ip_hl = 5 '\005', ip_v = 4 '\004', ip_tos = 13 '\r', ip_len = 5152, ip_id = 64343, ip_off = 10823, ip_ttl = 32 ' ', ip_p = 1 '\001', ip_sum = 53252, ip_src = {s_addr = 1761679396}, ip_dst = {s_addr = 0}} (kgdb) p/x *mhip $8 = {ip_hl = 0x5, ip_v = 0x4, ip_tos = 0xd, ip_len = 0x1420, ip_id = 0xfb57, ip_off = 0x2a47, ip_ttl = 0x20, ip_p = 0x1, ip_sum = 0xd004, ip_src = {s_addr = 0x69011824}, ip_dst = { s_addr = 0x0}}
That destination address is a bit odd... Do you have IPv6 traffic? pfsync always uses ip_output() for deferred traffic, which might perhaps explain the panic if we're shoving IPv6 packets through that. So let's try to skip IPv6 for now: diff --git a/sys/netpfil/pf/if_pfsync.c b/sys/netpfil/pf/if_pfsync.c index 47c3217f399c..a9e6988ff7af 100644 --- a/sys/netpfil/pf/if_pfsync.c +++ b/sys/netpfil/pf/if_pfsync.c @@ -2345,6 +2345,15 @@ pfsyncintr(void *arg) n = m->m_nextpkt; m->m_nextpkt = NULL; + { + struct ip *ip = mtod(m, struct ip *); + if (ip->ip_v != IPVERSION) { + printf("%s() skipping !IPv4 traffic\n", __func__); + m_freem(m); + continue; + } + } + /* * We distinguish between a deferral packet and our * own pfsync packet based on M_SKIP_FIREWALL Obviously that's not a fix, but if that stops the panic at least we'll know and we can work on a real fix later.
Yes, it's crashing on ip6 traffic. 2023-02-09T12:56:35.997491-05:00 extgw2 kernel: pfsyncintr() skipping !IPv4 traffic 2023-02-09T12:56:38.320484-05:00 extgw2 kernel: pfsyncintr() skipping !IPv4 traffic 2023-02-09T12:56:42.084492-05:00 extgw2 kernel: pfsyncintr() skipping !IPv4 traffic 2023-02-09T12:57:51.981479-05:00 extgw2 kernel: pfsyncintr() skipping !IPv4 traffic 2023-02-09T12:57:53.455473-05:00 extgw2 kernel: pfsyncintr() skipping !IPv4 traffic 2023-02-09T12:57:54.509227-05:00 extgw2 kernel: pfsyncintr() skipping !IPv4 traffic 2023-02-09T12:57:58.280231-05:00 extgw2 kernel: pfsyncintr() skipping !IPv4 traffic
OK, that's weird .... I just got nailed by a pfsync_defer_tmo while I was typing this up. It should be fixed in 13.1-STABLE source? db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe017cbedaf0 vpanic() at vpanic+0x151/frame 0xfffffe017cbedb40 panic() at panic+0x43/frame 0xfffffe017cbedba0 trap_fatal() at trap_fatal+0x385/frame 0xfffffe017cbedc00 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe017cbedc60 calltrap() at calltrap+0x8/frame 0xfffffe017cbedc60 --- trap 0xc, rip = 0xffffffff8235c624, rsp = 0xfffffe017cbedd30, rbp = 0xfffffe0 --- pfsync_defer_tmo() at pfsync_defer_tmo+0xa4/frame 0xfffffe017cbedd90 softclock_call_cc() at softclock_call_cc+0x133/frame 0xfffffe017cbede40 softclock() at softclock+0x79/frame 0xfffffe017cbede60 ithread_loop() at ithread_loop+0x25a/frame 0xfffffe017cbedef0 fork_exit() at fork_exit+0x7e/frame 0xfffffe017cbedf30 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe017cbedf30 --- trap 0x82c76fe0, rip = 0xffffffff80bc2cbf, rsp = 0, rbp = 0 --- mi_startup() at mi_startup+0xdf
(In reply to jjasen from comment #68) Okay, excellent. I think we've figured out what's going on here. I also think I see why I can't reproduce it yet. It's interpreting the IPv6 header as an IPv4 header (and we didn't notice so far, because the ip_fragment() code overwrites the version field in the header), and it's reading the flow label as length. We're defaulting to a flow label of 0, and that's putting us in a code path that avoids ip_fragment(). (The "If small enough for interface" case in ip_output()) I'm having an argument with some of the test tools to try to get a flow label set so I can make it panic locally. It should be relatively straightforward to fix this, but I'm going to be on poor connectivity for a few days, so it won't be for tomorrow. The pfsync_defer_tmo fix is in stable/13 and will be part of 13.2, but it's not in releng/13.1 (and never will be).
(In reply to Kristof Provost from comment #70) Yeah, according to the EOL schedule, 13.1 has maybe six months to live. Thanks for reminding me.
I'm still failing to reproduce, but this should be close to a real fix: diff --git a/sys/netpfil/pf/if_pfsync.c b/sys/netpfil/pf/if_pfsync.c index 47c3217f399c..4ebd304b1c13 100644 --- a/sys/netpfil/pf/if_pfsync.c +++ b/sys/netpfil/pf/if_pfsync.c @@ -102,6 +102,9 @@ __FBSDID("$FreeBSD$"); #include <netinet/tcp_fsm.h> #include <netinet/tcp_seq.h> +#include <netinet/ip6.h> +#include <netinet6/ip6_var.h> + #define PFSYNC_MINPKT ( \ sizeof(struct ip) + \ sizeof(struct pfsync_header) + \ @@ -2325,7 +2328,8 @@ pfsyncintr(void *arg) struct pfsync_softc *sc = arg; struct pfsync_bucket *b; struct mbuf *m, *n; - int c; + struct ip *ip; + int c, error; NET_EPOCH_ENTER(et); CURVNET_SET(sc->sc_ifp->if_vnet); @@ -2345,15 +2349,26 @@ pfsyncintr(void *arg) n = m->m_nextpkt; m->m_nextpkt = NULL; + ip = mtod(m, struct ip *); + /* * We distinguish between a deferral packet and our * own pfsync packet based on M_SKIP_FIREWALL * flag. This is XXX. */ - if (m->m_flags & M_SKIP_FIREWALL) - ip_output(m, NULL, NULL, 0, NULL, NULL); - else if (ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, - NULL) == 0) + if (m->m_flags & M_SKIP_FIREWALL) { + if (ip->ip_v == IPVERSION) + error = ip_output(m, NULL, NULL, 0, NULL, NULL); + else + error = ip6_output(m, NULL, NULL, 0, NULL, NULL, NULL); + } else { + if (ip->ip_v == IPVERSION) + error = ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, + NULL); + else + error = ENOTSUP; // When we add pfsync over IPv6 + } + if (error == 0) V_pfsyncstats.pfsyncs_opackets++; else V_pfsyncstats.pfsyncs_oerrors++;
(In reply to Kristof Provost from comment #72) I speculate cooking up a case where an IPv4 attempt from the firewall fails, it reverts to IPv6, and attempts to insert that state. I am able to reproduce via puppet runs, which invoke GPG key refreshes -- and they sporadically fail over IPv4 and try IPv6, which triggers the panic. I was previously able to reproduce via pkg commands as well, but this is unreliable now. Pulling down your patch now. Thanks!
Ooopsie! Did I goof up? mno-avx -std=iso9899:1999 -c /root/usr/src/sys/netpfil/pf/if_pfsync.c -o if_pfsync.o /root/usr/src/sys/netpfil/pf/if_pfsync.c:2368:48: error: implicit declaration of function 'ip6_output' is invalid in C99 [-Werror,-Wimplicit-function-declaration] error = ip6_output(m, NULL, NULL, 0, NULL, NULL, NULL); ^ /root/usr/src/sys/netpfil/pf/if_pfsync.c:2368:48: note: did you mean 'ip_output'? /root/usr/src/sys/netinet/ip_var.h:223:5: note: 'ip_output' declared here int ip_output(struct mbuf *,
(In reply to jjasen from comment #74) Did the patch apply fully? The ip6_output() prototype should be in `#include <netinet6/ip6_var.h>`.
(In reply to Kristof Provost from comment #75) No. I stuffed it in manually. Was this against clean -stable or against what we were working on?
(In reply to jjasen from comment #76) Yeah, that's against stable/13.
(In reply to Kristof Provost from comment #77) OK. I compiled a kernel with this in. I need to add the pfsync_defer_tmo patch as well. On the upside, I was able to generate traffic on this system repeatedly, which previously caused crashes. Three puppet runs successful, then put it in a while loop until I tripped over defer_tmo.
(In reply to jjasen from comment #78) defer_tmo patch added as well, started test bombardment. So far, no issues.
Good news. We may have fixed pfsyncintr. Bad news. I don't think we defeated pfsync_defer_tmo Want the usual bt, frame and prints?
Yes. Also show me what code you have for that function, so we’re sure we’re looking at the same thing.
(In reply to Kristof Provost from comment #81) I'm using this: https://cgit.freebsd.org/src/commit/?id=8edf0b52c40762d64b3d4318235b58ae5d4eff58 Other stuff up shortly.
BT: #0 __curthread () at /root/usr/src/sys/amd64/include/pcpu_aux.h:55 #1 dump_savectx () at /root/usr/src/sys/kern/kern_shutdown.c:394 #2 0xffffffff80c38ae8 in dumpsys (di=0x0) at /root/usr/src/sys/x86/include/dump.h:87 #3 doadump (textdump=<optimized out>) at /root/usr/src/sys/kern/kern_shutdown.c:423 #4 kern_reboot (howto=260) at /root/usr/src/sys/kern/kern_shutdown.c:497 #5 0xffffffff80c38f6e in vpanic (fmt=<optimized out>, ap=ap@entry=0xfffffe017cbed940) at /root/usr/src/sys/kern/kern_shutdown.c:930 #6 0xffffffff80c38da3 in panic (fmt=<unavailable>) at /root/usr/src/sys/kern/kern_shutdown.c:854 #7 0xffffffff81101c95 in trap_fatal (frame=0xfffffe017cbeda30, eva=24) at /root/usr/src/sys/amd64/amd64/trap.c:942 #8 0xffffffff81101cef in trap_pfault (frame=0xfffffe017cbeda30, usermode=false, signo=<optimized out>, ucode=<optimized out>) at /root/usr/src/sys/amd64/amd64/trap.c:761 #9 <signal handler called> #10 0xffffffff80cd15d0 in m_copym (m=0x0, m@entry=0xfffff80115340200, off0=8268, len=8192, wait=wait@entry=1) at /root/usr/src/sys/kern/uipc_mbuf.c:510 #11 0xffffffff80dfe81f in ip_fragment (ip=<optimized out>, ip@entry=0xfffff80115340268, m_frag=m_frag@entry=0xfffffe017cbedcb0, mtu=mtu@entry=9000, if_hwassist_flags=<optimized out>) at /root/usr/src/sys/netinet/ip_output.c:992 #12 0xffffffff80dfe1d2 in ip_output (m=m@entry=0xfffff80115340200, opt=<optimized out>, opt@entry=0x0, ro=<optimized out>, ro@entry=0x0, flags=flags@entry=0, imo=<optimized out>, imo@entry=0x0, inp=<optimized out>, inp@entry=0x0) at /root/usr/src/sys/netinet/ip_output.c:816 #13 0xffffffff823516dc in pfsync_defer_tmo (arg=0xfffff806ea395280) at /root/usr/src/sys/netpfil/pf/if_pfsync.c:1838 #14 0xffffffff80c57263 in softclock_call_cc (c=<optimized out>, cc=cc@entry=0xffffffff81ebac00 <cc_cpu>, direct=direct@entry=0) at /root/usr/src/sys/kern/kern_timeout.c:692 #15 0xffffffff80c576c9 in softclock (arg=0xffffffff81ebac00 <cc_cpu>) at /root/usr/src/sys/kern/kern_timeout.c:812 #16 0xffffffff80bf6eea in intr_event_execute_handlers (ie=0xfffff80104901300, p=<optimized out>) at /root/usr/src/sys/kern/kern_intr.c:1169 #17 ithread_execute_handlers (ie=<optimized out>, p=<optimized out>) at /root/usr/src/sys/kern/kern_intr.c:1182 #18 ithread_loop (arg=arg@entry=0xfffff801048949c0) at /root/usr/src/sys/kern/kern_intr.c:1270 #19 0xffffffff80bf38de in fork_exit (callout=0xffffffff80bf6c90 <ithread_loop>, arg=0xfffff801048949c0, frame=0xfffffe017cbedf40) at /root/usr/src/sys/kern/kern_fork.c:1093 #20 <signal handler called> #21 mi_startup () at /root/usr/src/sys/kern/init_main.c:322 Backtrace stopped: Cannot access memory at address 0x8 (kgdb) frame 13 #13 0xffffffff823516dc in pfsync_defer_tmo (arg=0xfffff806ea395280) at /root/usr/src/sys/netpfil/pf/if_pfsync.c:1838 1838 ip_output(m, NULL, NULL, 0, NULL, NULL); p ip_output $1 = {int (struct mbuf *, struct mbuf *, struct route *, int, struct ip_moptions *, struct inpcb *)} 0xffffffff80dfd080 <ip_output> kgdb) print m $4 = (struct mbuf *) 0xfffff80115340200 (kgdb) print *m $5 = {{m_next = 0x0, m_slist = {sle_next = 0x0}, m_stailq = {stqe_next = 0x0}}, { m_nextpkt = 0xfffff802ff9f1d00, m_slistpkt = {sle_next = 0xfffff802ff9f1d00}, m_stailqpkt = { stqe_next = 0xfffff802ff9f1d00}}, m_data = 0xfffff80115340268 "E", m_len = 80, m_type = 1, m_flags = 32770, {{{m_pkthdr = {{snd_tag = 0x0, rcvif = 0x0}, tags = {slh_first = 0x0}, len = 80, flowid = 0, csum_flags = 1025, fibnum = 0, numa_domain = 255 '\377', rsstype = 0 '\000', {rcv_tstmp = 0, {l2hlen = 0 '\000', l3hlen = 0 '\000', l4hlen = 0 '\000', l5hlen = 0 '\000', inner_l2hlen = 0 '\000', inner_l3hlen = 0 '\000', inner_l4hlen = 0 '\000', inner_l5hlen = 0 '\000'}}, PH_per = {eight = "\000\000\000\000\020\000\000", sixteen = {0, 0, 16, 0}, thirtytwo = { 0, 16}, sixtyfour = {68719476736}, unintptr = {68719476736}, ptr = 0x1000000000}, PH_loc = {eight = "\000\000\000\000\000\000\000", sixteen = {0, 0, 0, 0}, thirtytwo = { 0, 0}, sixtyfour = {0}, unintptr = {0}, ptr = 0x0}}, {m_epg_npgs = 0 '\000', m_epg_nrdy = 0 '\000', m_epg_hdrlen = 0 '\000', m_epg_trllen = 0 '\000', m_epg_1st_off = 0, m_epg_last_len = 0, m_epg_flags = 0 '\000', m_epg_record_type = 0 '\000', __spare = "\000", m_epg_enc_cnt = 0, m_epg_tls = 0x50, m_epg_so = 0xff000000000401, m_epg_seqno = 0, m_epg_stailq = { stqe_next = 0x1000000000}}}, {m_ext = {{ext_count = 742590208, ext_cnt = 0x70050a62c430700}, ext_size = 2954964035, ext_type = 8, ext_flags = 17664, { { ext_buf = 0x400646585e8d0045 <error: Cannot access memory at address 0x400646585e8d0045>, ext_arg2 = 0x69011824d0040120}, {extpg_pa = {4613452213645017157, 7566355395346956576, 720575940379279360, 11029316931009191974, 216172782113797907}, extpg_trail = "͡\000P\002;.\373\000\000\000\000\240\002\377\377\345\232\000\000\002\004\"\354\001\003\003\016\004\002\b\n\257<3\346\000\000\000\000\001\034P\001\020\231\023\067\000\000\000\000\000\003\000\000\000\000\000\000\000\000\000", extpg_hdr = '\000' <repeats 22 times>}}, ext_free = 0xffffffff82f67210 <rxb_free>, ext_arg1 = 0xfffff802ef19bfc0}, m_pktdat = 0xfffff80115340258 ""}}, m_dat = 0xfffff80115340220 ""}}
Ah, that's the same issue, but in the tmo function now. Try this: diff --git a/sys/netpfil/pf/if_pfsync.c b/sys/netpfil/pf/if_pfsync.c index 47c3217f399c..fd5be82367aa 100644 --- a/sys/netpfil/pf/if_pfsync.c +++ b/sys/netpfil/pf/if_pfsync.c @@ -102,6 +102,9 @@ __FBSDID("$FreeBSD$"); #include <netinet/tcp_fsm.h> #include <netinet/tcp_seq.h> +#include <netinet/ip6.h> +#include <netinet6/ip6_var.h> + #define PFSYNC_MINPKT ( \ sizeof(struct ip) + \ sizeof(struct pfsync_header) + \ @@ -1819,6 +1822,7 @@ pfsync_defer_tmo(void *arg) struct mbuf *m = pd->pd_m; struct pf_kstate *st = pd->pd_st; struct pfsync_bucket *b = pfsync_get_bucket(sc, st); + struct ip *ip; PFSYNC_BUCKET_LOCK_ASSERT(b); @@ -1833,9 +1837,14 @@ pfsync_defer_tmo(void *arg) pd->pd_st->state_flags &= ~PFSTATE_ACK; /* XXX: locking! */ if (pd->pd_refs == 0) free(pd, M_PFSYNC); - PFSYNC_BUCKET_UNLOCK(b); + PFSYNC_BUCKET_UNLOCK(b); - ip_output(m, NULL, NULL, 0, NULL, NULL); + ip = mtod(m, struct ip *); + + if (ip->ip_v == IPVERSION) + ip_output(m, NULL, NULL, 0, NULL, NULL); + else + ip6_output(m, NULL, NULL, 0, NULL, NULL, NULL); pf_release_state(st); @@ -2325,7 +2334,8 @@ pfsyncintr(void *arg) struct pfsync_softc *sc = arg; struct pfsync_bucket *b; struct mbuf *m, *n; - int c; + struct ip *ip; + int c, error; NET_EPOCH_ENTER(et); CURVNET_SET(sc->sc_ifp->if_vnet); @@ -2345,15 +2355,26 @@ pfsyncintr(void *arg) n = m->m_nextpkt; m->m_nextpkt = NULL; + ip = mtod(m, struct ip *); + /* * We distinguish between a deferral packet and our * own pfsync packet based on M_SKIP_FIREWALL * flag. This is XXX. */ - if (m->m_flags & M_SKIP_FIREWALL) - ip_output(m, NULL, NULL, 0, NULL, NULL); - else if (ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, - NULL) == 0) + if (m->m_flags & M_SKIP_FIREWALL) { + if (ip->ip_v == IPVERSION) + error = ip_output(m, NULL, NULL, 0, NULL, NULL); + else + error = ip6_output(m, NULL, NULL, 0, NULL, NULL, NULL); + } else { + if (ip->ip_v == IPVERSION) + error = ip_output(m, NULL, NULL, IP_RAWOUTPUT, &sc->sc_imo, + NULL); + else + error = ENOTSUP; // When we add pfsync over IPv6 + } + if (error == 0) V_pfsyncstats.pfsyncs_opackets++; else V_pfsyncstats.pfsyncs_oerrors++;
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9a1cab6d79b7286e5f650f57ed95625e6ddb8e4b commit 9a1cab6d79b7286e5f650f57ed95625e6ddb8e4b Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-02-14 06:11:38 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-02-16 06:43:04 +0000 pfsync: support deferring IPv6 packets When we send out a deferred packet we must make sure to call ip6_output() for IPv6 packets. If not we might end up attempting to ip_fragment() an IPv6 packet, which could lead to us reading outside of the mbuf. PR: 268246 Reviewed by: melifaro, zlei MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D38586 sys/netpfil/pf/if_pfsync.c | 91 +++++++++++++++++++++++++++++----------------- 1 file changed, 58 insertions(+), 33 deletions(-)
(In reply to Kristof Provost from comment #84) Compiling now.
(In reply to jjasen from comment #86) iterative tests started. Fingers crossed!
(In reply to jjasen from comment #87) So far, we held together overnight without any issue!
Hello :) I just want to add that we are watching this bug with great interest and will be upgrading our pf firewall cluster to latest stable/13 as soon as the ipv6 defer MFC lands. We need defer because they are two active/active routers, but they reboot as soon as we enable it. I've been trying to reproduce in a lab for a while and finally found this bug today. Thank you all for your work on this, it is very appreciated. Best regards, Thomas Steen Rasmussen / Tykling
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=dacffdd4dc511ae73e8fd3eb19f9efe4ecb26ba1 commit dacffdd4dc511ae73e8fd3eb19f9efe4ecb26ba1 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-02-14 06:11:38 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-03-02 11:15:28 +0000 pfsync: support deferring IPv6 packets When we send out a deferred packet we must make sure to call ip6_output() for IPv6 packets. If not we might end up attempting to ip_fragment() an IPv6 packet, which could lead to us reading outside of the mbuf. PR: 268246 Reviewed by: melifaro, zlei MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D38586 (cherry picked from commit 9a1cab6d79b7286e5f650f57ed95625e6ddb8e4b) sys/netpfil/pf/if_pfsync.c | 71 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 56 insertions(+), 15 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3dec62eded04eaf431bf0948f4e6412deede87d5 commit 3dec62eded04eaf431bf0948f4e6412deede87d5 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-02-14 06:11:38 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-03-02 16:21:59 +0000 pfsync: support deferring IPv6 packets When we send out a deferred packet we must make sure to call ip6_output() for IPv6 packets. If not we might end up attempting to ip_fragment() an IPv6 packet, which could lead to us reading outside of the mbuf. PR: 268246 Reviewed by: melifaro, zlei MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D38586 (cherry picked from commit 9a1cab6d79b7286e5f650f57ed95625e6ddb8e4b) sys/netpfil/pf/if_pfsync.c | 71 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 56 insertions(+), 15 deletions(-)
Hello, I have the same symptoms (crash of the kernel when having IPv6 packets over a pfsync interface with defer on). I managed to reproduce the problem/crash with only four lines from a fresh install : kldload pfsync echo "pass all" | pfctl -ef - ifconfig pfsync0 defer up ping ::1 From my tests the problem is present only starting from the 13.1 kernel. I tried with the latest stable/13 (containing your patches) and it still crash.
(In reply to mikael.le-bohec from comment #92) You're tripping over a different panic. (And it would have been helpful to actually include the backtrace in your report.) > panic: _mtx_lock_sleep: recursed on non-recursive mutex pfsync bucket @ /usr/src/sys/netpfil/pf/if_pfsync.c:2295 > > cpuid = 0 > time = 1679262157 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00c72cbc50 > vpanic() at vpanic+0x152/frame 0xfffffe00c72cbca0 > panic() at panic+0x43/frame 0xfffffe00c72cbd00 > __mtx_lock_sleep() at __mtx_lock_sleep+0x43d/frame 0xfffffe00c72cbd90 > __mtx_lock_flags() at __mtx_lock_flags+0xe5/frame 0xfffffe00c72cbde0 > pfsync_timeout() at pfsync_timeout+0x64/frame 0xfffffe00c72cbe10 > softclock_call_cc() at softclock_call_cc+0x14b/frame 0xfffffe00c72cbec0 > softclock_thread() at softclock_thread+0xc6/frame 0xfffffe00c72cbef0 > fork_exit() at fork_exit+0x80/frame 0xfffffe00c72cbf30 > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00c72cbf30 > --- trap 0, rip = 0, rsp = 0, rbp = 0 --- > KDB: enter: panic Which I can reproduce but do not understand. There's no obvious way for us to end up recursing on the bucket lock there, nor is there a clear path where we could fail to unlock a bucket.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=53247cdf12449e90f6736ae563e4cce8315c923f commit 53247cdf12449e90f6736ae563e4cce8315c923f Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-03-20 13:29:55 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-03-20 15:39:14 +0000 pfsync: fix pfsync_undefer_state() locking pfsync_undefer_state() takes the bucket lock, but could get called from places (e.g. from pfsync_update_state() or pfsync_delete_state()) where we already held the lock. As it can also be called from places where we don't yet hold the lock create new locked variant for use when the lock is already held. Keep using pfsync_undefer_state() where the lock must still be taken. PR: 268246 MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) sys/netpfil/pf/if_pfsync.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a6719858a48019aa54e1ea3be57d17fa88b080c6 commit a6719858a48019aa54e1ea3be57d17fa88b080c6 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-03-20 13:58:34 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-03-20 15:39:14 +0000 pf tests: try to trigger locking issues in pfsync_timeout() PR: 268246 MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) tests/sys/netpfil/pf/pfsync.sh | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=844ad2828a35c434b893af4274b1f6c50332dd70 commit 844ad2828a35c434b893af4274b1f6c50332dd70 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-03-20 13:26:33 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-03-20 15:39:14 +0000 pfsync: add missing unlock in pfsync_defer_tmo() The callout for pfsync_defer_tmo() is created with CALLOUT_RETURNUNLOCKED, because while the callout framework takes care of taking the lock we want to run a few operations outside of the lock, so we unlock ourselves. However, if `sc->sc_sync_if == NULL` we return without releasing the lock, and leak the lock, causing later deadlocks. Ensure we always release the bucket lock when we exit pfsync_defer_tmo() PR: 268246 MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) sys/netpfil/pf/if_pfsync.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5d6b503ed097733af83003cf675f3d55c2b90756 commit 5d6b503ed097733af83003cf675f3d55c2b90756 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-03-20 13:58:34 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-03-27 08:16:03 +0000 pf tests: try to trigger locking issues in pfsync_timeout() PR: 268246 MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) (cherry picked from commit a6719858a48019aa54e1ea3be57d17fa88b080c6) tests/sys/netpfil/pf/pfsync.sh | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6b916bc91d76a4e91629074779a89b22207ea053 commit 6b916bc91d76a4e91629074779a89b22207ea053 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-03-20 13:58:34 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-03-27 00:29:59 +0000 pf tests: try to trigger locking issues in pfsync_timeout() PR: 268246 MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) (cherry picked from commit a6719858a48019aa54e1ea3be57d17fa88b080c6) tests/sys/netpfil/pf/pfsync.sh | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3eb2341caaa307a8d067333c8aebe3e269ade2fd commit 3eb2341caaa307a8d067333c8aebe3e269ade2fd Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-03-20 13:26:33 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-03-27 08:16:03 +0000 pfsync: add missing unlock in pfsync_defer_tmo() The callout for pfsync_defer_tmo() is created with CALLOUT_RETURNUNLOCKED, because while the callout framework takes care of taking the lock we want to run a few operations outside of the lock, so we unlock ourselves. However, if `sc->sc_sync_if == NULL` we return without releasing the lock, and leak the lock, causing later deadlocks. Ensure we always release the bucket lock when we exit pfsync_defer_tmo() PR: 268246 MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) (cherry picked from commit 844ad2828a35c434b893af4274b1f6c50332dd70) sys/netpfil/pf/if_pfsync.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=09b7cab88f652e3a3682749e38c043d25dd7d0ed commit 09b7cab88f652e3a3682749e38c043d25dd7d0ed Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-03-20 13:29:55 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-03-27 00:28:32 +0000 pfsync: fix pfsync_undefer_state() locking pfsync_undefer_state() takes the bucket lock, but could get called from places (e.g. from pfsync_update_state() or pfsync_delete_state()) where we already held the lock. As it can also be called from places where we don't yet hold the lock create new locked variant for use when the lock is already held. Keep using pfsync_undefer_state() where the lock must still be taken. PR: 268246 MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) (cherry picked from commit 53247cdf12449e90f6736ae563e4cce8315c923f) sys/netpfil/pf/if_pfsync.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=1d6ece4351a01179e3a5ebbdb3fa2e6053a2d7aa commit 1d6ece4351a01179e3a5ebbdb3fa2e6053a2d7aa Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-03-20 13:26:33 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-03-27 00:28:37 +0000 pfsync: add missing unlock in pfsync_defer_tmo() The callout for pfsync_defer_tmo() is created with CALLOUT_RETURNUNLOCKED, because while the callout framework takes care of taking the lock we want to run a few operations outside of the lock, so we unlock ourselves. However, if `sc->sc_sync_if == NULL` we return without releasing the lock, and leak the lock, causing later deadlocks. Ensure we always release the bucket lock when we exit pfsync_defer_tmo() PR: 268246 MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) (cherry picked from commit 844ad2828a35c434b893af4274b1f6c50332dd70) sys/netpfil/pf/if_pfsync.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=44af4103e422029026e9bb1a5784550365831cf3 commit 44af4103e422029026e9bb1a5784550365831cf3 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2023-03-20 13:29:55 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2023-03-27 08:16:03 +0000 pfsync: fix pfsync_undefer_state() locking pfsync_undefer_state() takes the bucket lock, but could get called from places (e.g. from pfsync_update_state() or pfsync_delete_state()) where we already held the lock. As it can also be called from places where we don't yet hold the lock create new locked variant for use when the lock is already held. Keep using pfsync_undefer_state() where the lock must still be taken. PR: 268246 MFC after: 1 week Sponsored by: Rubicon Communications, LLC (Netgate) (cherry picked from commit 53247cdf12449e90f6736ae563e4cce8315c923f) sys/netpfil/pf/if_pfsync.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)