Bug 272153

Summary: [pf] [pfsync] Incomplete state sync causing null pointer dereference
Product: Base System Reporter: adam.stradtner
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Closed Unable to Reproduce    
Severity: Affects Only Me CC: franco, freebsd, kevans, kp
Priority: ---    
Version: 13.1-RELEASE   
Hardware: amd64   
OS: Any   

Description adam.stradtner 2023-06-22 22:10:41 UTC
I have a pair of OPNsense firewalls, based on FreeBSD 13.1. They are configured as an HA pair with state synchronization via pfsync. I am experiencing random crashes stating:

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x0
fault code      = supervisor read data, page not present

One of the OPNsense maintainers looked at a core dump and found this cause being a null pointer dereference:

#17 0xffffffff8237ed0f in pf_test_state_udp (state=<optimized out>, state@entry=0xfffffe001099b828,
    direction=<optimized out>, kif=<optimized out>, kif@entry=0xfffff800245b3a00, m=m@entry=0xfffff801e9409800,
    off=20, h=<optimized out>, pd=pd@entry=0xfffffe001099b758) at /usr/src/sys/netpfil/pf/pf.c:5086
5086         if (PF_ANEQ(pd->src, &nk->addr[pd->sidx], pd->af) ||
(kgdb) list
5081   
5082      /* translate source/destination address, if necessary */
5083      if ((*state)->key[PF_SK_WIRE] != (*state)->key[PF_SK_STACK]) {
5084         struct pf_state_key *nk = (*state)->key[pd->didx];
5085   
5086         if (PF_ANEQ(pd->src, &nk->addr[pd->sidx], pd->af) ||
5087             nk->port[pd->sidx] != uh->uh_sport)
5088            pf_change_ap(m, pd->src, &uh->uh_sport, pd->ip_sum,
5089                &uh->uh_sum, &nk->addr[pd->sidx],
5090                nk->port[pd->sidx], 1, pd->af);
(kgdb) p nk
$10 = (struct pf_state_key *) 0x0

I subsequently disabled pfsync and that has resolved my crashes. It appears the state sync is bringing invalid states with it, which eventually causes a kernel panic.
Comment 1 Franco Fichtner 2023-06-23 05:42:56 UTC
Adam sent me a crash dump for further analysis that can be shared as well. Apparently it crashes where no sanity checking takes place in state inspection (so far UDP and ICMP seem to be affected but I wouldn't be surprised it's similar for TCP) leading to the assumption that states are not properly synced for one reason or another.  The pfsync code has the latest updates from stable/13.


Cheers,
Franco
Comment 2 Kristof Provost freebsd_committer freebsd_triage 2023-06-23 07:32:02 UTC
Can you reproduce this problem on FreeBSD?
Comment 3 Franco Fichtner 2023-06-23 07:33:14 UTC
Ok, let's make this longer than necessary shall we ;) Which kernel would you prefer? 13.1-RELEASE, 13.2-RELEASE or stable/13?
Comment 4 Kristof Provost freebsd_committer freebsd_triage 2023-06-23 07:46:32 UTC
(In reply to Franco Fichtner from comment #3)
main, ideally, but I'll take information for any supported branch.

So include the (minimised!) reproduction scenario, any required non-default settings and any additional information that can be extracted from the core dump (i.e. backtrace, local variables).
Comment 5 Franco Fichtner 2023-06-23 07:48:28 UTC
Let's start with 13.2-RELEASE then. Will report back.
Comment 6 adam.stradtner 2023-08-23 19:11:26 UTC
Update: Since rebooting both systems with their identical kernels, I have not been able to reproduce this issue. Somehow a corrupt state (or maybe several corrupt states) was being persisted by always having at least 1 firewall online. At this time I don't know how to reproduce the original issue, but would recommend that some thought be put into validating states that are being synchronized. It seems to be an edge case that I ran into.
Comment 7 Franco Fichtner 2023-11-22 11:48:50 UTC
Appears to be false positive, please close.
Comment 8 Kyle Evans freebsd_committer freebsd_triage 2023-11-22 18:26:09 UTC
(In reply to Franco Fichtner from comment #7)

Done- feel free to reopen if new information comes to light.