Created attachment 184239 [details] core.txt I've encountered this panic now for the 10th time. After enabling pf and loading the rules, it takes a couple of days (sometimes 2, another time 8), to hit this bug. Also had this bug on FreeBSD 10.1. ============================================================================ Fatal trap 12: page fault while in kernel mode cpuid = 3; apic id = 03 fault virtual address = 0x28 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80a0c71b stack pointer = 0x28:0xfffffe0000359fa0 frame pointer = 0x28:0xfffffe0000359fb0 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq269: igb0:que 3) trap number = 12 panic: page fault cpuid = 3 KDB: stack backtrace: #0 0xffffffff8098e7e0 at kdb_backtrace+0x60 #1 0xffffffff809514b6 at vpanic+0x126 #2 0xffffffff80951383 at panic+0x43 #3 0xffffffff80d5646b at trap_fatal+0x36b #4 0xffffffff80d5676d at trap_pfault+0x2ed #5 0xffffffff80d55dea at trap+0x47a #6 0xffffffff80d3bdb2 at calltrap+0x8 #7 0xffffffff80995ecc at kvprintf+0xf9c #8 0xffffffff8099692d at _vprintf+0x8d #9 0xffffffff80994c2c at log+0x5c #10 0xffffffff80b24a17 at ip6_forward+0x107 #11 0xffffffff819d94be at pf_refragment6+0x16e #12 0xffffffff819cb1f3 at pf_test6+0x1023 #13 0xffffffff819d315d at pf_check6_out+0x1d #14 0xffffffff80a25354 at pfil_run_hooks+0x84 #15 0xffffffff820187db at bridge_pfil+0x25b #16 0xffffffff8201970b at bridge_broadcast+0x22b #17 0xffffffff820193ef at bridge_forward+0x20f Thanks, Jeroen
Can you describe the setup of the machine? (Is it a gateway? Where does it route to? ...)
This server is running in the DMZ, so it receives all kinds of spooky internet traffic. It only speaks IPv4 to the outside world though. It has a lagg0 interface running in failover mode on top of igb0 and igb1. These are the only two physical interfaces. Around 20 jails have a virtual IP on this lagg interface. There's a vlan0 virtual interface sending/receiving tagged traffic for VLAN 178. This is the route towards the internet. I'm using bridge1 for attaching Bhyve VM's. Besides one to three tap interfaces, lagg0 is also a member of this bridge (which itself has no IP). Furthermore there are two OpenVPN P2P tun interfaces running, and a tun for OpenVPN server. Besides lo0 there's an additional loopback interface with an RFC1918 address. With regard to forwarding, it only routes traffic for either the OpenVPN clients, for which it also masquarades the source IP. It also forwards packets between the local jails and the P2P tun interfaces. In this case nothing is NATted. When the jails go to the internet however, it SNATs to the lagg0 IP. Let me know if you need any more extensive/detailed information.
As the panic occurs in a pf_test6() path I'd expect you to be able to work around the project by dropping all incoming and outgoing v6 traffic. Sadly the v6 fragment code will not work as expected on a transparent bridge. Fixing that is on my (long-term) todo list. I'll still try to see if there's something I can do about this panic, but getting it to work correctly in transparent bridge mode is not going to happen short-term.
Okay, so the problem is that rcvif is NULL in the mbuf, but we try to log the receiving interface name in an error log, because the forwarding is incorrect. pfSense dealt with this problem too: https://github.com/pfsense/FreeBSD-src/commit/4204c9f01d2ab439f6e0b9454ab22d4ffcca8cc4 I think the pfSense fix is incorrect, because ifp is the outbound interface, not the receiving interface. It does address the panic, because m->m_pkthdr.rcvif is no longer NULL. The log message is probably wrong, but that's better than a panic. I think it'd be better to set rcvif to the original rcvif, as so: diff --git a/sys/netpfil/pf/pf_norm.c b/sys/netpfil/pf/pf_norm.c index 81ef54d47a0..45e7b938d40 100644 --- a/sys/netpfil/pf/pf_norm.c +++ b/sys/netpfil/pf/pf_norm.c @@ -731,6 +731,7 @@ pf_refragment6(struct ifnet *ifp, struct mbuf **m0, struct m_tag *mtag) struct mbuf *m = *m0, *t; struct pf_fragment_tag *ftag = (struct pf_fragment_tag *)(mtag + 1); struct pf_pdesc pd; + struct ifnet *rcvif; uint32_t frag_id; uint16_t hdrlen, extoff, maxlen; uint8_t proto; @@ -776,6 +777,7 @@ pf_refragment6(struct ifnet *ifp, struct mbuf **m0, struct m_tag *mtag) error = ip6_fragment(ifp, m, hdrlen, proto, maxlen, frag_id); m = (*m0)->m_nextpkt; (*m0)->m_nextpkt = NULL; + rcvif = (*m0)->m_pkthdr.rcvif; if (error == 0) { /* The first mbuf contains the unfragmented packet. */ m_freem(*m0); @@ -789,6 +791,7 @@ pf_refragment6(struct ifnet *ifp, struct mbuf **m0, struct m_tag *mtag) for (t = m; m; m = t) { t = m->m_nextpkt; m->m_nextpkt = NULL; + m->m_pkthdr.rcvif = rcvif; m->m_flags |= M_SKIP_FIREWALL; memset(&pd, 0, sizeof(pd)); pd.pf_mtag = pf_find_mtag(m); This is mostly untested though. Also note that this just fixes the panic, not the underlying issue with IP6 fragments on filtering bridges.
On IRC I got some aid to prevent the kernel panic, which is to disable scrubbing. It's now been running stable for a few months. I can live without using scrub. Though I'll see if I can find some time to test your proposed workaround.
This should be fixed with the introduction of PFIL_FWD. Please reopen if you can reproduce this on 12.0.