Bug 220611 - Fatal trap 12: page fault while in kernel mode (ip6_forward -> log)
Summary: Fatal trap 12: page fault while in kernel mode (ip6_forward -> log)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Kristof Provost
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-10 19:37 UTC by Jeroen Schutrup
Modified: 2019-06-04 11:21 UTC (History)
1 user (show)

See Also:


Attachments
core.txt (596.85 KB, text/plain)
2017-07-10 19:37 UTC, Jeroen Schutrup
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeroen Schutrup 2017-07-10 19:37:22 UTC
Created attachment 184239 [details]
core.txt

I've encountered this panic now for the 10th time. After enabling pf and loading the rules, it takes a couple of days (sometimes 2, another time 8), to hit this bug. Also had this bug on FreeBSD 10.1.


============================================================================
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x28
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80a0c71b
stack pointer           = 0x28:0xfffffe0000359fa0
frame pointer           = 0x28:0xfffffe0000359fb0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq269: igb0:que 3)
trap number             = 12
panic: page fault
cpuid = 3
KDB: stack backtrace:
#0 0xffffffff8098e7e0 at kdb_backtrace+0x60
#1 0xffffffff809514b6 at vpanic+0x126
#2 0xffffffff80951383 at panic+0x43
#3 0xffffffff80d5646b at trap_fatal+0x36b
#4 0xffffffff80d5676d at trap_pfault+0x2ed
#5 0xffffffff80d55dea at trap+0x47a
#6 0xffffffff80d3bdb2 at calltrap+0x8
#7 0xffffffff80995ecc at kvprintf+0xf9c
#8 0xffffffff8099692d at _vprintf+0x8d
#9 0xffffffff80994c2c at log+0x5c
#10 0xffffffff80b24a17 at ip6_forward+0x107
#11 0xffffffff819d94be at pf_refragment6+0x16e
#12 0xffffffff819cb1f3 at pf_test6+0x1023
#13 0xffffffff819d315d at pf_check6_out+0x1d
#14 0xffffffff80a25354 at pfil_run_hooks+0x84
#15 0xffffffff820187db at bridge_pfil+0x25b
#16 0xffffffff8201970b at bridge_broadcast+0x22b
#17 0xffffffff820193ef at bridge_forward+0x20f


Thanks,
Jeroen
Comment 1 Kristof Provost freebsd_committer freebsd_triage 2017-07-11 07:56:01 UTC
Can you describe the setup of the machine? (Is it a gateway? Where does it route to? ...)
Comment 2 Jeroen Schutrup 2017-07-11 17:56:22 UTC
This server is running in the DMZ, so it receives all kinds of spooky internet traffic. It only speaks IPv4 to the outside world though. It has a lagg0 interface running in failover mode on top of igb0 and igb1. These are the only two physical interfaces. Around 20 jails have a virtual IP on this lagg interface. There's a vlan0 virtual interface sending/receiving tagged traffic for VLAN 178. This is the route towards the internet. I'm using bridge1 for attaching Bhyve VM's. Besides one to three tap interfaces, lagg0 is also a member of this bridge (which itself has no IP). Furthermore there are two OpenVPN P2P tun interfaces running, and a tun for OpenVPN server. Besides lo0 there's an additional loopback interface with an RFC1918 address.

With regard to forwarding, it only routes traffic for either the OpenVPN clients, for which it also masquarades the source IP. It also forwards packets between the local jails and the P2P tun interfaces. In this case nothing is NATted. When the jails go to the internet however, it SNATs to the lagg0 IP.

Let me know if you need any more extensive/detailed information.
Comment 3 Kristof Provost freebsd_committer freebsd_triage 2017-07-11 18:12:30 UTC
As the panic occurs in a pf_test6() path I'd expect you to be able to work around the project by dropping all incoming and outgoing v6 traffic.

Sadly the v6 fragment code will not work as expected on a transparent bridge. Fixing that is on my (long-term) todo list. I'll still try to see if there's something I can do about this panic, but getting it to work correctly in transparent bridge mode is not going to happen short-term.
Comment 4 Kristof Provost freebsd_committer freebsd_triage 2017-07-11 21:01:01 UTC
Okay, so the problem is that rcvif is NULL in the mbuf, but we try to log the receiving interface name in an error log, because the forwarding is incorrect.

pfSense dealt with this problem too:
https://github.com/pfsense/FreeBSD-src/commit/4204c9f01d2ab439f6e0b9454ab22d4ffcca8cc4

I think the pfSense fix is incorrect, because ifp is the outbound interface, not the receiving interface. It does address the panic, because m->m_pkthdr.rcvif is no longer NULL. The log message is probably wrong, but that's better than a panic.

I think it'd be better to set rcvif to the original rcvif, as so:

diff --git a/sys/netpfil/pf/pf_norm.c b/sys/netpfil/pf/pf_norm.c
index 81ef54d47a0..45e7b938d40 100644
--- a/sys/netpfil/pf/pf_norm.c
+++ b/sys/netpfil/pf/pf_norm.c
@@ -731,6 +731,7 @@ pf_refragment6(struct ifnet *ifp, struct mbuf **m0, struct m_tag *mtag)
        struct mbuf             *m = *m0, *t;
        struct pf_fragment_tag  *ftag = (struct pf_fragment_tag *)(mtag + 1);
        struct pf_pdesc          pd;
+       struct ifnet    *rcvif;
        uint32_t                 frag_id;
        uint16_t                 hdrlen, extoff, maxlen;
        uint8_t                  proto;
@@ -776,6 +777,7 @@ pf_refragment6(struct ifnet *ifp, struct mbuf **m0, struct m_tag *mtag)
        error = ip6_fragment(ifp, m, hdrlen, proto, maxlen, frag_id);
        m = (*m0)->m_nextpkt;
        (*m0)->m_nextpkt = NULL;
+       rcvif = (*m0)->m_pkthdr.rcvif;
        if (error == 0) {
                /* The first mbuf contains the unfragmented packet. */
                m_freem(*m0);
@@ -789,6 +791,7 @@ pf_refragment6(struct ifnet *ifp, struct mbuf **m0, struct m_tag *mtag)
        for (t = m; m; m = t) {
                t = m->m_nextpkt;
                m->m_nextpkt = NULL;
+               m->m_pkthdr.rcvif = rcvif;
                m->m_flags |= M_SKIP_FIREWALL;
                memset(&pd, 0, sizeof(pd));
                pd.pf_mtag = pf_find_mtag(m);

This is mostly untested though.
Also note that this just fixes the panic, not the underlying issue with IP6 fragments on filtering bridges.
Comment 5 Jeroen Schutrup 2017-11-04 10:25:33 UTC
On IRC I got some aid to prevent the kernel panic, which is to disable scrubbing. It's now been running stable for a few months. I can live without using scrub. Though I'll see if I can find some time to test your proposed workaround.
Comment 6 Kristof Provost freebsd_committer freebsd_triage 2019-06-04 11:21:08 UTC
This should be fixed with the introduction of PFIL_FWD. Please reopen if you can reproduce this on 12.0.