| Summary: | Fatal trap 12: page fault while in kernel mode (ip6_forward -> log) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Jeroen Schutrup <jeroenschutrup> | ||||
| Component: | kern | Assignee: | Kristof Provost <kp> | ||||
| Status: | Closed FIXED | ||||||
| Severity: | Affects Only Me | CC: | kp | ||||
| Priority: | --- | ||||||
| Version: | 10.3-RELEASE | ||||||
| Hardware: | amd64 | ||||||
| OS: | Any | ||||||
| Attachments: |
|
||||||
|
Description
Jeroen Schutrup
2017-07-10 19:37:22 UTC
Can you describe the setup of the machine? (Is it a gateway? Where does it route to? ...) This server is running in the DMZ, so it receives all kinds of spooky internet traffic. It only speaks IPv4 to the outside world though. It has a lagg0 interface running in failover mode on top of igb0 and igb1. These are the only two physical interfaces. Around 20 jails have a virtual IP on this lagg interface. There's a vlan0 virtual interface sending/receiving tagged traffic for VLAN 178. This is the route towards the internet. I'm using bridge1 for attaching Bhyve VM's. Besides one to three tap interfaces, lagg0 is also a member of this bridge (which itself has no IP). Furthermore there are two OpenVPN P2P tun interfaces running, and a tun for OpenVPN server. Besides lo0 there's an additional loopback interface with an RFC1918 address. With regard to forwarding, it only routes traffic for either the OpenVPN clients, for which it also masquarades the source IP. It also forwards packets between the local jails and the P2P tun interfaces. In this case nothing is NATted. When the jails go to the internet however, it SNATs to the lagg0 IP. Let me know if you need any more extensive/detailed information. As the panic occurs in a pf_test6() path I'd expect you to be able to work around the project by dropping all incoming and outgoing v6 traffic. Sadly the v6 fragment code will not work as expected on a transparent bridge. Fixing that is on my (long-term) todo list. I'll still try to see if there's something I can do about this panic, but getting it to work correctly in transparent bridge mode is not going to happen short-term. Okay, so the problem is that rcvif is NULL in the mbuf, but we try to log the receiving interface name in an error log, because the forwarding is incorrect. pfSense dealt with this problem too: https://github.com/pfsense/FreeBSD-src/commit/4204c9f01d2ab439f6e0b9454ab22d4ffcca8cc4 I think the pfSense fix is incorrect, because ifp is the outbound interface, not the receiving interface. It does address the panic, because m->m_pkthdr.rcvif is no longer NULL. The log message is probably wrong, but that's better than a panic. I think it'd be better to set rcvif to the original rcvif, as so: diff --git a/sys/netpfil/pf/pf_norm.c b/sys/netpfil/pf/pf_norm.c index 81ef54d47a0..45e7b938d40 100644 --- a/sys/netpfil/pf/pf_norm.c +++ b/sys/netpfil/pf/pf_norm.c @@ -731,6 +731,7 @@ pf_refragment6(struct ifnet *ifp, struct mbuf **m0, struct m_tag *mtag) struct mbuf *m = *m0, *t; struct pf_fragment_tag *ftag = (struct pf_fragment_tag *)(mtag + 1); struct pf_pdesc pd; + struct ifnet *rcvif; uint32_t frag_id; uint16_t hdrlen, extoff, maxlen; uint8_t proto; @@ -776,6 +777,7 @@ pf_refragment6(struct ifnet *ifp, struct mbuf **m0, struct m_tag *mtag) error = ip6_fragment(ifp, m, hdrlen, proto, maxlen, frag_id); m = (*m0)->m_nextpkt; (*m0)->m_nextpkt = NULL; + rcvif = (*m0)->m_pkthdr.rcvif; if (error == 0) { /* The first mbuf contains the unfragmented packet. */ m_freem(*m0); @@ -789,6 +791,7 @@ pf_refragment6(struct ifnet *ifp, struct mbuf **m0, struct m_tag *mtag) for (t = m; m; m = t) { t = m->m_nextpkt; m->m_nextpkt = NULL; + m->m_pkthdr.rcvif = rcvif; m->m_flags |= M_SKIP_FIREWALL; memset(&pd, 0, sizeof(pd)); pd.pf_mtag = pf_find_mtag(m); This is mostly untested though. Also note that this just fixes the panic, not the underlying issue with IP6 fragments on filtering bridges. On IRC I got some aid to prevent the kernel panic, which is to disable scrubbing. It's now been running stable for a few months. I can live without using scrub. Though I'll see if I can find some time to test your proposed workaround. This should be fixed with the introduction of PFIL_FWD. Please reopen if you can reproduce this on 12.0. |