Bug 246951

Summary: carp(4): Active CARP member crashes: panic, trap_pfault, ip_input || ip_output when using ipSec, AES-NI (on Intel I350)
Product: Base System Reporter: freebsd-bugzilla
Component: kernAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Some People CC: crest, freqlabs, igor-fbsdbugs, markj, net
Priority: --- Keywords: crash, needs-qa, regression
Version: 11.3-STABLEFlags: markj: mfc-stable12-
markj: mfc-stable11+
Hardware: amd64   
OS: Any   
URL: https://forum.netgate.com/topic/151329/pfsense-active-carp-member-crashed-aesni_process-crypto_dispatch/12
Attachments:
Description Flags
Compilation of crash dump information
none
Compilation of crash dump information with msgbuf.txt appended to each crash
none
dmesg.boot written out following the last kernel panic
none
output from the ifconfig command with public ips obfuscated
none
interfaces and LAGGs specified in pfSense config.xml
none
sysctl tunables from pfSense config.xml none

Description freebsd-bugzilla 2020-06-03 09:46:56 UTC
Created attachment 215186 [details]
Compilation of crash dump information

Hello,

We're using FreeBSD 11.3 Stable (pfSense). We have encountered numerous kernel panics during recent months (see attached). They are all trap_pfault and usually feature ip_input or ip_output.

The pfSense boxes are in a CARP configuration (the crash is always on the active firewall). We are using IPSec (AES256-GCM, SHA256) and AES-NI. Many but not all of the stack traces include IPSec and AES-NI calls.All interfaces are using switch independent failover LAGG.

Using FreeBSD 11.2-RELEASE-p3 without producing this problem. 

There is one other person who appears to have the same problem: https://forum.netgate.com/topic/151329/pfsense-active-carp-member-crashed-aesni_process-crypto_dispatch/12

Hardware is as follows:
* Dell PowerEdge R330
* Intel Xeon E3-1270 v5
* Intel I350 Quad Port 1GbE network cards (x2)

We are just wondering if you can give any pointers as to the potential cause/resolution?

Thank you
Comment 1 Mark Johnston freebsd_committer 2020-06-03 14:44:41 UTC
I looked at the ip_input() and ip_output() faults.  ip_input() is crashing here:

 709   if (ifp != NULL && ifp->if_flags & IFF_BROADCAST) {                                                                              
 710           IF_ADDR_RLOCK(ifp);
 711           TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 712                   if (ifa->ifa_addr->sa_family != AF_INET) <- null ifa_addr
 713                           continue;

and ip_output() is crashing here:

 678                 if (error == 0) {                                                                                                        
 679                         /* Record statistics for this interface address. */
 680                         if (ia != NULL) {
 681                                 counter_u64_add(ia->ia_ifa.ifa_opackets, 1);
 682                                 counter_u64_add(ia->ia_ifa.ifa_obytes,
 683                                     m->m_pkthdr.len);
 684                         }

So there is some race involving interface address changes.

It would be useful to see the dmesg leading up to the panic.
Comment 2 freebsd-bugzilla 2020-06-04 09:42:26 UTC
Created attachment 215217 [details]
Compilation of crash dump information with msgbuf.txt appended to each crash
Comment 3 freebsd-bugzilla 2020-06-04 09:52:01 UTC
The textdump.tar created with each kernel panic contains a file called msgbuf.txt which looks like the content in /var/run/dmesg.boot, so I've appended that to the crash dump compilation.

Apologies, it is not brief.

There earlier dumps do show an IP address flapping between two different MAC addresses - but that is absent from the later dumps.

Thanks for the assistance so far.
Comment 4 Kubilay Kocak freebsd_committer freebsd_triage 2020-06-07 01:30:33 UTC
@Reporter Could you please add additional information, including:

- /var/run/dmesg.boot output (as an attachment)
- /etc/rc.conf network configuration (minimal reproducer configuration, sanitised where necessary, as an attachment)
- Details of reproducibility changes isolating any of the components of the configuration (ipsec, different interface, not incl lagg(4) etc)
- Any indications of events/traffic/workloads that might be relevent to triggering the crash
Comment 5 freebsd-bugzilla 2020-06-08 11:01:46 UTC
Created attachment 215352 [details]
dmesg.boot written out following the last kernel panic
Comment 6 freebsd-bugzilla 2020-06-08 11:02:40 UTC
Created attachment 215353 [details]
output from the ifconfig command with public ips obfuscated
Comment 7 freebsd-bugzilla 2020-06-08 11:03:32 UTC
Created attachment 215354 [details]
interfaces and LAGGs specified in pfSense config.xml
Comment 8 freebsd-bugzilla 2020-06-08 11:04:09 UTC
Created attachment 215355 [details]
sysctl tunables from pfSense config.xml
Comment 9 freebsd-bugzilla 2020-06-08 11:51:36 UTC
(In reply to Kubilay Kocak from comment #4)

Thank you.

I've attached dmesg.boot output.

This is pfSense so there is no rc.conf. I've attached instead:
 - ifconfig output
 - interfaces and LAGGs specified in pfSense config.xml
 - sysctl tunables specified in pfSense config.xml

The uptime between crashes varies between 2 and 30 days. It does not seem to correlate to any specific event that we are aware of or even peak throughput. The only additional package installed on these firewalls is NRPE.

In terms of workload:
 - HTTP/s traffic too and from customers
 - TCP load blancing of customer HTTP/s with 10 pools, 4 virtual servers per pool. Total of around 1.5 - 2 million active sessions
 - ipSec site-to-site tunnel for replication to our standby data centre
 - CARP / pfSync with bandwith/packet rates of 22-80 Mb/s, 2-8 Kpps
 - AES-NI enabled for IpSec (AES256-GCM)

The firewalls are handling:
 - 20-45 Mb/s (13-45 Kpps) inbound ipSec
 - 30-150 Mb/s (14-55 Kpps) outbound ipSec
 - 20-90 Mb/s (15-60 Kpps) inbound IP traffic
 - 50-250 Mb/s (15-60 Kpps) outbound IP traffic
 - 30-90k states
 - ~66k Mbuf Clusters utilised (out of 1M total)

The only other thing of note, that I can think of, is that we have a Cassandra cluster replicating over the IpSec tunnel. That's around 256 constantly changing states as data is replicated from one data centre to another.

We have now disabled IpSec and switched to OpenVPN for the site-to-site VPN, in order to see whether the crash is reproducable without IpSec

Additionally, I had setup a couple of FreeBSD 11.3 VMs with a site-to-site IpSec connection. I had continuous iperf running over the tunnel for 7 days without issue.

If there is any further information that I can provide, or anything I can do to assist, please don't hesitate.
Comment 10 Mark Johnston freebsd_committer 2020-06-09 15:15:00 UTC
(In reply to freebsd-bugzilla from comment #9)
So from the panic sites it looks like a linked ifaddr structure has been freed.  (Or zeroed, but I cannot see any code which does that.)  One possible cause that we can try to rule out is a refcount overflow.  You said that the uptime between panics is at least several days - is that consistently the case?  In other words, have you ever seen back-to-back crashes in a short timeframe?

I'm not sure if pfsense enables DTrace, but if so we can try to probe ifa_ref(), which is helpfully not inlined.  Please try this command (you might need to kldload dtraceall first):

# dtrace -n 'fbt::ifa_ref:entry {@[args[0]->ifa_refcnt] = count();} tick-300s {exit(0);}'

It will sleep for 300s and dump a table of reference counts.
Comment 11 igor-fbsdbugs 2020-06-25 15:57:44 UTC
unfortunately dtrace is not available on pfsense. is there another way to get the refcounts?

I can confirm, however, that it does take a few days for this crash to happen. usually within 4-8 days, but never immediately.

I have a handful of crash backtraces from previous crashes.
I will paste the first 3 few lines from each. let me know if you need the entire dumps:

Tracing pid 12 tid 100285 td 0xfffff8000ccf6620
ip_input() at ip_input+0x60e/frame 0xfffffe0451afee70
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451afeec0
ether_demux() at ether_demux+0x173/frame 0xfffffe0451afeef0


Tracing pid 12 tid 100263 td 0xfffff8000c88f620
ip_output() at ip_output+0x1418/frame 0xfffffe0451a15a60
ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe0451a15ab0
esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe0451a15b10


db:0:kdb.enter.default>  bt
Tracing pid 12 tid 100283 td 0xfffff8000c67b000
ip_output() at ip_output+0x1418/frame 0xfffffe0451af4a60
ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe0451af4ab0
esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe0451af4b10


Tracing pid 12 tid 100270 td 0xfffff8000c8cf000
ip_input() at ip_input+0x60e/frame 0xfffffe0451a38e70
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451a38ec0
ether_demux() at ether_demux+0x173/frame 0xfffffe0451a38ef0


Tracing pid 12 tid 100267 td 0xfffff8000c3f3620
ip_input() at ip_input+0x60e/frame 0xfffffe0451a29e70
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe0451a29ec0
ether_demux() at ether_demux+0x173/frame 0xfffffe0451a29ef0


Tracing pid 18 tid 100138 td 0xfffff800085d1620
ip_output() at ip_output+0x1418/frame 0xfffffe044f4181c0
ipsec_process_done() at ipsec_process_done+0x1c8/frame 0xfffffe044f418210
esp_output_cb() at esp_output_cb+0xeb/frame 0xfffffe044f418270


Tracing pid 12 tid 100264 td 0xfffff8000c15f000
in_broadcast() at in_broadcast+0x43/frame 0xfffffe0451a1ac40
ip_output() at ip_output+0x7be/frame 0xfffffe0451a1ad70
ip_forward() at ip_forward+0x2b5/frame 0xfffffe0451a1ae10

our pfsense host is relatively busy pushing about 300mbit of traffic through, with about half of it to the ipsec tunnel.
Comment 12 freebsd-bugzilla 2020-06-30 09:01:28 UTC
Hello,

Just to add we haven't seen a crash for 28 days. We have switched out IpSec for OpenVPN.

The issue looks to be fairly conclusively IpSec related.
Thanks
Comment 13 Mark Johnston freebsd_committer 2020-06-30 14:08:42 UTC
(In reply to freebsd-bugzilla from comment #12)
I see at least one such bug in ip_forward() in stable/11: in the IPSEC_ENABLED case there is a missing ifa_free() call.  Do you have IPv4 forwarding enabled, or do you use any forward actions in your firewall rule set.
Comment 14 igor-fbsdbugs 2020-06-30 15:58:04 UTC
sorry, it is not clear whom you are referring to. 
we might be dealing with two(or more) distinct bugs here.
in my case ipv4 forwarding is enabled (net.inet.ip.forwarding=1)
and I see ip_forward() in 50% of the backtraces I get when pfsense crashes.
Comment 15 Ryan Moeller freebsd_committer 2020-06-30 16:04:12 UTC
I have a FreeNAS bug report where ifa_addr is NULL on 11.3, too. I was suspecting some local patches we have for dealing with various IGMP / multicast issues, but I did not find anything in auditing those patches. I haven't found any way to explain ifa_addr being NULL in an ifma that is still linked into a list.

In my user's case the panic is cropping up in igmp_fasttimo. Though the circumstances are different, I wonder if these may be caused by the same underlying problem.

One thing I have thought to double-check next is making sure that the maddr locks match up with the correct ifp everywhere the list is modified.
Comment 16 Mark Johnston freebsd_committer 2020-06-30 16:15:33 UTC
(In reply to igor-fbsdbugs from comment #14)
Oops, I did not realize that there are multiple reporters.  I posted a patch for HEAD here: https://reviews.freebsd.org/D25519 .  There is an analogous bug in stable/11 and 12.  If forwarding is enabled (default in pfsense) and ipsec is in use, then it is possible that this bug is responsible for the crashes I was looking at in comment 1.

(In reply to Ryan Moeller from comment #15)
If you're able to use DTrace to track ifa refs, then you might try looking for leaks as suggested in comment 10.  Your case sounds like a different bug though.
Comment 17 commit-hook freebsd_committer 2020-07-01 15:43:27 UTC
A commit references this bug:

Author: markj
Date: Wed Jul  1 15:42:49 UTC 2020
New revision: 362840
URL: https://svnweb.freebsd.org/changeset/base/362840

Log:
  Fix a possible next-hop refcount leak when handling IPSec traffic.

  It may be possible to fix this by deferring the lookup, but let's
  keep the initial change simple to make MFCs easier.

  PR:		246951
  Reviewed by:	melifaro
  MFC after:	1 week
  Sponsored by:	The FreeBSD Foundation
  Differential Revision:	https://reviews.freebsd.org/D25519

Changes:
  head/sys/netinet/ip_input.c
Comment 18 commit-hook freebsd_committer 2020-07-08 13:41:19 UTC
A commit references this bug:

Author: markj
Date: Wed Jul  8 13:40:28 UTC 2020
New revision: 363009
URL: https://svnweb.freebsd.org/changeset/base/363009

Log:
  MFC r362840:
  Fix a possible refcount leak when handling IPSec traffic.

  PR:	246951

Changes:
  stable/11/sys/netinet/ip_input.c
Comment 19 Mark Johnston freebsd_committer 2020-07-08 13:43:26 UTC
I'm going to resolve this for now since I suspect that the patch fixes the original bug.  I'll give a heads-up to the pfsense folks so that it can be integrated there.
Comment 20 Mark Johnston freebsd_committer 2020-07-08 15:05:43 UTC
Note, the change was intentionally not merged to stable/12, as the bug does not exist there.  In particular, if you are seeing these panics on a stable/12-based branch, then the culprit lies elsewhere.