Bug 240944 - em(4): Crash with Intel 82571EB NIC with AMD Piledriver and Steamroller APUs
Summary: em(4): Crash with Intel 82571EB NIC with AMD Piledriver and Steamroller APUs
URL: https://www.reddit.com/r/PFSENSE/comm...
Keywords: IntelNetworking, crash, needs-qa
Reported: 2019-09-30 16:50 UTC by tinfever
Modified: 2020-05-08 23:09 UTC (History)
koobs: mfc-stable12?
Description tinfever 2019-09-30 16:50:46 UTC
When using an Intel 82571EB quad-port gigabit NIC (HP NC364T) in a system with an AMD Piledriver or Steamroller APU, any taxing Ethernet workload will cause the entire system to either permanently lock up (requiring a hard reboot by holding down the power button to reset), or crash and reboot.

Initial discussion of this issue can be found in the pfSense subreddit: https://www.reddit.com/r/PFSENSE/comments/da6nh7/multiport_intel_82571ebbased_network_cards_not/

I have confirmed this issue is present even on the FreeBSD 12.0 release so it is not an issue introduced by pfSense or OPNsense.

Hardware used for testing and reproduction:

HP T730 thin client
AMD RX-427BB APU (4 core)
HP NC364T Network Controller (Uses Intel 82571EB chipset)

Steps to reproduce:

1. Download "FreeBSD-12.0-RELEASE-amd64-mini-memstick.img"
2. Used Etcher to load to flash drive.
3. Boot up in UEFI or Legacy to flash drive
4. Run as live CD instead of installer
5. Run "dhclient em3" to get network access.
6. Run command "fetch http://speedtest.tele2.net/1GB.zip -o /dev/null"
7. Crash happens within the first 60 seconds.

When it crashes, the system goes 100% unresponsive. No amount of Ctl-Alt-Del will do anything. I have to hold power button to do a hard reboot. System has been left in unresponsive state for 12+ hours with no change. There are no log entries or bug checks present (that I have seen)

Troubleshooting done:

1. Issue reoccurs using both UEFI and Legacy boot methods
2. Issue reoccurs with onboard NIC disabled
3. Issue reoccurs using included 7.6.1 and latest 7.7.5 Intel drivers
4. Issue reoccurs when using either USB or SSD boot media
5. Hardware works perfectly fine when booting to Ubuntu live USB for testing
6. Hardware passed memtest64+ @ 4 passes

Other user reports (see Reddit discussion link) indicate that:

1. Issue reoccurs with on every NC364T card when testing several of them
2. Issue occurs on NC360T (dual-port Intel 82571EB NIC)
3. Issue does not occur with NC112T (Intel 82574L-based single port NIC)
4. Issue occurs with other versions of pfSense including 2.1.5, 2.2.4, 2.3.3, and 2.5.0 (20190928). I have not looked up the correlation between pfSense and FreeBSD versions.

I'm no Linux/Unix guru, nor do I have very much time I can allocate to seeing this through to a resolution, but I will try to assist in any testing or data collection needed to hopefully address this issue. Thanks.
Comment 1 Krzysztof Galazka 2019-09-30 19:11:30 UTC
(In reply to tinfever from comment #0)
Could you, please, provide output from: pciconf -l -vbc and dmesg for that NIC?
Comment 2 Ron 2019-10-01 03:47:28 UTC
Created attachment 207973 [details]
pciconf -l -vbc and dmesg from AMD Piledriver APU and HP NC364T NIC

pciconf -l -vbc and dmesg as requested in comment #1
Comment 3 tinfever 2019-10-01 05:57:04 UTC
Created attachment 207977 [details]
pciconf -l -vbc from NC364T and AMD RX-427BB

Attached copy of requested output of "pciconf -l -vbc" on system using NC364T and AMD RX-427BB.
Comment 4 tinfever 2019-10-01 06:01:30 UTC
Created attachment 207978 [details]
dmesg from from NC364T and AMD RX-427BB Steamroller

Attached copy of requested output of dmesg on system using NC364T and AMD RX-427BB.

I had surprising difficulty getting these logs off the machine since even an ssh session seems to be enough to crash everything sometimes. I've also noticed that if you catch it crashing and start mashing buttons on the keyboard, you can see it register the key presses really slowly for a second until it eventually registers nothing at all.
Comment 5 bhertenstein97 2020-05-08 23:09:37 UTC
Dropping in to state that I am seeing the same issue with my HP t730 and both a HP NC364T and NC360T. I followed the same reproduction steps as tinfever but with the 12.1 release. Also tested with pfSense 2.5 (based on FreeBSD 12.1) running as an iperf server. 

I do also see this with pfSense 2.4.5 (based on FreeBSD 11.3).

If any further information is needed, please let me know.