When using an Intel 82571EB quad-port gigabit NIC (HP NC364T) in a system with an AMD Piledriver or Steamroller APU, any taxing Ethernet workload will cause the entire system to either permanently lock up (requiring a hard reboot by holding down the power button to reset), or crash and reboot.
Initial discussion of this issue can be found in the pfSense subreddit: https://www.reddit.com/r/PFSENSE/comments/da6nh7/multiport_intel_82571ebbased_network_cards_not/
I have confirmed this issue is present even on the FreeBSD 12.0 release so it is not an issue introduced by pfSense or OPNsense.
Hardware used for testing and reproduction:
HP T730 thin client
AMD RX-427BB APU (4 core)
HP NC364T Network Controller (Uses Intel 82571EB chipset)
Steps to reproduce:
1. Download "FreeBSD-12.0-RELEASE-amd64-mini-memstick.img"
2. Used Etcher to load to flash drive.
3. Boot up in UEFI or Legacy to flash drive
4. Run as live CD instead of installer
5. Run "dhclient em3" to get network access.
6. Run command "fetch http://speedtest.tele2.net/1GB.zip -o /dev/null"
7. Crash happens within the first 60 seconds.
When it crashes, the system goes 100% unresponsive. No amount of Ctl-Alt-Del will do anything. I have to hold power button to do a hard reboot. System has been left in unresponsive state for 12+ hours with no change. There are no log entries or bug checks present (that I have seen)
1. Issue reoccurs using both UEFI and Legacy boot methods
2. Issue reoccurs with onboard NIC disabled
3. Issue reoccurs using included 7.6.1 and latest 7.7.5 Intel drivers
4. Issue reoccurs when using either USB or SSD boot media
5. Hardware works perfectly fine when booting to Ubuntu live USB for testing
6. Hardware passed memtest64+ @ 4 passes
Other user reports (see Reddit discussion link) indicate that:
1. Issue reoccurs with on every NC364T card when testing several of them
2. Issue occurs on NC360T (dual-port Intel 82571EB NIC)
3. Issue does not occur with NC112T (Intel 82574L-based single port NIC)
4. Issue occurs with other versions of pfSense including 2.1.5, 2.2.4, 2.3.3, and 2.5.0 (20190928). I have not looked up the correlation between pfSense and FreeBSD versions.
I'm no Linux/Unix guru, nor do I have very much time I can allocate to seeing this through to a resolution, but I will try to assist in any testing or data collection needed to hopefully address this issue. Thanks.
(In reply to tinfever from comment #0)
Could you, please, provide output from: pciconf -l -vbc and dmesg for that NIC?
Created attachment 207973 [details]
pciconf -l -vbc and dmesg from AMD Piledriver APU and HP NC364T NIC
pciconf -l -vbc and dmesg as requested in comment #1
Created attachment 207977 [details]
pciconf -l -vbc from NC364T and AMD RX-427BB
Attached copy of requested output of "pciconf -l -vbc" on system using NC364T and AMD RX-427BB.
Created attachment 207978 [details]
dmesg from from NC364T and AMD RX-427BB Steamroller
Attached copy of requested output of dmesg on system using NC364T and AMD RX-427BB.
I had surprising difficulty getting these logs off the machine since even an ssh session seems to be enough to crash everything sometimes. I've also noticed that if you catch it crashing and start mashing buttons on the keyboard, you can see it register the key presses really slowly for a second until it eventually registers nothing at all.
Dropping in to state that I am seeing the same issue with my HP t730 and both a HP NC364T and NC360T. I followed the same reproduction steps as tinfever but with the 12.1 release. Also tested with pfSense 2.5 (based on FreeBSD 12.1) running as an iperf server.
I do also see this with pfSense 2.4.5 (based on FreeBSD 11.3).
If any further information is needed, please let me know.
Same issue happens to me :| I am running it on HP T730 with HP NC365T Network Controller. 32GB SSD, 2x4GB RAM (brand new)
Trying to make it work with pfSense 2.4.5 and 2.5 (FreeBSD 12.2-Stable). Changed different RAM sticks, SSDs, NICs. The only thing I have not changed is the CPU.
Works for about an hour maybe less and then becomes unresponsive and required hard reboot.
Let me know if you found any solution/workaround or I need to repurpose the box to something else.
Stumbled upon the same issue today. Took the card out and it works fine again. Happy to provide any details if necessary.
(In reply to Ace from comment #7)
Please do, I'd like to see the output of 'ifconfig em0' and a 'dmesg'.
Created attachment 227688 [details]
Created attachment 227689 [details]
ifconfig em0 output
Created attachment 227690 [details]
(In reply to Kevin Bowling from comment #8)
Thanks, I've attached them. I've also included pciconf -l -vbc output for good measure as asked in comment #1
(In reply to Ace from comment #12)
> ecap 0001 = AER 1 0 fatal 1 non-fatal 2 corrected
You have some fatal PCI errors occurring on the card, and that looks consistent with the other pciconf reports.. just to start with a low effort guess can you try disabling PCI Link Power management (ASPM) and/or AER (advanced error reporting) in the system's firmware and see what happens?
Beyond that there are a number of relevant errata we may need to check off in the driver to see if we are missing some mitigation http://iommu.com/datasheets/e1000-datasheets/82571eb-82572ei-gbe-controller-spec-update.pdf the above two firmware changes stand out to me as eliminating some possible issues.
> try disabling PCI Link Power management (ASPM) and/or AER (advanced error reporting) in the system's firmware and see what happens?
Sorry mate, I'm struggling to figure out how to do this. Sorry if the following sounds dumb in this context. I'm using UEFI but don't see any such option, nor do I find anything on the internet on how to disable these.
(In reply to Ace from comment #14)
If the UEFI had options for it I think it would be obvious so it may not have the knobs exposed. It will be tricky to proceed and make any fixes without a card.
(In reply to Kevin Bowling from comment #15)
I just bought a new a Intel i350-T4 which users online have reported no issues with in combination with the HP T730 and OPNSense so fingers crossed that will fare better.
Having said that, if you are UK based, I'd be happy to post the HP NC364T card to you if it helps other users since I'll have no use for it. Please contact me directly via email if you're up for that.