| Summary: | Intel 82571EB quad port adapter causing ifconfig response delays when only one interface plugged in at boot | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Timothy Pearson <tpearson> | ||||
| Component: | kern | Assignee: | freebsd-net (Nobody) <net> | ||||
| Status: | New --- | ||||||
| Severity: | Affects Some People | CC: | pkubaj, zlei | ||||
| Priority: | --- | Keywords: | IntelNetworking | ||||
| Version: | 13.1-STABLE | ||||||
| Hardware: | powerpc | ||||||
| OS: | Any | ||||||
| Attachments: |
|
||||||
Digging further, it appears that at some point in this process an invalid PCIe transaction occurs, forcing the controller off the bus. The subsequent hang may be ppc64le specific, as x86 doesn't have an equivalent PCIe firewalling mechanism, however the underlying bug is likely present on both platforms. I can confirm the igb driver is also affected. It appears that as soon as CARP transitions from BACKUP to ACTIVE something happens that causes invalid DMA to/from the card. X520 cards are confirmed not affected, using the ixgbe driver. Sorry, but unfortunately we only have amd64 servers for testing. Can you try rebuilding the driver with CFLAGS+=-DVERBOSE_DEBUG and with define DBG 1 (changed from 0) in e1000_osdep.h? You may also want to try using the newer version of that driver from https://www.intel.com/content/www/us/en/download/15187/intel-network-adapter-gigabit-base-driver-for-freebsd.html. Assign to net@, but note that this may be powerpc specific. Created attachment 240807 [details]
Kernel output with debugging enabled
Here is a capture of the kernel log with debugging enabled. Note there are actually three identical 82571EB cards installed in this box, but only one has a network cable attached to one of its ports (em11) for this test.
I suspect the device is firewalled by the PCIe controller somewhere between the "carp: 6@em11: BACKUP -> MASTER (master timed out)" and "Unable to establish link!!!" lines.
(In reply to Timothy Pearson from comment #5) Can you please test without CARP ? (In reply to Zhenlei Huang from comment #6) Yes, I'll just need a bit of time to shuffle the configuration around. Two other data points: An Intel I340-T4 I had available to test with appears to work correctly, including with CARP. The driver linked above does not compile under FreeBSD 13.1. OK, looks like it's on data transmission, CARP was just the first transmission on the device when it was set up. With CARP disabled, the log looks similar, everything looks OK then it dies somewhere in the first few lines here: e1000_put_hw_semaphore __e1000_read_phy_reg_igp e1000_get_hw_semaphore e1000_read_phy_reg_mdic e1000_put_hw_semaphore e1000_get_laa_state_82571 e1000_rar_set_generic e1000_get_laa_state_82571 e1000_rar_set_generic e1000_check_for_copper_link e1000_phy_has_link_generic __e1000_read_phy_reg_igp e1000_get_hw_semaphore Driver can't access device - SMBI bit is set. e1000_put_hw_semaphore e1000_read_phy_reg_mdic MDI Error e1000_put_hw_semaphore __e1000_read_phy_reg_igp e1000_get_hw_semaphore Driver can't access device - SMBI bit is set. e1000_put_hw_semaphore e1000_read_phy_reg_mdic MDI Error e1000_put_hw_semaphore (In reply to Timothy Pearson from comment #8) Note that when the device is firewalled, every access returns 0xff, so the reference to a driver lock is probably just the relevant bit being returned as 1 as part of a 0xff response from the PCIe controller. I have tested the card I have (dual-port Intel(R) PRO/1000 PT 82571EB/82571GB (Copper)) with both big-endian and little-endian systems on Talos II and it seems to work correctly. The only difference would be that I have FreeBSD 13.2-RC2 (which I would recommend using anyway if you're on POWER9, since starting with 13.2 radix is enabled by default on POWER9). If you're on POWER8, compatibility with it is of course retained. |
When using a 82571EB quad port Ethernet adapter under FreeBSD 13.1, I am observing a persistent and significant delay in ifconfig responses if the machine was booted without all four Ethernet ports connected and active. This delay shows up as ifconfig is attempting to gather media information and link state, and lasts for ~50 seconds Once the device enters this state, it no longer detects link state changes on Ethernet ports that have been connected after boot. Connecting a network cable to the port continues to register link down, even though the card lights show activity on that port. Bad link state output: media: Ethernet autoselect status: no carrier Correct link state output: media: Ethernet autoselect (1000baseT <full-duplex>) status: active Even with all four ports connected, subsequent link state changes are not registered. I suspect there may be an issue communicating with the PHYs after initial driver setup has completed. I am using CARP on these interfaces. Other Ethernet adapter models I have tested do not appear to exhibit this problem. Rarely (less than 1 in 100 chance), the system starts normally on its own and all link state changes are reflected correctly until the next reboot. No abnormal messages appear in dmesg. The cards operate normally under Linux.