Bug 270163

Summary: Intel 82571EB quad port adapter causing ifconfig response delays when only one interface plugged in at boot
Product: Base System Reporter: Timothy Pearson <tpearson>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: New ---    
Severity: Affects Some People CC: pkubaj, zlei
Priority: --- Keywords: IntelNetworking
Version: 13.1-STABLE   
Hardware: powerpc   
OS: Any   
Attachments:
Description Flags
Kernel output with debugging enabled none

Description Timothy Pearson 2023-03-12 21:47:19 UTC
When using a 82571EB quad port Ethernet adapter under FreeBSD 13.1, I am observing a persistent and significant delay in ifconfig responses if the machine was booted without all four Ethernet ports connected and active.  This delay shows up as ifconfig is attempting to gather media information and link state, and lasts for ~50 seconds

Once the device enters this state, it no longer detects link state changes on Ethernet ports that have been connected after boot.  Connecting a network cable to the port continues to register link down, even though the card lights show activity on that port.

Bad link state output:
        media: Ethernet autoselect
        status: no carrier

Correct link state output:
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active

Even with all four ports connected, subsequent link state changes are not registered.  I suspect there may be an issue communicating with the PHYs after initial driver setup has completed.

I am using CARP on these interfaces.  Other Ethernet adapter models I have tested do not appear to exhibit this problem.

Rarely (less than 1 in 100 chance), the system starts normally on its own and all link state changes are reflected correctly until the next reboot.

No abnormal messages appear in dmesg.

The cards operate normally under Linux.
Comment 1 Timothy Pearson 2023-03-12 22:08:02 UTC
Digging further, it appears that at some point in this process an invalid PCIe transaction occurs, forcing the controller off the bus.  The subsequent hang may be ppc64le specific, as x86 doesn't have an equivalent PCIe firewalling mechanism, however the underlying bug is likely present on both platforms.
Comment 2 Timothy Pearson 2023-03-13 00:11:22 UTC
I can confirm the igb driver is also affected.  It appears that as soon as CARP transitions from BACKUP to ACTIVE something happens that causes invalid DMA to/from the card.  X520 cards are confirmed not affected, using the ixgbe driver.
Comment 3 Piotr Kubaj freebsd_committer freebsd_triage 2023-03-13 00:35:05 UTC
Sorry, but unfortunately we only have amd64 servers for testing. Can you try rebuilding the driver with CFLAGS+=-DVERBOSE_DEBUG and with define DBG 1 (changed from 0) in e1000_osdep.h?

You may also want to try using the newer version of that driver from https://www.intel.com/content/www/us/en/download/15187/intel-network-adapter-gigabit-base-driver-for-freebsd.html.
Comment 4 Mark Linimon freebsd_committer freebsd_triage 2023-03-13 02:34:36 UTC
Assign to net@, but note that this may be powerpc specific.
Comment 5 Timothy Pearson 2023-03-13 03:29:07 UTC
Created attachment 240807 [details]
Kernel output with debugging enabled

Here is a capture of the kernel log with debugging enabled.  Note there are actually three identical 82571EB cards installed in this box, but only one has a network cable attached to one of its ports (em11) for this test.

I suspect the device is firewalled by the PCIe controller somewhere between the "carp: 6@em11: BACKUP -> MASTER (master timed out)" and "Unable to establish link!!!" lines.
Comment 6 Zhenlei Huang freebsd_committer freebsd_triage 2023-03-13 03:33:24 UTC
(In reply to Timothy Pearson from comment #5)
Can you please test without CARP ?
Comment 7 Timothy Pearson 2023-03-13 03:58:15 UTC
(In reply to Zhenlei Huang from comment #6)

Yes, I'll just need a bit of time to shuffle the configuration around.

Two other data points:

An Intel I340-T4 I had available to test with appears to work correctly, including with CARP.

The driver linked above does not compile under FreeBSD 13.1.
Comment 8 Timothy Pearson 2023-03-13 05:04:02 UTC
OK, looks like it's on data transmission, CARP was just the first transmission on the device when it was set up.  With CARP disabled, the log looks similar, everything looks OK then it dies somewhere in the first few lines here:

e1000_put_hw_semaphore
__e1000_read_phy_reg_igp
e1000_get_hw_semaphore
e1000_read_phy_reg_mdic
e1000_put_hw_semaphore
e1000_get_laa_state_82571
e1000_rar_set_generic
e1000_get_laa_state_82571
e1000_rar_set_generic
e1000_check_for_copper_link
e1000_phy_has_link_generic
__e1000_read_phy_reg_igp
e1000_get_hw_semaphore
Driver can't access device - SMBI bit is set.
e1000_put_hw_semaphore
e1000_read_phy_reg_mdic
MDI Error
e1000_put_hw_semaphore
__e1000_read_phy_reg_igp
e1000_get_hw_semaphore
Driver can't access device - SMBI bit is set.
e1000_put_hw_semaphore
e1000_read_phy_reg_mdic
MDI Error
e1000_put_hw_semaphore
Comment 9 Timothy Pearson 2023-03-13 05:05:02 UTC
(In reply to Timothy Pearson from comment #8)

Note that when the device is firewalled, every access returns 0xff, so the reference to a driver lock is probably just the relevant bit being returned as 1 as part of a 0xff response from the PCIe controller.
Comment 10 Piotr Kubaj freebsd_committer freebsd_triage 2023-03-16 23:38:48 UTC
I have tested the card I have (dual-port Intel(R) PRO/1000 PT 82571EB/82571GB (Copper)) with both big-endian and little-endian systems on Talos II and it seems to work correctly. The only difference would be that I have FreeBSD 13.2-RC2 (which I would recommend using anyway if you're on POWER9, since starting with 13.2 radix is enabled by default on POWER9). If you're on POWER8, compatibility with it is of course retained.