Bug 234766 - em(4) Intel 82579LM regression on Supermicro X9SCM-F
Summary: em(4) Intel 82579LM regression on Supermicro X9SCM-F
Status: Closed Feedback Timeout
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Kevin Bowling
URL:
Keywords: IntelNetworking, regression
Depends on:
Blocks:
 
Reported: 2019-01-08 18:44 UTC by Henry
Modified: 2021-09-25 19:21 UTC (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Henry 2019-01-08 18:44:48 UTC
12.0-RELEASE-p1 on a Supermicro X9SCM-F using the on board em0 which is a 82579LM.

MSI-X is enabled by default despite the driver initialisation showing a problem with MSI-X: "Unable to map MSIX table"

em0: <Intel(R) PRO/1000 Network Connection> port 0xf020-0xf03f mem 0xfbb00000-0xfbb1ffff,0xfbb24000-0xfbb24fff irq 20 at device 25.0 on pci0
em0: attach_pre capping queues at 1
em0: using 1024 tx descriptors and 1024 rx descriptors
em0: msix_init qsets capped at 1
em0: Unable to map MSIX table 
em0: Using an MSI interrupt
em0: allocated for 1 tx_queues
em0: allocated for 1 rx_queues
em0: Ethernet address: xx:xx:xx:xx:xx:xx
em0: netmap queues/slots: TX 1/1024, RX 1/1024

# pciconf -lv
[...]
em0@pci0:0:25:0:        class=0x020000 card=0x150215d9 chip=0x15028086 rev=0x05 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82579LM Gigabit Network Connection (Lewisville)'
    class      = network
    subclass   = ethernet

This eventually leads to the interface going down, yet the box remains up and responsive, with the only recourse to restoring the network being a reboot.

em0: TX(0) desc avail = 42, pidx = 988
em0: link state changed to DOWN
em0: TX(0) desc avail = 1024, pidx = 0
em0: TX(0) desc avail = 1024, pidx = 0
[...]
em0: TX(0) desc avail = 1024, pidx = 0


The workaround appears to be to disable MSI-X:

# sysctl dev.em.0.iflib.disable_msix=0
Comment 1 Henry 2019-01-08 18:52:19 UTC
Its worth adding that this interface worked flawlessly with MSI-X enabled on 11.2-RELEASE.

dmesg portion from 11.2:
em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0xf020-0xf03f mem 0xfbb00000-0xfbb1ffff,0xfbb24000-0xfbb24fff irq 20 at device 25.0 on pci0
em0: Using an MSI interrupt
em0: Ethernet address: xx:xx:xx:xx:xx:xx
em0: netmap queues/slots: TX 1/1024, RX 1/1024
Comment 2 Henry 2019-01-09 19:50:14 UTC
So disabling msix does not actually prevent this happening, just appears to increase the time till its triggered.

I've now disabled LRO to see if that helps.
Comment 3 Eric Joyner freebsd_committer freebsd_triage 2019-01-09 21:24:42 UTC
(In reply to Henry David Bartholomew from comment #2)

I was going to post a comment about how setting that tunable doesn't really cause any other change other than to suppress that error message, because the only difference would be that that iflib tries and fails to map the MSI-X bar.

Regardless, that error message shouldn't be appearing at all on most em(4) devices since only 82574 supports MSI-X, and so mapping the MSI-X bar shouldn't even be attempted on the 82579.

I don't know what to do about the queue hangs, though. Have you tried disabling TSO, if it is enabled?
Comment 4 Henry 2019-01-10 01:56:00 UTC
Disabling LRO doesnt help.

TSO wasnt enabled.

I've switched to using the kernel module from net/intel-em-kmod in the hope its less buggy.
Comment 5 Henry 2019-01-24 23:56:27 UTC
I've been using the net/intel-em-kmod driver for the last two weeks which has performed flawlessly.
Comment 6 ncrogers 2019-02-05 02:39:59 UTC
Its possible this was fixed by:

https://svnweb.freebsd.org/base?view=revision&revision=343578
Comment 7 freebsd 2019-05-22 21:43:24 UTC
I'm seeing similar behavior on a different Intel chip:

$ freebsd-version -kr
12.0-RELEASE-p4
12.0-RELEASE-p4

$ pciconf -lv
...
em0@pci0:0:31:6:	class=0x020000 card=0x86721043 chip=0x15b88086 rev=0x31 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection (2) I219-V'
    class      = network
    subclass   = ethernet

I will suddenly get the following kernel messages (the avail and pidx values differ at each failure):
   kernel: em0: TX(0) desc avail = 42, pidx = 208
   kernel: em0: link state changed to DOWN

Followed by a constant stream of the following messages:
   kernel: em0: TX(0) desc avail = 1024, pidx = 0

This seems to be triggered by periods of high network activity. For example, another computer has a backup task set to run once daily, which backs up a SAMBA share hosted by this computer. This network failure often happens shortly after that task begins.

I used 11.2-RELASE on this computer for some time and never encountered this behavior with it.

I'll give net/intel-em-kmod a try.
Comment 8 freebsd 2019-05-25 23:52:20 UTC
It looks like using net/intel-em-kmod also works around the issue on my card.
Comment 9 Kevin Bowling freebsd_committer freebsd_triage 2021-04-15 06:54:47 UTC
Is this still problematic on 12.2 or 13.0?
Comment 10 Kevin Bowling freebsd_committer freebsd_triage 2021-09-25 19:21:22 UTC
Please re-open if you see this with supported versions of FreeBSD.