Summary: | em(4) driver not working for Intel 82583V Gigabit chip | ||
---|---|---|---|
Product: | Base System | Reporter: | Joshua Kinard <freebsd> |
Component: | kern | Assignee: | Marius Strobl <marius> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | CC: | brent, brewmeister2112, krzysztof.galazka, luigiitaliano, mail, marius, naito.yuichiro, ncrogers, net, wolfgang |
Priority: | --- | Keywords: | IntelNetworking, regression |
Version: | 12.0-RELEASE | ||
Hardware: | Any | ||
OS: | Any |
Description
Joshua Kinard
2019-01-23 10:09:46 UTC
# pciconf -lvBbcV (for em0 only, repeats through em5, only the BARs change) em0@pci0:1:0:0: class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82583V Gigabit Network Connection' class = network subclass = ethernet bar [10] = type Memory, range 32, base rxdfe00000, size 131072, enabled bar [18] = type I/O Port, range 32, base rxe000, size 32, enabled bar [1c] = type Memory, range 32, base rxdfe20000, size 16384, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit cap 10[e0] = PCI-Express 1 endpoint max data 128(256) RO NS link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) cap 11[a0] = MSI-X supports 7 messages, enabled Table in map 0x1c[0x0], PBA in map 0x1c[0x2000] ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected ecap 0003[140] = Serial 1 xxxxxxffffxxxxxx (MAC redacted on ecap 0003) I'm having the exact same problem with the Minisys IBOX-501 N13. Looks like the hardware is identical. Same here. Asrock 990FX Extreme 9 integrated LAN (In reply to Joshua Kinard from comment #1) According to datasheet (https://www.intel.com/content/www/us/en/embedded/products/networking/82583v-gbe-controller-datasheet.html) 82583V supports only MSI and legacy interrupts. That might be the cause why it's not working when iflib tries to use MSI-X. I think this patch https://reviews.freebsd.org/rS343864 in 12-STABLE should fix that. (In reply to Krzysztof Galazka from comment #4) I gave it a really quick test by trying to apply the diff from D18980 via 'cat <patch> | patch -p2' while in /usr/src/sys, and rebuilding a GENERIC 12.0-RELEASE-p3 kernel on a secondary machine. Copied the newer /boot/kernel over to the networking appliance via USB and booted into the new kernel. Doesn't seem to have had any effect: FreeBSD 12.0-RELEASE-p3 GENERIC amd64 FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1) [snip] pcib1: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0 pci1: <ACPI PCI bus> on pcib1 em0: <Intel(R) PRO/1000 Network Connection> port 0xe000-0xe01f mem 0xdfe00000-0xdfe1ffff,0xdfe20000-0xdfe23fff irq 16 at device 0.0 on pci1 em0: Using 1024 tx descriptors and 1024 rx descriptors em0: Using 1 rx queues 1 tx queues em0: Using MSI-X interrupts with 2 vectors em0: Ethernet address: xx:xx:xx:xx:xx:xx em0: netmap queues/slots: TX 1/1024, RX 1/1024 [snip] # ifconfig -a em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=81249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER> ether xx:xx:xx:xx:xx:xx inet a.b.c.d netmask 0xffffff00 broadcast a.b.c.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> # ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8): 56 data bytes ping: sendto: No route to host ping: sendto: No route to host ping: sendto: No route to host I can't rule out that I need to apply other parts of that patch, but I find that Phabricator interface to be incredibly confusing. Is there a classic gitweb sitting around somewhere that will cough up a standard unidiff patch of the whole commit? Or maybe the patch alone doesn't fix it? --- So that said, the info on this chipset not being compatible w/ MSI-X was very helpful, thank you. I had to dig around, as the iflib change made obsolete the old "hw.x.*" tunables in /boot/loader.conf (and that's all that stupid Google wants to find right now), but I found the correct tunable names for newer iflib stuff, and disabled MSI-X there. After another reboot, we have contact: # ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: icmp_seq=0 ttl=120 time=12.144 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=120 time=12.667 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=120 time=10.681 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=120 time=10.519 ms ^C --- 8.8.8.8 ping statistics --- 4 packets transmitted, 4 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 10.519/11.503/12.667/0.923 ms root@armenelos:~ # So! It looks like the fix for these specific chips would be for em(4) to automatically disable MSI-X when it detects the specific PCI ID of these chipsets. (In reply to Joshua Kinard from comment #5) There is a new patch under review, which disables MSI-X support in em(4) for devices other than 82574: https://reviews.freebsd.org/D19108. Thank you for all the details you have provided and thank Marius for the patch. A commit references this bug: Author: marius Date: Sat Feb 9 11:58:41 UTC 2019 New revision: 343934 URL: https://svnweb.freebsd.org/changeset/base/343934 Log: - Remove the redundant device disabled hint handling; ever since r241119 that's performed globally by device_attach(9). - As for the EM-class of devices, em(4) supports multiple queues and MSI-X respectively only with 82574 devices. However, since the conversion to iflib(4), em(4) relies on the interrupt type fallback mechanism, i. e. MSI-X -> MSI -> INTx, of iflib(4) to figure out the interrupt type to use for the EM-class (as well as the IGB-class) of MACs. Moreover, despite the datasheet for 82583V not mentioning any support of MSI-X, there actually are 82583V devices out there that report a varying number of MSI-X messages as supported. The interrupt type fallback of iflib(4) is causing two failure modes depending on the actual number of MSI-X messages supported for such instances of 82583V: 1) With only one MSI-X message supported, none is left for the RX/TX queues as that one message gets assigned to the admin interrupt. Worse, later on - which will be addressed with a separate fix - iflib(4) interprets that one messages as MSI or INTx to be set up, but fails to actually do so as it has previously called pci_alloc_msix(9). [1, 2] 2) With more message supported, their distribution is okay but then em_if_msix_intr_assign() doesn't work for 82583V, with the interface being left in a non-working state, too. [3] Thus, let em_if_attach_pre() indicate to iflib(4) to try MSI-X with 82574 only, and at most MSI for the remainder of EM-class devices. While at it, remove "try_second_bar" as it's polarity inverted and not actually needed. - Remove code from em_if_timer() that effectively is a NOP since the conversion to iflib(4) ("trigger" is no longer read). While at it, let the comment for em_if_timer() reflect reality after said conversion. - Implement an ifdi_watchdog_reset method which only updates the em(4) "watchdog_events" counter but doesn't perform any reset, so that the em(4) "watchdog_timeouts" SYSCTL (iflib(4) doesn't provide a counterpart) reflects reality and these timeouts add to IFCOUNTER_OERRORS again after the iflib(4) conversion. - Remove the "mbuf_defrag_fail" and "tx_dma_fail" SYSCTLS; since the iflib(4) conversion, associated counters are disconnected, but iflib(4) provides "mbuf_defrag_failed" and "tx_map_failed" respectively as equivalents. - Move the description preceding lem_smartspeed() to the correct spot before em_reset() and bring back appropriate comments for {igb,em}_initialize_rss_mapping() and lem_smartspeed() lost in the iflib(4) conversion. - Adapt some other function descriptions and INIT_DEBUGOUT() use to match reality after the iflib(4) conversion. - Put the debugging message of em_enable_vectors_82574() (missed in r343578) under bootverbose, too. PR: 219428 [1], 235246 [2], 235147 [3] Reviewed by: erj (previous version) Differential Revision: https://reviews.freebsd.org/D19108 Changes: head/sys/dev/e1000/if_em.c head/sys/dev/e1000/if_em.h I had the same problem on probably the same hardware (Axiomtek NA-320) after updating from 11.2-STABLE to 12-STABLE r343942 Adding hw.pci.enable_msix=0 to /boot/loader.conf and rebooting fixed it for me. A commit references this bug: Author: marius Date: Wed Feb 13 14:39:17 UTC 2019 New revision: 344098 URL: https://svnweb.freebsd.org/changeset/base/344098 Log: MFC: r343934 - Remove the redundant device disabled hint handling; ever since r241119 that's performed globally by device_attach(9). - As for the EM-class of devices, em(4) supports multiple queues and MSI-X respectively only with 82574 devices. However, since the conversion to iflib(4), em(4) relies on the interrupt type fallback mechanism, i. e. MSI-X -> MSI -> INTx, of iflib(4) to figure out the interrupt type to use for the EM-class (as well as the IGB-class) of MACs. Moreover, despite the datasheet for 82583V not mentioning any support of MSI-X, there actually are 82583V devices out there that report a varying number of MSI-X messages as supported. The interrupt type fallback of iflib(4) is causing two failure modes depending on the actual number of MSI-X messages supported for such instances of 82583V: 1) With only one MSI-X message supported, none is left for the RX/TX queues as that one message gets assigned to the admin interrupt. Worse, later on - which will be addressed with a separate fix - iflib(4) interprets that one messages as MSI or INTx to be set up, but fails to actually do so as it has previously called pci_alloc_msix(9). [1, 2] 2) With more message supported, their distribution is okay but then em_if_msix_intr_assign() doesn't work for 82583V, with the interface being left in a non-working state, too. [3] Thus, let em_if_attach_pre() indicate to iflib(4) to try MSI-X with 82574 only, and at most MSI for the remainder of EM-class devices. While at it, remove "try_second_bar" as it's polarity inverted and not actually needed. - Remove code from em_if_timer() that effectively is a NOP since the conversion to iflib(4) ("trigger" is no longer read). While at it, let the comment for em_if_timer() reflect reality after said conversion. - Implement an ifdi_watchdog_reset method which only updates the em(4) "watchdog_events" counter but doesn't perform any reset, so that the em(4) "watchdog_timeouts" SYSCTL (iflib(4) doesn't provide a counterpart) reflects reality and these timeouts add to IFCOUNTER_OERRORS again after the iflib(4) conversion. - Remove the "mbuf_defrag_fail" and "tx_dma_fail" SYSCTLS; since the iflib(4) conversion, associated counters are disconnected, but iflib(4) provides "mbuf_defrag_failed" and "tx_map_failed" respectively as equivalents. - Move the description preceding lem_smartspeed() to the correct spot before em_reset() and bring back appropriate comments for {igb,em}_initialize_rss_mapping() and lem_smartspeed() lost in the iflib(4) conversion. - Adapt some other function descriptions and INIT_DEBUGOUT() use to match reality after the iflib(4) conversion. - Put the debugging message of em_enable_vectors_82574() (missed in r343578) under bootverbose, too. PR: 219428 [1], 235246 [2], 235147 [3] Reviewed by: erj (previous version) Differential Revision: https://reviews.freebsd.org/D19108 Changes: _U stable/12/ stable/12/sys/dev/e1000/if_em.c stable/12/sys/dev/e1000/if_em.h Close; thanks for the report! |