Trying to load FreeBSD 12.0-RELEASE on a small, six-port firewall appliance, a Protectli FW6A (https://protectli.com/product/fw6a/). The device's six ports are powered by Intel's 82583V gigabit chipset, and supposed to be supported by the em(4) driver. I've opened a ticket with Protectli support, and they have confirmed that 11.2-RELEASE will work, but have verified my observation that 12.0-RELEASE does not. My suspicion is this is a regression from the iflib updates done between 11.2 and 12.0. I've tried a couple of things found in other bugs for the em(4) driver, including disabling TSO, several sysctl tweakables, disabling MSI-X, different ethernet cables, different ports, etc. Nothing seems to work. Also tried forcing the igb(4) driver, to see if that would pick the ports up, but no go on that. Both the em(4) and igb(4) man pages say they can support the 82580 chipsets. The port will take an IP address assigned statically, but cannot look one up via DHCP. It does seem capable of seeing ARP "Who am I?" requests, but cannot see the responses and does not update the ARP tables w/ new MAC addresses, even after fresh ping attempts (MAC and IPs below redacted). It doesn't appear to process any other ethertype protocol at all outside of ARP. Though, I have not verified that via tcpdump real well yet. # arp -a ? (192.168.w.x) at xx:xx:xx:xx:xx:xx on em0 permanent [ethernet] ? (192.168.w.y) at (incomplete) on em0 expired [ethernet] ? (192.168.w.z) at (incomplete) on em0 expired [ethernet] Some additional info from various utilities, with addresses masked: # ifconfig em0 em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=81249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER> ether xx:xx:xx:xx:xx:xx inet 192.168.w.x netmask 0xffffff00 broadcast 192.168.w.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> dmesg: pcib1: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0 pci1: <ACPI PCI bus> on pcib1 em0: <Intel(R) PRO/1000 Network Connection> port 0xe000-0xe01f mem 0xdfe00000-0xdfe1ffff,0xdfe20000-0xdfe23fff irq 16 at device 0.0 on pci1 em0: attach_pre capping queues at 1 em0: using 1024 tx descriptors and 1024 rx descriptors em0: msix_init qsets capped at 1 em0: pxm cpus: 2 queue msgs: 6 admincnt: 1 em0: using 1 rx queues 1 tx queues em0: Using MSIX interrupts with 2 vectors em0: allocated for 1 tx_queues em0: allocated for 1 rx_queues em0: Ethernet address: xx:xx:xx:xx:xx:xx em0: netmap queues/slots: TX 1/1024, RX 1/1024 ((repeat five more times to em5) em0: link state changed to UP If any additional information is needed to debug this, please let me know.
# pciconf -lvBbcV (for em0 only, repeats through em5, only the BARs change) em0@pci0:1:0:0: class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82583V Gigabit Network Connection' class = network subclass = ethernet bar [10] = type Memory, range 32, base rxdfe00000, size 131072, enabled bar [18] = type I/O Port, range 32, base rxe000, size 32, enabled bar [1c] = type Memory, range 32, base rxdfe20000, size 16384, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit cap 10[e0] = PCI-Express 1 endpoint max data 128(256) RO NS link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) cap 11[a0] = MSI-X supports 7 messages, enabled Table in map 0x1c[0x0], PBA in map 0x1c[0x2000] ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected ecap 0003[140] = Serial 1 xxxxxxffffxxxxxx (MAC redacted on ecap 0003)
I'm having the exact same problem with the Minisys IBOX-501 N13. Looks like the hardware is identical.
Same here. Asrock 990FX Extreme 9 integrated LAN
(In reply to Joshua Kinard from comment #1) According to datasheet (https://www.intel.com/content/www/us/en/embedded/products/networking/82583v-gbe-controller-datasheet.html) 82583V supports only MSI and legacy interrupts. That might be the cause why it's not working when iflib tries to use MSI-X. I think this patch https://reviews.freebsd.org/rS343864 in 12-STABLE should fix that.
(In reply to Krzysztof Galazka from comment #4) I gave it a really quick test by trying to apply the diff from D18980 via 'cat <patch> | patch -p2' while in /usr/src/sys, and rebuilding a GENERIC 12.0-RELEASE-p3 kernel on a secondary machine. Copied the newer /boot/kernel over to the networking appliance via USB and booted into the new kernel. Doesn't seem to have had any effect: FreeBSD 12.0-RELEASE-p3 GENERIC amd64 FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1) [snip] pcib1: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0 pci1: <ACPI PCI bus> on pcib1 em0: <Intel(R) PRO/1000 Network Connection> port 0xe000-0xe01f mem 0xdfe00000-0xdfe1ffff,0xdfe20000-0xdfe23fff irq 16 at device 0.0 on pci1 em0: Using 1024 tx descriptors and 1024 rx descriptors em0: Using 1 rx queues 1 tx queues em0: Using MSI-X interrupts with 2 vectors em0: Ethernet address: xx:xx:xx:xx:xx:xx em0: netmap queues/slots: TX 1/1024, RX 1/1024 [snip] # ifconfig -a em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=81249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER> ether xx:xx:xx:xx:xx:xx inet a.b.c.d netmask 0xffffff00 broadcast a.b.c.255 media: Ethernet autoselect (1000baseT <full-duplex>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> # ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8): 56 data bytes ping: sendto: No route to host ping: sendto: No route to host ping: sendto: No route to host I can't rule out that I need to apply other parts of that patch, but I find that Phabricator interface to be incredibly confusing. Is there a classic gitweb sitting around somewhere that will cough up a standard unidiff patch of the whole commit? Or maybe the patch alone doesn't fix it? --- So that said, the info on this chipset not being compatible w/ MSI-X was very helpful, thank you. I had to dig around, as the iflib change made obsolete the old "hw.x.*" tunables in /boot/loader.conf (and that's all that stupid Google wants to find right now), but I found the correct tunable names for newer iflib stuff, and disabled MSI-X there. After another reboot, we have contact: # ping 8.8.8.8 PING 8.8.8.8 (8.8.8.8): 56 data bytes 64 bytes from 8.8.8.8: icmp_seq=0 ttl=120 time=12.144 ms 64 bytes from 8.8.8.8: icmp_seq=1 ttl=120 time=12.667 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=120 time=10.681 ms 64 bytes from 8.8.8.8: icmp_seq=3 ttl=120 time=10.519 ms ^C --- 8.8.8.8 ping statistics --- 4 packets transmitted, 4 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 10.519/11.503/12.667/0.923 ms root@armenelos:~ # So! It looks like the fix for these specific chips would be for em(4) to automatically disable MSI-X when it detects the specific PCI ID of these chipsets.
(In reply to Joshua Kinard from comment #5) There is a new patch under review, which disables MSI-X support in em(4) for devices other than 82574: https://reviews.freebsd.org/D19108. Thank you for all the details you have provided and thank Marius for the patch.
A commit references this bug: Author: marius Date: Sat Feb 9 11:58:41 UTC 2019 New revision: 343934 URL: https://svnweb.freebsd.org/changeset/base/343934 Log: - Remove the redundant device disabled hint handling; ever since r241119 that's performed globally by device_attach(9). - As for the EM-class of devices, em(4) supports multiple queues and MSI-X respectively only with 82574 devices. However, since the conversion to iflib(4), em(4) relies on the interrupt type fallback mechanism, i. e. MSI-X -> MSI -> INTx, of iflib(4) to figure out the interrupt type to use for the EM-class (as well as the IGB-class) of MACs. Moreover, despite the datasheet for 82583V not mentioning any support of MSI-X, there actually are 82583V devices out there that report a varying number of MSI-X messages as supported. The interrupt type fallback of iflib(4) is causing two failure modes depending on the actual number of MSI-X messages supported for such instances of 82583V: 1) With only one MSI-X message supported, none is left for the RX/TX queues as that one message gets assigned to the admin interrupt. Worse, later on - which will be addressed with a separate fix - iflib(4) interprets that one messages as MSI or INTx to be set up, but fails to actually do so as it has previously called pci_alloc_msix(9). [1, 2] 2) With more message supported, their distribution is okay but then em_if_msix_intr_assign() doesn't work for 82583V, with the interface being left in a non-working state, too. [3] Thus, let em_if_attach_pre() indicate to iflib(4) to try MSI-X with 82574 only, and at most MSI for the remainder of EM-class devices. While at it, remove "try_second_bar" as it's polarity inverted and not actually needed. - Remove code from em_if_timer() that effectively is a NOP since the conversion to iflib(4) ("trigger" is no longer read). While at it, let the comment for em_if_timer() reflect reality after said conversion. - Implement an ifdi_watchdog_reset method which only updates the em(4) "watchdog_events" counter but doesn't perform any reset, so that the em(4) "watchdog_timeouts" SYSCTL (iflib(4) doesn't provide a counterpart) reflects reality and these timeouts add to IFCOUNTER_OERRORS again after the iflib(4) conversion. - Remove the "mbuf_defrag_fail" and "tx_dma_fail" SYSCTLS; since the iflib(4) conversion, associated counters are disconnected, but iflib(4) provides "mbuf_defrag_failed" and "tx_map_failed" respectively as equivalents. - Move the description preceding lem_smartspeed() to the correct spot before em_reset() and bring back appropriate comments for {igb,em}_initialize_rss_mapping() and lem_smartspeed() lost in the iflib(4) conversion. - Adapt some other function descriptions and INIT_DEBUGOUT() use to match reality after the iflib(4) conversion. - Put the debugging message of em_enable_vectors_82574() (missed in r343578) under bootverbose, too. PR: 219428 [1], 235246 [2], 235147 [3] Reviewed by: erj (previous version) Differential Revision: https://reviews.freebsd.org/D19108 Changes: head/sys/dev/e1000/if_em.c head/sys/dev/e1000/if_em.h
I had the same problem on probably the same hardware (Axiomtek NA-320) after updating from 11.2-STABLE to 12-STABLE r343942 Adding hw.pci.enable_msix=0 to /boot/loader.conf and rebooting fixed it for me.
A commit references this bug: Author: marius Date: Wed Feb 13 14:39:17 UTC 2019 New revision: 344098 URL: https://svnweb.freebsd.org/changeset/base/344098 Log: MFC: r343934 - Remove the redundant device disabled hint handling; ever since r241119 that's performed globally by device_attach(9). - As for the EM-class of devices, em(4) supports multiple queues and MSI-X respectively only with 82574 devices. However, since the conversion to iflib(4), em(4) relies on the interrupt type fallback mechanism, i. e. MSI-X -> MSI -> INTx, of iflib(4) to figure out the interrupt type to use for the EM-class (as well as the IGB-class) of MACs. Moreover, despite the datasheet for 82583V not mentioning any support of MSI-X, there actually are 82583V devices out there that report a varying number of MSI-X messages as supported. The interrupt type fallback of iflib(4) is causing two failure modes depending on the actual number of MSI-X messages supported for such instances of 82583V: 1) With only one MSI-X message supported, none is left for the RX/TX queues as that one message gets assigned to the admin interrupt. Worse, later on - which will be addressed with a separate fix - iflib(4) interprets that one messages as MSI or INTx to be set up, but fails to actually do so as it has previously called pci_alloc_msix(9). [1, 2] 2) With more message supported, their distribution is okay but then em_if_msix_intr_assign() doesn't work for 82583V, with the interface being left in a non-working state, too. [3] Thus, let em_if_attach_pre() indicate to iflib(4) to try MSI-X with 82574 only, and at most MSI for the remainder of EM-class devices. While at it, remove "try_second_bar" as it's polarity inverted and not actually needed. - Remove code from em_if_timer() that effectively is a NOP since the conversion to iflib(4) ("trigger" is no longer read). While at it, let the comment for em_if_timer() reflect reality after said conversion. - Implement an ifdi_watchdog_reset method which only updates the em(4) "watchdog_events" counter but doesn't perform any reset, so that the em(4) "watchdog_timeouts" SYSCTL (iflib(4) doesn't provide a counterpart) reflects reality and these timeouts add to IFCOUNTER_OERRORS again after the iflib(4) conversion. - Remove the "mbuf_defrag_fail" and "tx_dma_fail" SYSCTLS; since the iflib(4) conversion, associated counters are disconnected, but iflib(4) provides "mbuf_defrag_failed" and "tx_map_failed" respectively as equivalents. - Move the description preceding lem_smartspeed() to the correct spot before em_reset() and bring back appropriate comments for {igb,em}_initialize_rss_mapping() and lem_smartspeed() lost in the iflib(4) conversion. - Adapt some other function descriptions and INIT_DEBUGOUT() use to match reality after the iflib(4) conversion. - Put the debugging message of em_enable_vectors_82574() (missed in r343578) under bootverbose, too. PR: 219428 [1], 235246 [2], 235147 [3] Reviewed by: erj (previous version) Differential Revision: https://reviews.freebsd.org/D19108 Changes: _U stable/12/ stable/12/sys/dev/e1000/if_em.c stable/12/sys/dev/e1000/if_em.h
Close; thanks for the report!