Summary: | em network driver broken in current | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | andy | ||||||||
Component: | kern | Assignee: | freebsd-net (Nobody) <net> | ||||||||
Status: | Closed FIXED | ||||||||||
Severity: | Affects Some People | CC: | admin-freebsd-bugzilla, arkadiusz.majewski, erj, freebsdbug, gitdev, kaho, kbowling, madpilot, mmacy, ncrogers, nleibert87, ohartmann, pi, robert, russ.haley, sbruno, sergey.dyatko, shurd, thurners, tom, tom | ||||||||
Priority: | --- | Keywords: | regression | ||||||||
Version: | CURRENT | ||||||||||
Hardware: | amd64 | ||||||||||
OS: | Any | ||||||||||
Bug Depends on: | |||||||||||
Bug Blocks: | 220004 | ||||||||||
Attachments: |
|
Description
andy
2017-05-20 21:14:18 UTC
This report should be self contained, it is difficult to attempt to try and dig up and align ML threads No need to look thru ML threads, I just mentioned that so there was more reports than just mine. Not much to report, install CURRENT on a physical machine with an Intel gigabit nic. DHCP and static address assignment both fail to configure the interface and successfully ping another machine. Not sure what I can add. Did you try to install on a physical machine? I have no way else to tell you how to reproduce the problem. This is what I've send to the mailing list recently and I simply copy-and-paste'd it here for convenience to document the problem, which is still present and serious. The problem has gone worse and reliefed since the introduction of IFLIB, to say: recent CURRENT recovers itself now after being "dead" for more than a minute (but loosing then connections due to timeouts, i.e. ssh) and in now more frequently occuring cases getting worse in terms of loosing the device: there was no known to me method to revive the NIC but rebooting - which is desastrous in some situations. [...] Since the introduction of IFLIB, I have big trouble with especially a certain type of NIC, namely formerly known igb and em. The worst device is an Intel NIC known as i217-LM em0@pci0:0:25:0: class=0x020000 card=0x11ed1734 chip=0x153a8086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection I217-LM' class = network subclass = ethernet bar [10] = type Memory, range 32, base rxfb300000, size 131072, enabled bar [14] = type Memory, range 32, base rxfb339000, size 4096, enabled bar [18] = type I/O Port, range 32, base rxf020, size 32, enabled This NIC is widely used by Fujitsu's workstations CELSIUS M740 and the fate would have it, that I have to use one of these. When syncing data over the network from the workstation to an older C2D/bce based server via NFSv4, since introduction of IFLIB the connection to the NFS get stuck and I receive on the console messages like em0: TX(0) desc avail = 1024, pidx = 0 em0: TX(0) desc avail = 42, pidx = 985 Hitting "Ctrl-T" on the terminal doing the sync via "rsync", I see then this message: load: 0.01 cmd: rsync 68868 [nfsaio] 395.68r 4.68u 4.64s 0% 3288k (just for the record) Server and client(s) are on 12-CURRENT: ~ FreeBSD 12.0-CURRENT #38 r318285: Mon May 15 12:27:29 CEST 2017 amd64, customised kernels and "netmap" enabled (just for the record if that matters). In the past, I was able to revive the connection by simply putting the NIC down and then up again and while I had running a ping as a trace indication of the state of the NIC, I got very often ping: sendto: No buffer space available Well, today I checked via dmesg the output to gather again those messages and realised that the dmesg is garbled: [...] nfs nfs servnnfs servefs r server19 2.19162n.fs snerver fs1 s9nfs s2er.nfs server er192.168.0.31:/pool/packages: not responding v er 192.168.0.31ver :/po1ol/packages9: 2.168.0.31:/pool/packagesn: noot responding t <6>n fs serverespondinngf s server 192.168.1rn nfs server 192.168.0.31:/pool/packages: not1 responding 9 2.168.1f7s 0.31:/pool/packagenfs sesrver 19serv2er .168.0.31:/poo: not respolnding / packages: not responding nfs server 19192.168.0.31:/pool/pa2c.k168.0.31:a/gpserver ne1s92.168.0.31:/pool/pac: knot respaof1s68 gs.e17rve8r.2 3192.168.0.31:/pool/packa1:/pool/packages: not responding o goes: nl/packages: not responding o t responding nfs server 192.168.0.31:/poes: ol/packages: nfns server 192.168.0.31:/pool/paot responding c kages: not respondinnfs server n192.1f68.0.31:/pool/packagess: ndi server 192.168.0.31:/pool/packages: not responding [...] Earlier this year after introduction of IFLIB, I checked out servers equipted with Intels very popular i350T2v2 NIC and I had similar problems when dd'ing large files over NFSv4 (ZFS backed) from a client (em0, a client/consumer grade older NIC from 2010, forgot its ID, towards server with i350, but the server side got stuck with the messages seen similar to those reported with the i217-LM). Since my department uses lots of those server grade NICs, I will swap the i217 with a i350T2 and check again. I confirm the problem with em on freebsd 12-CURRENT r319167. The board is an Atom D525 with an ICH8M system chip. pciconf shows em5@pci0:7:0:0: class=0x020000 card=0x00008086 chip=0x150c8086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82583V Gigabit Network Connection' class = network subclass = ethernet Created attachment 183184 [details]
dmesg from affected system
Boot output from affected system, amd64 with atom D525 booting 12-CURRENT r319167. There should be em0-em5 ethernet devices recognized, and em3-em5 configured as a bridge.
Created attachment 183187 [details]
debug output
Debug output for r319481 on amd64
panic: Assertion adapter->tx_num_queues > 0 failed at /usr/src/sys/dev/e1000/if_em.c:2664
Created attachment 183588 [details]
boot msg, panic and backtrace
FreeBSD 12.0-CURRENT #0 r319859 generates panic:
em1: Using MSIX interrupts with 1 vectors
panic: Assertion adapter->tx_num_queues > 0 failed at /usr/src/sys/dev/e1000/if_em.c:2664
(In reply to gitdev from comment #7) The panic you met is unrelated to the original report. Please try this patch. Index: sys/dev/e1000/if_em.c =================================================================== --- sys/dev/e1000/if_em.c (revision 322833) +++ sys/dev/e1000/if_em.c (working copy) @@ -797,6 +797,8 @@ scctx->isc_txrx = &em_txrx; scctx->isc_capenable = EM_CAPS; scctx->isc_tx_csum_flags = CSUM_TCP | CSUM_UDP | CSUM_IP_TSO; + if (adapter->hw.mac.type != e1000_82574) + scctx->isc_msix_bar = 0; } else { scctx->isc_txqsizes[0] = roundup2((scctx->isc_ntxd[0] + 1) * sizeof(struct e1000_tx_desc), EM_DBA_ALIGN); scctx->isc_rxqsizes[0] = roundup2((scctx->isc_nrxd[0] + 1) * sizeof(struct e1000_rx_desc), EM_DBA_ALIGN); Hi, I have SuperMicro server smbios.planar.product="X9DRW-3LN4F+/X9DRW-3TF+" running FreeBSD st3.domain.tld 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r325556: Sun Nov 12 22:39:29 MSK 2017 root@st3.domain.tld:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64 with igb(4): igb0@pci0:4:0:0: class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet igb1@pci0:4:0:1: class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet igb2@pci0:129:0:0: class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet igb3@pci0:129:0:3: class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet # grep lagg /etc/rc.conf cloned_interfaces="lagg0 vlan2" ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1 laggport igb2 laggport igb3 62.x.x.x netmask 255.255.255.224" ifconfig_vlan2="vlan 2 vlandev lagg0 192.168.2.3/24" after reboot all works fine: lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e505bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:4c:11:d2 inet 62.x.x.x netmask 0xffffffe0 broadcast 62.x.x.x nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect status: active groups: lagg laggproto lacp lagghash l2,l3,l4 laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> laggport: igb2 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> laggport: igb3 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> but, after a while I see in messages something like this: Nov 16 09:35:30 st3 kernel: igb1: Interface stopped DISTRIBUTING, possible flapping (always igb1) then, after a while the server become unavailable over the network, if I open console via IPMI I could see following: igb1: TX(3) desc avail = 1024, pidx = 0 igb1: TX(3) desc avail = 1024, pidx = 0 igb1: TX(3) desc avail = 1024, pidx = 0 igb1: TX(3) desc avail = 1024, pidx = 0 after reboot all works fine again... I'm also seeing similar errors, I have this hardware: em0@pci0:0:31:6: class=0x020000 card=0x86721043 chip=0x15b88086 rev=0x31 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet Connection (2) I219-V' class = network subclass = ethernet Running "ifconfig em0 -tso4" makes the machine stable for me. This started happening on a couple of my boxes as well. (optiplex 7010, lenovo thinkcentre m90p) em0: TX(0) desc avail = 41, pidx = 788 link state changed to down em0: link state changed to DOWN em0: TX(0) desc avail = 1024, pidx = 0 em0: TX(0) desc avail = 1024, pidx = 0 dell - em0@pci0:0:25:0: class=0x020000 card=0x052c1028 chip=0x15028086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82579LM Gigabit Network Connection (Lewisville)' class = network subclass = ethernet thinkcentre - em0@pci0:0:25:0: class=0x020000 card=0x306017aa chip=0x10ef8086 rev=0x06 hdr=0x00 vendor = 'Intel Corporation' device = '82578DM Gigabit Network Connection' class = network subclass = ethernet No panic, no other messages other than "kernel out of bufferspace" I've added -tso -lro -rxcsum -txcsum to ifconfig and we'll see if it stops puking. It usually takes a couple of days to trigger it. Didn't start happening on the dell till last week. The only recent change was r335303. Can you try r335302 or prior? -M (In reply to Matthew Macy from comment #12) I can confirm putting a pci rl card in the lenovo box prior to this as I suspected it was hardware related at the time. A commit references this bug: Author: marius Date: Sun Jul 15 19:04:26 UTC 2018 New revision: 336313 URL: https://svnweb.freebsd.org/changeset/base/336313 Log: Assorted TSO fixes for em(4)/iflib(9) and dead code removal: - Ever since the workaround for the silicon bug of TSO4 causing MAC hangs was committed in r295133, CSUM_TSO always got disabled unconditionally by em(4) on the first invocation of em_init_locked(). However, even with that problem fixed, it turned out that for at least e. g. 82579 not all necessary TSO workarounds are in place, still causing MAC hangs even at Gigabit speed. Thus, for stable/11, TSO usage was deliberately disabled in r323292 (r323293 for stable/10) for the EM-class by default, allowing users to turn it on if it happens to work with their particular EM MAC in a Gigabit-only environment. In head, the TSO workaround for speeds other than Gigabit was lost with the conversion to iflib(9) in r311849 (possibly along with another one or two TSO workarounds). Yet at the same time, for EM-class MACs TSO4 got enabled by default again, causing device hangs. Therefore, change the default for this hardware class back to have TSO4 off, allowing users to turn it on manually if it happens to work in their environment as we do in stable/{10,11}. An alternative would be to add a whitelist of EM-class devices where TSO4 actually is reliable with the workarounds in place, but given that the advantage of TSO at Gigabit speed is rather limited - especially with the overhead of these workarounds -, that's really not worth it. [1] This change includes the addition of an isc_capabilities to struct if_softc_ctx so iflib(9) can also handle interface capabilities that shouldn't be enabled by default which is used to handle the default-off capabilities of e1000 as suggested by shurd@ and moving their handling from em_setup_interface() to em_if_attach_pre() accordingly. - Although 82543 support TSO4 in theory, the former lem(4) didn't have support for TSO4, presumably because TSO4 is even more broken in the LEM-class of MACs than the later EM ones. Still, TSO4 for LEM-class devices was enabled as part of the conversion to iflib(9) in r311849, causing device hangs. So revert back to the pre-r311849 behavior of not supporting TSO4 for LEM-class at all, which includes not creating a TSO DMA tag in iflib(9) for devices not having IFCAP_TSO4 set. [2] - In fact, the FreeBSD TCP stack can handle a TSO size of IP_MAXPACKET (65535) rather than FREEBSD_TSO_SIZE_MAX (65518). However, the TSO DMA must have a maxsize of the maximum TSO size plus the size of a VLAN header for software VLAN tagging. The iflib(9) converted em(4), thus, first correctly sets scctx->isc_tx_tso_size_max to EM_TSO_SIZE in em_if_attach_pre(), but later on overrides it with IP_MAXPACKET in em_setup_interface() (apparently, left-over from pre-iflib(9) times). So remove the later and correct iflib(9) to correctly cap the maximum TSO size reported to the stack at IP_MAXPACKET. While at it, let iflib(9) use if_sethwtsomax*(). This change includes the addition of isc_tso_max{seg,}size DMA engine constraints for the TSO DMA tag to struct if_shared_ctx and letting iflib_txsd_alloc() automatically adjust the maxsize of that tag in case IFCAP_VLAN_MTU is supported as requested by shurd@. - Move the if_setifheaderlen(9) call for adjusting the maximum Ethernet header length from {ixgbe,ixl,ixlv,ixv,em}_setup_interface() to iflib(9) so adjustment is automatically done in case IFCAP_VLAN_MTU is supported. As a consequence, this adjustment now is also done in case of bnxt(4) which missed it previously. - Move the reduction of the maximum TSO segment count reported to the stack by the number of m_pullup(9) calls (which in the worst case, can add another mbuf and, thus, the requirement for another DMA segment each) in the transmit path for performance reasons from em_setup_interface() to iflib_txsd_alloc() as these pull-ups are now done in iflib_parse_header() rather than in the no longer existing em_xmit(). Moreover, this optimization applies to all drivers using iflib(9) and not just em(4); all in-tree iflib(9) consumers still have enough room to handle full size TSO packets. Also, reduce the adjustment to the maximum number of m_pullup(9)'s now performed in iflib_parse_header(). - Prior to the conversion of em(4)/igb(4)/lem(4) and ixl(4) to iflib(9) in r311849 and r335338 respectively, these drivers didn't enable IFCAP_VLAN_HWFILTER by default due to VLAN events not being passed through by lagg(4). With iflib(9), IFCAP_VLAN_HWFILTER was turned on by default but also lagg(4) was fixed in that regard in r203548. So just remove the now redundant and defunct IFCAP_VLAN_HWFILTER handling in {em,ixl,ixlv}_setup_interface(). - Nuke other redundant IFCAP_* setting in {em,ixl,ixlv}_setup_interface() which is (more completely) already done in {em,ixl,ixlv}_if_attach_pre() now. - Remove some redundant/dead setting of scctx->isc_tx_csum_flags in em_if_attach_pre(). - Remove some IFCAP_* duplicated either directly or indirectly (e. g. via IFCAP_HWCSUM) in {EM,IGB,IXL}_CAPS. - Don't bother to fiddle with IFCAP_HWSTATS in ixgbe(4)/ixgbev(4) as iflib(9) adds that capability unconditionally. - Remove some unused macros from em(4). - Bump __FreeBSD_version as some of the above changes require the modules of drivers using iflib(9) to be recompiled. Okayed by: sbruno@ at 201806 DevSummit Transport Working Group [1] Reviewed by: sbruno (earlier version), erj PR: 219428 (part of; comment #10) [1], 220997 (part of; comment #3) [2] Differential Revision: https://reviews.freebsd.org/D15720 Changes: head/sys/dev/bnxt/if_bnxt.c head/sys/dev/e1000/if_em.c head/sys/dev/e1000/if_em.h head/sys/dev/ixgbe/if_ix.c head/sys/dev/ixgbe/if_ixv.c head/sys/dev/ixgbe/ixgbe.h head/sys/dev/ixl/if_ixl.c head/sys/dev/ixl/if_ixlv.c head/sys/dev/ixl/ixl_pf_main.c head/sys/net/iflib.c head/sys/net/iflib.h head/sys/sys/param.h A commit references this bug: Author: marius Date: Sat Feb 9 11:58:41 UTC 2019 New revision: 343934 URL: https://svnweb.freebsd.org/changeset/base/343934 Log: - Remove the redundant device disabled hint handling; ever since r241119 that's performed globally by device_attach(9). - As for the EM-class of devices, em(4) supports multiple queues and MSI-X respectively only with 82574 devices. However, since the conversion to iflib(4), em(4) relies on the interrupt type fallback mechanism, i. e. MSI-X -> MSI -> INTx, of iflib(4) to figure out the interrupt type to use for the EM-class (as well as the IGB-class) of MACs. Moreover, despite the datasheet for 82583V not mentioning any support of MSI-X, there actually are 82583V devices out there that report a varying number of MSI-X messages as supported. The interrupt type fallback of iflib(4) is causing two failure modes depending on the actual number of MSI-X messages supported for such instances of 82583V: 1) With only one MSI-X message supported, none is left for the RX/TX queues as that one message gets assigned to the admin interrupt. Worse, later on - which will be addressed with a separate fix - iflib(4) interprets that one messages as MSI or INTx to be set up, but fails to actually do so as it has previously called pci_alloc_msix(9). [1, 2] 2) With more message supported, their distribution is okay but then em_if_msix_intr_assign() doesn't work for 82583V, with the interface being left in a non-working state, too. [3] Thus, let em_if_attach_pre() indicate to iflib(4) to try MSI-X with 82574 only, and at most MSI for the remainder of EM-class devices. While at it, remove "try_second_bar" as it's polarity inverted and not actually needed. - Remove code from em_if_timer() that effectively is a NOP since the conversion to iflib(4) ("trigger" is no longer read). While at it, let the comment for em_if_timer() reflect reality after said conversion. - Implement an ifdi_watchdog_reset method which only updates the em(4) "watchdog_events" counter but doesn't perform any reset, so that the em(4) "watchdog_timeouts" SYSCTL (iflib(4) doesn't provide a counterpart) reflects reality and these timeouts add to IFCOUNTER_OERRORS again after the iflib(4) conversion. - Remove the "mbuf_defrag_fail" and "tx_dma_fail" SYSCTLS; since the iflib(4) conversion, associated counters are disconnected, but iflib(4) provides "mbuf_defrag_failed" and "tx_map_failed" respectively as equivalents. - Move the description preceding lem_smartspeed() to the correct spot before em_reset() and bring back appropriate comments for {igb,em}_initialize_rss_mapping() and lem_smartspeed() lost in the iflib(4) conversion. - Adapt some other function descriptions and INIT_DEBUGOUT() use to match reality after the iflib(4) conversion. - Put the debugging message of em_enable_vectors_82574() (missed in r343578) under bootverbose, too. PR: 219428 [1], 235246 [2], 235147 [3] Reviewed by: erj (previous version) Differential Revision: https://reviews.freebsd.org/D19108 Changes: head/sys/dev/e1000/if_em.c head/sys/dev/e1000/if_em.h A commit references this bug: Author: marius Date: Wed Feb 13 14:39:17 UTC 2019 New revision: 344098 URL: https://svnweb.freebsd.org/changeset/base/344098 Log: MFC: r343934 - Remove the redundant device disabled hint handling; ever since r241119 that's performed globally by device_attach(9). - As for the EM-class of devices, em(4) supports multiple queues and MSI-X respectively only with 82574 devices. However, since the conversion to iflib(4), em(4) relies on the interrupt type fallback mechanism, i. e. MSI-X -> MSI -> INTx, of iflib(4) to figure out the interrupt type to use for the EM-class (as well as the IGB-class) of MACs. Moreover, despite the datasheet for 82583V not mentioning any support of MSI-X, there actually are 82583V devices out there that report a varying number of MSI-X messages as supported. The interrupt type fallback of iflib(4) is causing two failure modes depending on the actual number of MSI-X messages supported for such instances of 82583V: 1) With only one MSI-X message supported, none is left for the RX/TX queues as that one message gets assigned to the admin interrupt. Worse, later on - which will be addressed with a separate fix - iflib(4) interprets that one messages as MSI or INTx to be set up, but fails to actually do so as it has previously called pci_alloc_msix(9). [1, 2] 2) With more message supported, their distribution is okay but then em_if_msix_intr_assign() doesn't work for 82583V, with the interface being left in a non-working state, too. [3] Thus, let em_if_attach_pre() indicate to iflib(4) to try MSI-X with 82574 only, and at most MSI for the remainder of EM-class devices. While at it, remove "try_second_bar" as it's polarity inverted and not actually needed. - Remove code from em_if_timer() that effectively is a NOP since the conversion to iflib(4) ("trigger" is no longer read). While at it, let the comment for em_if_timer() reflect reality after said conversion. - Implement an ifdi_watchdog_reset method which only updates the em(4) "watchdog_events" counter but doesn't perform any reset, so that the em(4) "watchdog_timeouts" SYSCTL (iflib(4) doesn't provide a counterpart) reflects reality and these timeouts add to IFCOUNTER_OERRORS again after the iflib(4) conversion. - Remove the "mbuf_defrag_fail" and "tx_dma_fail" SYSCTLS; since the iflib(4) conversion, associated counters are disconnected, but iflib(4) provides "mbuf_defrag_failed" and "tx_map_failed" respectively as equivalents. - Move the description preceding lem_smartspeed() to the correct spot before em_reset() and bring back appropriate comments for {igb,em}_initialize_rss_mapping() and lem_smartspeed() lost in the iflib(4) conversion. - Adapt some other function descriptions and INIT_DEBUGOUT() use to match reality after the iflib(4) conversion. - Put the debugging message of em_enable_vectors_82574() (missed in r343578) under bootverbose, too. PR: 219428 [1], 235246 [2], 235147 [3] Reviewed by: erj (previous version) Differential Revision: https://reviews.freebsd.org/D19108 Changes: _U stable/12/ stable/12/sys/dev/e1000/if_em.c stable/12/sys/dev/e1000/if_em.h Hi, I'm experiencing this bug on: FreeBSD 12.0-RELEASE-p3 GENERIC amd64 CPU: Intel(R) Core(TM) i3-3220T CPU @ 2.80GHz (2793.72-MHz K8-class CPU) em0: <Intel(R) PRO/1000 Network Connection> port 0xf080-0xf09f mem 0xf7e00000-0xf7e1ffff,0xf7e39000-0xf7e39fff irq 20 at device 25.0 on pci0 em1: <Intel(R) PRO/1000 Network Connection> port 0xe000-0xe01f mem 0xf7c00000-0xf7c1ffff,0xf7c20000-0xf7c23fff irq 18 at device 0.0 on pci3 em1: attach_pre capping queues at 2 Current cap: 0x460b em1: using 1024 tx descriptors and 1024 rx descriptors em1: msix_init qsets capped at 2 em1: pxm cpus: 2 queue msgs: 4 admincnt: 1 em1: using 2 rx queues 2 tx queues em1: Using MSIX interrupts with 3 vectors em1: allocated for 2 tx_queues em1: allocated for 2 rx_queues em1: Ethernet address: 70:54:d2:45:71:41 em1: netmap queues/slots: TX 2/1024, RX 2/1024 em1: TX(1) desc avail = 41, pidx = 179 em1: link state changed to DOWN em1: TX(1) desc avail = 1024, pidx = 0 em1: TX(1) desc avail = 1024, pidx = 0 em1: TX(1) desc avail = 1024, pidx = 0 ... It's only fixed by a reboot. How can I know if the fixes mentioned here are present in 12.0-RELEASE-p3 or if I'll have to wait for p4 or more ? thanks Can confirm this is still a problem on 12.0-RELEASE-p3. Of note it is only happening on em0 on my system, I use em1 for a cross connect and NFS. They're both the built in on the motherboard. em0@pci0:0:25:0: class=0x020000 card=0x20378086 chip=0x15038086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82579V Gigabit Network Connection' class = network subclass = ethernet em1@pci0:3:0:0: class=0x020000 card=0x20378086 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet Hi, I believe I am also affected by this bug in FreeBSD-12.0-RELEASE-p3. When uploading/downloading at the same time, em0 loses speed and will eventually restart with: Mar 2 14:45:40 kernel: em0: TX(0) desc avail = 41, pidx = 583 Mar 2 14:45:41 kernel: em0: link state changed to DOWN Mar 2 14:45:42 kernel: em0: link state changed to UP Mar 2 14:45:43 kernel: em0: TX(0) desc avail = 1024, pidx = 0 Mar 2 14:45:44 kernel: em0: link state changed to DOWN Mar 2 14:45:45 kernel: em0: link state changed to UP em0@pci0:0:25:0: class=0x020000 card=0x20138086 chip=0x15038086 rev=0x05 hdr=0x00 vendor = 'Intel Corporation' device = '82579V Gigabit Network Connection' class = network subclass = ethernet This does not happen when using Linux (ie Debian 9.8). Thank you. Hi, I'm facing this problem on 1 of 2 installed em interfaces, too. System: FreeBSD 12.0-STABLE r344823 amd64 NICs: em0@pci0:0:25:0: class=0x020000 card=0x102e17aa chip=0x15028086 rev=0x04 hdr=0x00 vendor = 'Intel Corporation' device = '82579LM Gigabit Network Connection (Lewisville)' class = network subclass = ethernet em1@pci0:2:10:0: class=0x020000 card=0x002e8086 chip=0x100e8086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '82540EM Gigabit Ethernet Controller' class = network subclass = ethernet Problem: Mar 7 20:49:06 hydrogen kernel: em1: link state changed to DOWN Mar 7 20:49:07 hydrogen kernel: em1: TX(0) desc avail = 1024, pidx = 0 Mar 7 20:49:12 hydrogen syslogd: last message repeated 3 times Mar 7 20:49:14 hydrogen kernel: em1: link state changed to UP Mar 7 20:49:15 hydrogen kernel: em1: TX(0) desc avail = 1020, pidx = 4 Mar 7 20:49:15 hydrogen kernel: em1: link state changed to DOWN Mar 7 20:49:16 hydrogen kernel: em1: link state changed to UP Mar 7 20:49:24 hydrogen kernel: em1: link state changed to DOWN Mar 7 20:49:26 hydrogen kernel: em1: link state changed to UP Mar 7 20:49:27 hydrogen kernel: em1: link state changed to DOWN Mar 7 20:49:28 hydrogen kernel: em1: TX(0) desc avail = 1024, pidx = 0 Mar 7 20:49:30 hydrogen syslogd: last message repeated 1 times Mar 7 20:49:32 hydrogen kernel: em1: link state changed to UP Mar 7 20:49:34 hydrogen kernel: em1: link state changed to DOWN Mar 7 20:49:36 hydrogen kernel: em1: TX(0) desc avail = 1024, pidx = 0 Mar 7 20:49:50 hydrogen syslogd: last message repeated 8 times Mar 7 20:49:52 hydrogen kernel: em1: link state changed to UP Mar 7 20:49:52 hydrogen kernel: em1: TX(0) desc avail = 1024, pidx = 0 Mar 7 20:49:52 hydrogen kernel: em1: link state changed to DOWN Mar 7 20:49:54 hydrogen kernel: em1: link state changed to UP em0 (82579LM) does work without problems. Hello! I'm affected as well. kernel: igb0: TX(0) desc avail = 1024, pidx = 0 ... kernel: igb2: TX(5) desc avail = 1024, pidx = 0 igb0@pci0:129:0:0: class=0x020000 card=0x00001458 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet igb2@pci0:132:0:0: class=0x020000 card=0xa02c8086 chip=0x10e88086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82576 Gigabit Network Connection' class = network subclass = ethernet Please open the bug, the problem still occurs. FreeBSD 12.0-RELEASE-p7 Use case: 1. All was working. 2. I detached Ethernet cable from modem side. 3. Connected again and saw "no carrier" on igb0 interface. 4. System logged as follow: Jul 5 22:31:35 hpv kernel: igb0: TX(0) desc avail = 1024, pidx = 0 Jul 5 22:31:38 hpv kernel: igb0: TX(0) desc avail = 1024, pidx = 0 Jul 5 22:31:40 hpv kernel: igb0: TX(0) desc avail = 1024, pidx = 0 5. I was trying "server:# service netif restart igb0 && service routing restart" without success. 6. Resarted OS to back to normal. Is there any workaround instead of restart a system? I was trying below without progress as well. # ifconfig igb0 down # ifconfig igb0 up I think the problem is when the network card loses ethernet link and then errors occurs with non-working interface. My sysctl.conf file. net.inet.tcp.blackhole=2 net.inet.udp.blackhole=1 net.inet.icmp.log_redirect=1 net.inet.icmp.drop_redirect=1 net.inet.ip.random_id=1 net.link.tap.up_on_open=1 net.inet.tcp.mssdflt=1440 net.inet.tcp.nolocaltimewait=1 net.inet.ip.check_interface=1 net.inet.ip.redirect=0 net.inet.tcp.drop_synfin=1 net.inet.tcp.msl=15000 net.inet.tcp.icmp_may_rst=0 net.inet.tcp.path_mtu_discovery=0 net.inet6.icmp6.rediraccept=0 net.inet6.ip6.redirect=0 kern.ipc.maxsockbuf=16777216 net.inet.tcp.sendspace=1048576 net.inet.tcp.recvspace=1048576 net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.sendbuf_inc=524288 net.inet.tcp.recvbuf_inc=524288 net.inet.tcp.cc.algorithm=cubic net.inet.tcp.tso=0 net.inet.tcp.rexmit_slop=50 net.inet.tcp.msl=5000 net.inet.tcp.keepinit=5000 net.inet.tcp.finwait2_timeout=5000 net.inet.tcp.fast_finwait2_recycle=1 net.inet.tcp.always_keepalive=0 net.route.netisr_maxqlen=2048 net.inet.ip.process_options=0 net.inet.sctp.blackhole=2 net.inet.tcp.abc_l_var=44 I've been having intermittent network problems on a SuperMicro Server. My computer is exhibiting a similar pathology as Mr. Sergey V. Dyatkos'. I'm serving Minecraft to an internal network over a dlink powerline ethernet modem. It seems over the last few days the powerline network has been going up and down, which eventually causes the "TX(5) desc avail = 1024" error. My system uses the igb driver. russellh@sylvester:~> uname -a FreeBSD sylvester 12.0-RELEASE-p3 FreeBSD 12.0-RELEASE-p3 r346787 GENERIC amd64 russellh@sylvester:~> pciconf -lv #SNIP# igb0@pci0:2:0:0: class=0x020000 card=0x10c915d9 chip=0x10c98086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82576 Gigabit Network Connection' class = network subclass = ethernet |