Bug 205706 - Watchdog timeout on em driver under heavy traffic on a bridge configuration
Summary: Watchdog timeout on em driver under heavy traffic on a bridge configuration
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net mailing list
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2015-12-30 00:02 UTC by avilamarquezalvaro2015
Modified: 2019-06-17 07:31 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description avilamarquezalvaro2015 2015-12-30 00:02:13 UTC
The em driver hangs with following error:

em1: Watchdog timeout Queue[0]-- resetting
Interface is RUNNING and ACTIVE
em1: TX Queue 0 ------
em1: hw tdh = 379, hw tdt = 315
em1: Tx Queue Status = -2147483648
em1: TX descriptors avail = 64
em1: Tx Descriptors avail failure = 816926
em1: RX Queue 0 ------
em1: hw rdh = 149, hw rdt = 145
em1: RX discarded packets = 0
em1: RX Next to Check = 146
em1: RX Next to Refresh = 145
em1: link state changed to DOWN
em1: link state changed to UP

This is in a machine with dual Intel Gigabit NICs set up as a bridge with 11-CURRENT-amd64-20151217 base r292413. I suspect this will also happen if only one of the NICs uses em driver.

This happens under heavy traffic conditions (>= 10MiBytes/sec in both directions of the bridge) after 1-4 hours. Once the watchdog timeouts, the link won't pass any more packets, it will detect cable disconn/conn. ifconfig down/ifconfig up won't help at all, a reboot is necessary.

I tried the what is suggested here: bug #200221 (ifconfig -tso -vlanhwtso in both intfs) the watchdog doesn't timeout anymore but after a similar period of time the link becomes unusable (a ping loses ~ 90% packets).

This was also tested on 10.2-CURRENT with identical results

Thanks
Comment 1 avilamarquezalvaro2015 2016-01-06 00:08:21 UTC
Errata for comment #1:
Where it reads 10.2-CURRENT must be 10.2-RELEASE
Comment 2 joshruehlig 2017-05-05 17:53:00 UTC
I am also experiencing this, and yes I am triggering it on a single interface of my motherboard, that has dual Intel NICs (82574L).

In my case, the OS is FreeBSD 10.3. One interface (em0), connects to my router and continues to work. The other interface (em1) connects to a dumb gigabit switch, which also connects to five 100Mbps IP cameras.

I must reboot the server for this interface to start working again, ifconfig up+down does not fix it. I also tried with "TSO4" disabled for this interface but the issue still happens.
Comment 3 joshruehlig 2017-05-05 18:01:02 UTC
In my case, my server is streaming maybe around 6MiBytes/sec of video from my five IP cameras, in a single direction.
Comment 4 Eugene Grosbein freebsd_committer 2017-05-05 19:56:39 UTC
(In reply to joshruehlig from comment #2)

Next time this happens, instead of reboot, please try to force link re-negotiation using commands:

ifconfig em0 media 10baseT/UTP
ifconfig em0 media autoselect
Comment 5 joshruehlig 2017-06-02 00:45:48 UTC
(In reply to Eugene Grosbein from comment #4)

This finally happened to me again, shortly after a reboot for other reasons.

Forcing link re-negotiation on the affected interface did not work. In this instance only 1 of my cameras (10.0.1.11) was unreachable, while the others worked as expected.
Comment 6 joshruehlig 2017-06-02 00:46:40 UTC
(In reply to joshruehlig from comment #5)

I'm thinking there was actually a problem with the camera in this case. Because rebooting the camera (and no the FreeBSD server) fixed it.
Comment 7 Andrei 2017-08-22 18:40:09 UTC
Same issue for me but for igb driver and after upgrade to 11.1 from 11.0
If I rollback to 11.0  - all is fine

igb0@pci0:8:0:0:        class=0x020000 card=0x060f15d9 chip=0x10c98086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82576 Gigabit Network Connection'
    class      = network
    subclass   = ethernet

interface connected to router
I'm not sure that separate bug report needed because of igb driver.
If yes, please ping me and I will create one
Comment 8 Peter 2017-10-24 15:33:07 UTC
Same issue for me with the em0 driver. Kernel Version 11.0-RELEASE-p12
In the setup are two OpnSense firewalls (COTS PCs) back-2-back with an IPsec tunnel between. The error shows up immediately when traffic is generated on the LAN side(s) at the same time. I feed 64Byte Ethernet frames with an IXIA test generator. Packet rate is 1448000 pps.

I believe its always the WAN side (on both PCs), the em driver hangs with following error:

em0: Watchdog timeout Queue[0]-- resetting
Interface is RUNNING and ACTIVE
em0: TX Queue 0 ------
em0: hw tdh = 818, hw tdt = 777
em0: Tx Queue Status = -2147483648
em0: TX descriptors avail = 40
em0: Tx Descriptors avail failure = 17012
em0: RX Queue 0 ------
em0: hw rdh = 139, hw rdt = 139
em0: RX discarded packets = 0
em0: RX Next to Check = 140
em0: RX Next to Refresh = 139

Nothing will cure this problem unless I reboot. I also tried to play with mbufs but without success.

pciconf -lvbc
em0@pci0:1:0:0:	class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 hdr=0x00
vendor     = 'Intel Corporation'
device     = '82574L Gigabit Network Connection'
class      = network
subclass   = ethernet
bar   [10] = type Memory, range 32, base rxf7ec0000, size 131072, enabled
bar   [14] = type Memory, range 32, base rxf7e00000, size 524288, enabled
bar   [18] = type I/O Port, range 32, base rxe000, size 32, enabled
bar   [1c] = type Memory, range 32, base rxf7ee0000, size 16384, enabled
cap 01[c8] = powerspec 2  supports D0 D3  current D0
cap 05[d0] = MSI supports 1 message, 64 bit 
cap 10[e0] = PCI-Express 1 endpoint max data 256(256) NS
link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
cap 11[a0] = MSI-X supports 5 messages, enabled
Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
ecap 0001[100] = AER 1 0 fatal 0 non-fatal 1 corrected
ecap 0003[140] = Serial 1 6805caffff5fda26 

Cheers, Peter
Comment 9 Pokemon999 2019-06-17 07:31:10 UTC
MARKED AS SPAM