Bug 243463

Summary: ix0: Watchdog timeout
Product: Base System Reporter: Jiri <silence>
Component: kernAssignee: freebsd-net mailing list <net>
Status: Open ---    
Severity: Affects Only Me CC: denis, krzysztof.galazka, net
Priority: --- Keywords: IntelNetworking, needs-qa
Version: 12.1-RELEASEFlags: koobs: mfc-stable12?
koobs: mfc-stable11?
Hardware: amd64   
OS: Any   
See Also: https://reviews.freebsd.org/D21712

Description Jiri 2020-01-20 08:47:59 UTC
Hi all,
I observe some strange behavior of my NIC. Dual port Intel X520, only one port connected.
Real traffic about 400MBit RX / 100MBit TX, Supermicro X11SCL-F/Xeon E-2176G/2x8GB RAM, latest BIOS R1.2, acting as pf firewall/router. New install, running first day. 

ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 1455) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 1885) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 1062) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 1 desc avail: 34 pidx: 177) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 33 pidx: 1275) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 2014) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 707) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 653) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP

ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500        options=8138b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER>
        ether a0:36:9f:26:fb:b8
        inet x.x.x.x netmask 0xfffffff8 broadcast y.y.y.y
        media: Ethernet autoselect (10Gbase-LR <full-duplex,rxpause,txpause>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        plugged: SFP/SFP+/SFP28 10G Base-LR (LC)
        vendor: Intel Corp PN: SFP-10G13-LR SN: IB81220374 DATE: 2018-12-20
        module temperature: 36.87 C Voltage: 3.28 Volts
        RX: 0.54 mW (-2.64 dBm) TX: 0.71 mW (-1.43 dBm)

ix0@pci0:1:0:0: class=0x020000 card=0x7b118086 chip=0x154d8086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet 10G 2P X520 Adapter'
    class      = network
    subclass   = ethernet

sysctl.conf
kern.ipc.maxsockbuf=16777216
net.inet.tcp.mssdflt=1460
net.inet.tcp.minmss=536

loader.conf
cc_htcp_load="YES"
machdep.hyperthreading_allowed="0"
net.inet.tcp.soreceive_stream="1"
net.isr.maxthreads="-1"
net.isr.bindthreads="-1"
net.pf.source_nodes_hashsize="1048576"

Jiri
Comment 1 Krzysztof Galazka 2020-01-21 09:34:11 UTC
(In reply to Jiri from comment #0)

Could you, please, check if applying this patch https://reviews.freebsd.org/D21712 has any influence? I would like to rule out that the watchdog timeouts are false positives.
Comment 2 Jiri 2020-01-21 09:58:07 UTC
(In reply to Krzysztof Galazka from comment #1)

Thank You,

O.K. I''l do it at night. Now, two days router running at the same traffic condition (no reboots, no config changes) no ix0 timeouts has appeared.

Jiri
Comment 3 Jiri 2020-01-23 14:54:02 UTC
I have had applied the patch. No timeouts or messages like "queue can't be marked as hung if interface is down" has appeared.
Next I'll try switch port shutdown and traffic torture. I'll let you know if something happened.
Jiri
Comment 4 Jiri 2020-01-27 15:42:57 UTC
I tried to switch on/off optical link to my ix0 - manually remove fibers. Kernel doesn't detect any outage, no message ix0 down/up in log. (was about 7 sec - info from switch).
No errors appear in log, system running about 5 days from recommended patch.
Strange, but it looks like fully operable.
Comment 5 Jiri 2020-03-10 08:18:15 UTC
Two outages was observed. No kernel message, no log event. ix0 stop communicating, looking still up. Ifconfig up/down did resolve this issue.
May be bad network card, if nobody have this problem ?
Comment 6 Denis Ahrens 2020-03-27 02:55:43 UTC
looks like 235524 for me.

the igb interface will not survive iperf3 -t 300 for me.
Comment 7 Jiri 2020-03-27 09:19:00 UTC
Yes, agree. I have added the second X520 card, there is very low traffic (about some MBits) and no problem observed here. This timeout behavior probably depends on traffic. I.E. high traffic = problem.