Bug 243463 - ix0: Watchdog timeout
Summary: ix0: Watchdog timeout
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net mailing list
URL:
Keywords: IntelNetworking, needs-qa
Depends on:
Blocks:
 
Reported: 2020-01-20 08:47 UTC by Jiri
Modified: 2020-01-27 15:42 UTC (History)
2 users (show)

See Also:
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jiri 2020-01-20 08:47:59 UTC
Hi all,
I observe some strange behavior of my NIC. Dual port Intel X520, only one port connected.
Real traffic about 400MBit RX / 100MBit TX, Supermicro X11SCL-F/Xeon E-2176G/2x8GB RAM, latest BIOS R1.2, acting as pf firewall/router. New install, running first day. 

ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 1455) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 1885) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 1062) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 1 desc avail: 34 pidx: 177) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 33 pidx: 1275) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 2014) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 707) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP
ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 653) -- resetting
ix0: link state changed to DOWN
ix0: link state changed to UP

ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500        options=8138b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER>
        ether a0:36:9f:26:fb:b8
        inet x.x.x.x netmask 0xfffffff8 broadcast y.y.y.y
        media: Ethernet autoselect (10Gbase-LR <full-duplex,rxpause,txpause>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        plugged: SFP/SFP+/SFP28 10G Base-LR (LC)
        vendor: Intel Corp PN: SFP-10G13-LR SN: IB81220374 DATE: 2018-12-20
        module temperature: 36.87 C Voltage: 3.28 Volts
        RX: 0.54 mW (-2.64 dBm) TX: 0.71 mW (-1.43 dBm)

ix0@pci0:1:0:0: class=0x020000 card=0x7b118086 chip=0x154d8086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet 10G 2P X520 Adapter'
    class      = network
    subclass   = ethernet

sysctl.conf
kern.ipc.maxsockbuf=16777216
net.inet.tcp.mssdflt=1460
net.inet.tcp.minmss=536

loader.conf
cc_htcp_load="YES"
machdep.hyperthreading_allowed="0"
net.inet.tcp.soreceive_stream="1"
net.isr.maxthreads="-1"
net.isr.bindthreads="-1"
net.pf.source_nodes_hashsize="1048576"

Jiri
Comment 1 Krzysztof Galazka 2020-01-21 09:34:11 UTC
(In reply to Jiri from comment #0)

Could you, please, check if applying this patch https://reviews.freebsd.org/D21712 has any influence? I would like to rule out that the watchdog timeouts are false positives.
Comment 2 Jiri 2020-01-21 09:58:07 UTC
(In reply to Krzysztof Galazka from comment #1)

Thank You,

O.K. I''l do it at night. Now, two days router running at the same traffic condition (no reboots, no config changes) no ix0 timeouts has appeared.

Jiri
Comment 3 Jiri 2020-01-23 14:54:02 UTC
I have had applied the patch. No timeouts or messages like "queue can't be marked as hung if interface is down" has appeared.
Next I'll try switch port shutdown and traffic torture. I'll let you know if something happened.
Jiri
Comment 4 Jiri 2020-01-27 15:42:57 UTC
I tried to switch on/off optical link to my ix0 - manually remove fibers. Kernel doesn't detect any outage, no message ix0 down/up in log. (was about 7 sec - info from switch).
No errors appear in log, system running about 5 days from recommended patch.
Strange, but it looks like fully operable.