Hi all, I observe some strange behavior of my NIC. Dual port Intel X520, only one port connected. Real traffic about 400MBit RX / 100MBit TX, Supermicro X11SCL-F/Xeon E-2176G/2x8GB RAM, latest BIOS R1.2, acting as pf firewall/router. New install, running first day. ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 1455) -- resetting ix0: link state changed to DOWN ix0: link state changed to UP ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 1885) -- resetting ix0: link state changed to DOWN ix0: link state changed to UP ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 1062) -- resetting ix0: link state changed to DOWN ix0: link state changed to UP ix0: Watchdog timeout (TX: 1 desc avail: 34 pidx: 177) -- resetting ix0: link state changed to DOWN ix0: link state changed to UP ix0: Watchdog timeout (TX: 0 desc avail: 33 pidx: 1275) -- resetting ix0: link state changed to DOWN ix0: link state changed to UP ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 2014) -- resetting ix0: link state changed to DOWN ix0: link state changed to UP ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 707) -- resetting ix0: link state changed to DOWN ix0: link state changed to UP ix0: Watchdog timeout (TX: 0 desc avail: 34 pidx: 653) -- resetting ix0: link state changed to DOWN ix0: link state changed to UP ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=8138b8<VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER> ether a0:36:9f:26:fb:b8 inet x.x.x.x netmask 0xfffffff8 broadcast y.y.y.y media: Ethernet autoselect (10Gbase-LR <full-duplex,rxpause,txpause>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> plugged: SFP/SFP+/SFP28 10G Base-LR (LC) vendor: Intel Corp PN: SFP-10G13-LR SN: IB81220374 DATE: 2018-12-20 module temperature: 36.87 C Voltage: 3.28 Volts RX: 0.54 mW (-2.64 dBm) TX: 0.71 mW (-1.43 dBm) ix0@pci0:1:0:0: class=0x020000 card=0x7b118086 chip=0x154d8086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'Ethernet 10G 2P X520 Adapter' class = network subclass = ethernet sysctl.conf kern.ipc.maxsockbuf=16777216 net.inet.tcp.mssdflt=1460 net.inet.tcp.minmss=536 loader.conf cc_htcp_load="YES" machdep.hyperthreading_allowed="0" net.inet.tcp.soreceive_stream="1" net.isr.maxthreads="-1" net.isr.bindthreads="-1" net.pf.source_nodes_hashsize="1048576" Jiri
(In reply to Jiri from comment #0) Could you, please, check if applying this patch https://reviews.freebsd.org/D21712 has any influence? I would like to rule out that the watchdog timeouts are false positives.
(In reply to Krzysztof Galazka from comment #1) Thank You, O.K. I''l do it at night. Now, two days router running at the same traffic condition (no reboots, no config changes) no ix0 timeouts has appeared. Jiri
I have had applied the patch. No timeouts or messages like "queue can't be marked as hung if interface is down" has appeared. Next I'll try switch port shutdown and traffic torture. I'll let you know if something happened. Jiri
I tried to switch on/off optical link to my ix0 - manually remove fibers. Kernel doesn't detect any outage, no message ix0 down/up in log. (was about 7 sec - info from switch). No errors appear in log, system running about 5 days from recommended patch. Strange, but it looks like fully operable.
Two outages was observed. No kernel message, no log event. ix0 stop communicating, looking still up. Ifconfig up/down did resolve this issue. May be bad network card, if nobody have this problem ?
looks like 235524 for me. the igb interface will not survive iperf3 -t 300 for me.
Yes, agree. I have added the second X520 card, there is very low traffic (about some MBits) and no problem observed here. This timeout behavior probably depends on traffic. I.E. high traffic = problem.
Some new information. Kernel patched to P3. May be not problem in Intel driver. I change Intel NIC to Mellanox ConnectX4 NIC to wish solve traffic outages. There is running cron script to test connectivity to gateway an down/up interface. Log attached: Fri Apr 10 19:57:13 CEST 2020 interface ix0 restart Fri Apr 10 20:38:13 CEST 2020 interface ix0 restart Sat Apr 11 17:45:13 CEST 2020 interface ix0 restart Sat Apr 11 20:00:13 CEST 2020 interface ix0 restart Sat Apr 11 20:30:13 CEST 2020 interface ix0 restart Sun Apr 12 19:16:13 CEST 2020 interface ix0 restart Sun Apr 12 20:30:13 CEST 2020 interface ix0 restart Sun Apr 26 00:27:13 CEST 2020 interface mce0 restart Sun Apr 26 04:48:13 CEST 2020 interface mce0 restart Sun Apr 26 11:12:13 CEST 2020 interface mce0 restart Wed Apr 29 21:27:13 CEST 2020 interface mce0 restart Wed Apr 29 21:33:13 CEST 2020 interface mce0 restart After changing network card from Apr 12 to Apr 26 no problems appears. But from Apr 26 on the Mellanox card traffic not stop, but 80% packet loss appear. Interface down/up solve this issue, like in Intel case. Other servers in my network have connectivity to gateway O.K., there is no connectivity/switch issues. Traffic max about 1,3GBit/s on the NIC. On Mellanox NIC this log event appear "arpresolve: can't allocate llinfo for xxx.xxx.xxx.xxx on mce0" Jiri
Please reopen if you think there is still an issue with ixgbe.