Summary: | tso seems broken on RELENG10 for version 7.4.2 of em driver | ||
---|---|---|---|
Product: | Base System | Reporter: | mike |
Component: | kern | Assignee: | freebsd-net (Nobody) <net> |
Status: | Closed FIXED | ||
Severity: | Affects Many People | CC: | Karli.Sjoberg, bsd, cstef, erj, fredrikb, jfv, meyer.sydney, sbruno |
Priority: | --- | Keywords: | IntelNetworking |
Version: | 10.0-STABLE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
mike
2014-09-21 01:00:30 UTC
Same here. I run into problems sending TCP data bigger than ~4 KB when the TCO option is enabled (which is the default!). See https://forums.freebsd.org/threads/error-sending-tcp-data-4kb.50431 for the details. The problems occur with both an up-to-date i386 kernel and an up-to-date amd64 kernel (10.1-STABLE r278696). dmesg reports: em0: <Intel(R) PRO/1000 Network Connection 7.4.2> port 0xf080-0xf09f mem 0xf7f00000-0xf7f1ffff,0xf7f3c000-0xf7f3cfff irq 20 at device 25.0 on pci0 em0: Using an MSI interrupt # ifconfig -v em0 em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=4009b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,VLAN_HWTSO> ether ... inet ... netmask ... broadcast ... nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (100baseTX <full-duplex>) status: active This is an onboard LAN controller (I218-V according to the manual) on an MSI Z97S SLI Plus motherboard (Intel Z97 Express chipset). The importance has already been set to "Affects Many People". This is probably true, and it is difficult for affected users to get help as they cannot send data. To make it even worse, the symptoms (error messages such as "The server may be unavailable or is refusing SMTP connections") do not reveal the real cause of the problem. Same here, on a Supermicro X9SRL-F. It would seem for us that iSCSI is more affected by this than NFS is: Feb 23 12:18:20 <fileserver> kernel: WARNING: <ipaddress> (<iqn>): no ping reply (NOP-Out) after 5 seconds; dropping connection Feb 23 12:18:20 <fileserver> kernel: em1: link state changed to UP Feb 23 12:18:25 <fileserver> kernel: Feb 23 12:18:25 <fileserver> ctld[9389]: <ipaddress>: read: connection lost Feb 23 13:26:55 <fileserver> kernel: WARNING: <ipaddress> (<iqn>): no ping reply (NOP-Out) after 5 seconds; dropping connection Feb 23 13:26:56 <fileserver> kernel: em0: link state changed to UP Feb 23 13:26:56 <fileserver> kernel: WARNING: <ipaddress> (<iqn>): no ping reply (NOP-Out) after 5 seconds; dropping connection Feb 23 13:26:59 <fileserver> kernel: WARNING: <ipaddress> (<iqn>): connection error; dropping connection Feb 23 14:46:23 <fileserver> kernel: WARNING: <ipaddress> (<iqn>): no ping reply (NOP-Out) after 5 seconds; dropping connection Feb 23 14:46:23 <fileserver> kernel: em1: link state changed to UP Feb 23 14:46:24 <fileserver> kernel: WARNING: <ipaddress> (<iqn>): no ping reply (NOP-Out) after 5 seconds; dropping connection Feb 23 14:46:30 <fileserver> kernel: Feb 23 14:46:30 <fileserver> ctld[36377]: <ipaddress>: read: connection lost Feb 23 15:13:40 <fileserver> kernel: em0: link state changed to UP Feb 23 15:13:40 <fileserver> kernel: WARNING: <ipaddress> (<iqn>): no ping reply (NOP-Out) after 5 seconds; dropping connection Feb 23 15:20:31 <fileserver> kernel: WARNING: <ipaddress> (<iqn>): no ping reply (NOP-Out) after 5 seconds; dropping connection Feb 23 15:20:31 <fileserver> kernel: em1: link state changed to UP Feb 23 15:20:31 <fileserver> kernel: WARNING: <ipaddress> (<iqn>): no ping reply (NOP-Out) after 5 seconds; dropping connection Feb 22 03:01:36 <fileserver> kernel: em0: Watchdog timeout -- resetting Feb 22 03:01:36 <fileserver> kernel: em0: Queue(0) tdh = 669, hw tdt = 623 Feb 22 03:01:36 <fileserver> kernel: em0: TX(0) desc avail = 32,Next TX to Clean = 655 # pciconf -lvcb em0 em0@pci0:9:0:0: class=0x020000 card=0x000015d9 chip=0x10d38086 rev=0x00 hdr=0x00 vendor = 'Intel Corporation' device = '82574L Gigabit Network Connection' class = network subclass = ethernet bar [10] = type Memory, range 32, base rxfbe00000, size 131072, enabled bar [18] = type I/O Port, range 32, base rxa000, size 32, enabled bar [1c] = type Memory, range 32, base rxfbe20000, size 16384, enabled cap 01[c8] = powerspec 2 supports D0 D3 current D0 cap 05[d0] = MSI supports 1 message, 64 bit cap 10[e0] = PCI-Express 1 endpoint max data 128(256) link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1) cap 11[a0] = MSI-X supports 5 messages, enabled Table in map 0x1c[0x0], PBA in map 0x1c[0x2000] ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected ecap 0003[140] = Serial 1 0cc47affff0cc034 FreeBSD 10.1-STABLE #0 r278568M Disabling TSO seems to have stabilized it, for now. /K Moving to -net and CCing Jack and Eric from Intel. https://lists.freebsd.org/pipermail/freebsd-stable/2014-September/080088.html has some insights / discussion on it from Rick M I've committed and updated enhancements to the watchdog handler and significant error handlers for this specific chipset to em(4). In addition, the EM_MULTIQUEUE kernel conf configuration is available to turn on the 2 queues in the card. If you feel like testing these, let me know. https://reviews.freebsd.org/D3192 I think we have a good fix for this problem, if you guys have time to validate my findings, please do. Please retest this issue with the 10.3 BETA version of the em(4) driver. There's a bit of stuff around DMA that got sorted out finally and hopefully for good that should make this problem a solved one. Thanks, I was working with marius@ to test on certain hardware and media configs and it seems to have resolved this issue. See the discussion in the thread https://lists.freebsd.org/pipermail/freebsd-stable/2016-January/084028.html |