Summary: | [ixgbe] 10gigabit networking problems | ||
---|---|---|---|
Product: | Base System | Reporter: | pataki.antal |
Component: | kern | Assignee: | jfv |
Status: | Closed Overcome By Events | ||
Severity: | Affects Only Me | CC: | adrian, kevin, sbruno |
Priority: | Normal | Keywords: | IntelNetworking |
Version: | Unspecified | ||
Hardware: | Any | ||
OS: | Any |
Description
pataki.antal
2013-10-28 11:10:00 UTC
why is this non-critical? the other side drops the connection because of this, this is very = critical for example if the bogous system is a storage...= Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s). Responsible Changed From-To: freebsd-net->jfv Hi, Jack, Some FreeNAS users [1] have encountered similar issue too, can you take a look at this one? Thanks in advance! [1] https://bugs.freenas.org/issues/4560 To keep you in the loop; I'm having a very similar problem in 10.0-RELEASE We've made some headway - Disabling TSO (ifconfig ix0 -tso) seems to avoid the symptom, but of course that's just a temporary fix. Try it, and see if you have stability again. The discussion is the freebsd-net mailing list, at http://lists.freebsd.org/pipermail/freebsd-net/2014-March/038061.html It's a bit long, but follow along as it may help your situation. I hope to test changes to the TSO code tomorrow. I am seeing this too on 10.0-RELEASE. Disabling TSO doesn't seem to help it either. The server was undergoing fairly heavy load related to ZFS at the time . Network was fairly quiet since the NFS connections I did have ended up hanging. System specs: FreeBSD 10.0-RELEASE-p1 #3 r264309: Wed Apr 9 17:01:09 PDT 2014 2x Opteron 6128 (16 total cores) 128GB RAM Intel X520 NIC ~22TB ZFS filesystem I'm seeing this two on one of our busier boxes running 10.0-RELEASE. The ix device worked okay at first, but under heavy load we'd see things like: # ping x.x.x.x PING x.x.x.x (x.x.x.x): 56 data bytes 64 bytes from x.x.x.x: icmp_seq=0 ttl=61 time=55.950 ms ping: sendto: File too large 64 bytes from x.x.x.x: icmp_seq=2 ttl=61 time=55.972 ms ping: sendto: File too large 64 bytes from x.x.x.x: icmp_seq=4 ttl=61 time=55.944 ms ping: sendto: File too large TCP traffic seemed unaffected, but things that used UDP like NFS or NTP got it too: ntpd[46659]: sendto(204.9.54.119) (fd=26): File too large lldpd_FreeBSD_amd64[1407]: unable to send packet on real device for ix0: File too large I set -tso and -vlanhwtso, and that didn't immediately help. I then set the interface down/up and that seemed to immediately fix it. Not sure yet if it's a permanent fix or if it'll return after a while. Any debugging we can do to help with this if it returns? dev.ix.0.%desc: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15 dev.ix.0.%driver: ix dev.ix.0.%location: slot=0 function=0 dev.ix.0.%pnpinfo: vendor=0x8086 device=0x10fb subvendor=0x8086 subdevice=0x0006 class=0x020000 Hi, This should've been fixed in -HEAD, -10 and -9 for at least the ixgbe NICs. Log: MFC: r264630 For NFS mounts using rsize,wsize=65536 over TSO enabled network interfaces limited to 32 transmit segments, there are two known issues. The more serious one is that for an I/O of slightly less than 64K, the net device driver prepends an ethernet header, resulting in a TSO segment slightly larger than 64K. Since m_defrag() copies this into 33 mbuf clusters, the transmit fails with EFBIG. A tester indicated observing a similar failure using iSCSI. The second less critical problem is that the network device driver must copy the mbuf chain via m_defrag() (m_collapse() is not sufficient), resulting in measurable overhead. This patch reduces the default size of if_hw_tsomax slightly, so that the first issue is avoided. Fixing the second issue will require a way for the network device driver to inform tcp_output() that it is limited to 32 transmit segments. HEAD: 264630 -10: r265414 -9: r265292 additionally there were some issues with the way mbufs were repacked, resulting in EFBIG being returned. I'm not sure where/when that was fixed - search for 'NFS client READ performance on -current' on the freebsd-net mailing list. This looks to be fixed in 10.2r and head. If not, please reopen the ticket. |