There appears to be a regression in e1000-based drivers (em(4), igb(4)) which intermittently causes data corruption with rxcsum enabled. I do not have an exact timeframe, but it appears to be roughly within the past two months or so. When building arm snapshots, math/mpfr is a dependent port which, for the past several weeks had caused build failures due to a checksum mismatch. It is not 100% reproducible, but this port specifically has been the consistent cause of the build failures. When looking into the root cause of the checksum mismatch, I noticed that manually fetching the distfile for the math/mpfr port did not always produce an incorrect checksum. Looking closer at several different machines running various versions of 12-CURRENT ranging from September through early November, some with different NICs (bge(4) being the most common non-e1000 NIC), it became apparent that the key common element for the intermittent corrupted distfile was the network driver. Looking further, I discovered that when rxcsum is turned off via ifconfig(8), the problem is no longer reproducible. It should be noted that the failures have not been observed with other ports, and I am uncertain what about the master site for math/mpfr is special in this regard, but this testing has been repeated numerous times with similar results. With rxcsum enabled: # ifconfig igb0 igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e505bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> # for i in $(seq 1 9); do fetch -4 -o ${i}.tar.xz http://www.mpfr.org/mpfr-current/mpfr-3.1.6.tar.xz; sha256 -q ${i}.tar.xz; done 1.tar.xz 100% of 1107 kB 585 kBps 00m02s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 2.tar.xz 100% of 1107 kB 597 kBps 00m02s 3cf25a685c0dda614e320e7263299f2897425a55302a51bc69a2df449d7d34a6 3.tar.xz 100% of 1107 kB 704 kBps 00m02s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 4.tar.xz 100% of 1107 kB 565 kBps 00m02s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 5.tar.xz 100% of 1107 kB 627 kBps 00m02s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 6.tar.xz 100% of 1107 kB 633 kBps 00m02s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 7.tar.xz 100% of 1107 kB 637 kBps 00m02s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 8.tar.xz 100% of 1107 kB 290 kBps 00m04s 8f429167a0225c62063b62f3fe6eff70a0dafcdee15d3ead7d90bb533d1b64bd 9.tar.xz 100% of 1107 kB 492 kBps 00m02s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 With rxcsum disabled: # ifconfig igb0 igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=c505ba<TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,LRO,VLAN_HWFILTER,VLAN_HWTSO,TXCSUM_IPV6> # for i in $(seq 1 9); do fetch -4 -o ${i}.tar.xz http://www.mpfr.org/mpfr-current/mpfr-3.1.6.tar.xz; sha256 -q ${i}.tar.xz; done 1.tar.xz 100% of 1107 kB 202 kBps 00m06s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 2.tar.xz 100% of 1107 kB 269 kBps 00m04s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 3.tar.xz 100% of 1107 kB 445 kBps 00m03s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 4.tar.xz 100% of 1107 kB 127 kBps 00m09s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 5.tar.xz 100% of 1107 kB 384 kBps 00m03s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 6.tar.xz 100% of 1107 kB 182 kBps 00m06s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 7.tar.xz 100% of 1107 kB 451 kBps 00m02s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 8.tar.xz 100% of 1107 kB 310 kBps 00m03s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950 9.tar.xz 100% of 1107 kB 223 kBps 00m05s 7a62ac1a04408614fccdc506e4844b10cf0ad2c2b1677097f8f35d3a1344a950
Gimme a pciconf -lvbc so I know what devices I'm dealing with here.
# pciconf -lvbc igb0 igb0@pci0:3:0:0: class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = 'I350 Gigabit Network Connection' class = network subclass = ethernet bar [10] = type Memory, range 32, base rxc7120000, size 131072, enabled bar [18] = type I/O Port, range 32, base rx6020, size 32, enabled bar [1c] = type Memory, range 32, base rxc7144000, size 16384, enabled cap 01[40] = powerspec 3 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit, vector masks cap 11[70] = MSI-X supports 10 messages, enabled Table in map 0x1c[0x0], PBA in map 0x1c[0x2000] cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR NS link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1) ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected ecap 0003[140] = Serial 1 0cc47affffd8b808 ecap 000e[150] = ARI 1 ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled 0 VFs configured out of 8 supported First VF RID Offset 0x0180, VF RID Stride 0x0004 VF Device ID 0x1520 Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304 ecap 0017[1a0] = TPH Requester 1 ecap 0018[1c0] = LTR 1 ecap 000d[1d0] = ACS 1
A commit references this bug: Author: shurd Date: Thu Dec 21 01:22:36 UTC 2017 New revision: 327052 URL: https://svnweb.freebsd.org/changeset/base/327052 Log: Don't call tcp_lro_rx() unless hardware verified TCP/UDP csum It seems that tcp_lro_rx() doesn't verify TCP checksums, so if there are bad checksums in the packets caused by invalid data, the invalid data will pass through without errors. This was noticed with the igb driver and a specific internet host: fetch http://www.mpfr.org/mpfr-current/mpfr-3.1.6.tar.xz -o test.bin && sha256 test.bin Would result in a different value sometimes. This ends up making LRO require RXCSUM to be enabled, and RXCSUM to support TCP and UDP checksums. PR: 224346 Reported by: gjb Reviewed by: sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D13561 Changes: head/sys/net/iflib.c