Bug 217287 - if_em: "Off by 8" error in network streams under -CURRENT as of roughly Feb 1
Summary: if_em: "Off by 8" error in network streams under -CURRENT as of roughly Feb 1
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs mailing list
URL: https://github.com/trueos/trueos-core...
Keywords: IntelNetworking, iflib, regression
Depends on:
Blocks:
 
Reported: 2017-02-22 05:43 UTC by Jeffrey Baitis
Modified: 2017-11-07 08:04 UTC (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeffrey Baitis 2017-02-22 05:43:28 UTC
Originally discovered in TrueOS UNSTABLE and tested / verified in vanilla FreeBSD kernel source as of 12.0-CURRENT FreeBSD 12.0-CURRENT #1 7a0e1ff53(master) -- originally recorded at https://github.com/trueos/trueos-core/issues/327

Summary:

Corruption observed in network data within socket stream resulting in changes of a +8 value added, at seemingly random intervals, to bytes within the stream

Hardware:

  CPU: Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz

  Selected data from `lspci -v`:
    00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v3 Processor DRAM Controller (rev 06)
    00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 04)
    00:1f.0 ISA bridge: Intel Corporation C226 Series Chipset Family Server Advanced SKU LPC Controller (rev 04)
    00:1f.3 SMBus: Intel Corporation 8 Series/C220 Series Chipset Family SMBus Controller (rev 04)
    01:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 730] (rev a1)

Steps to reproduce:

$ curl http://norvig.com/big.txt -o big.txt
$ curl http://norvig.com/big.txt -o big2.txt
$ sha256 big.txt 
SHA256 (big.txt) = a36fe438864ad8c7b76ca310c2d1176689bbf79536f84338ebd3dc253997efd5
$ sha256 big2.txt 
SHA256 (big2.txt) = 71f0fa4bf8585feae457dbfc0b48dd6dfef8dcd03a07220a5754e0876b9e4efc

`diff -u big.txt big2.txt` results in lines such as:

-There was a movement and an mxclamation from my right, and peering through the gloom, I saw Whitney, pale, haggard, and unkempt, staring out at me.
+There was a movement and an exclamation from my right, and peering through the gloom, I saw Whitney, pale, haggard, and unkempt, staring out at me.   


-" 'You may as well face the matter,' said I; 'you have been caught in the act, and no confession could make your guilt more heinous. If you but make such reparation as is in your power, by telling us where the beryls are, all shall be forgiven and forgotten.'
+" 'You may as well face the matter,' said I; 'you have been caught in the act, and no confession could make your guilt more heino}s. If you but make such reparation as is in your power, by telling us where the beryls are, all shall be forgiven and forgotten.'

Diagnostic:

+>>> ord('m') - ord('e')
8
+>>> ord('}') - ord('u')
8

Last working state:

The last working version in -CURRENT occurs somewhere prior to commit 
'8f3781173d79d5b83e19f59b10b54263976dd66e' which was merged into the TrueOS "drm-next" branch on Jan 27.
Comment 1 Jeffrey Baitis 2017-02-22 05:53:48 UTC
Upon further inspection, the WORKING kernel was built on Wed Jan 18:

$ uname -a
FreeBSD raid.baitis.home 12.0-CURRENT FreeBSD 12.0-CURRENT #18 4f888bf(drm-next): Wed Jan 18 14:31:26 UTC 2017     root@gauntlet:/usr/obj/usr/src/sys/GENERIC  amd64

So, changes in -CURRENT made after Jan 18 and before Feb 1 were likely to have precipitated this problem.
Comment 2 Eric Joyner freebsd_committer 2017-03-09 18:14:40 UTC
Have you tried a non-i217 em device, or turning off TSO and/or TXCSUM? Getting random bits inserted sounds like it might be the hardware trying to do an offload, but doing some part of it incorrectly.
Comment 3 Eric Joyner freebsd_committer 2017-03-09 18:41:44 UTC
I don't really see anything that affects i217 devices in the commits between Jan 18 and Feb 1, except for this: https://svnweb.freebsd.org/base/head/sys/dev/e1000/e1000_ich8lan.c?r1=312426&r2=312427&

Want to try reverting the hunk starting at line 1720? I think lpt is i217/i218.
Comment 4 Jeffrey Baitis 2017-04-30 15:04:22 UTC
I've attempted the following:
* Disabling the rx and txcsum has no effect on the problem
* Disabling the hwtso has no effect on the problem
* I am unable to disable vlanhwcsum as the command seems to have no effect
* Removing the virtual box network drivers has no effect

I will next attempt to cherry-pick
Comment 5 Jeffrey Baitis 2017-11-07 08:04:24 UTC
It turns out that this bug is either due to an interaction between my wireless router and my FreeBSD system, or was masked by TrueOS' default network configuration that may have undone some of my own customization.

More likely is that after upgrading the wireless router from OpenWrt to the latest version of LEDE, I was no longer able to reproduce this issue.

I'm therefore closing this and very much enjoying 12.0-CURRENT.