Bug 260260

Summary: igb(4) I35{0,4} parrent <--> vlan jumbo frame mtu mismatch
Product: Base System Reporter: Marek Zarychta <zarychtam>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Only Me CC: kbowling, net, zlei
Priority: --- Keywords: regression
Version: 13.0-STABLE   
Hardware: amd64   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=268490

Description Marek Zarychta 2021-12-07 06:29:01 UTC
The vlan(4) children of LACP lagg(4) consistent of two ibg(4) I350 or I354 have to use reduced MTU size to work.

To reproduce:

ifconfig_igb0="mtu 9000 up"
ifconfig_igb1="mtu 9000 up"
ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1 -lacp_strict"
vlans_lagg0="vlan0 vlan1 ..."
ifconfig_vlan0="inet x.x.x.x/y"

# iperf3 -R -c y.y.y.y
Connecting to host y.y.y.y, port 5201
Reverse mode, remote host y.y.y.y is sending
[  5] local x.x.x.x port 52750 connected to y.y.y.y port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.02   sec  0.00 Bytes  0.00 bits/sec
[  5]   1.02-2.02   sec  0.00 Bytes  0.00 bits/sec
[  5]   2.02-3.02   sec  0.00 Bytes  0.00 bits/sec
[  5]   3.02-3.55   sec  0.00 Bytes  0.00 bits/sec

#ifconfig vlan0 mtu 8996

# iperf3 -R -c y.y.y.y
Connecting to host y.y.y.y, port 5201
Reverse mode, remote host y.y.y.y is sending
[  5] local x.x.x.x port 49056 connected to y.y.y.y port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec   118 MBytes   989 Mbits/sec
[  5]   1.00-2.00   sec   118 MBytes   990 Mbits/sec
[  5]   2.00-3.00   sec   118 MBytes   990 Mbits/sec
[  5]   3.00-3.69   sec  81.8 MBytes   989 Mbits/sec

There is no problem with sending jumbo frames, only receiving them is broken. It is not hardware limitation, since bumping MTU on parents also solves the issue and the configuration below is working fine:

ifconfig_igb0="mtu 9004 up"
ifconfig_igb1="mtu 9004 up"
ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1 -lacp_strict"
vlans_lagg0="vlan0 vlan1 ..."
ifconfig_vlan0="inet x.x.x.x/y mtu 9000"

The issue looks like either ibg(4) or maybe only I35{0,4} specific. I have more machines with em(4) running similar setups, but only a few of them, those with ibg(4) I35{0,4} NICs seem to be affected. Moreover, they all worked fine while running either 11.4-STABLE or even 12.1-STABLE at the beginning of 2021.
Comment 1 Marek Zarychta 2021-12-07 07:16:38 UTC
Last time tested on 13.0-STABLE stable/13-n248421-3b936a8c889 where the issue persists. It is also worth mentioning that turning off vlanmtu vlanhwtag vlanhwfilter vlanhwtso vlanhwcsum on parents doesn't solve it.
Comment 2 Zhenlei Huang freebsd_committer freebsd_triage 2021-12-07 10:07:50 UTC
So it is weird.

If VLANMTU is disabled on parent interface, MTU of VLAN interface will be limited off by 4 automatically. The MTU of vlans should be 8996 in your case.

Try the following steps:

1. Disabling all vlan hardware offloading features in rc.conf:

ifconfig_igb0="mtu 9000 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso -vlanhwcsum up"
ifconfig_igb1="mtu 9000 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso -vlanhwcsum up"

2. reboot

For the 2nd step, you could also destroy cloned interfaces and restart netif service.

# ifconfig vlan0 destroy
# ifconfig vlan1 destroy
# ifconfig ... destroy
# ifconfig lagg0 destroy
# service netif restart
Comment 3 Marek Zarychta 2021-12-07 12:24:13 UTC
(In reply to Zhenlei Huang from comment #2)
1. Indeed, bringing up intrefaces this way:
ifconfig_igb0="mtu 9000 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso -vlanhwcsum up"
ifconfig_igb1="mtu 9000 -vlanmtu -vlanhwtag -vlanhwfilter -vlanhwtso -vlanhwcsum up"
makes vlan(4) children brought up with mtu 8996, but IMHO it's not right solution of the problem.

2. I also tried to apply D33154 as suggested Özkan KIRIK on net@ mailing list:
> Please see the https://reviews.freebsd.org/D33154,
> igb driver doesn't honor the interface capabilities like vlanhwtag,
> vlan* and etc now. If you want, you can apply the patches from D33154

The patch unfortunately doesn't solve this issue.

FYI, in another machine, I have the setup described below working flawlessly with the same 13.0-STABLE revision.

em0: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000
	options=481049b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,VLAN_HWFILTER,NOMAP>

em1: flags=8963<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000
	options=481049b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,VLAN_HWFILTER,NOMAP>


lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 9000
	options=481049b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,VLAN_HWFILTER,NOMAP>
	ether 00:26:55:e4:d3:fa
	laggproto lacp lagghash l2,l3,l4
	laggport: em0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
	laggport: em1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>


vlan6: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000
	options=4000403<RXCSUM,TXCSUM,LRO,NOMAP>
	vlan: 7 vlanproto: 802.1q vlanpcp: 0 parent interface: lagg0
Comment 4 Marek Zarychta 2021-12-15 16:15:40 UTC
I was told to update the PR with some em vs igb details, so TL;DR:
1. em(4) works like before with the same MTU 9000 on parent em(4), lagg(4) and vlan(4) with the options: VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,VLAN_HWFILTER enabled.
2. igb(4) doesn't work this way and with options: VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWFILTER,VLAN_HWTSO enabled, the size of vlan(4) MTU has to be lowered by 4 bytes to unbreak receiving jumbo frames.
Comment 5 Zhenlei Huang freebsd_committer freebsd_triage 2021-12-17 10:10:14 UTC
Since the VLAN(4) works as expected when disabling VLAN hardware offloading features, it should be a bug in the driver of igb(4).
Comment 6 Kevin Bowling freebsd_committer freebsd_triage 2023-04-15 01:01:27 UTC
Since you have a fairly good idea of when this was introduced, do you have any suspect commits in e1000 or in the network stack?
Comment 7 Marek Zarychta 2023-04-15 08:47:59 UTC
No idea. Perhaps I am wrong. Now I recall that transitioned to stable/13 on affected machines relatively late, and the bug was reported here even later. I wonder if lagg(4) has anything to do with it, since in the report of bug 268490, using lagg(4) has also been noted. I am utilising lagg on all affected setups, think I can check if it's lagg dependent this weekend.
Comment 8 Marek Zarychta 2023-04-15 12:39:44 UTC
The bug seems to be not related to lagg(4). Moreover, I found out that it can be also worked around by only temporarily oversizing MTU by one byte. 

So to reproduce the problem issue this sequence of commands:

ifconfig igb3 mtu 9000
ifconfig vlan1500 create vlandev igb3 vlan 1500
ifconfig vlan1500 192.168.40.5/24

Now MTU for both igb3 and vlan1500 is set to 9000 but vlan1500 can only send large TCP segments. Receiving is broken.

But the real mystery is that after the sequence: 

ifconfig igb3 mtu 9001
ifconfig igb3 mtu 9000
ifconfig vlan1500 create vlandev igb3 vlan 1500
ifconfig vlan1500 192.168.40.5/24

Receiving large TCP segments for vlan1500 is working fine (MTU for both igb3 and vlan1500 is still set to the same value of 9000). 


I tried to investigate how can it be possible that when sibling setups are compared one of them is able to work without oversizing MTU of the parent but another one does not. Neither TSO nor LRO seems to be involved.

Here's how the segment looks at receiving side after applying a temporary MTU oversizing fix from the above:

 14:27:36.269125 ac:1f:6b:02:aa:bb > ac:1f:6b:02:aa:cc, ethertype IPv4 (0x0800), length 9014: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 9000)
    192.168.40.3.25218 > 192.168.40.5.5201: Flags [.], cksum 0x1a28 (correct), seq 835179525:835188473, ack 1, win 70, options [nop,nop,TS val 2309578650 ecr 1068877401], length 8948
Comment 9 Marek Zarychta 2023-04-15 13:10:57 UTC
It's also worth mentioning, that large UDP datagrams are received fine without any workaround, though the frames are shorter, at least in the case of testing with iperf3.

UDP:
15:00:45.609781 ac:1f:6b:02:aa:bb > ac:1f:6b:02:aa:cc, ethertype IPv4 (0x0800), length 9002: (tos 0x0, ttl 64, id 34971, offset 0, flags [none], proto UDP (17), length 8988)
    192.168.40.3.49459 > 192.168.40.5.5201: [udp sum ok] UDP, length 8960

TCP:
 14:27:36.269125 ac:1f:6b:02:aa:bb > ac:1f:6b:02:aa:cc, ethertype IPv4 (0x0800), length 9014: (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 9000)
    192.168.40.3.25218 > 192.168.40.5.5201: Flags [.], cksum 0x1a28 (correct), seq 835179525:835188473, ack 1, win 70, options [nop,nop,TS val 2309578650 ecr 1068877401], length 8948
Comment 10 Marek Zarychta 2023-04-16 08:07:51 UTC
The workaround mentioned in comment 8 can be applied only on I354, I350 requires permanent bumping of MTU for the parent by 4 bytes, so the issue still looks the same as it looked 1.5 years ago.