Bug 241785

Summary: ix(4): Creating vlan over lagg causes flapping with Intel drivers and iflib
Product: Base System Reporter: Konrad <konrad.kreciwilk>
Component: kernAssignee: Eric Joyner <erj>
Status: Closed FIXED    
Severity: Affects Some People CC: afedorov, backdoor, emaste, net, nonesuch, rudolphfroger
Priority: --- Flags: erj: mfc-stable12+
koobs: mfc-stable11-
Version: 12.1-RELEASE   
Hardware: amd64   
OS: Any   
URL: https://reviews.freebsd.org/D24659
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240818

Description Konrad 2019-11-07 17:40:50 UTC
Hello,

Starting of 12-STABLE (I do not know revision numer) up to 12.1-RELEASE I notice a problem with vlans over laggport. I have INTEL 82599 (ix0, ix1) and create a lagg(LACP):

root@:~ # ifconfig lagg0 create
root@:~ # ifconfig lagg0 laggproto lacp laggport ix0 laggport ix1

and vlans:
root@:~ # ifconfig vlan3960 create
root@:~ # ifconfig vlan3960 vlan 3960 vlandev lagg0

ifconfig output:

ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
	ether a0:36:9f:1d:db:4c
	media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ix1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
	ether a0:36:9f:1d:db:4c
	hwaddr a0:36:9f:1d:db:4e
	media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x7
	inet 127.0.0.1 netmask 0xff000000
	groups: lo
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
	ether a0:36:9f:1d:db:4c
	laggproto lacp lagghash l2,l3,l4
	laggport: ix0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
	laggport: ix1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
	groups: lagg
	media: Ethernet autoselect
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
vlan3960: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=600703<RXCSUM,TXCSUM,TSO4,TSO6,LRO,RXCSUM_IPV6,TXCSUM_IPV6>
	ether a0:36:9f:1d:db:4c
	inet 212.127.xx.xx netmask 0xfffffffc broadcast 212.127.92.255
	groups: vlan
	vlan: 3960 vlanpcp: 0 parent interface: lagg0
	media: Ethernet autoselect
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

next, creating new vlan causes a port flapping (ix0,ix1 and whole lagg0):

root@:~ # ifconfig vlan100 create
root@:~ # ifconfig vlan100 vlan 100 vlandev lagg0

root@:~ # ifconfig vlan200 create
root@:~ # ifconfig vlan200 vlan 200 vlandev lagg0

dmesg output:

vlan3960: link state changed to UP
ix0: link state changed to DOWN
ix1: link state changed to DOWN
lagg0: link state changed to DOWN
vlan3960: link state changed to DOWN
vlan100: link state changed to DOWN
ix0: link state changed to UP
ix1: link state changed to UP
lagg0: link state changed to UP
vlan3960: link state changed to UP
vlan100: link state changed to UP
ix0: link state changed to DOWN
ix1: link state changed to DOWN
lagg0: link state changed to DOWN
vlan3960: link state changed to DOWN
vlan100: link state changed to DOWN
vlan200: link state changed to DOWN
ix0: link state changed to UP
ix1: link state changed to UP
lagg0: link state changed to UP
vlan3960: link state changed to UP
vlan100: link state changed to UP
vlan200: link state changed to UP

Its happend for example on 12.0-STABLE #0 r344658M, 12.1-RELEASE. I use a diffrent hardware, only networkcard (based on INTEL 82599 chipset) is the same.
Comment 1 Aleksandr Fedorov freebsd_committer freebsd_triage 2019-11-09 11:53:39 UTC
This is a known issue with iflib + intel drivers:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240818
https://lists.freebsd.org/pipermail/freebsd-net/2018-November/052184.html

We also use vlan + lagg + ix and we often need to add/remove vlans, so as a temporary solution we disable vlanhwfilter on lagg interface.
Comment 2 Aleksandr Fedorov freebsd_committer freebsd_triage 2020-05-06 15:12:57 UTC
Need help testing this patch: https://reviews.freebsd.org/D24659
Comment 3 commit-hook freebsd_committer freebsd_triage 2020-05-11 17:42:18 UTC
A commit references this bug:

Author: erj
Date: Mon May 11 17:42:06 UTC 2020
New revision: 360902
URL: https://svnweb.freebsd.org/changeset/base/360902

Log:
  em/ix/ixv/ixl/iavf: Implement ifdi_needs_restart iflib method

  Pursuant to r360398, implement driver-specific versions of the
  ifdi_needs_restart iflib device method.

  Some (if not most?) Intel network cards don't need reinitializing when a
  VLAN is added or removed from the device hardware, so these implement
  ifdi_needs_restart in a way that tell iflib not to bring the interface
  up or down when a VLAN is added or removed, regardless of whether the
  VLAN_HWFILTER interface capability flag is set or not.

  This could potentially solve several PRs relating to link flaps that
  occur when VLANs are added/removed to devices.

  Signed-off-by: Eric Joyner <erj@freebsd.org>

  PR:		240818, 241785
  Reviewed by:	gallatin@, olivier@
  MFC after:	3 days
  MFC with:	r360398
  Sponsored by:	Intel Corporation
  Differential Revision:	https://reviews.freebsd.org/D24659

Changes:
  head/sys/dev/e1000/if_em.c
  head/sys/dev/ixgbe/if_ix.c
  head/sys/dev/ixgbe/if_ixv.c
  head/sys/dev/ixl/if_iavf.c
  head/sys/dev/ixl/if_ixl.c
Comment 4 commit-hook freebsd_committer freebsd_triage 2020-05-14 19:57:46 UTC
A commit references this bug:

Author: erj
Date: Thu May 14 19:56:56 UTC 2020
New revision: 361053
URL: https://svnweb.freebsd.org/changeset/base/361053

Log:
  MFC r360398 and r360902

  These commits introduce a new iflib device-dependent method and
  implements that method in the Intel ethernet network drivers;
  this method tells iflib if the network interface needs to be
  restarted when certain events happen.

  This fixes several issues that occur when VLANs are registered
  or unregistered with the network interface.

  PR:		240818, 241785
  Sponsored by:	Intel Corporation

Changes:
_U  stable/12/
  stable/12/sys/dev/e1000/if_em.c
  stable/12/sys/dev/ixgbe/if_ix.c
  stable/12/sys/dev/ixgbe/if_ixv.c
  stable/12/sys/dev/ixl/if_iavf.c
  stable/12/sys/dev/ixl/if_ixl.c
  stable/12/sys/net/ifdi_if.m
  stable/12/sys/net/iflib.c
  stable/12/sys/net/iflib.h
Comment 5 Kubilay Kocak freebsd_committer freebsd_triage 2020-05-19 02:35:32 UTC
^Triage: 

 - Assign to committer resolving
 - Assume stable/11 isn't getting an MFC given only stable/12 has iflib. If this is not the case, set mfc-stable11 to ? and set to + when merged
Comment 6 nonesuch 2020-05-20 18:13:41 UTC
I have a follow up to this issue. 

I am seeing this problem on non-intel non-iflib cards as well , on 12.1-STABLE .

In my case I have solarflare cards setup with a LACP lagg with vlans on top. 

I am sporadically seeing the lagg members flap for reasons I can not establish.

Could this be related ?
Comment 7 Eric Joyner freebsd_committer freebsd_triage 2020-05-21 19:35:20 UTC
(In reply to nonesuch from comment #6)

I'd want to say that if it's non-Intel and/or non-iflib, then the problem isn't the same as this one. It's possible there's a similar problem that's contained in the Solarflare driver, though, but I have never looked at that driver.
Comment 8 Backdoor 2020-11-28 21:05:01 UTC
i noticed this too on 12.1
is this a related issue the bug ?

lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP
em3: Interface stopped DISTRIBUTING, possible flapping
em2: Interface stopped DISTRIBUTING, possible flapping
lagg0: link state changed to DOWN
lagg0: link state changed to UP

when i switch from LACP to fail over the possible flapping error disappear.
Comment 9 Mark Linimon freebsd_committer freebsd_triage 2024-01-02 03:23:38 UTC
^Triage: the 12 branch is now out of support.

Apparently committed back in 2020.