Created attachment 207816 [details] network configuration in rc.conf After upgrading from 11.3-RELEASE-p3 to 12.0-RELEASE-p10 vlan-interfaces on lagg0 (em0, em1) stopped working. Vlan interfaces look like working but not working. Network configuration is: em0 and em1 aggregated to lagg0. Based on lagg0 created vlan-interfaces. On the other end is a cisco switch (port channel from two ethernet ports, the mode is LACP). The network configuration worked on 9.x-releng, 10.x-releng, 11.2-releng, 11.3-releng. And does not work on 12.0-releng, 12-stable, 12.1-PRERELEASE. The lagg0 interface works. But does not work vlans on lagg0. Looks like that untagged ethernet frames passed from/to lagg0, but tagged ethernet frames not passed. After downgrading to 11.3-releng vlans on lagg0 works again. Kernel of the system based on GENERIC, i.e "device vlan" is in the kernel configuration. See rc.conf, console.log in attachment.
Created attachment 207817 [details] console.log
From console.log at boot time: lagg0.11: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1496 options=403<RXCSUM,TXCSUM,LRO> ether 00:e0:81:ba:ad:90 inet xx.xx.170.82 netmask 0xfffffff0 broadcast xx.xx.170.95 groups: vlan vlan: 11 vlanpcp: 0 parent interface: lagg0 media: Ethernet autoselect status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> mtu=1496 for lagg0.11 (while mtu=1500 for lagg0) means that vlan was created while lagg0 had no registered members with hardware vlan support. This is very strange. It looks like some race condition at boot time between internal lagg configuration and another ifconfig process creating vlan over lagg. Adding CC: for some people that worked with lagg(4) code recently.
However if destroy the broken vlan-interface and lagg-interface after FreeBSD booting, and create them again, the vlan-interface works.
Does this also happen if you disable devd. When network interfaces are created devd will receive an event and do some configuration in the background. This will typically race with netstart .
Feedback timeout over 6 weeks. Feel free to re-open the PR if you have additional information.
As you can see in the original console.log devd started: > Sep 25 18:39:00 <console.info> gw1 kernel: Starting devd.
Vlans over lagg0 does not work with devd and without devd (devd_enable="NO" in /etc/rc.conf)
with devd: lagg0.11: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1496 options=403<RXCSUM,TXCSUM,LRO> ether 00:e0:81:ba:ad:90 inet xx.xx.170.82 netmask 0xfffffff0 broadcast xx.xx.170.95 groups: vlan vlan: 11 vlanpcp: 0 parent interface: lagg0 media: Ethernet autoselect status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> without devd (devd_enable="NO"): lagg0.11: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1496 options=403<RXCSUM,TXCSUM,LRO> ether 00:e0:81:ba:ad:90 inet xx.xx.170.82 netmask 0xfffffff0 broadcast xx.xx.170.95 groups: vlan vlan: 11 vlanpcp: 0 parent interface: lagg0 media: Ethernet autoselect status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
I just updated the system to 12.1-RELEASE. Problem still exists. After reboot, the interface lagg0.11 does not work. With manual creation (ifconfig lagg0 create, etc...) the interface is working.
A commit references this bug: Author: eugen Date: Thu Jan 9 11:58:26 UTC 2020 New revision: 356551 URL: https://svnweb.freebsd.org/changeset/base/356551 Log: arp(8): avoid segfaulting due to out-of-bounds memory access Fix obvious mistake that sometimes results in reading memory past end of an array. PR: 240825 MFC after: 1 week Changes: head/usr.sbin/arp/arp.c
^Triage: assign to committer who resolved back in 2020.
An attribution is wrong due to my mistake in the commit log for Subversion revision 356551 back in 2020 that had no connection to this PR. Undo last change to it.
^Triage: turn off spurious mfc-stable12 flag. (I thought this had been previously obviated by a commit, but I was wrong.)