Bug 207701 - vlan interface over failover lagg has empty/00:00:00:00:00:00 mac/ether address
Summary: vlan interface over failover lagg has empty/00:00:00:00:00:00 mac/ether address
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-BETA2
Hardware: amd64 Any
: --- Affects Many People
Assignee: Marcelo Araujo
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2016-03-04 13:04 UTC by Markus Wild
Modified: 2018-05-29 06:31 UTC (History)
6 users (show)

See Also:


Attachments
comparison between 10.2-release and 10.3-prerelease dmesg (19.94 KB, application/gzip)
2016-03-04 13:04 UTC, Markus Wild
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Wild 2016-03-04 13:04:00 UTC
Created attachment 167713 [details]
comparison between 10.2-release and 10.3-prerelease dmesg

We configure some servers with two interfaces as follows:
- failover lagg0 on em0 and em1
- several vlans on lagg0

after upgrading our systems due to the openssl bug, they came up without
networking. The reason was the vlan interfaces on the lagg got configured
with empty mac addresses:

vlan10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1496
        ether 00:00:00:00:00:00
        inet 10.0.1.11 netmask 0xffffff00 broadcast 10.0.1.255 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        vlan: 10 parent interface: lagg0

where as with RELEASE-10.2, this looked as follows:

vlan10: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1496
        ether 00:30:48:7f:29:74
        inet 10.0.1.11 netmask 0xffffff00 broadcast 10.0.1.255 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        vlan: 10 parent interface: lagg0

The lagg0 looks the same in both releases:

lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>
        ether 00:30:48:7f:29:74
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        laggproto failover lagghash l2,l3,l4
        laggport: em0 flags=5<MASTER,ACTIVE>
        laggport: em1 flags=0<>


We're normally using a stripped own kernel config, but for this purpose
I deliberately booted with 10.2-RELEASE and created a 10.3-PRERELEASE@r296382
GENERIC kernel to have a verifiable configuration.

Our rc.conf snippet to configure the interfaces looks as follows (there
are more vlans configured than this):

cloned_interfaces="vlan10 lagg0"

ifconfig_em0="vlanhwtag up"
ifconfig_em1="vlanhwtag up"
ifconfig_lagg0="laggproto failover laggport em0 laggport em1 up"
ifconfig_vlan10="inet 10.0.1.11/24 vlan 10 vlandev lagg0 up"
Comment 1 Marcelo Araujo freebsd_committer 2016-03-10 01:02:55 UTC
I will take a look on it.
Comment 2 Pushkar Kothavade 2016-03-18 10:48:40 UTC
Hi Team, 

After going through the issue specified here, I investigated this issue on FreeBSD 10 stable, Freebsd 10.2 release and FreeBSD HOL platforms by creating lagg and vlan interfaces.

After investigation, problem statement is as follows:

*************************
*** Problem Statement ***
*************************

After machine boots up, when we create a vlan on Lagg interface, vlan interface takes the same MAC address as that of lagg interface. This works fine as expected.  Now going further, when we add/delete members of lagg bundle, 
MAC address of the lagg bundle changes as expected. It is expected that vlan MAC should also get changed, when MAC of lagg interface changes, but it is not happening. 

This issue is present in only 10 stable. It is not present in HOL and 10.2 release. 


*************************
******** Fix ************
*************************

https://reviews.freebsd.org/differential/diff/14418/


*************************
***** Test Plan *********
*************************

####################
#### Before Fix ####
####################

[Step-0] uname -a

** Output **

FreeBSD Host-XXX 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #2 r294978M: Thu Jan 28 16:46:33 IST 2016 amd64

[Step-1] ifconfig lagg1 create

[Step-2] ifconfig lagg1 laggproto failover laggport le1 laggport le2

[Step-3] ifconfig vlan1 create

[Step-4] ifconfig vlan1 vlan 1 vlandev lagg1

[Step-5] ifconfig

** Output **

lagg1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 00:0c:29:5b:6a:04
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
	laggproto failover lagghash l2,l3,l4
	laggport: le1 flags=1<MASTER>
	laggport: le2 flags=0<>

vlan1: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	ether 00:0c:29:5b:6a:04
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
	vlan: 1 parent interface: lagg1

[Step-6] ifconfig lagg1 -laggport le1

[Step-7] ifconfig

** Output **

lagg1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 00:0c:29:5b:6a:0e
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
	laggproto failover lagghash l2,l3,l4
	laggport: le2 flags=1<MASTER>

vlan1: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	ether 00:0c:29:5b:6a:04
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
	vlan: 1 parent interface: lagg1

** Conclusion  **

vlan1 MAC does not change, when Lagg1 MAC changes. 

###################
#### After Fix ####
###################

[Step-0] uname -a

** Output **

FreeBSD Host-XXX 10.3-PRERELEASE FreeBSD 10.3-PRERELEASE #6 r296988: Fri Mar 18 08:30:25 IST 2016 amd64

[Step-1] ifconfig lagg1 create

[Step-2] ifconfig lagg1 laggproto failover laggport le1 laggport le2

[Step-3] ifconfig vlan1 create

[Step-4] ifconfig vlan1 vlan 1 vlandev lagg1

[Step-5] ifconfig

** Output **

lagg1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 00:0c:29:5b:6a:04
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
	laggproto failover lagghash l2,l3,l4
	laggport: le1 flags=1<MASTER>
	laggport: le2 flags=0<>

vlan1: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	ether 00:0c:29:5b:6a:04
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
	vlan: 1 parent interface: lagg1

[Step-6] ifconfig lagg1 -laggport le1

[Step-7] ifconfig

** Output **

lagg1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=8<VLAN_MTU>
	ether 00:0c:29:5b:6a:0e
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
	laggproto failover lagghash l2,l3,l4
	laggport: le2 flags=1<MASTER>

vlan1: flags=8842<BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	ether 00:0c:29:5b:6a:0e
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: no carrier
	vlan: 1 parent interface: lagg1

** Conclusion  **

vlan1 MAC changes, when Lagg1 MAC changes. 

Thanks,
Pushkar Kothavade
Comment 3 Pushkar Kothavade 2016-03-18 11:38:25 UTC
Please find the Fix:

https://reviews.freebsd.org/D5672
Comment 4 Markus Wild 2016-03-18 12:57:25 UTC
Thank you! This fixed our problem.

Kind regards,
Markus
Comment 5 Marcelo Araujo freebsd_committer 2016-03-22 07:21:00 UTC
I'm not sure about the fix on D5672 is right.

This commit[0], removes the call of EVENTHANDLER_INVOKE due several LOR's.

[0] https://svnweb.freebsd.org/base/stable/10/sys/net/if_lagg.c?revision=287723&view=markup&sortby=date

And this commit[1], fix properly the lladdr usage.

[1] https://svnweb.freebsd.org/base/head/sys/net/if_lagg.c?revision=290239&view=markup


So, what I will do is, I will check the impact to import r290239 on 10-Stable.
It will cost for me couple days.


Best,
Comment 6 Pushkar Kothavade 2016-03-28 10:35:51 UTC
Thanks Marcelo,

I noticed that proposed fix causes following regression :
MAC address of the participating link in lagg bundle does not get restored 
to it's original MAC, when link is taken out of lagg bundle. 

https://reviews.freebsd.org/D5672 has been updated to fix regression. 

Thanks,
Pushkar Kothavade.
Comment 7 Marcelo Araujo freebsd_committer 2016-03-30 02:35:46 UTC
(In reply to Pushkar Kothavade from comment #6)

I don't see any update there!
But the right approach would be import this patch: https://svnweb.freebsd.org/base/head/sys/net/if_lagg.c?revision=290239&view=markup


I'm a bit busy for the next couple weeks to import it, I have some other patch in my pool that I need to fix first.



Best,
Comment 8 Matthew Seaman freebsd_committer 2016-06-15 15:01:27 UTC
Hi,

We're seeing exactly this on upgrading from 10.2-RELEASE to 10.3-RELEASE-p5 -- only one machine out of about 10 that have been upgraded.  

Workaround is to set the ether address via rc.conf by adding:

```
ifconfig_vlan110="ether 90:b1:1c:41:93:50 up"
```

Config settings:
```
ifconfig_bge0="up"
ifconfig_bge1="up"

ifconfig_lagg0="laggproto lacp laggport bge0 laggport bge1 up"

cloned_interfaces="lagg0 vlan110"
vlans_lagg0="vlan110"
create_args_vlan110="vlandev lagg0 vlan 110"
ifconfig_vlan110="ether 90:b1:1c:41:93:50 up"
ipv4_addrs_vlan110="10.2.1.9/24"
defaultrouter="10.2.1.1"
```

Machine is a Dell R420 -- we're seeing the same effect with both the on-board bge(4) NICs and with an add-on igb(4) card.

This setup has been working flawlessly for us on numerous machines since around 9.0-RELEASE
Comment 9 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:43:14 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 10 Markus Wild 2018-05-29 06:31:29 UTC
For us, the problem was resolved by inverting the order of cloned_interfaces, 
ensuring that lagg are listed before vlan that make use uf lagg. 

So, this caused the problem:
cloned_interfaces="vlan10 lagg0"

this didn't anymore:
cloned_interfaces="lagg0 vlan10"

it would be nice, if this could be taken care of by network.subr, but
I don't know whether that's possible with the dependency information 
available in rc.conf.