Bug 179926 - [lacp] [patch] active aggregator selection bug
Summary: [lacp] [patch] active aggregator selection bug
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.1-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-24 12:40 UTC by boris.astardzhiev
Modified: 2017-12-31 22:27 UTC (History)
0 users

See Also:


Attachments
file.diff (1.25 KB, patch)
2013-06-24 12:40 UTC, boris.astardzhiev
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description boris.astardzhiev 2013-06-24 12:40:00 UTC
Hi,

I've been investigating the LACP implementation in FreeBSD and have
encountered a bug. Here's the set:

"                      ---------          ----------              "
"                      | lagg1 |          | bond0  |              "
"   ---------          |      xl0--------eth0      |   ---------  "
"   | hosts |----b1----1 FBSD rl0--------eth1 Linux|---| hosts |  "
"   ---------          |  9.1 rl1--------eth2      |   ---------  "
"                      |       |          |        |              "
"                      ---------          ----------              "

On a FreeBSD 9.1-RELEASE #0 r243826 system a lagg is created and three 
interfaces are added to it:
- xl0
- rl0
- rl1

On a Linux system a bonding interface is added *ONLY ONE* interface:
- eth0
Note: I think the Linux may be substituted with any other LACP implementation.

The lagg protocol on both of the systems is LACP.

LACPDUs transmission/reception takes place only between xl0 and eth0.
Here's the result:

root@freebsd91:/root # ifconfig lagg1
lagg1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=2008<VLAN_MTU,WOL_MAGIC>
	ether 00:10:b5:7f:97:fb
	inet6 fe80::210:b5ff:fe7f:97fb%lagg1 prefixlen 64 scopeid 0x9 
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
	media: Ethernet autoselect
	status: active
	laggproto lacp lagghash l2,l3,l4
	laggport: xl0 flags=18<COLLECTING,DISTRIBUTING>
	laggport: rl1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
	laggport: rl0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

I consider that xl0 is the only available link therefor the aggregation must
rely on it. However the lacp implementation has chosen the other two links
that haven't received a single LACPDU.

I think the problem is related to the selection of best active aggregator -
in lacp_select_active_aggregator(). I've attached the debug output of
sysctl net.lacp_debug.

.. snippet ...
Jun 24 10:41:43 freebsd91 kernel: xl0: new pstate 3f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Jun 24 10:41:43 freebsd91 kernel: rl0: lacp_sm_mux: state 4
Jun 24 10:41:43 freebsd91 kernel: rl1: lacp_sm_mux: state 4
Jun 24 10:41:43 freebsd91 kernel: xl0: lacp_sm_mux: state 3
Jun 24 10:41:43 freebsd91 kernel: xl0: enable distributing on aggregator [(8000,00-10-B5-7F-97-FB,0126,0000,0000),(FFFF,E0-8F-EC-00-B5-2F,0009,0000,0000)], nports 0 -> 1
Jun 24 10:41:43 freebsd91 kernel: lacp_select_active_aggregator
Jun 24 10:41:43 freebsd91 kernel: [(8000,00-10-B5-7F-97-FB,0126,0000,0000),(FFFF,00-00-00-00-00-00,0000,0000,0000)], speed=200000000, nports=2
Jun 24 10:41:43 freebsd91 kernel: [(8000,00-10-B5-7F-97-FB,0126,0000,0000),(FFFF,E0-8F-EC-00-B5-2F,0009,0000,0000)], speed=100000000, nports=1
Jun 24 10:41:43 freebsd91 kernel: active aggregator not changed
Jun 24 10:41:43 freebsd91 kernel: new [(8000,00-10-B5-7F-97-FB,0126,0000,0000),(FFFF,00-00-00-00-00-00,0000,0000,0000)]
Jun 24 10:41:43 freebsd91 kernel: xl0: mux_state 3 -> 4
Jun 24 10:41:43 freebsd91 kernel: xl0: lacpdu transmit
.. snippet ...

Though there is an aggregator with an active partner the implementation has chosen the other aggregator:
Jun 24 10:41:43 freebsd91 kernel: new [(8000,00-10-B5-7F-97-FB,0126,0000,0000),(FFFF,00-00-00-00-00-00,0000,0000,0000)]

Do you think that such aggregators must be skipped in favour of aggregators with active partners? I've applied a patch that fixes this issue and xl0 remains the only active link but I'm not sure it is correct and it has the correct approach.

Any comments are appreciated.

Greetings,
Boris Astardzhiev,
Smartcom Bulgaria AD

Fix: A patch is attached.

Patch attached with submission follows:
How-To-Repeat: Follow the described set and the bug is reproduced.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2013-06-24 14:47:10 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 2 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:57 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped