After upgrading from 9.2-RELEASE-p3 the link aggregation with lacp is not working anymore. It seems that it does not depend on the used network-hardware. The machine has the following configuration: a quad-port Broadcom card: bge0: <Broadcom NetXtreme Gigabit Ethernet, ASIC rev. 0x5720000> mem 0xd90a0000-0xd90affff,0xd90b0000-0xd90bffff,0xd90c0000-0xd90cffff irq 34 at device 0.0 on pci2 and a dual-port Intel card: igb0: <Intel(R) PRO/1000 Network Connection version - 2.4.0> mem 0xd4d00000-0xd4dfffff,0xd4ff8000-0xd4ffbfff irq 82 at device 0.0 on pci68 On 9.2 the lagg0 configuration was as follows (from /etc/rc.conf): ifconfig_igb0="UP" ifconfig_igb1="UP" ifconfig_bge2="UP" ifconfig_bge3="UP" cloned_interfaces="lagg0" ifconfig_lagg0="laggproto lacp laggport igb0 laggport igb1 laggport bge2 laggport bge3 10.50.1.154 netmask 255.255.0.0" defaultrouter="10.50.0.1" ifconfig_lagg0_alias0="inet 10.50.1.155 netmask 255.255.0.0" ifconfig_lagg0_alias1="inet 10.50.1.165 netmask 255.255.0.0" This all works fine and as expected under 9.2. After upgrading to 10.0-RC5, you can't connect to the machine as expected. The lagg0-Interface goes up, but no network connection is possible. Changing to 10.0-RELEASE didn't solve the problem. I tried several things under 10.0 (w. LACP as Protocol Type): lagg-Interface with bge2 and bge3 network interfaces: not working lagg-interface with igb0 and igb1 network interfaces: not working ifconfig shows the interface as up and active, netstat -r shows the assigned IP-Adresses and the correct entry for the defaultroute. But e.g. you can't ping the gateway (100% packet loss). If i change the LAGG-Protocol to failover, then the lagg-Interface is working. It seems to me, that there is a problem only with LACP in 10.0. Fix: Don't use laggproto lacp, use laggproto failover instead (which is virtually not the same) How-To-Repeat: With the help of Google i found this thread, the problem has been previously discussed here before: http://forums.freebsd.org/viewtopic.php?f=7&t=43665
Hi, I have been using LACP interfaces on 10.0 just fine using the em driver with a Cisco 3750X switch. What brand/model switch are you using on the remote side? Regards, Brad Davis
Hi, Im using the igb driver and it doesnt work as well. The switch is a Juniper EX3300-48T. I tried to switch to failover but this seems not to work as well. Best regards Ben=
Hi, On 23.01.14 17:30, Brad Davis wrote: > > I have been using LACP interfaces on 10.0 just fine using the em driver > with a Cisco 3750X switch. > > What brand/model switch are you using on the remote side? > i'm using a HP ProCurve 6600ml-24G (J9263A). Kind regards, Michael
Scott Long helped to fix this issue for me: "The difference between FreeBSD 9.x and 10 is that in 9.x, it ran in=20 =93optimistic=94 mode, meaning that it didn=92t rely on getting receive=20 messages from the switch, and only took a channel down if the link state=20 went down. In strict mode, it looks for the receive messages and only=20 transitions to a full operational state if it gets them. So while I know=20 it=92s easy to point at the problem being FreeBSD 10, seeing as FreeBSD 9= =20 worked for you, please check to make sure that your switch is set up=20 correctly." Setting up my Juniper switch to "active" LACP mode it worked again: "The LACP mode can be active or passive. If the actor and partner are=20 both in passive mode, they do not exchange LACP packets, which results=20 in the aggregated Ethernet links not coming up. If either the actor or=20 partner is active, they do exchange LACP packets. By default, LACP is=20 turned off on aggregated Ethernet interfaces. If LACP is configured, it=20 is in passive mode by default. To initiate transmission of LACP packets=20 and response to LACP packets, you must configure LACP in active mode." If you have this issue please check if you have a similar setting. Best regards Ben
Scott Long provided here (thanks to Ben Niessen for opening the thread there): http://www.opendevs.org/mhthu/kern-185967-link-aggregation-lagg-lacp-not-working-in-10.html the following Patch for the issue: Index: ieee8023ad_lacp.c =================================================================== --- ieee8023ad_lacp.c (revision 261432) +++ ieee8023ad_lacp.c (working copy) @@ -192,6 +192,11 @@ SYSCTL_INT(_net_link_lagg_lacp, OID_AUTO, debug, CTLFLAG_RW | CTLFLAG_TUN, &lacp_debug, 0, "Enable LACP debug logging (1=debug, 2=trace)"); TUNABLE_INT("net.link.lagg.lacp.debug", &lacp_debug); +static int lacp_strict = 0; +SYSCTL_INT(_net_link_lagg_lacp, OID_AUTO, lacp_strict_mode, + CTLFLAG_RW | CTLFLAG_TUN, &lacp_strict, 0, + "Enable LACP strict protocol compliance"); +TUNABLE_INT("net.link.lagg.lacp.lacp_strict_mode", &lacp_strict); #define LACP_DPRINTF(a) if (lacp_debug & 0x01) { lacp_dprintf a ; } #define LACP_TRACE(a) if (lacp_debug & 0x02) { lacp_dprintf(a,"%s\n",__func__); } @@ -791,7 +796,7 @@ lsc->lsc_hashkey = arc4random(); lsc->lsc_active_aggregator = NULL; - lsc->lsc_strict_mode = 1; + lsc->lsc_strict_mode = lacp_strict; LACP_LOCK_INIT(lsc); TAILQ_INIT(&lsc->lsc_aggregators); LIST_INIT(&lsc->lsc_ports); I've applied this patch and can confirm, that this patch restores the LACP-behaviour of FreeBSD 10.0 to that of 9.2. Though, with this patch, you may upgrade your OS without changing your configuration, neither on the switch nor on the OS. I've tested this across different type of interfaces (igb/bge) with up to four interfaces and also configurations with two interfaces of the same type (bge0/1 and igb 0/1). You may run FreeBSD on the switch without the need of configuring LACP on the switch side. This may not strictly to be conform with 802.3ad standard, but is working (just for the records: bonding mode 4 under Linux, which claims also conformity against 802.3ad, works like this) From my point of view, the patch should be applied and merged ASAP into the FreeBSD-Kernel and be the standard behaviour for the next release. Kind regards Michael
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s).
The patch didn't solve Lagg regression between 9.2 and 10-stable (r267244). My configuration only use one lagg member (the second will be added later): ifconfig_em0="up" cloned_interfaces="lagg0" ifconfig_lagg0="laggproto lacp laggport em0 SYNCDHCP" ifconfig_lagg0_ipv6="inet6 accept_rtadv" And it's works great on 9.2: [root@R1]~# uname -a FreeBSD R1 9.2-RELEASE FreeBSD 9.2-RELEASE #0 r255918M: Sat Oct 26 22:41:39 CEST 2013 root@orange.bsdrp.net:/usr/obj/BSDRP.amd64/usr/local/BSDRP/BSDRP/FreeBSD/src/sys/amd64 amd64 [root@R1]~# ifconfig lagg0 lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM> ether aa:aa:00:01:01:01 inet6 fe80::a8aa:ff:fe01:101%lagg0 prefixlen 64 scopeid 0x8 inet 10.0.12.1 netmask 0xffffff00 broadcast 10.0.12.255 inet6 2001:db8:12:0:a8aa:ff:fe01:101 prefixlen 64 autoconf nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> media: Ethernet autoselect status: active laggproto lacp lagghash l2,l3,l4 laggport: em0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> [root@R1]~# uname -a FreeBSD R1 9.2-RELEASE FreeBSD 9.2-RELEASE #0 r255918M: Sat Oct 26 22:41:39 CEST 2013 root@orange.bsdrp.net:/usr/obj/BSDRP.amd64/usr/local/BSDRP/BSDRP/FreeBSD/src/sys/amd64 amd64 But once upgraded to 10-stable AND DISABLING the strict mode, it didn't works anymore: [root@R1]~# uname -a FreeBSD R1 10.0-STABLE FreeBSD 10.0-STABLE #0 r267244M: Mon Jun 9 03:57:44 CEST 2014 root@orange.bsdrp.net:/usr/obj/BSDRP.amd64/usr/local/BSDRP/BSDRP/FreeBSD/src/sys/amd64 amd64 [root@R1]~# echo "net.link.lagg.0.lacp.lacp_strict_mode=0" >> /etc/sysctl.conf [root@R1]~# ifconfig lagg0 lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=9b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM> ether aa:aa:00:01:01:01 inet6 fe80::a8aa:ff:fe01:101%lagg0 prefixlen 64 scopeid 0x7 inet 0.0.0.0 netmask 0xff000000 broadcast 255.255.255.255 nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL> media: Ethernet autoselect status: active laggproto lacp lagghash l2,l3,l4 laggport: em0 flags=0<> [root@R1]~# sysctl net.link.lagg.0. net.link.lagg.0.use_flowid: 1 net.link.lagg.0.flowid_shift: 16 net.link.lagg.0.count: 1 net.link.lagg.0.active: 0 net.link.lagg.0.flapping: 0 net.link.lagg.0.lacp.lacp_strict_mode: 0 net.link.lagg.0.lacp.debug.rx_test: 0 net.link.lagg.0.lacp.debug.tx_test: 0 In debug mode, I've got this output: lacp_select_tx_port: no active aggregator
This patch did solve my occasional problems with FreeBSD 10-STABLE and a Netgear GS110TP, so somebody should at least have a look at it (or scottl could just commit it).
Hi Oliver, (In reply to olivier from comment #8) > The patch didn't solve Lagg regression between 9.2 and 10-stable (r267244). > > My configuration only use one lagg member (the second will be added later): _Maybe_ #179926 is your Bug and you could try the patch over there... (Although that problem seems to have existed at least in 9.1)
(In reply to kvedulv from comment #10) > Hi Oliver, > > (In reply to olivier from comment #8) > > The patch didn't solve Lagg regression between 9.2 and 10-stable (r267244). > > > > My configuration only use one lagg member (the second will be added later): > > _Maybe_ #179926 is your Bug and you could try the patch over there... > (Although that problem seems to have existed at least in 9.1) I believe it's not a regression in my case: On FreeBSD 9.2, the lagg-lacp mode by not implementing a 'strict' mode was equivalent to a 'static+optionnal LACP' mode. The behavior of lagg-lacp on 9.2 seems: - If no LACP detected, allow a minimum of 1 lagg member to transmit. The behavior on 10 seems: - If no LACP detected, don't allow any of the lagg member to transmit (even with strict_mode disabled) And in my case: I didn't have a LACP device in front of my FreeBSD. Then I simply need to use a 'static' mode (like loadbalanced or roundrobin).
Created attachment 147795 [details] Output of ifconfig -a
We are facing the same problem, running FreeBSD 10.0-p9, connected with LACP to two Juniper EX4550 switches (virtual-chassis/stacked). In our case the Junipers are already in active mode, but we still have issues with the connection. We are experiencing lots of outages (every few mins). dmesg shows (on -multiple- FreeBSD servers: ix1: Interface stopped DISTRIBUTING, possible flapping ix1: Interface stopped DISTRIBUTING, possible flapping ix1: Interface stopped DISTRIBUTING, possible flapping Juniper configuration: ae21 { description "serverX (xe-0/0/17 en xe-1/0/17)"; aggregated-ether-options { link-speed 10g; lacp { active; periodic fast; } } unit 0 { family ethernet-switching { port-mode access; vlan { members S1_SERVERS; } } } } - attached output of "ifconfig -a" - attached output of "/etc/rc.conf"
Created attachment 147796 [details] LACP settings in /etc/rc.conf
Is the patch supposed to be included with 10.1-BETA3? There does not seem to be a sysctl entry called net.link.lagg.lacp.lacp_strict_mode
I just upgraded our storage server from 9.3 to 10.1 and we're now seeing this problem as well. This is what we had in /etc/rc.conf: cloned_interfaces="lagg0" ifconfig_lagg0="laggproto lacp laggport em0 laggport em1 laggport em2 laggport em3 192.168.0.102/24" I'll also attach a screenshot of ifconfig -a.
Screenshots: https://picasaweb.google.com/104346539473598376650/LaggProblems?authuser=0&authkey=Gv1sRgCKjvr5q29ouZLg&feat=directlink
In our case the problem was due to a bug in TSO (tcp segment offloading). This has been resolved in 10.1. You can easily exclude issues TSO by turning it temporary off for your interfaces (ifconfig ix# -tso).
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Keyword: patch or patch-ready – in lieu of summary line prefix: [patch] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>
This bug will be hard to fix and reproduce if it still exists. It is likely that the transition to iflib was the cause of the regression or required the introduction of a configuration change. It would be reasonable to close this PR IMHO.
(In reply to Marek Zarychta from comment #21) iflib driver did not affect any of the NICs in this PR until 12.0 (December 2018).
I tested LACP mode while I was fixing bpf tapping [1] with a H3C switch and a dual port Chelsio T520-CR card. IIRC the LCAP mode works great. From the discuss above, I suspect the switch in front is not configured correctly. 1. https://cgit.freebsd.org/src/commit/?id=5f3d0399e903573e9648385ea6585e54af4d573f