Bug 203031 - LACP problem with FreeBSD 10.2-RELEASE
Summary: LACP problem with FreeBSD 10.2-RELEASE
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.2-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-10 23:13 UTC by gondim
Modified: 2016-05-05 13:09 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description gondim 2015-09-10 23:13:45 UTC
Hi All,

We have a router configured with 2 LACP (lagg0 and lagg1):

lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        ether 00:1b:21:7b:ee:98
        inet6 fe80::21b:21ff:fe7b:ee98%lagg0 prefixlen 64 scopeid 0x12
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        laggproto lacp lagghash l2,l3,l4
        laggport: igb6 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: igb7 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

lagg1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        ether 00:1b:21:7b:ee:6c
        inet 189.xxx.xxx.34 netmask 0xfffffffc broadcast 189.113.78.35
        inet6 fe80::21b:21ff:fe7b:ee6c%lagg1 prefixlen 64 scopeid 0xf
        inet6 2804:xxxx:0:8::2 prefixlen 64
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        laggproto lacp lagghash l2,l3,l4
        laggport: igb4 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: igb5 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>

When my Internet traffic is high at night, my BGP session in lagg1 is giving up and down 4 in 4 minutes. High load in system 40.x
After upgraded to FreeBSD 10.2-RELEASE-p2, this problem started. Before we were using the FreeBSD 10.1-STABLE r281235 without any problem.

The log have the following messages:

/var/log/messages:Sep  9 19:21:43 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 19:21:44 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 19:27:01 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 19:27:01 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 19:29:13 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 19:29:14 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 19:46:10 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 19:46:11 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:01:02 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:01:03 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:02:08 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:02:09 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:03:54 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:03:57 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:07:05 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:07:06 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:20:49 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:20:50 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:25:39 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:25:40 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:28:55 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:28:56 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:31:39 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:31:39 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:33:29 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 20:33:30 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:03:38 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:03:38 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:09:39 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:09:39 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:20:51 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:20:52 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:25:24 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:25:25 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:36:22 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:36:23 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:39:26 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:39:27 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:47:40 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:47:40 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:52:19 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:52:19 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:53:01 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:53:01 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:58:53 rt01 kernel: igb5: Interface stopped DISTRIBUTING, possible flapping
/var/log/messages:Sep  9 21:58:53 rt01 kernel: igb4: Interface stopped DISTRIBUTING, possible flapping

Has there been any change in the LACP, during this period, which could be causing this problem? 

Today:
I did downgrade the system to 10.1-STABLE r281235 and the problem stopped happening.
Some change occurred between 10.1-STABLE r281235 and 10.2-RELEASE-p2 causing this problem in the system.
Comment 1 gondim 2015-09-17 22:36:08 UTC
Today I tried to run my router with the latest 10.2-STABLE and noticed the following problem:

When I'm using the 10.1-STABLE r281235 and my OpenBGP starts to open the sessions, the load remains at 4.x and with high traffic, the load increases to 9.x

Using the latest 10.2-STABLE, when my OpenBGP start to open the sessions, the load rises to 14.x and with high traffic, the load increases to 40.x, 53.x.

Something made very drop system performance with large traffic but I have no idea at what time it happened.
Comment 2 gondim 2015-09-19 13:10:45 UTC
Hi all,

More information about the problem. When I do:

sysctl net.link.lagg.lacp.debug=1

I see many messages in the log:

Sep 18 15:15:10 rt01 kernel: igb4: lacpdu transmit
Sep 18 15:15:10 rt01 kernel: actor=(8000,00-1B-21-7B-EE-6C,01EB,8000,0005)
Sep 18 15:15:10 rt01 kernel:
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Sep 18 15:15:10 rt01 kernel:
partner=(007F,28-8A-1C-55-B3-C0,0004,007F,0005)
Sep 18 15:15:10 rt01 kernel:
partner.state=3f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Sep 18 15:15:10 rt01 kernel: maxdelay=0
Sep 18 15:15:11 rt01 kernel: igb5: lacpdu transmit
Sep 18 15:15:11 rt01 kernel: actor=(8000,00-1B-21-7B-EE-6C,01EB,8000,0006)
Sep 18 15:15:11 rt01 kernel:
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Sep 18 15:15:11 rt01 kernel:
partner=(007F,28-8A-1C-55-B3-C0,0004,007F,0004)
Sep 18 15:15:11 rt01 kernel:
partner.state=3f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Sep 18 15:15:11 rt01 kernel: maxdelay=0
Sep 18 15:15:11 rt01 kernel: igb4: lacpdu transmit
Sep 18 15:15:11 rt01 kernel: actor=(8000,00-1B-21-7B-EE-6C,01EB,8000,0005)
Sep 18 15:15:11 rt01 kernel:
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Sep 18 15:15:11 rt01 kernel:
partner=(007F,28-8A-1C-55-B3-C0,0004,007F,0005)
Sep 18 15:15:11 rt01 kernel:
partner.state=3f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Sep 18 15:15:11 rt01 kernel: maxdelay=0
Sep 18 15:15:12 rt01 kernel: igb5: lacpdu transmit
Sep 18 15:15:12 rt01 kernel: actor=(8000,00-1B-21-7B-EE-6C,01EB,8000,0006)
Sep 18 15:15:12 rt01 kernel:
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Sep 18 15:15:12 rt01 kernel:
partner=(007F,28-8A-1C-55-B3-C0,0004,007F,0004)
Sep 18 15:15:12 rt01 kernel:
partner.state=3f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Sep 18 15:15:12 rt01 kernel: maxdelay=0
Sep 18 15:15:12 rt01 kernel: igb4: lacpdu transmit
Sep 18 15:15:12 rt01 kernel: actor=(8000,00-1B-21-7B-EE-6C,01EB,8000,0005)
Sep 18 15:15:12 rt01 kernel:
actor.state=3d<ACTIVITY,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Sep 18 15:15:12 rt01 kernel:
partner=(007F,28-8A-1C-55-B3-C0,0004,007F,0005)
Sep 18 15:15:12 rt01 kernel:
partner.state=3f<ACTIVITY,TIMEOUT,AGGREGATION,SYNC,COLLECTING,DISTRIBUTING>
Sep 18 15:15:12 rt01 kernel: maxdelay=0
Comment 3 gondim 2015-09-24 13:01:11 UTC
Hi All,

I'm realizing another strange thing in all FreeBSD 10.2-RELEASE:

Look my ix0:

ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO>
        ether 00:1b:21:89:25:28
        inet 192.168.255.1 netmask 0xffffff00 broadcast 192.168.255.255 
        inet6 fe80::21b:21ff:fe89:2528%ix0 prefixlen 64 scopeid 0x1 
        inet6 2804:1054:dead:faca::1 prefixlen 64 
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-SR <full-duplex,rxpause,txpause>)
        status: active


<full-duplex,rxpause,txpause> ==> rxpause and txpause is correct?

All servers that are updated showing this message.
Comment 4 Jeff Pieper 2015-09-24 14:14:54 UTC
(In reply to gondim from comment #3)

Yes that is correct. It should correspond to sysctl dev.ix.0.fc=3, which indicates that both tx and rx pause frames are enabled.
Comment 5 Nick 2016-05-05 13:09:05 UTC
I have same problem at two servers running freebsd 10.2 and using LACP. At some time in log apper messages like 

kernel: igb0: Interface stopped DISTRIBUTING, possible flapping
kernel: igb1: Interface stopped DISTRIBUTING, possible flapping

and LA goes high to 30-40.

Tried to updade to 10.3 but got same problem.