Bug 213606 - [bxe] multicast (LACP/OSPF) not working with qlogic BCM57800
Summary: [bxe] multicast (LACP/OSPF) not working with qlogic BCM57800
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: David C Somayajulu
URL:
Keywords: regression
: 227743 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-10-19 09:00 UTC by Kristjan
Modified: 2020-03-23 14:48 UTC (History)
15 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kristjan 2016-10-19 09:00:17 UTC
Hello,

In FreeBSD 11.0 lacp is not working with Qlogic BCM57800.
Under FreeBSD 10.3 it is working.

Tested with Juniper EX4550


Tested with commands:

ifconfig bxe0 up
ifconfig bxe1 up
ifconfig lagg0 create
ifconfig lagg0 laggproto lacp laggport bxe0 laggport bxe1 xxx.xxx.xxx.xxx/xx
Comment 1 Steven Hartland freebsd_committer freebsd_triage 2016-10-19 15:07:22 UTC
what does ifconfig lagg0 report?
Comment 2 Kristjan 2016-10-19 15:26:39 UTC
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether b0:83:fe:e5:fa:82
        inet 172.21.50.2 netmask 0xffffff00 broadcast 172.21.50.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        groups: lagg
        laggproto lacp lagghash l2,l3,l4
        laggport: bxe0 flags=0<>
        laggport: bxe1 flags=0<>
Comment 3 Steven Hartland freebsd_committer freebsd_triage 2016-10-19 15:35:54 UTC
and for bxe0 / bxe1?
Comment 4 Kristjan 2016-10-19 15:45:51 UTC
bxe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether b0:83:fe:e5:fa:82
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-SR <full-duplex>)
        status: active


bxe1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether b0:83:fe:e5:fa:82
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-SR <full-duplex>)
        status: active
Comment 5 Steven Hartland freebsd_committer freebsd_triage 2016-10-19 16:20:24 UTC
That looks odd, no options= for the nic's.

Can you try something silly, assign a private IP to the bxe0 and see if that changes things?
Comment 6 Steven Hartland freebsd_committer freebsd_triage 2016-10-19 16:26:01 UTC
Looks like it may be an old issues see:
https://forums.freenas.org/index.php?threads/lacp-lagg-issues-with-9-3.26227/
Comment 7 Kristjan 2016-10-19 17:19:56 UTC
Nothing changes and it seems to be the same problem yes. laggproto loadbalance is working fine.
LACP was also working in 10.3 so something between 10.3 and 11.0 changed that.
Comment 8 Borja Marcos 2016-10-20 08:20:24 UTC
Deja vu with https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=150249

The main symptom was lagg refusing to work in LACP mode. 

In this case, the reason was that the driver didn't detect media properly, and the "paperwork" with the kernel failed: the interface wasn't marked as full duplex. As a result, LACP (which checks the full-duplex flag for the interface) refused to use it. Remember that full-duplex is a prerequisite for LACP.

This seems to be a case of incomplete paperwork as well, although the necessary bits seem to be in place.

In my case this was the problem with LACP (ieee8023ad_lacp.c):

---------
        /*
         * If the port is not an active full duplex Ethernet link then it can
         * not be aggregated.
         */
        if (IFM_TYPE(media) != IFM_ETHER || (media & IFM_FDX) == 0 ||
            ifp->if_link_state != LINK_STATE_UP) {
                lacp_port_disable(lp);
        } else {
                lacp_port_enable(lp);
        }
---------

But according to ifconfig the interface is marked as full duplex and media seems to be Ethernet. I would add some printf's here to check if this is really the case and some other check is failing.

What does ifconfig -m say of the interfaces? But that lack of options looks like a driver bug. And it would help to see its capabilities as reported by ifconfig.

This is an example with an "em" interface.

---------
% ifconfig -m -v -v em0
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC,VLAN_HWTSO>
	capabilities=15399b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,NETMAP>
	ether 68:05:ca:XX:YY:ZZ
	inet 192.168.1.202 netmask 0xffffff00 broadcast 192.168.1.255 
	inet 192.168.1.203 netmask 0xffffffff broadcast 192.168.1.203 
	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active
	supported media:
		media autoselect
		media 1000baseT
		media 1000baseT mediaopt full-duplex
		media 100baseTX mediaopt full-duplex
		media 100baseTX
		media 10baseT/UTP mediaopt full-duplex
		media 10baseT/UTP
---------
Comment 9 Kristjan 2016-10-20 13:20:37 UTC
ifconfig -m bxe0

bxe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        capabilities=527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO>
        ether b0:83:fe:e5:fa:82
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (10Gbase-SR <full-duplex>)
        status: active
        supported media:
                media autoselect
                media 10Gbase-SR mediaopt full-duplex
Comment 10 Borja Marcos 2016-10-20 13:30:37 UTC
And what if you try to mess with the options? 

Try this:

ifconfig bxe0 -rxcsum
ifconfig bxe1 -rxcsum

I'm just wondering, maybe by doing that you can coerce the driver to complete the paperwork properly.
Comment 11 Kristjan 2016-10-25 06:12:41 UTC
Sorry, it does not make things better.
Comment 12 Kristjan 2016-11-24 09:57:36 UTC
Upgraded nic FW to latest version and tried FreeBSD-12.0-CURRENT-amd64-20161117-r308737-disc1.iso. LACP still not working.

Is there anything more I can give to help debug this problem?
Comment 13 Simon Lindgren 2016-12-08 10:39:33 UTC
Hi!

I'm having the same issues in recent versions of freebsd, it worked before upgrade, also using bxe driver.

When putting the interfaces in promiscious mode, they start working again for some unknown reason, but when taking them out of promisc they stop working after about a minute. If i start a tcpdump (and therefor putting the interface in promisc), i get around 10 LACPv1 packets and then i start seeing other traffic coming in/out as it should, and the interface flags becomes ACTIVE,COLLECTING,DISTRIBUTING.

Using laggproto loadbalance or failover works fine.

NIC: QLogic NetXtreme II BCM57810 10GbE (B0) BXE v:1.78.81

LACP works on:
10.3-RELEASE-p7 (and at the very least, some earlier versions as well)

LACP does NOT work on:
FreeBSD 11.0-RELEASE-p1
FreeBSD 11.0-RELEASE-p2
Comment 14 Ingeborg Hellemo 2017-02-23 08:00:06 UTC
The issues are still there in 11.0-RELEASE-p7

bxe and lagg worked perfectly before upgrading from 10.3 to 11.0

Hope this problem can get some priority. Happy to help debugging.
Comment 15 rainer 2017-10-05 11:41:08 UTC
Still seems to be a problem with 11.1
Comment 16 Eugene Grosbein freebsd_committer freebsd_triage 2017-10-05 12:16:49 UTC
Everyone having this problem should run "tcpdump -enp -i bxe0" to see if it shows incoming and/or outgoing LACP ethernet frames while lagg negotiates protocol with partner switch, and report back.
Comment 17 rainer 2017-10-05 14:30:37 UTC
Also related, probably:

https://community.emc.com/thread/222482?start=0&tstart=0
Comment 18 Eugene Grosbein freebsd_committer freebsd_triage 2017-10-06 09:34:03 UTC
LACP relies on receiving ethernet multicasts. It seems that bxe(4) hardware fails to deliver incoming multicasts to its host unless switched to promiscuous mode.
Comment 19 Eugene Grosbein freebsd_committer freebsd_triage 2018-04-25 11:21:45 UTC
Here is suspicious commit that might broke bxe(4) multicast processing:

https://svnweb.freebsd.org/base/head/sys/dev/bxe/bxe.c?revision=266979&view=markup

See also older PR https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=174850 describing same problem for FreeBSD 8.x

Currect bxe(4) code has some processing for IFF_ALLMULTI but it does not seem functional:

https://svnweb.freebsd.org/base/head/sys/dev/bxe/bxe.c?revision=266979&view=markup#l12664

Adding edavis that may have some thoughts.
Comment 20 Eugene Grosbein freebsd_committer freebsd_triage 2018-04-25 11:26:14 UTC
*** Bug 227743 has been marked as a duplicate of this bug. ***