Bug 269133 - bnxt(4): BCM57416 - HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error
Summary: bnxt(4): BCM57416 - HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords:
: 272865 (view as bug list)
Depends on:
Blocks:
 
Reported: 2023-01-24 19:43 UTC by Santiago Martinez
Modified: 2024-11-17 15:57 UTC (History)
25 users (show)

See Also:


Attachments
bnxt-patch.diff (2.10 KB, patch)
2023-05-04 16:27 UTC, Santiago Martinez
no flags Details | Diff
debug_patch_01 (502 bytes, patch)
2023-08-07 11:51 UTC, Chandrakanth Patil
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Santiago Martinez 2023-01-24 19:43:38 UTC

    
Comment 1 Graham Perrin freebsd_committer freebsd_triage 2023-01-24 22:02:09 UTC
Was this intended to be a separate bug report? 

I see you as a commenter under bug 245981, 

bnxt(4): BCM57414 / BCM57416 not initializing: bnxt0: Unable to allocate device TX queue / queue memory
Comment 2 Santiago Martinez 2023-01-25 09:46:13 UTC
Dear Graham, I'm sorry, didn't realize I submitted it. 

Yes, the intention was to create a separate one it seems to be different.

I'm having issues with the BCM57416 on 13-STABLE from yesterday (24-JAN-2023).


[49] bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
[49] bnxt0: set_multi: rx_mask set failed
[49] bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
[49] bnxt0: set_multi: rx_mask set failed
[49] bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
[49] bnxt0: set_multi: rx_mask set failed

It was working good with stable/13-n252199-9644bc4a1126.

I have tried removing all offloading features but after each ifconfig bnxt0 -capability it returns the same ALLOC error. Have tried adding promisc and the same results. The interface up and link are up but no traffic.

The interface bnxt0 has three 802.1q sub-interfaces 

sysctl :

dev.bnxt.0.%pnpinfo: vendor=0x14e4 device=0x16d8 subvendor=0x15d9 subdevice=0x16d8 class=0x020000
dev.bnxt.0.%location: slot=0 function=0 dbsf=pci0:199:0:0 handle=\_SB_.S0D0.D0A6.D017
dev.bnxt.0.%driver: bnxt
dev.bnxt.0.%desc: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet
dev.bnxt.0.ver.hwrm_min_ver: 1.2.2
dev.bnxt.0.ver.package_ver: 214.0.286.18
dev.bnxt.0.ver.chip_type: ASIC
dev.bnxt.0.ver.chip_bond_id: 0
dev.bnxt.0.ver.chip_metal: 1
dev.bnxt.0.ver.chip_rev: 1
dev.bnxt.0.ver.chip_num: 5848
dev.bnxt.0.ver.phy_partnumber: 
dev.bnxt.0.ver.phy_vendor: 
dev.bnxt.0.ver.roce_fw_name: BONO_FW
dev.bnxt.0.ver.netctrl_fw_name: KONG_FW
dev.bnxt.0.ver.mgmt_fw_name: AFW_214.0.253.2
dev.bnxt.0.ver.hwrm_fw_name: CHIMP_FW
dev.bnxt.0.ver.phy: 13.1.11
dev.bnxt.0.ver.roce_fw: 214.0.187
dev.bnxt.0.ver.netctrl_fw: 214.0.241
dev.bnxt.0.ver.mgmt_fw: 214.0.253
dev.bnxt.0.ver.hwrm_fw: 214.4.9
dev.bnxt.0.ver.driver_hwrm_if: 1.8.1.7
dev.bnxt.0.ver.hwrm_if: 1.10.0
Comment 3 Santiago Martinez 2023-01-30 01:05:41 UTC
Seems that the following commit is the one causing the issue.

if_bnxt: Add support for VLAN on Thor:
2db35273502b3c35aa653effc5c97618567367ab

Went back to 13.1 and started applying each of the commits on stable/13 for bnxt. After doing a cherry-pick on "if_bnxt: Add support for VLAN on Thor" the NIC stop working. Reverting it, make the card to work again.
Comment 4 Santiago Martinez 2023-02-09 22:20:29 UTC
The issue is related to the following code in bnxt_hwrm.c, line 1480.
This gets always true and then returns. I have commented on this and the NICs are working with stable/13 (today).

Now not sure what is the correct check that this "if" should do, any hints??

    if (*filter_id != -1) {
        device_printf(softc->dev, "Attempt to re-allocate l2 ctx "
            "filter (fid: 0x%jx)\n", (uintmax_t)*filter_id);

        return EDOOFUS;
    }

[408] bnxt0: vlan tag : 0x3fc, filter-id: 0x106000000000204)
[408] bnxt0: Attempt to re-allocate l2 ctx filter (fid: 0x106000000000204)
[408] bnxt0: vlan tag : 0x3f3, filter-id: 0x107000000000404)
[408] bnxt0: Attempt to re-allocate l2 ctx filter (fid: 0x107000000000404)
[408] bnxt0: vlan tag : 0x3f2, filter-id: 0x108000000000604)
[408] bnxt0: Attempt to re-allocate l2 ctx filter (fid: 0x108000000000604)
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2023-02-10 14:45:36 UTC
(In reply to Santiago Martinez from comment #4)
I think the problem is this: when a new vlan tag is registered, bnxt_vlan_register() adds a new tag structure to the vlan_tags list.

After adding a vlan tag, iflib will reinitialize the interface, see iflib_vlan_register()->iflib_init_locked()->IFDI_INIT.

Then bnxt_init() will call bnxt_hwrm_set_filter(), which initializes all the tags on the list.

Suppose all of this happens twice.  bnxt_hwrm_set_filter() will encounter an already-initialized tag and trigger the EDOOFUS error.

I suspect that bnxt_init() should unregister all of its filters during reinitialization.  That is, bnxt_init() should call bnxt_hwrm_free_filter() before calling bnxt_clear_ids().

(I'm not very familiar with this driver though, so this might not work.)
Comment 6 Santiago Martinez 2023-02-23 20:16:42 UTC
Hi Mark, thanks a lot for the reply. I will try to do some tests.
I will make the init to call the filter free and see what happens.
Will keep you posted.
Comment 7 Santiago Martinez 2023-02-23 23:51:45 UTC
Been doing more tests and there are two issues:

One: related to the if in line 1480.
Second: the is related to the filter enabled on lines 1503-1505.

When i did the initial workaround it forgot that I have also commented out the code: 

	if (vlan_tag != 0xffff) {
		enables |=
			HWRM_CFA_L2_FILTER_ALLOC_INPUT_ENABLES_L2_IVLAN |
			HWRM_CFA_L2_FILTER_ALLOC_INPUT_ENABLES_L2_IVLAN_MASK |
			HWRM_CFA_L2_FILTER_ALLOC_INPUT_ENABLES_NUM_VLANS;
		req.l2_ivlan_mask = 0xffff;
		req.l2_ivlan = vlan_tag;
		req.num_vlans = 1;
	}

I will do some more test tomorrow and compare it with the linux driver.
Comment 8 geoffroy desvernay 2023-05-04 12:20:13 UTC
Since upgrade from 12.3p1x to 13.2-RELEASE, we have the same error message here with bnxt (not tested with 13.1):

dmesg:
bnxt0: <Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet> mem 0xb9a10000-0xb9a1ffff,0xb9100000-0xb91fffff,0xb9aa2000-0xb9aa3fff irq 48 at device 0.0 numa-domain 0 on pci9
bnxt0: Using 256 TX descriptors and 256 RX descriptors
bnxt0: Using 12 RX queues 12 TX queues
bnxt0: Using MSI-X interrupts with 13 vectors
bnxt0: Ethernet address: d0:94:66:81:60:e3
bnxt0: netmap queues/slots: TX 12/256, RX 12/256
bnxt1: <Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet> mem 0xb9a00000-0xb9a0ffff,0xb8800000-0xb88fffff,0xb9aa0000-0xb9aa1fff irq 52 at device 0.1 numa-domain 0 on pci9
bnxt1: Using 256 TX descriptors and 256 RX descriptors
bnxt1: Using 12 RX queues 12 TX queues
bnxt1: Using MSI-X interrupts with 13 vectors
bnxt1: Ethernet address: d0:94:66:81:60:e4
bnxt1: netmap queues/slots: TX 12/256, RX 12/256
bnxt0: Link is UP full duplex, FC - none - 10000 Mbps 
bnxt0: link state changed to UP
bnxt1: Link is UP full duplex, FC - none - 10000 Mbps 
bnxt1: link state changed to UP
bnxt0: Attempt to re-allocate l2 ctx filter (fid: 0x117000000000204)
bnxt1: Attempt to re-allocate l2 ctx filter (fid: 0x11c00000003f004)
bnxt0: Attempt to re-allocate l2 ctx filter (fid: 0x125000000000204)
bnxt1: Attempt to re-allocate l2 ctx filter (fid: 0x12800000003f004)
bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
[same messages x 100's]


sysctl:

dev.bnxt.0.%domain: 0
dev.bnxt.0.%parent: pci9
dev.bnxt.0.%pnpinfo: vendor=0x14e4 device=0x16d8 subvendor=0x1028 subdevice=0x1feb class=0x020000
dev.bnxt.0.%location: slot=0 function=0 dbsf=pci0:94:0:0
dev.bnxt.0.%driver: bnxt
dev.bnxt.0.%desc: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet
dev.bnxt.0.ver.hwrm_min_ver: 1.10.2
dev.bnxt.0.ver.package_ver: <unknown>
dev.bnxt.0.ver.chip_type: ASIC
dev.bnxt.0.ver.chip_bond_id: 0
dev.bnxt.0.ver.chip_metal: 1
dev.bnxt.0.ver.chip_rev: 1
dev.bnxt.0.ver.chip_num: 5848
dev.bnxt.0.ver.phy_partnumber: 616740003
dev.bnxt.0.ver.phy_vendor: Amphenol
dev.bnxt.0.ver.roce_fw_name: BONO_FW
dev.bnxt.0.ver.netctrl_fw_name: KONG_FW
dev.bnxt.0.ver.mgmt_fw_name: AFW_223.0.205.0
dev.bnxt.0.ver.hwrm_fw_name: CHIMP_FW
dev.bnxt.0.ver.phy: 13.1.11
dev.bnxt.0.ver.fw_ver: 223.0.205.0/pkg 22.31.13.70
dev.bnxt.0.ver.roce_fw: 223.0.205
dev.bnxt.0.ver.netctrl_fw: 223.0.205
dev.bnxt.0.ver.mgmt_fw: 223.0.205
dev.bnxt.0.ver.hwrm_fw: 223.0.205
dev.bnxt.0.ver.driver_hwrm_if: 1.10.2.34
dev.bnxt.0.ver.hwrm_if: 1.10.2
Comment 9 Mark Johnston freebsd_committer freebsd_triage 2023-05-04 14:08:53 UTC
(In reply to Santiago Martinez from comment #7)
Have you had any success?
Comment 10 Santiago Martinez 2023-05-04 14:46:02 UTC
(In reply to Mark Johnston from comment #9)


Hi Mark, 

With the patch it works and the server has been stable, as it was with 13.1
The card seems to work fine, with no errors with or without 802.1q encap.

I have tried to find documentation from Broadcom regarding the meaning of each bit on the mask, but I couldn't find it.

If required I can provide access to the servers as they are for lab purposes.

Best regards.
Santi
Comment 11 Mark Johnston freebsd_committer freebsd_triage 2023-05-04 14:47:05 UTC
(In reply to Santiago Martinez from comment #10)
Thanks for following up so quickly.  Could you please share the patch you are using?  Is it just based on comment 5, or is there more to it?
Comment 12 Santiago Martinez 2023-05-04 16:12:03 UTC
Yes, its based on comment 4 + 7. 
Give me a few mins that I sync up with 13.2 and will check that still valid.
Comment 13 Santiago Martinez 2023-05-04 16:27:49 UTC
Created attachment 241972 [details]
bnxt-patch.diff
Comment 14 Santiago Martinez 2023-05-04 16:27:58 UTC
here it goes. sorry for the delay.
Comment 15 geoffroy desvernay 2023-05-05 09:17:02 UTC
tested here and approved
Comment 16 geoffroy desvernay 2023-05-05 09:24:33 UTC
tested and approved here !!! (releng/13.2 + patch from #13)

We are using lagg (lacp) with two bnxt, vlans on top of lagg and bridge to connect vnet jails (some vlans with mtu 9000, other with mtu 1500) 

I'll try now with latest firmware available
Comment 17 geoffroy desvernay 2023-05-05 09:36:56 UTC
Running well with dell's firmware 22.31.13.70

dev.bnxt.0.ver.fw_ver: 223.0.205.0/pkg 22.31.13.70

Thank you Santiago !!!

Could it be part of next errata release ?
Comment 18 Santiago Martinez 2023-05-05 11:25:21 UTC
Those are good news. It will be great if it can be fixed on the next service release as it is a pain at the moment. 

My only frustration, is that we didn't manage to sort it out before releasing 13.2. But on the other hand I understand we are short in resources and we all do what we can.
Comment 19 geoffroy desvernay 2023-07-19 13:36:21 UTC
Hi, any news for integration in next errata release ? (-p2 ?)
Comment 20 Alan Somers freebsd_committer freebsd_triage 2023-08-01 14:56:00 UTC
*** Bug 272865 has been marked as a duplicate of this bug. ***
Comment 21 Lutz Donnerhacke freebsd_committer freebsd_triage 2023-08-03 07:00:31 UTC
I have this problem, too.

Your patch seems to be only a "local quick fix", not a mainstream solution.
I'll try to find out, how to detect the necessary cases.
Comment 22 geoffroy desvernay 2023-08-03 07:55:29 UTC
I may test code if needed, for now I have a spare server affected
Comment 23 Santiago Martinez 2023-08-03 10:57:39 UTC
(In reply to Lutz Donnerhacke from comment #21)

Hi Lutz, thanks for taking a look.

Indeed, I have just rolled back a few changes committed before that broke the driver. We need someone that knows how these cards work and make sure that we are doing the correct things.

I have tried digging for documents that explain how to program the filters but failed to do so.

What frustrates me, is the fact that these changes (the ones that broke the driver) went into 13.2 even when the issue has been raised before and now people are been hit.

Clearly, we need better testing of network drivers, as this is not complex that is triggered by specific traffic patterns or values, but just basic driver functionality.

I do have access to some boxes that can be used for testing, so I'm happy to be included as part of testing on network drivers, assuming that I have those cards in the laboratory.

Thanks again.
Santi
Comment 24 tmoehle 2023-08-03 13:03:29 UTC
I ran into the same bug as well, when I upgraded my pfSense firewall software to its recently published 2.7 mainstream release. 

I wrote about it there in the forums and already found a few people who are affected by the same bug within a day. (https://forum.netgate.com/topic/181948/bug-in-broadcom-bnxt-driver-in-combination-with-vlans)

I fear, with popular software solutions relying on this freebsd release, a lot more people will be affected soon.
Comment 25 Warner Losh freebsd_committer freebsd_triage 2023-08-03 16:31:25 UTC
The last updates to this driver were driven by the vendor of the card.
But it's sounding like they caused too many problems and it's better to back them out, correct? Is that what the patch does?
Comment 26 Santiago Martinez 2023-08-03 16:38:11 UTC
Hi Warner, yes that pretty much correct. At some point Mark J mentioned that he thinks there are other issues. I did try to contact the vendor committer (based on the commit) and haven't received any reply.
Comment 27 Chandrakanth Patil 2023-08-04 11:59:53 UTC
(In reply to Santiago Martinez from comment #26)
Hi Warner and Santiago,

I am working towards the resolution and will update you ASAP.

-Chandrakanth patil
Comment 28 Santiago Martinez 2023-08-04 15:28:45 UTC
(In reply to Chandrakanth Patil from comment #27)
Thanks a lot! Will be available for testing.
Santi
Comment 29 Chandrakanth Patil 2023-08-07 11:51:52 UTC
Created attachment 243922 [details]
debug_patch_01

- VLAN tags are getting used after freeing which may lead to this issue.
- Assigning VLAN filter tags to -1 to avoid using VLAN tags after freeing them.
- Please check whether this fixes the issue.
Comment 30 Santiago Martinez 2023-08-07 12:19:51 UTC
Thanks, will give it a try today and let you know.
Santi
Comment 31 Kristof Provost freebsd_committer freebsd_triage 2023-08-07 12:34:56 UTC
(In reply to Chandrakanth Patil from comment #29)
I'm still seeing 'bnxt0: HWRM_CFA_L2_FILTER_ALLOC command returned INVALID_PARAMS error.' with debug_patch_01, when I try to `ifconfig vlan0 vlan 201 vlandev bnxt0 up`.
Comment 32 Santiago Martinez 2023-08-07 12:37:39 UTC
Same here, applied the path against latest 13-stable and it fails.
Comment 33 Santiago Martinez 2023-08-09 13:03:33 UTC
(In reply to Chandrakanth Patil from comment #29)
Hi Chandrakanth, hope you are doing well.
Just wondering if you have the chance to review the last patch as the driver still fails.
BR.
Comment 34 Warner Losh freebsd_committer freebsd_triage 2023-08-23 00:32:07 UTC
https://reviews.freebsd.org/D41558

Kevin Bowling did this... does it help?
Comment 35 Steinar Haug 2023-08-24 06:31:20 UTC
The problem also applies to BCM57412:

Aug 13 10:06:44 y kernel: bnxt1: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.

bnxt1@pci0:24:0:1:      class=0x020000 rev=0x01 hdr=0x00 vendor=0x14e4 device=0x16d6 subvendor=0x14e4 subdevice=0x4120
    vendor     = 'Broadcom Inc. and subsidiaries'
    device     = 'BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller'
    class      = network
    subclass   = ethernet

This was working in 13.1-STABLE n253353
Comment 36 Franco Fichtner 2023-08-24 07:09:08 UTC
(In reply to Warner Losh from comment #34)

It appears so. Got another confirmation via https://forum.opnsense.org/index.php?topic=35139.msg172730#msg172730 but they will do more testing to be sure.
Comment 37 Werner Fischer 2023-08-24 08:34:51 UTC
(In reply to Warner Losh from comment #34 an to Franco from comment #36)

As Franco mentioned, the patch https://reviews.freebsd.org/D41558 fixes the problem for us with OPNsense 23.7 (base FreeBSD 13.2). Thank you Franco for providing the updated Kernel including this patch.

We will monitor it for another few days to ensure that no other side-affects appear. But from the current point of view, it seems that this patch fixes the problem.

(In reply to  Steinar Haug from comment #35)

Affected by this bug are all NICs with BCM574xx chips (codename Whitley+). With the FreeBSD 13.2 release, support for the newer  BCM575xx chips (codename Thor) has been added. As mentioned in comment #3, adding VLAN support for Thor somehow broke VLAN support for Whitley+.
I have written wiki article with an overview about the different Broadcom NICs and the chips they are using: https://www.thomas-krenn.com/de/wiki/Broadcom_Netzwerkkarten
Comment 38 Santiago Martinez 2023-08-24 15:28:24 UTC
(In reply to Warner Losh from comment #34)
Hi Warner, sorry for the late reply. Indeed, that patch did solve the issue for bnxt. I will continue monitoring that box.
Comment 39 commit-hook freebsd_committer freebsd_triage 2023-08-24 20:52:21 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=bce864d1c274faeb6678028aad1e07e91fe430ac

commit bce864d1c274faeb6678028aad1e07e91fe430ac
Author:     Kevin Bowling <kbowling@FreeBSD.org>
AuthorDate: 2023-08-24 20:16:24 +0000
Commit:     Kevin Bowling <kbowling@FreeBSD.org>
CommitDate: 2023-08-24 20:46:56 +0000

    bnxt: Don't restart on VLAN changes

    In rS360398, a new iflib device method was added with default of opt out
    for VLAN events needing an interface reset.

    This is unintentional for bnxt(4) and is causing another bug in its VLAN
    initialization code to affect the common case of adding and removing
    VLANs on an existing interface.

    PR:             269133
    Tested by:      kp
    MFC after:      2 weeks
    Sponsored by:   BBOX.io
    Differential Revision:  https://reviews.freebsd.org/D41558

 sys/dev/bnxt/if_bnxt.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)
Comment 40 geoffroy desvernay 2023-08-30 19:52:09 UTC
I confirm last patch working on 13.2p2 here https://reviews.freebsd.org/D41558
Comment 41 Kevin Bowling freebsd_committer freebsd_triage 2023-08-30 20:28:24 UTC
(In reply to geoffroy desvernay from comment #40)
It does avoid the problem but note this doesn't fix the actual logic bugs in the driver.  Hopefully Broadcom will continue their corrections.
Comment 42 Philipp Wuensche 2023-09-07 10:29:16 UTC
Will this find its way to 13.2-RELEASE at some point? It is a regression from one release to another after all and 13.3-RELEASE might be far away still..
Comment 43 punkt.de Hosting Team 2023-09-08 07:55:38 UTC
(In reply to Philipp Wuensche from comment #42)
Seconded - please include this in 13.2-p3. While not a security fix this bug is a complete show stopper for at least two production systems for us. Not being able to perform binary upgrades in a highly automated data centre is "not good" [tm].

Kind regards,
Patrick
Comment 44 Kurt Jaeger freebsd_committer freebsd_triage 2023-09-08 10:14:57 UTC
(In reply to punkt.de Hosting Team from comment #43)
Well, 13.2p3 is out and as far as I can see, this fix is not in it.
Comment 45 punkt.de Hosting Team 2023-09-08 10:19:30 UTC
Sorry, my bad. -p4, then. Let's agree on releng/13.2 ... :-)
Comment 46 Kurt Jaeger freebsd_committer freebsd_triage 2023-09-08 10:25:21 UTC
(In reply to punkt.de Hosting Team from comment #45)
The problem is who to agree with. I'm really not sure who to talk to get the fix in...
Comment 47 commit-hook freebsd_committer freebsd_triage 2023-09-11 22:36:36 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ca3fc7aabe3998822c6e1357df922618afb18648

commit ca3fc7aabe3998822c6e1357df922618afb18648
Author:     Kevin Bowling <kbowling@FreeBSD.org>
AuthorDate: 2023-08-24 20:16:24 +0000
Commit:     Kevin Bowling <kbowling@FreeBSD.org>
CommitDate: 2023-09-11 22:34:20 +0000

    bnxt: Don't restart on VLAN changes

    In rS360398, a new iflib device method was added with default of opt out
    for VLAN events needing an interface reset.

    This is unintentional for bnxt(4) and is causing another bug in its VLAN
    initialization code to affect the common case of adding and removing
    VLANs on an existing interface.

    PR:             269133
    Tested by:      kp
    Sponsored by:   BBOX.io
    Differential Revision:  https://reviews.freebsd.org/D41558

    (cherry picked from commit bce864d1c274faeb6678028aad1e07e91fe430ac)

 sys/dev/bnxt/if_bnxt.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)
Comment 48 Mark Linimon freebsd_committer freebsd_triage 2023-09-12 15:00:07 UTC
^Triage: assign to committer who resolved this.
Comment 49 Kevin Bowling freebsd_committer freebsd_triage 2023-09-12 16:00:43 UTC
Broadcom still has a critical bug in their driver initialization and seem to be asleep at the wheel.  That issue is not fixed, my commit simply reduces the damage for common use.
Comment 50 Chandrakanth Patil 2023-11-01 18:08:31 UTC
(In reply to Kevin Bowling from comment #49)

Hi Kevin, Warner, 

An issue is with the below code execution that leads to VLAN failure.
this problem is due to the driver is attempting to allocate an already allocated
VLAN tag in the bnxt_hwrm_l2_filter_alloc function. Specifically, the code snippet
below checks for a previously allocated filter ID:

        if (*filter_id != -1) {
                device_printf(softc->dev, "Attempt to re-allocate l2 ctx "
                    "filter (fid: 0x%jx)\n", (uintmax_t)*filter_id);
                return EDOOFUS;
        }    
		
Here's the sequence of events:

1. When the first VLAN is created (vlan1), the correct filter ID (other than -1) is fetched from the firmware.
2. During the creation of the second VLAN (vlan2), the driver attempts to allocate vlan1. The target_id of vlan1
   is a valid value, causing the above if condition to be true. Consequently, it returns after throwing the error
   "Attempt to re-allocate l2 ctx" without allocating vlan2.

To resolve this issue, I suggest the following fix:

Assign -1 to the target_id of all VLAN tags in the list in the bnxt_init function. Here's the relevant code snippet:

        if (!BNXT_CHIP_P5(softc)) {
                rc = bnxt_hwrm_func_reset(softc);
                if (rc) 
                        return;
                SLIST_FOREACH (tag, &vnic->vlan_tags, next)
                {    
                        tag->filter_id = -1;
                }    
        } else if (softc->is_dev_init) {
                bnxt_stop(ctx);
        }    

I have applied this fix on BCM57416, and it has resolved the issue for me. I had provided this patch earlier(debug_patch_01), and I am
surprised it did not fix the problem earlier.

Could you please confirm if this patch resolves the issue?
Comment 51 Werner Fischer 2023-11-03 07:29:08 UTC
(In reply to Chandrakanth Patil from comment #50)
Hi Chandrakanth,

thank you for your comment.
Unfortunately, I don't have a test setup currently here, and I haven't yet compiled FreeBSD drivers by myself.

Regarding your patch and the mentioned code snipped, Kristof Provost (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269133#c31) and Santiago Martinez (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269133#c32) reported above that this patch did not work for them:
"I'm still seeing 'bnxt0: HWRM_CFA_L2_FILTER_ALLOC command returned INVALID_PARAMS error.' with debug_patch_01, when I try to `ifconfig vlan0 vlan 201 vlandev bnxt0 up`."

So it seems to me that this patch (debug_patch_01) alone does not fix the issue, at least for Kristof and Santiago.
Does anybody else in this thread has any thoughts on this or could someone help to test the patch again?

Best regards,
Werner
Comment 52 Chandrakanth Patil 2023-11-03 07:43:14 UTC
I have tried on the 223 firmware (same as Santiago's driver firmware combination) with a 13.2 inbox driver and I would be able to create the 240 VLANs without any issues.
Comment 53 Hauke Fath 2023-11-03 09:34:55 UTC
(In reply to Werner Fischer from comment #51)
Confirmed:

With the respective patch on top of 13.2 sources and

bnxt0: <Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet> mem 0x38080f10000-0x38080f1ffff,0x38080e00000-0x38080efffff,0x38080f
bnxt0: Using 256 TX descriptors and 256 RX descriptors
bnxt0: Using 8 RX queues 8 TX queues
bnxt0: Using MSI-X interrupts with 9 vectors
bnxt0: Ethernet address: 14:23:f2:a5:bd:50
bnxt0: netmap queues/slots: TX 8/256, RX 8/256
bnxt1: <Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet> mem 0x38080f00000-0x38080f0ffff,0x38080d00000-0x38080dfffff,0x38080f
bnxt1: Using 256 TX descriptors and 256 RX descriptors
bnxt1: Using 8 RX queues 8 TX queues
bnxt1: Using MSI-X interrupts with 9 vectors
bnxt1: Ethernet address: 14:23:f2:a5:bd:51
bnxt1: Unknown phy type
bnxt1: netmap queues/slots: TX 8/256, RX 8/256

I still get

Mounting local filesystems:.
ELF ldconfig pbnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
ath: /lib /usr/lib /usr/lib/compbnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
carp_alloc_if: ifpromisc(bnxt0.2) failed: 12
at
32-bit compabnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
tibility ldconfibnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
g path: /usr/libbnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
carp: demoted by 240 to 240 (interface down)
32

-- the intermingling of rc and kernel output doesn't help, but you get the drift. I haven't been able to verify if the vlans are functional, yet.
Comment 54 Hauke Fath 2023-11-03 10:14:04 UTC
(In reply to Hauke Fath from comment #53)
For clarity, the patch I applied is https://bugs.freebsd.org/bugzilla/attachment.cgi?id=243922

and the card's firmware state is

dev.bnxt.0.ver.hwrm_min_ver: 1.10.2
dev.bnxt.0.ver.package_ver: 226.1.107.1
dev.bnxt.0.ver.chip_type: ASIC
dev.bnxt.0.ver.chip_bond_id: 0
dev.bnxt.0.ver.chip_metal: 1
dev.bnxt.0.ver.chip_rev: 1
dev.bnxt.0.ver.chip_num: 5847
dev.bnxt.0.ver.phy_partnumber: S28-PC015
dev.bnxt.0.ver.phy_vendor: FS
dev.bnxt.0.ver.roce_fw_name: BONO_FW
dev.bnxt.0.ver.netctrl_fw_name: KONG_FW
dev.bnxt.0.ver.mgmt_fw_name: AFW_226.0.145.0
dev.bnxt.0.ver.hwrm_fw_name: CHIMP_FW
dev.bnxt.0.ver.phy: 13.1.11
dev.bnxt.0.ver.fw_ver: 226.0.145.0/pkg 226.1.107.1
dev.bnxt.0.ver.roce_fw: 226.0.145
dev.bnxt.0.ver.netctrl_fw: 226.0.145
dev.bnxt.0.ver.mgmt_fw: 226.0.145
dev.bnxt.0.ver.hwrm_fw: 226.0.145
dev.bnxt.0.ver.driver_hwrm_if: 1.10.2.34
dev.bnxt.0.ver.hwrm_if: 1.10.2
Comment 55 Chandrakanth Patil 2023-11-03 11:40:54 UTC
(In reply to Hauke Fath from comment #54)
The firmware configuration appears to be distinct in my setup. I will replicate the same firmware settings locally and attempt to reproduce the issue. By the way, I need the precise steps for reproduction. Could you please help with that?
Comment 56 Kristof Provost freebsd_committer freebsd_triage 2023-11-03 11:50:34 UTC
(In reply to Chandrakanth Patil from comment #55)
At least in my case a simple `ifconfig vlan create ; ifconfig vlan0 vlan 42 vlandev bnxt0` was sufficient to trigger the error and loss of connectivity.

(That's from memory, I've not reverted Kevin's patch to test again.)
Comment 57 Chandrakanth Patil 2023-11-07 09:45:48 UTC
(In reply to Kristof Provost from comment #56)

- Are there many mcast MAC addresses that have been added? Please let me know.
- Please get the resource allocation strategy from the firmware through below 
  command
 on lcdiag:
 # nvm cfg 1101-
 on FreeBSD os:
 # bnxtnvm -dev=<dev-name> getoption=? | grep -i strategy  (bnxtnvm utility is 
   needed)
- please provide the ifconfig output of the interface (ifconfig <intf>)
Comment 58 Kristof Provost freebsd_committer freebsd_triage 2023-11-07 12:41:32 UTC
(In reply to Chandrakanth Patil from comment #57)
Okay, so with 725e4008efef32dfbe57b3e21635fa80dde8ee38 and ca3fc7aabe3998822c6e1357df922618afb18648 reverted I see `bnxt0: HWRM_CFA_L2_FILTER_ALLOC command returned INVALID_PARAMS error.` on `ifconfig vlan0 vlan 42 vlandev bnxt0`.

There are no extra MAC addresses added or anything like that.

bnxt0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500
	options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG>
	ether f4:02:70:ae:72:8c
	inet 10.0.2.211 netmask 0xffffff00 broadcast 10.0.2.255
	media: Ethernet autoselect (1000baseT <full-duplex,rxpause,txpause>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

I can't seem to find the bnxtnvm tool in the ports tree. Where can I find that?
Comment 59 Werner Fischer 2024-01-08 08:36:29 UTC
(In reply to Kristof Provost from comment #58)
Hi Kristof, hi Chandrakanth 

@Kristof: you can find the bnxtnvm utility here:
https://docs.broadcom.com/docs/NXE_FreeBSD_niccli_228.0.132.0-

This is the currently latest version. To search for the most current version you can search for it at the Broadcom website (you can find the download in the "Firmware" area):
https://www.broadcom.com/support/download-search?pg=&pf=Ethernet+Network+Adapters&pn=P225P++-+2+x+25/10G+PCIe+NIC&pa=&po=Broadcom&dk=&pl=&l=false

@Chandrakanth: do you have any other updates / new findings on the issues? Or should we wait for Kristof's "# bnxtnvm -dev=<dev-name> getoption=? | grep -i strategy" output?

Best regards,
Werner
Comment 60 Kristof Provost freebsd_committer freebsd_triage 2024-01-08 09:42:20 UTC
(In reply to Werner Fischer from comment #59)
The tool seems to be called niccli.freebsd, and doesn't quite take the options you listed, but it does have a getoption command in it's CLI, and produces this:

./niccli.freebsd -dev 1 getoption -name afm_rm_resc_strategy     
                                                            
-------------------------------------------------------------------------------
Scrutiny NIC CLI v228.0.132.0 - Broadcom Inc. (c) 2023 (Bld-61.52.25.90.16.0)
-------------------------------------------------------------------------------

ERROR: Getting option 'afm_rm_resc_strategy' value does not support on this hardware.
ERROR: Get option failed for option 'afm_rm_resc_strategy'.

EXIT CODE   : C0000001
DESCRIPTION : Command failed with generic failure status.
              Command getoption failed.

It also makes WITNESS unhappy:

...
uma_zalloc_debug: zone "malloc-256" with the following non-sleepable locks held:
exclusive sleep mutex BNXT MGMT Lock (BNXT MGMT Lock) r = 0 (0xffffffff82039590) locked @ /usr/src/sys/dev/bnxt/bnxt_mgmt.c:347
stack backtrace:
#0 0xffffffff80bc6c35 at witness_debugger+0x65
#1 0xffffffff80bc7d79 at witness_warn+0x3e9
#2 0xffffffff80ee4994 at uma_zalloc_debug+0x34
#3 0xffffffff80ee44a7 at uma_zalloc_arg+0x27
#4 0xffffffff80b25a5e at malloc+0x7e
#5 0xffffffff8202f6d9 at bnxt_mgmt_ioctl+0x869
#6 0xffffffff809dbde2 at devfs_ioctl+0xd2
#7 0xffffffff80c60fe2 at vn_ioctl+0xc2
#8 0xffffffff809dc4be at devfs_ioctl_f+0x1e
#9 0xffffffff80bcc526 at kern_ioctl+0x286
#10 0xffffffff80bcc233 at sys_ioctl+0x143
#11 0xffffffff81057453 at amd64_syscall+0x153
#12 0xffffffff81028deb at fast_syscall_common+0xf8
Comment 61 Chandrakanth Patil 2024-01-08 10:04:35 UTC
Hi Kristof,

Thanks for the data. The niccli crash issue seems to be a different issue.

Werner, Kristof,
As of today, I am unable to reproduce the issue that led to the delay.
I would paste my complete setup details here and please let me know what is missing in it.
Comment 62 Kristof Provost freebsd_committer freebsd_triage 2024-01-08 10:11:50 UTC
(In reply to Chandrakanth Patil from comment #61)
I believe that's expected on a recent FreeBSD main. Kevin worked around the problem with 725e4008efef32dfbe57b3e21635fa80dde8ee38 and ca3fc7aabe3998822c6e1357df922618afb18648.

You may need to revert those two to see the problem again.
Comment 63 Werner Fischer 2024-02-01 08:58:14 UTC
(In reply to Kristof Provost from comment #62 and Chandrakanth Patil from comment #61)

Chandrakanth, do you need further information?

Kristof mentioned that's expected on a recent FreeBSD main, that you do not see the issue. Kevin worked around the problem with 725e4008efef32dfbe57b3e21635fa80dde8ee38 and ca3fc7aabe3998822c6e1357df922618afb18648 and you may need to revert those two to see the problem again.

Kevin Bowling mentioned in comment #49 that there seems to still be a critical bug in the driver initialization. His commit did not fix the issue, it "simply reduces the damage for common use." he mentioned.

In case I can help in any way, please let me know.
Comment 64 Kenneth D. Merry freebsd_committer freebsd_triage 2024-04-16 17:52:09 UTC
I have a machine with a BCM57414 on the motherboard:

bnxt0: <Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet> mem 0xd1d10000-0xd1d1ffff,0xd1c00000-0xd1cfffff,0xd1d22000-0xd1d23fff irq 36 at device 0.0 numa-domain 0 on pci3
bnxt0: Using 256 TX descriptors and 256 RX descriptors
bnxt0: Using 16 RX queues 16 TX queues
bnxt0: Using MSI-X interrupts with 17 vectors
bnxt0: Ethernet address: 9c:6b:00:46:a2:0c
bnxt1: <Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet> mem 0xd1d00000-0xd1d0ffff,0xd1b00000-0xd1bfffff,0xd1d20000-0xd1d21fff irq 37 at device 0.1 numa-domain 0 on pci3
bnxt1: Using 256 TX descriptors and 256 RX descriptors
bnxt1: Using 16 RX queues 16 TX queues
bnxt1: Using MSI-X interrupts with 17 vectors
bnxt1: Ethernet address: 9c:6b:00:46:a2:0d

Here is the firmware information:

dev.bnxt.0.nvram.available_size: 4173824
dev.bnxt.0.nvram.reserved_size: 16384
dev.bnxt.0.nvram.size: 8388608
dev.bnxt.0.nvram.sector_size: 4096
dev.bnxt.0.nvram.device_id: 16407
dev.bnxt.0.nvram.mfg_id: 239
dev.bnxt.0.ver.hwrm_min_ver: 1.10.2
dev.bnxt.0.ver.package_ver: <unknown>
dev.bnxt.0.ver.chip_type: ASIC
dev.bnxt.0.ver.chip_bond_id: 0
dev.bnxt.0.ver.chip_metal: 1
dev.bnxt.0.ver.chip_rev: 1
dev.bnxt.0.ver.chip_num: 5847
dev.bnxt.0.ver.phy_partnumber: MCP7F00-A002
dev.bnxt.0.ver.phy_vendor: Mellanox
dev.bnxt.0.ver.roce_fw_name: BONO_FW
dev.bnxt.0.ver.netctrl_fw_name: KONG_FW
dev.bnxt.0.ver.mgmt_fw_name: AFW_226.0.145.0
dev.bnxt.0.ver.hwrm_fw_name: CHIMP_FW
dev.bnxt.0.ver.phy: 13.1.11
dev.bnxt.0.ver.fw_ver: 226.0.145.0/pkg N/A
dev.bnxt.0.ver.roce_fw: 226.0.145
dev.bnxt.0.ver.netctrl_fw: 226.0.145
dev.bnxt.0.ver.mgmt_fw: 226.0.145
dev.bnxt.0.ver.hwrm_fw: 226.0.145
dev.bnxt.0.ver.driver_hwrm_if: 1.10.2.34
dev.bnxt.0.ver.hwrm_if: 1.10.2
dev.bnxt.0.%domain: 0
dev.bnxt.0.%parent: pci3
dev.bnxt.0.%pnpinfo: vendor=0x14e4 device=0x16d7 subvendor=0x1849 subdevice=0x1402 class=0x020000
dev.bnxt.0.%location: slot=0 function=0 dbsf=pci0:195:0:0
dev.bnxt.0.%driver: bnxt
dev.bnxt.0.%desc: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet


I have no VLANs configured.  I'm running stable/13 from mid-2023, but I've tried the driver from the latest FreeBSD/head and FreeBSD stable/13 with no success.  I still get:

bnxt0: HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.
bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
bnxt0: Link is UP full duplex, FC - none - 25000 Mbps 
bnxt0: link state changed to UP
bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
bnxt1: Link is UP full duplex, FC - none - 25000 Mbps 
bnxt1: link state changed to UP
bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed
bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error.
bnxt0: set_multi: rx_mask set failed

Strangely enough, though, the driver works fine if I first PXE boot the machine off of the chip.  If I do that, it works normally.  But if I boot off disk, I get the RESOURCE_ALLOC_ERROR messages above.

This suggests that there is some kind of initialization issue that the PXE boot environment takes care of, but the driver does not.

Also, bnxtnvm and niccli don't work with the driver in my kernel.  But it isn't, apparently, because of the state of the driver, it is because of the ioctl definition in the driver.  The ioctl call doesn't even get to bnxt_mgmt_ioctl.  I've verified that with dtrace, but here is the ioctl call from ktrace/kdump:

  7818 bnxtnvm  CALL  openat(AT_FDCWD,0x46199d,0x2<O_RDWR>)
  7818 bnxtnvm  NAMI  "/dev/bnxt_mgmt"
  7818 bnxtnvm  RET   openat 4
  7818 bnxtnvm  CALL  ioctl(0x4,0x80000000,0x821199a70)
  7818 bnxtnvm  RET   ioctl -1 errno 25 Inappropriate ioctl for device
  7818 bnxtnvm  CALL  close(0x4)

Using this Dtrace script:
#pragma D option cleanrate=5000hz
#pragma D option dynvarsize=8192000

fbt::bnxt_mgmt_ioctl:entry
{
        printf("cmd = %#x\n", args[1]);
}

fbt::bnxt_mgmt_open:entry
{
        printf("opened bnxt mgmt device\n");
}

fbt::sys_ioctl:entry
/args[1]->com == 0x80000000/
{
        printf("got ioctl command 0x80000000\n");
}

I verified that it isn't getting down to bnxt_mgmt_ioctl:

# dtrace -s bnxt.d 
dtrace: script 'bnxt.d' matched 3 probes
CPU     ID                    FUNCTION:NAME
 31   2108             bnxt_mgmt_open:entry opened bnxt mgmt device

 31  22882                  sys_ioctl:entry got ioctl command 0x80000000

 31   2108             bnxt_mgmt_open:entry opened bnxt mgmt device

 31  22882                  sys_ioctl:entry got ioctl command 0x80000000

^C

When I boot the machine via PXE, though, bnxtnvm listdev shows the device:

# ./bnxtnvm listdev

N/A #1
Device Interface Name       : bnxt0
MACAddress                  : 9c:6b:00:46:a2:0c
PCI Device Name             : 0000:c3:00.0

And strangely the ioctl works, although from my reading of sys_ioctl(), it shouldn't but I think I've discovered why it does:

  1904 bnxtnvm  CALL  openat(AT_FDCWD,0x46199d,0x2<O_RDWR>)
  1904 bnxtnvm  NAMI  "/dev/bnxt_mgmt"
  1904 bnxtnvm  RET   openat 3
  1904 bnxtnvm  CALL  ioctl(0x3,0x80000000,0x8212303d0)
  1904 bnxtnvm  RET   ioctl 0
  1904 bnxtnvm  CALL  close(0x3)

This code is from sys_ioctl():


        /*
         * Interpret high order word to find amount of data to be
         * copied to/from the user's address space.
         */
        size = IOCPARM_LEN(com);
        if ((size > IOCPARM_MAX) ||
            ((com & (IOC_VOID  | IOC_IN | IOC_OUT)) == 0) ||
#if defined(COMPAT_FREEBSD5) || defined(COMPAT_FREEBSD4) || defined(COMPAT_43)
            ((com & IOC_OUT) && size == 0) ||
#else
            ((com & (IOC_IN | IOC_OUT)) && size == 0) ||
#endif
            ((com & IOC_VOID) && size > 0 && size != sizeof(int)))
                return (ENOTTY);

My regular kernel config file doesn't have COMPAT_FREEBSD4/5, but the PXE kernel config file does.

Here are the bit definitions from sys/ioccom.h:
#ifndef _SYS_IOCCOM_H_
#define _SYS_IOCCOM_H_

/*
 * Ioctl's have the command encoded in the lower word, and the size of
 * any in or out parameters in the upper word.  The high 3 bits of the
 * upper word are used to encode the in/out status of the parameter.
 *
 *       31 29 28                     16 15            8 7             0
 *      +---------------------------------------------------------------+
 *      | I/O | Parameter Length        | Command Group | Command       |
 *      +---------------------------------------------------------------+
 */
#define IOCPARM_SHIFT   13              /* number of bits for ioctl size */
#define IOCPARM_MASK    ((1 << IOCPARM_SHIFT) - 1) /* parameter length mask */
#define IOCPARM_LEN(x)  (((x) >> 16) & IOCPARM_MASK)
#define IOCBASECMD(x)   ((x) & ~(IOCPARM_MASK << 16))
#define IOCGROUP(x)     (((x) >> 8) & 0xff)

#define IOCPARM_MAX     (1 << IOCPARM_SHIFT) /* max size of ioctl */

#define IOC_VOID        0x20000000UL    /* no parameters */
#define IOC_OUT         0x40000000UL    /* copy out parameters */
#define IOC_IN          0x80000000UL    /* copy in parameters */
#define IOC_INOUT       (IOC_IN|IOC_OUT)/* copy parameters in and out */
#define IOC_DIRMASK     (IOC_VOID|IOC_OUT|IOC_IN)/* mask for IN/OUT/VOID */

Because the BNXT_MGMT_OPCODE_GET_DEV_INFO ioctl the same as the IOC_IN bit definition, the ioctl breaks if the old compat stuff isn't built into the kernel.

The ioctls for the bnxt(4) driver need to be changed to use the usual _IOW/_IOWR macros from sys/ioccom.h.  I realize that will break the management tools.  Perhaps they can have a version check and a fallback to the old ioctls if need be.
Comment 65 Kenneth D. Merry freebsd_committer freebsd_triage 2024-06-25 13:15:00 UTC
(In reply to Kenneth D. Merry from comment #64)

Upgrading the firmware on the NIC from version 226 to 229 fixed the RESOURCE_ALLOC_ERROR errors in my case.  Since this is an onboard NIC (BCM57414 on an ASRock motherboard) I needed a different firmware load than the standard one in order to be able to upgrade.

Broadcom kindly gave me the firmware load, and ASRock is upgrading their motherboards to ship with the new version.

The ioctl issue is still there, though.
Comment 66 Hauke Fath 2024-11-06 16:41:36 UTC
(In reply to Hauke Fath from comment #54)
With a freebsd-13 build of last week, and the firmware updated to vAFW_231.0.153.0, VLANs are functional here.

OTOH, a gif(4) tunnel attached to the NIC turned into a black hole. The other end (same freebsd build) pf instance logged a few udp packages (dns) as dropped for checksum errors, so I suspect problems with the checksum offload.

Going back to the mellanox X3 card got the router pair working again; I guess at this point I'll have to write off the broadcom cards.
Comment 67 Hauke Fath 2024-11-07 14:00:29 UTC
(In reply to Hauke Fath from comment #66)

As an aside, in a plain (non-vlan) setup, the broadcom NIC "negociated" 1 GBit with an Nvidia 2010 switch on a 1/10/15 GBit port set to 'auto'. The port had to be set to 25 GBit before the NIC accepted the speed.
Comment 68 Augusta L 2024-11-17 15:57:04 UTC
(In reply to Kenneth D. Merry from comment #65)

> Broadcom kindly gave me the firmware load, and ASRock is upgrading their motherboards to ship with the new version.

Hi Kenneth. Would you be able to share how or where you got firmware 229 for your embedded BCM57414 controller?

Did you have to raise a support request with Broadcom or was there a firmware download link available on the internet that worked for the ethernet controller embedded in your board?

All of the links I'm finding are for the BCM57414 controller on a dedicated PCIe card NOT the embedded controller on the motherboard.