Was this intended to be a separate bug report? I see you as a commenter under bug 245981, bnxt(4): BCM57414 / BCM57416 not initializing: bnxt0: Unable to allocate device TX queue / queue memory
Dear Graham, I'm sorry, didn't realize I submitted it. Yes, the intention was to create a separate one it seems to be different. I'm having issues with the BCM57416 on 13-STABLE from yesterday (24-JAN-2023). [49] bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. [49] bnxt0: set_multi: rx_mask set failed [49] bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. [49] bnxt0: set_multi: rx_mask set failed [49] bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. [49] bnxt0: set_multi: rx_mask set failed It was working good with stable/13-n252199-9644bc4a1126. I have tried removing all offloading features but after each ifconfig bnxt0 -capability it returns the same ALLOC error. Have tried adding promisc and the same results. The interface up and link are up but no traffic. The interface bnxt0 has three 802.1q sub-interfaces sysctl : dev.bnxt.0.%pnpinfo: vendor=0x14e4 device=0x16d8 subvendor=0x15d9 subdevice=0x16d8 class=0x020000 dev.bnxt.0.%location: slot=0 function=0 dbsf=pci0:199:0:0 handle=\_SB_.S0D0.D0A6.D017 dev.bnxt.0.%driver: bnxt dev.bnxt.0.%desc: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet dev.bnxt.0.ver.hwrm_min_ver: 1.2.2 dev.bnxt.0.ver.package_ver: 214.0.286.18 dev.bnxt.0.ver.chip_type: ASIC dev.bnxt.0.ver.chip_bond_id: 0 dev.bnxt.0.ver.chip_metal: 1 dev.bnxt.0.ver.chip_rev: 1 dev.bnxt.0.ver.chip_num: 5848 dev.bnxt.0.ver.phy_partnumber: dev.bnxt.0.ver.phy_vendor: dev.bnxt.0.ver.roce_fw_name: BONO_FW dev.bnxt.0.ver.netctrl_fw_name: KONG_FW dev.bnxt.0.ver.mgmt_fw_name: AFW_214.0.253.2 dev.bnxt.0.ver.hwrm_fw_name: CHIMP_FW dev.bnxt.0.ver.phy: 13.1.11 dev.bnxt.0.ver.roce_fw: 214.0.187 dev.bnxt.0.ver.netctrl_fw: 214.0.241 dev.bnxt.0.ver.mgmt_fw: 214.0.253 dev.bnxt.0.ver.hwrm_fw: 214.4.9 dev.bnxt.0.ver.driver_hwrm_if: 1.8.1.7 dev.bnxt.0.ver.hwrm_if: 1.10.0
Seems that the following commit is the one causing the issue. if_bnxt: Add support for VLAN on Thor: 2db35273502b3c35aa653effc5c97618567367ab Went back to 13.1 and started applying each of the commits on stable/13 for bnxt. After doing a cherry-pick on "if_bnxt: Add support for VLAN on Thor" the NIC stop working. Reverting it, make the card to work again.
The issue is related to the following code in bnxt_hwrm.c, line 1480. This gets always true and then returns. I have commented on this and the NICs are working with stable/13 (today). Now not sure what is the correct check that this "if" should do, any hints?? if (*filter_id != -1) { device_printf(softc->dev, "Attempt to re-allocate l2 ctx " "filter (fid: 0x%jx)\n", (uintmax_t)*filter_id); return EDOOFUS; } [408] bnxt0: vlan tag : 0x3fc, filter-id: 0x106000000000204) [408] bnxt0: Attempt to re-allocate l2 ctx filter (fid: 0x106000000000204) [408] bnxt0: vlan tag : 0x3f3, filter-id: 0x107000000000404) [408] bnxt0: Attempt to re-allocate l2 ctx filter (fid: 0x107000000000404) [408] bnxt0: vlan tag : 0x3f2, filter-id: 0x108000000000604) [408] bnxt0: Attempt to re-allocate l2 ctx filter (fid: 0x108000000000604)
(In reply to Santiago Martinez from comment #4) I think the problem is this: when a new vlan tag is registered, bnxt_vlan_register() adds a new tag structure to the vlan_tags list. After adding a vlan tag, iflib will reinitialize the interface, see iflib_vlan_register()->iflib_init_locked()->IFDI_INIT. Then bnxt_init() will call bnxt_hwrm_set_filter(), which initializes all the tags on the list. Suppose all of this happens twice. bnxt_hwrm_set_filter() will encounter an already-initialized tag and trigger the EDOOFUS error. I suspect that bnxt_init() should unregister all of its filters during reinitialization. That is, bnxt_init() should call bnxt_hwrm_free_filter() before calling bnxt_clear_ids(). (I'm not very familiar with this driver though, so this might not work.)
Hi Mark, thanks a lot for the reply. I will try to do some tests. I will make the init to call the filter free and see what happens. Will keep you posted.
Been doing more tests and there are two issues: One: related to the if in line 1480. Second: the is related to the filter enabled on lines 1503-1505. When i did the initial workaround it forgot that I have also commented out the code: if (vlan_tag != 0xffff) { enables |= HWRM_CFA_L2_FILTER_ALLOC_INPUT_ENABLES_L2_IVLAN | HWRM_CFA_L2_FILTER_ALLOC_INPUT_ENABLES_L2_IVLAN_MASK | HWRM_CFA_L2_FILTER_ALLOC_INPUT_ENABLES_NUM_VLANS; req.l2_ivlan_mask = 0xffff; req.l2_ivlan = vlan_tag; req.num_vlans = 1; } I will do some more test tomorrow and compare it with the linux driver.
Since upgrade from 12.3p1x to 13.2-RELEASE, we have the same error message here with bnxt (not tested with 13.1): dmesg: bnxt0: <Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet> mem 0xb9a10000-0xb9a1ffff,0xb9100000-0xb91fffff,0xb9aa2000-0xb9aa3fff irq 48 at device 0.0 numa-domain 0 on pci9 bnxt0: Using 256 TX descriptors and 256 RX descriptors bnxt0: Using 12 RX queues 12 TX queues bnxt0: Using MSI-X interrupts with 13 vectors bnxt0: Ethernet address: d0:94:66:81:60:e3 bnxt0: netmap queues/slots: TX 12/256, RX 12/256 bnxt1: <Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet> mem 0xb9a00000-0xb9a0ffff,0xb8800000-0xb88fffff,0xb9aa0000-0xb9aa1fff irq 52 at device 0.1 numa-domain 0 on pci9 bnxt1: Using 256 TX descriptors and 256 RX descriptors bnxt1: Using 12 RX queues 12 TX queues bnxt1: Using MSI-X interrupts with 13 vectors bnxt1: Ethernet address: d0:94:66:81:60:e4 bnxt1: netmap queues/slots: TX 12/256, RX 12/256 bnxt0: Link is UP full duplex, FC - none - 10000 Mbps bnxt0: link state changed to UP bnxt1: Link is UP full duplex, FC - none - 10000 Mbps bnxt1: link state changed to UP bnxt0: Attempt to re-allocate l2 ctx filter (fid: 0x117000000000204) bnxt1: Attempt to re-allocate l2 ctx filter (fid: 0x11c00000003f004) bnxt0: Attempt to re-allocate l2 ctx filter (fid: 0x125000000000204) bnxt1: Attempt to re-allocate l2 ctx filter (fid: 0x12800000003f004) bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed [same messages x 100's] sysctl: dev.bnxt.0.%domain: 0 dev.bnxt.0.%parent: pci9 dev.bnxt.0.%pnpinfo: vendor=0x14e4 device=0x16d8 subvendor=0x1028 subdevice=0x1feb class=0x020000 dev.bnxt.0.%location: slot=0 function=0 dbsf=pci0:94:0:0 dev.bnxt.0.%driver: bnxt dev.bnxt.0.%desc: Broadcom BCM57416 NetXtreme-E 10GBase-T Ethernet dev.bnxt.0.ver.hwrm_min_ver: 1.10.2 dev.bnxt.0.ver.package_ver: <unknown> dev.bnxt.0.ver.chip_type: ASIC dev.bnxt.0.ver.chip_bond_id: 0 dev.bnxt.0.ver.chip_metal: 1 dev.bnxt.0.ver.chip_rev: 1 dev.bnxt.0.ver.chip_num: 5848 dev.bnxt.0.ver.phy_partnumber: 616740003 dev.bnxt.0.ver.phy_vendor: Amphenol dev.bnxt.0.ver.roce_fw_name: BONO_FW dev.bnxt.0.ver.netctrl_fw_name: KONG_FW dev.bnxt.0.ver.mgmt_fw_name: AFW_223.0.205.0 dev.bnxt.0.ver.hwrm_fw_name: CHIMP_FW dev.bnxt.0.ver.phy: 13.1.11 dev.bnxt.0.ver.fw_ver: 223.0.205.0/pkg 22.31.13.70 dev.bnxt.0.ver.roce_fw: 223.0.205 dev.bnxt.0.ver.netctrl_fw: 223.0.205 dev.bnxt.0.ver.mgmt_fw: 223.0.205 dev.bnxt.0.ver.hwrm_fw: 223.0.205 dev.bnxt.0.ver.driver_hwrm_if: 1.10.2.34 dev.bnxt.0.ver.hwrm_if: 1.10.2
(In reply to Santiago Martinez from comment #7) Have you had any success?
(In reply to Mark Johnston from comment #9) Hi Mark, With the patch it works and the server has been stable, as it was with 13.1 The card seems to work fine, with no errors with or without 802.1q encap. I have tried to find documentation from Broadcom regarding the meaning of each bit on the mask, but I couldn't find it. If required I can provide access to the servers as they are for lab purposes. Best regards. Santi
(In reply to Santiago Martinez from comment #10) Thanks for following up so quickly. Could you please share the patch you are using? Is it just based on comment 5, or is there more to it?
Yes, its based on comment 4 + 7. Give me a few mins that I sync up with 13.2 and will check that still valid.
Created attachment 241972 [details] bnxt-patch.diff
here it goes. sorry for the delay.
tested here and approved
tested and approved here !!! (releng/13.2 + patch from #13) We are using lagg (lacp) with two bnxt, vlans on top of lagg and bridge to connect vnet jails (some vlans with mtu 9000, other with mtu 1500) I'll try now with latest firmware available
Running well with dell's firmware 22.31.13.70 dev.bnxt.0.ver.fw_ver: 223.0.205.0/pkg 22.31.13.70 Thank you Santiago !!! Could it be part of next errata release ?
Those are good news. It will be great if it can be fixed on the next service release as it is a pain at the moment. My only frustration, is that we didn't manage to sort it out before releasing 13.2. But on the other hand I understand we are short in resources and we all do what we can.
Hi, any news for integration in next errata release ? (-p2 ?)
*** Bug 272865 has been marked as a duplicate of this bug. ***
I have this problem, too. Your patch seems to be only a "local quick fix", not a mainstream solution. I'll try to find out, how to detect the necessary cases.
I may test code if needed, for now I have a spare server affected
(In reply to Lutz Donnerhacke from comment #21) Hi Lutz, thanks for taking a look. Indeed, I have just rolled back a few changes committed before that broke the driver. We need someone that knows how these cards work and make sure that we are doing the correct things. I have tried digging for documents that explain how to program the filters but failed to do so. What frustrates me, is the fact that these changes (the ones that broke the driver) went into 13.2 even when the issue has been raised before and now people are been hit. Clearly, we need better testing of network drivers, as this is not complex that is triggered by specific traffic patterns or values, but just basic driver functionality. I do have access to some boxes that can be used for testing, so I'm happy to be included as part of testing on network drivers, assuming that I have those cards in the laboratory. Thanks again. Santi
I ran into the same bug as well, when I upgraded my pfSense firewall software to its recently published 2.7 mainstream release. I wrote about it there in the forums and already found a few people who are affected by the same bug within a day. (https://forum.netgate.com/topic/181948/bug-in-broadcom-bnxt-driver-in-combination-with-vlans) I fear, with popular software solutions relying on this freebsd release, a lot more people will be affected soon.
The last updates to this driver were driven by the vendor of the card. But it's sounding like they caused too many problems and it's better to back them out, correct? Is that what the patch does?
Hi Warner, yes that pretty much correct. At some point Mark J mentioned that he thinks there are other issues. I did try to contact the vendor committer (based on the commit) and haven't received any reply.
(In reply to Santiago Martinez from comment #26) Hi Warner and Santiago, I am working towards the resolution and will update you ASAP. -Chandrakanth patil
(In reply to Chandrakanth Patil from comment #27) Thanks a lot! Will be available for testing. Santi
Created attachment 243922 [details] debug_patch_01 - VLAN tags are getting used after freeing which may lead to this issue. - Assigning VLAN filter tags to -1 to avoid using VLAN tags after freeing them. - Please check whether this fixes the issue.
Thanks, will give it a try today and let you know. Santi
(In reply to Chandrakanth Patil from comment #29) I'm still seeing 'bnxt0: HWRM_CFA_L2_FILTER_ALLOC command returned INVALID_PARAMS error.' with debug_patch_01, when I try to `ifconfig vlan0 vlan 201 vlandev bnxt0 up`.
Same here, applied the path against latest 13-stable and it fails.
(In reply to Chandrakanth Patil from comment #29) Hi Chandrakanth, hope you are doing well. Just wondering if you have the chance to review the last patch as the driver still fails. BR.
https://reviews.freebsd.org/D41558 Kevin Bowling did this... does it help?
The problem also applies to BCM57412: Aug 13 10:06:44 y kernel: bnxt1: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt1@pci0:24:0:1: class=0x020000 rev=0x01 hdr=0x00 vendor=0x14e4 device=0x16d6 subvendor=0x14e4 subdevice=0x4120 vendor = 'Broadcom Inc. and subsidiaries' device = 'BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller' class = network subclass = ethernet This was working in 13.1-STABLE n253353
(In reply to Warner Losh from comment #34) It appears so. Got another confirmation via https://forum.opnsense.org/index.php?topic=35139.msg172730#msg172730 but they will do more testing to be sure.
(In reply to Warner Losh from comment #34 an to Franco from comment #36) As Franco mentioned, the patch https://reviews.freebsd.org/D41558 fixes the problem for us with OPNsense 23.7 (base FreeBSD 13.2). Thank you Franco for providing the updated Kernel including this patch. We will monitor it for another few days to ensure that no other side-affects appear. But from the current point of view, it seems that this patch fixes the problem. (In reply to Steinar Haug from comment #35) Affected by this bug are all NICs with BCM574xx chips (codename Whitley+). With the FreeBSD 13.2 release, support for the newer BCM575xx chips (codename Thor) has been added. As mentioned in comment #3, adding VLAN support for Thor somehow broke VLAN support for Whitley+. I have written wiki article with an overview about the different Broadcom NICs and the chips they are using: https://www.thomas-krenn.com/de/wiki/Broadcom_Netzwerkkarten
(In reply to Warner Losh from comment #34) Hi Warner, sorry for the late reply. Indeed, that patch did solve the issue for bnxt. I will continue monitoring that box.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=bce864d1c274faeb6678028aad1e07e91fe430ac commit bce864d1c274faeb6678028aad1e07e91fe430ac Author: Kevin Bowling <kbowling@FreeBSD.org> AuthorDate: 2023-08-24 20:16:24 +0000 Commit: Kevin Bowling <kbowling@FreeBSD.org> CommitDate: 2023-08-24 20:46:56 +0000 bnxt: Don't restart on VLAN changes In rS360398, a new iflib device method was added with default of opt out for VLAN events needing an interface reset. This is unintentional for bnxt(4) and is causing another bug in its VLAN initialization code to affect the common case of adding and removing VLANs on an existing interface. PR: 269133 Tested by: kp MFC after: 2 weeks Sponsored by: BBOX.io Differential Revision: https://reviews.freebsd.org/D41558 sys/dev/bnxt/if_bnxt.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
I confirm last patch working on 13.2p2 here https://reviews.freebsd.org/D41558
(In reply to geoffroy desvernay from comment #40) It does avoid the problem but note this doesn't fix the actual logic bugs in the driver. Hopefully Broadcom will continue their corrections.
Will this find its way to 13.2-RELEASE at some point? It is a regression from one release to another after all and 13.3-RELEASE might be far away still..
(In reply to Philipp Wuensche from comment #42) Seconded - please include this in 13.2-p3. While not a security fix this bug is a complete show stopper for at least two production systems for us. Not being able to perform binary upgrades in a highly automated data centre is "not good" [tm]. Kind regards, Patrick
(In reply to punkt.de Hosting Team from comment #43) Well, 13.2p3 is out and as far as I can see, this fix is not in it.
Sorry, my bad. -p4, then. Let's agree on releng/13.2 ... :-)
(In reply to punkt.de Hosting Team from comment #45) The problem is who to agree with. I'm really not sure who to talk to get the fix in...
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ca3fc7aabe3998822c6e1357df922618afb18648 commit ca3fc7aabe3998822c6e1357df922618afb18648 Author: Kevin Bowling <kbowling@FreeBSD.org> AuthorDate: 2023-08-24 20:16:24 +0000 Commit: Kevin Bowling <kbowling@FreeBSD.org> CommitDate: 2023-09-11 22:34:20 +0000 bnxt: Don't restart on VLAN changes In rS360398, a new iflib device method was added with default of opt out for VLAN events needing an interface reset. This is unintentional for bnxt(4) and is causing another bug in its VLAN initialization code to affect the common case of adding and removing VLANs on an existing interface. PR: 269133 Tested by: kp Sponsored by: BBOX.io Differential Revision: https://reviews.freebsd.org/D41558 (cherry picked from commit bce864d1c274faeb6678028aad1e07e91fe430ac) sys/dev/bnxt/if_bnxt.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
^Triage: assign to committer who resolved this.
Broadcom still has a critical bug in their driver initialization and seem to be asleep at the wheel. That issue is not fixed, my commit simply reduces the damage for common use.
(In reply to Kevin Bowling from comment #49) Hi Kevin, Warner, An issue is with the below code execution that leads to VLAN failure. this problem is due to the driver is attempting to allocate an already allocated VLAN tag in the bnxt_hwrm_l2_filter_alloc function. Specifically, the code snippet below checks for a previously allocated filter ID: if (*filter_id != -1) { device_printf(softc->dev, "Attempt to re-allocate l2 ctx " "filter (fid: 0x%jx)\n", (uintmax_t)*filter_id); return EDOOFUS; } Here's the sequence of events: 1. When the first VLAN is created (vlan1), the correct filter ID (other than -1) is fetched from the firmware. 2. During the creation of the second VLAN (vlan2), the driver attempts to allocate vlan1. The target_id of vlan1 is a valid value, causing the above if condition to be true. Consequently, it returns after throwing the error "Attempt to re-allocate l2 ctx" without allocating vlan2. To resolve this issue, I suggest the following fix: Assign -1 to the target_id of all VLAN tags in the list in the bnxt_init function. Here's the relevant code snippet: if (!BNXT_CHIP_P5(softc)) { rc = bnxt_hwrm_func_reset(softc); if (rc) return; SLIST_FOREACH (tag, &vnic->vlan_tags, next) { tag->filter_id = -1; } } else if (softc->is_dev_init) { bnxt_stop(ctx); } I have applied this fix on BCM57416, and it has resolved the issue for me. I had provided this patch earlier(debug_patch_01), and I am surprised it did not fix the problem earlier. Could you please confirm if this patch resolves the issue?
(In reply to Chandrakanth Patil from comment #50) Hi Chandrakanth, thank you for your comment. Unfortunately, I don't have a test setup currently here, and I haven't yet compiled FreeBSD drivers by myself. Regarding your patch and the mentioned code snipped, Kristof Provost (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269133#c31) and Santiago Martinez (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269133#c32) reported above that this patch did not work for them: "I'm still seeing 'bnxt0: HWRM_CFA_L2_FILTER_ALLOC command returned INVALID_PARAMS error.' with debug_patch_01, when I try to `ifconfig vlan0 vlan 201 vlandev bnxt0 up`." So it seems to me that this patch (debug_patch_01) alone does not fix the issue, at least for Kristof and Santiago. Does anybody else in this thread has any thoughts on this or could someone help to test the patch again? Best regards, Werner
I have tried on the 223 firmware (same as Santiago's driver firmware combination) with a 13.2 inbox driver and I would be able to create the 240 VLANs without any issues.
(In reply to Werner Fischer from comment #51) Confirmed: With the respective patch on top of 13.2 sources and bnxt0: <Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet> mem 0x38080f10000-0x38080f1ffff,0x38080e00000-0x38080efffff,0x38080f bnxt0: Using 256 TX descriptors and 256 RX descriptors bnxt0: Using 8 RX queues 8 TX queues bnxt0: Using MSI-X interrupts with 9 vectors bnxt0: Ethernet address: 14:23:f2:a5:bd:50 bnxt0: netmap queues/slots: TX 8/256, RX 8/256 bnxt1: <Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet> mem 0x38080f00000-0x38080f0ffff,0x38080d00000-0x38080dfffff,0x38080f bnxt1: Using 256 TX descriptors and 256 RX descriptors bnxt1: Using 8 RX queues 8 TX queues bnxt1: Using MSI-X interrupts with 9 vectors bnxt1: Ethernet address: 14:23:f2:a5:bd:51 bnxt1: Unknown phy type bnxt1: netmap queues/slots: TX 8/256, RX 8/256 I still get Mounting local filesystems:. ELF ldconfig pbnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed ath: /lib /usr/lib /usr/lib/compbnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. carp_alloc_if: ifpromisc(bnxt0.2) failed: 12 at 32-bit compabnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed tibility ldconfibnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed g path: /usr/libbnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed carp: demoted by 240 to 240 (interface down) 32 -- the intermingling of rc and kernel output doesn't help, but you get the drift. I haven't been able to verify if the vlans are functional, yet.
(In reply to Hauke Fath from comment #53) For clarity, the patch I applied is https://bugs.freebsd.org/bugzilla/attachment.cgi?id=243922 and the card's firmware state is dev.bnxt.0.ver.hwrm_min_ver: 1.10.2 dev.bnxt.0.ver.package_ver: 226.1.107.1 dev.bnxt.0.ver.chip_type: ASIC dev.bnxt.0.ver.chip_bond_id: 0 dev.bnxt.0.ver.chip_metal: 1 dev.bnxt.0.ver.chip_rev: 1 dev.bnxt.0.ver.chip_num: 5847 dev.bnxt.0.ver.phy_partnumber: S28-PC015 dev.bnxt.0.ver.phy_vendor: FS dev.bnxt.0.ver.roce_fw_name: BONO_FW dev.bnxt.0.ver.netctrl_fw_name: KONG_FW dev.bnxt.0.ver.mgmt_fw_name: AFW_226.0.145.0 dev.bnxt.0.ver.hwrm_fw_name: CHIMP_FW dev.bnxt.0.ver.phy: 13.1.11 dev.bnxt.0.ver.fw_ver: 226.0.145.0/pkg 226.1.107.1 dev.bnxt.0.ver.roce_fw: 226.0.145 dev.bnxt.0.ver.netctrl_fw: 226.0.145 dev.bnxt.0.ver.mgmt_fw: 226.0.145 dev.bnxt.0.ver.hwrm_fw: 226.0.145 dev.bnxt.0.ver.driver_hwrm_if: 1.10.2.34 dev.bnxt.0.ver.hwrm_if: 1.10.2
(In reply to Hauke Fath from comment #54) The firmware configuration appears to be distinct in my setup. I will replicate the same firmware settings locally and attempt to reproduce the issue. By the way, I need the precise steps for reproduction. Could you please help with that?
(In reply to Chandrakanth Patil from comment #55) At least in my case a simple `ifconfig vlan create ; ifconfig vlan0 vlan 42 vlandev bnxt0` was sufficient to trigger the error and loss of connectivity. (That's from memory, I've not reverted Kevin's patch to test again.)
(In reply to Kristof Provost from comment #56) - Are there many mcast MAC addresses that have been added? Please let me know. - Please get the resource allocation strategy from the firmware through below command on lcdiag: # nvm cfg 1101- on FreeBSD os: # bnxtnvm -dev=<dev-name> getoption=? | grep -i strategy (bnxtnvm utility is needed) - please provide the ifconfig output of the interface (ifconfig <intf>)
(In reply to Chandrakanth Patil from comment #57) Okay, so with 725e4008efef32dfbe57b3e21635fa80dde8ee38 and ca3fc7aabe3998822c6e1357df922618afb18648 reverted I see `bnxt0: HWRM_CFA_L2_FILTER_ALLOC command returned INVALID_PARAMS error.` on `ifconfig vlan0 vlan 42 vlandev bnxt0`. There are no extra MAC addresses added or anything like that. bnxt0: flags=1008843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST,LOWER_UP> metric 0 mtu 1500 options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,HWSTATS,MEXTPG> ether f4:02:70:ae:72:8c inet 10.0.2.211 netmask 0xffffff00 broadcast 10.0.2.255 media: Ethernet autoselect (1000baseT <full-duplex,rxpause,txpause>) status: active nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> I can't seem to find the bnxtnvm tool in the ports tree. Where can I find that?
(In reply to Kristof Provost from comment #58) Hi Kristof, hi Chandrakanth @Kristof: you can find the bnxtnvm utility here: https://docs.broadcom.com/docs/NXE_FreeBSD_niccli_228.0.132.0- This is the currently latest version. To search for the most current version you can search for it at the Broadcom website (you can find the download in the "Firmware" area): https://www.broadcom.com/support/download-search?pg=&pf=Ethernet+Network+Adapters&pn=P225P++-+2+x+25/10G+PCIe+NIC&pa=&po=Broadcom&dk=&pl=&l=false @Chandrakanth: do you have any other updates / new findings on the issues? Or should we wait for Kristof's "# bnxtnvm -dev=<dev-name> getoption=? | grep -i strategy" output? Best regards, Werner
(In reply to Werner Fischer from comment #59) The tool seems to be called niccli.freebsd, and doesn't quite take the options you listed, but it does have a getoption command in it's CLI, and produces this: ./niccli.freebsd -dev 1 getoption -name afm_rm_resc_strategy ------------------------------------------------------------------------------- Scrutiny NIC CLI v228.0.132.0 - Broadcom Inc. (c) 2023 (Bld-61.52.25.90.16.0) ------------------------------------------------------------------------------- ERROR: Getting option 'afm_rm_resc_strategy' value does not support on this hardware. ERROR: Get option failed for option 'afm_rm_resc_strategy'. EXIT CODE : C0000001 DESCRIPTION : Command failed with generic failure status. Command getoption failed. It also makes WITNESS unhappy: ... uma_zalloc_debug: zone "malloc-256" with the following non-sleepable locks held: exclusive sleep mutex BNXT MGMT Lock (BNXT MGMT Lock) r = 0 (0xffffffff82039590) locked @ /usr/src/sys/dev/bnxt/bnxt_mgmt.c:347 stack backtrace: #0 0xffffffff80bc6c35 at witness_debugger+0x65 #1 0xffffffff80bc7d79 at witness_warn+0x3e9 #2 0xffffffff80ee4994 at uma_zalloc_debug+0x34 #3 0xffffffff80ee44a7 at uma_zalloc_arg+0x27 #4 0xffffffff80b25a5e at malloc+0x7e #5 0xffffffff8202f6d9 at bnxt_mgmt_ioctl+0x869 #6 0xffffffff809dbde2 at devfs_ioctl+0xd2 #7 0xffffffff80c60fe2 at vn_ioctl+0xc2 #8 0xffffffff809dc4be at devfs_ioctl_f+0x1e #9 0xffffffff80bcc526 at kern_ioctl+0x286 #10 0xffffffff80bcc233 at sys_ioctl+0x143 #11 0xffffffff81057453 at amd64_syscall+0x153 #12 0xffffffff81028deb at fast_syscall_common+0xf8
Hi Kristof, Thanks for the data. The niccli crash issue seems to be a different issue. Werner, Kristof, As of today, I am unable to reproduce the issue that led to the delay. I would paste my complete setup details here and please let me know what is missing in it.
(In reply to Chandrakanth Patil from comment #61) I believe that's expected on a recent FreeBSD main. Kevin worked around the problem with 725e4008efef32dfbe57b3e21635fa80dde8ee38 and ca3fc7aabe3998822c6e1357df922618afb18648. You may need to revert those two to see the problem again.
(In reply to Kristof Provost from comment #62 and Chandrakanth Patil from comment #61) Chandrakanth, do you need further information? Kristof mentioned that's expected on a recent FreeBSD main, that you do not see the issue. Kevin worked around the problem with 725e4008efef32dfbe57b3e21635fa80dde8ee38 and ca3fc7aabe3998822c6e1357df922618afb18648 and you may need to revert those two to see the problem again. Kevin Bowling mentioned in comment #49 that there seems to still be a critical bug in the driver initialization. His commit did not fix the issue, it "simply reduces the damage for common use." he mentioned. In case I can help in any way, please let me know.
I have a machine with a BCM57414 on the motherboard: bnxt0: <Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet> mem 0xd1d10000-0xd1d1ffff,0xd1c00000-0xd1cfffff,0xd1d22000-0xd1d23fff irq 36 at device 0.0 numa-domain 0 on pci3 bnxt0: Using 256 TX descriptors and 256 RX descriptors bnxt0: Using 16 RX queues 16 TX queues bnxt0: Using MSI-X interrupts with 17 vectors bnxt0: Ethernet address: 9c:6b:00:46:a2:0c bnxt1: <Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet> mem 0xd1d00000-0xd1d0ffff,0xd1b00000-0xd1bfffff,0xd1d20000-0xd1d21fff irq 37 at device 0.1 numa-domain 0 on pci3 bnxt1: Using 256 TX descriptors and 256 RX descriptors bnxt1: Using 16 RX queues 16 TX queues bnxt1: Using MSI-X interrupts with 17 vectors bnxt1: Ethernet address: 9c:6b:00:46:a2:0d Here is the firmware information: dev.bnxt.0.nvram.available_size: 4173824 dev.bnxt.0.nvram.reserved_size: 16384 dev.bnxt.0.nvram.size: 8388608 dev.bnxt.0.nvram.sector_size: 4096 dev.bnxt.0.nvram.device_id: 16407 dev.bnxt.0.nvram.mfg_id: 239 dev.bnxt.0.ver.hwrm_min_ver: 1.10.2 dev.bnxt.0.ver.package_ver: <unknown> dev.bnxt.0.ver.chip_type: ASIC dev.bnxt.0.ver.chip_bond_id: 0 dev.bnxt.0.ver.chip_metal: 1 dev.bnxt.0.ver.chip_rev: 1 dev.bnxt.0.ver.chip_num: 5847 dev.bnxt.0.ver.phy_partnumber: MCP7F00-A002 dev.bnxt.0.ver.phy_vendor: Mellanox dev.bnxt.0.ver.roce_fw_name: BONO_FW dev.bnxt.0.ver.netctrl_fw_name: KONG_FW dev.bnxt.0.ver.mgmt_fw_name: AFW_226.0.145.0 dev.bnxt.0.ver.hwrm_fw_name: CHIMP_FW dev.bnxt.0.ver.phy: 13.1.11 dev.bnxt.0.ver.fw_ver: 226.0.145.0/pkg N/A dev.bnxt.0.ver.roce_fw: 226.0.145 dev.bnxt.0.ver.netctrl_fw: 226.0.145 dev.bnxt.0.ver.mgmt_fw: 226.0.145 dev.bnxt.0.ver.hwrm_fw: 226.0.145 dev.bnxt.0.ver.driver_hwrm_if: 1.10.2.34 dev.bnxt.0.ver.hwrm_if: 1.10.2 dev.bnxt.0.%domain: 0 dev.bnxt.0.%parent: pci3 dev.bnxt.0.%pnpinfo: vendor=0x14e4 device=0x16d7 subvendor=0x1849 subdevice=0x1402 class=0x020000 dev.bnxt.0.%location: slot=0 function=0 dbsf=pci0:195:0:0 dev.bnxt.0.%driver: bnxt dev.bnxt.0.%desc: Broadcom BCM57414 NetXtreme-E 10Gb/25Gb Ethernet I have no VLANs configured. I'm running stable/13 from mid-2023, but I've tried the driver from the latest FreeBSD/head and FreeBSD stable/13 with no success. I still get: bnxt0: HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error. bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed bnxt0: Link is UP full duplex, FC - none - 25000 Mbps bnxt0: link state changed to UP bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed bnxt1: Link is UP full duplex, FC - none - 25000 Mbps bnxt1: link state changed to UP bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed bnxt0: HWRM_CFA_L2_SET_RX_MASK command returned RESOURCE_ALLOC_ERROR error. bnxt0: set_multi: rx_mask set failed Strangely enough, though, the driver works fine if I first PXE boot the machine off of the chip. If I do that, it works normally. But if I boot off disk, I get the RESOURCE_ALLOC_ERROR messages above. This suggests that there is some kind of initialization issue that the PXE boot environment takes care of, but the driver does not. Also, bnxtnvm and niccli don't work with the driver in my kernel. But it isn't, apparently, because of the state of the driver, it is because of the ioctl definition in the driver. The ioctl call doesn't even get to bnxt_mgmt_ioctl. I've verified that with dtrace, but here is the ioctl call from ktrace/kdump: 7818 bnxtnvm CALL openat(AT_FDCWD,0x46199d,0x2<O_RDWR>) 7818 bnxtnvm NAMI "/dev/bnxt_mgmt" 7818 bnxtnvm RET openat 4 7818 bnxtnvm CALL ioctl(0x4,0x80000000,0x821199a70) 7818 bnxtnvm RET ioctl -1 errno 25 Inappropriate ioctl for device 7818 bnxtnvm CALL close(0x4) Using this Dtrace script: #pragma D option cleanrate=5000hz #pragma D option dynvarsize=8192000 fbt::bnxt_mgmt_ioctl:entry { printf("cmd = %#x\n", args[1]); } fbt::bnxt_mgmt_open:entry { printf("opened bnxt mgmt device\n"); } fbt::sys_ioctl:entry /args[1]->com == 0x80000000/ { printf("got ioctl command 0x80000000\n"); } I verified that it isn't getting down to bnxt_mgmt_ioctl: # dtrace -s bnxt.d dtrace: script 'bnxt.d' matched 3 probes CPU ID FUNCTION:NAME 31 2108 bnxt_mgmt_open:entry opened bnxt mgmt device 31 22882 sys_ioctl:entry got ioctl command 0x80000000 31 2108 bnxt_mgmt_open:entry opened bnxt mgmt device 31 22882 sys_ioctl:entry got ioctl command 0x80000000 ^C When I boot the machine via PXE, though, bnxtnvm listdev shows the device: # ./bnxtnvm listdev N/A #1 Device Interface Name : bnxt0 MACAddress : 9c:6b:00:46:a2:0c PCI Device Name : 0000:c3:00.0 And strangely the ioctl works, although from my reading of sys_ioctl(), it shouldn't but I think I've discovered why it does: 1904 bnxtnvm CALL openat(AT_FDCWD,0x46199d,0x2<O_RDWR>) 1904 bnxtnvm NAMI "/dev/bnxt_mgmt" 1904 bnxtnvm RET openat 3 1904 bnxtnvm CALL ioctl(0x3,0x80000000,0x8212303d0) 1904 bnxtnvm RET ioctl 0 1904 bnxtnvm CALL close(0x3) This code is from sys_ioctl(): /* * Interpret high order word to find amount of data to be * copied to/from the user's address space. */ size = IOCPARM_LEN(com); if ((size > IOCPARM_MAX) || ((com & (IOC_VOID | IOC_IN | IOC_OUT)) == 0) || #if defined(COMPAT_FREEBSD5) || defined(COMPAT_FREEBSD4) || defined(COMPAT_43) ((com & IOC_OUT) && size == 0) || #else ((com & (IOC_IN | IOC_OUT)) && size == 0) || #endif ((com & IOC_VOID) && size > 0 && size != sizeof(int))) return (ENOTTY); My regular kernel config file doesn't have COMPAT_FREEBSD4/5, but the PXE kernel config file does. Here are the bit definitions from sys/ioccom.h: #ifndef _SYS_IOCCOM_H_ #define _SYS_IOCCOM_H_ /* * Ioctl's have the command encoded in the lower word, and the size of * any in or out parameters in the upper word. The high 3 bits of the * upper word are used to encode the in/out status of the parameter. * * 31 29 28 16 15 8 7 0 * +---------------------------------------------------------------+ * | I/O | Parameter Length | Command Group | Command | * +---------------------------------------------------------------+ */ #define IOCPARM_SHIFT 13 /* number of bits for ioctl size */ #define IOCPARM_MASK ((1 << IOCPARM_SHIFT) - 1) /* parameter length mask */ #define IOCPARM_LEN(x) (((x) >> 16) & IOCPARM_MASK) #define IOCBASECMD(x) ((x) & ~(IOCPARM_MASK << 16)) #define IOCGROUP(x) (((x) >> 8) & 0xff) #define IOCPARM_MAX (1 << IOCPARM_SHIFT) /* max size of ioctl */ #define IOC_VOID 0x20000000UL /* no parameters */ #define IOC_OUT 0x40000000UL /* copy out parameters */ #define IOC_IN 0x80000000UL /* copy in parameters */ #define IOC_INOUT (IOC_IN|IOC_OUT)/* copy parameters in and out */ #define IOC_DIRMASK (IOC_VOID|IOC_OUT|IOC_IN)/* mask for IN/OUT/VOID */ Because the BNXT_MGMT_OPCODE_GET_DEV_INFO ioctl the same as the IOC_IN bit definition, the ioctl breaks if the old compat stuff isn't built into the kernel. The ioctls for the bnxt(4) driver need to be changed to use the usual _IOW/_IOWR macros from sys/ioccom.h. I realize that will break the management tools. Perhaps they can have a version check and a fallback to the old ioctls if need be.
(In reply to Kenneth D. Merry from comment #64) Upgrading the firmware on the NIC from version 226 to 229 fixed the RESOURCE_ALLOC_ERROR errors in my case. Since this is an onboard NIC (BCM57414 on an ASRock motherboard) I needed a different firmware load than the standard one in order to be able to upgrade. Broadcom kindly gave me the firmware load, and ASRock is upgrading their motherboards to ship with the new version. The ioctl issue is still there, though.
(In reply to Hauke Fath from comment #54) With a freebsd-13 build of last week, and the firmware updated to vAFW_231.0.153.0, VLANs are functional here. OTOH, a gif(4) tunnel attached to the NIC turned into a black hole. The other end (same freebsd build) pf instance logged a few udp packages (dns) as dropped for checksum errors, so I suspect problems with the checksum offload. Going back to the mellanox X3 card got the router pair working again; I guess at this point I'll have to write off the broadcom cards.
(In reply to Hauke Fath from comment #66) As an aside, in a plain (non-vlan) setup, the broadcom NIC "negociated" 1 GBit with an Nvidia 2010 switch on a 1/10/15 GBit port set to 'auto'. The port had to be set to 25 GBit before the NIC accepted the speed.
(In reply to Kenneth D. Merry from comment #65) > Broadcom kindly gave me the firmware load, and ASRock is upgrading their motherboards to ship with the new version. Hi Kenneth. Would you be able to share how or where you got firmware 229 for your embedded BCM57414 controller? Did you have to raise a support request with Broadcom or was there a firmware download link available on the internet that worked for the ethernet controller embedded in your board? All of the links I'm finding are for the BCM57414 controller on a dedicated PCIe card NOT the embedded controller on the motherboard.