Created attachment 256708 [details] kernel panic image Hi, Since upgrading from FreeBSD 14.1 to FreeBSD 14.2 we get random kernel panics on multiple servers, referring to bnxt: bnxt_dcb_list_app in the panic error. See attached screenshot. Servers: Dell PowerEdge R6615 NIC: 540-BCOD : Broadcom 57416 Dual Port 10GbE BASE-T Adapter, OCP NIC 3.0 Firmware: dev.bnxt.0.ver.roce_fw_name: BONO_FW dev.bnxt.0.ver.netctrl_fw_name: KONG_FW dev.bnxt.0.ver.mgmt_fw_name: AFW_231.0.153.0 dev.bnxt.0.ver.hwrm_fw_name: CHIMP_FW dev.bnxt.0.ver.fw_ver: 231.0.153.0/pkg 23.11.16.22 dev.bnxt.0.ver.roce_fw: 231.0.153 dev.bnxt.0.ver.netctrl_fw: 231.0.153 dev.bnxt.0.ver.mgmt_fw: 231.0.153 dev.bnxt.0.ver.hwrm_fw: 231.0.153 Has anyone seen something similar and have any ideas how to solve it? Tried to turn of LRO/TSO, but that didn't help. Downgrading to 14.1 fixes the crashes.
The faulting code was added in ac940a8b92ac79df7bab71f50ae3b9aa7cff145d: bnxt_en: Add PFC, ETS & App TLVs protocols support Created new directory "bnxt_en" in /dev/bnxt and /modules/bnxt and moved source files and Makefile into respective directory. ETS support: - Added new files bnxt_dcb.c & bnxt_dcb.h - Added sysctl node 'dcb' and created handlers 'ets' and 'dcbx_cap' - Add logic to validate user input and configure ETS in the firmware - Updated makefile to include bnxt_dcb.c & bnxt_dcb.h PFC support: - Created sysctl handlers 'pfc' under node 'dcb' - Added logic to validate user input and configure PFC in the firmware. App TLV support: - Created 3 new sysctl handlers under node 'dcb' - set_apptlv (write only): Sets a specified TLV - del_apptlv (write only): Deletes a specified TLV - list_apptlv (read only): Lists all APP TLVs configured - Added logic to validate user input and configure APP TLVs in the firmware. Added Below DCB ops for management interface: - Set PFC, Get PFC, Set ETS, Get ETS, Add App_TLV, Del App_TLV Lst App_TLV Reviewed by: imp Approved by: imp Differential revision: https://reviews.freebsd.org/D45005 (cherry picked from commit 35b53f8c989f62286aad075ef2e97bba358144f8)
addr2line shows that the line of source code is https://cgit.freebsd.org/src/tree/sys/dev/bnxt/bnxt_en/bnxt_sysctl.c?h=releng/14.2#n1959 , but no clue how that line can panic. Maybe the kernel actually panic within `sysctl_handle_string()` ? Hi Daniel, Can you please build the kernel with INVARIANTS enabled, or directly with the kernel conf `GENERIC-DEBUG` and test with the new kernel / driver ? ``` % readelf -s if_bnxt.ko.debug | grep bnxt_dcb_list_app 163: 000000000001a5c0 368 FUNC LOCAL DEFAULT 1 bnxt_dcb_list_app % echo "obase=16; ibase=16; 1A5C0 + 144" | bc 1A704 % addr2line -fip -e if_bnxt.ko.debug -j .text 0x1A704 bnxt_dcb_list_app at /usr/src/sys/dev/bnxt/bnxt_en/bnxt_sysctl.c:1959 ```
(In reply to Zhenlei Huang from comment #2) I don't have debug symbols handy to check myself, but maybe the %rip value 0xffffffff80b4dee7 gives a further hint?
Created attachment 256835 [details] kernel crash with GENERIC-DEBUG
(In reply to Zhenlei Huang from comment #2) I have built and installed the GENERIC-DEBUG kernel now, But now it crashes directly on boot, not sure if I did something wrong when i built the kernel or if this is realated to the bug, see new screenshot.
(In reply to Daniel Porsch from comment #5) That last crash is because the garp timer callback doesn't enter epoch. I've got a fix in progress for that problem.
(In reply to Daniel Porsch from comment #5) That is apparently another genuine bug :) Did you enable `net.link.ether.inet.garp_rexmit_count` ? You may restore it back to 0 to prevent that panic, if I read the code right.
(In reply to Zhenlei Huang from comment #7) That helped, and now it boots with the debug kernel, I will send the error when it crashes again. It might a day or so for it to crash.
(In reply to Mark Johnston from comment #3) > I don't have debug symbols handy to check myself, but maybe the %rip value > 0xffffffff80b4dee7 gives a further hint? ``` % addr2line -fip -e kernel.debug 0xffffffff80b4dee7 sysctl_handle_string at /usr/src/sys/kern/kern_sysctl.c:1787 ``` See https://cgit.freebsd.org/src/tree/sys/kern/kern_sysctl.c?h=releng/14.2#n1787 That is interesting. The parameter `req` is actually on kernel stack ( allocated on stack in userland_sysctl() ), the call stack is ``` sys___sysctl() userland_sysctl() sysctl_root() sysctl_root_handler_locked() bnxt_dcb_list_app() sysctl_handle_string() ``` , but the fault virtual address `0x500000015` appears to be an userland one.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=38fdcca05d09b4d5426a253d3c484f9481a73ac2 commit 38fdcca05d09b4d5426a253d3c484f9481a73ac2 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2025-01-20 13:24:48 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2025-01-20 13:28:39 +0000 netinet: enter epoch in garp_rexmit() garp_rexmit() is a callback, so is not in net_epoch, which arprequest_internal() expects. Enter and exit the net_epoch. PR: 284073 MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") sys/netinet/if_ether.c | 3 +++ 1 file changed, 3 insertions(+)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b4bd97ec168e97360cf9511b975a20f677864661 commit b4bd97ec168e97360cf9511b975a20f677864661 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2025-01-20 13:27:05 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2025-01-20 13:28:39 +0000 netinet tests: basic garp test Excercise the garp code. This doesn't actively verify anything, but is sufficient to trigger the panic reported in PR 284073, so it's a useful test case to keep. PR: 284073 Sponsored by: Rubicon Communications, LLC ("Netgate") tests/sys/netinet/arp.sh | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
(In reply to Daniel Porsch from comment #5) > That helped, and now it boots with the debug kernel, I will send the error > when it crashes again. It might a day or so for it to crash. To speed up, after you get and report the crash (again), you can apply https://reviews.freebsd.org/D48495 and https://reviews.freebsd.org/D48496 to releng/14.2 branch locally and test if that helps.
Created attachment 256867 [details] kernel panic with debug Here is the crash with debug. I will try the patches.
(In reply to Zhenlei Huang from comment #9) Emm, I was wrong, RIP is next instruction. Stupid ... ``` % objdump --disassemble-symbols=sysctl_handle_string /boot/kernel/kernel /boot/kernel/kernel: file format elf64-x86-64 Disassembly of section .text: ffffffff80b4de20 <sysctl_handle_string>: ffffffff80b4de20: 55 pushq %rbp ffffffff80b4de21: 48 89 e5 movq %rsp, %rbp ffffffff80b4de24: 41 57 pushq %r15 ffffffff80b4de26: 41 56 pushq %r14 ffffffff80b4de28: 41 55 pushq %r13 ffffffff80b4de2a: 41 54 pushq %r12 ffffffff80b4de2c: 53 pushq %rbx ffffffff80b4de2d: 50 pushq %rax ffffffff80b4de2e: 48 89 cb movq %rcx, %rbx ffffffff80b4de31: 49 89 f6 movq %rsi, %r14 ffffffff80b4de34: 48 85 d2 testq %rdx, %rdx ffffffff80b4de37: 0f 84 9c 00 00 00 je 0xffffffff80b4ded9 <sysctl_handle_string+0xb9> ffffffff80b4de3d: b8 00 00 08 40 movl $0x40080000, %eax # imm = 0x40080000 ffffffff80b4de42: 23 47 2c andl 0x2c(%rdi), %eax ffffffff80b4de45: 0f 84 8e 00 00 00 je 0xffffffff80b4ded9 <sysctl_handle_string+0xb9> ffffffff80b4de4b: 80 3d 5f 22 cb 00 00 cmpb $0x0, 0xcb225f(%rip) # 0xffffffff818000b1 <kdb_active> ffffffff80b4de52: 0f 85 81 00 00 00 jne 0xffffffff80b4ded9 <sysctl_handle_string+0xb9> ffffffff80b4de58: 49 89 d7 movq %rdx, %r15 ffffffff80b4de5b: 48 83 7b 10 00 cmpq $0x0, 0x10(%rbx) ffffffff80b4de60: 0f 84 a0 00 00 00 je 0xffffffff80b4df06 <sysctl_handle_string+0xe6> ffffffff80b4de66: 4c 89 ff movq %r15, %rdi ffffffff80b4de69: 48 c7 c6 c0 5c 8d 81 movq $-0x7e72a340, %rsi # imm = 0x818D5CC0 ffffffff80b4de70: ba 02 00 00 00 movl $0x2, %edx ffffffff80b4de75: e8 e6 51 fc ff callq 0xffffffff80b13060 <malloc> ffffffff80b4de7a: 49 89 c4 movq %rax, %r12 ffffffff80b4de7d: 48 c7 c7 70 21 bb 81 movq $-0x7e44de90, %rdi # imm = 0x81BB2170 ffffffff80b4de84: 31 f6 xorl %esi, %esi ffffffff80b4de86: e8 f5 bb ff ff callq 0xffffffff80b49a80 <_sx_slock_int> ffffffff80b4de8b: 4c 89 e7 movq %r12, %rdi ffffffff80b4de8e: 4c 89 f6 movq %r14, %rsi ffffffff80b4de91: 4c 89 fa movq %r15, %rdx ffffffff80b4de94: e8 00 00 00 00 callq 0xffffffff80b4de99 <sysctl_handle_string+0x79> ffffffff80b4de99: 48 c7 c7 70 21 bb 81 movq $-0x7e44de90, %rdi # imm = 0x81BB2170 ffffffff80b4dea0: e8 db c2 ff ff callq 0xffffffff80b4a180 <_sx_sunlock_int> ffffffff80b4dea5: 4c 89 e7 movq %r12, %rdi ffffffff80b4dea8: e8 d3 3c 4d 00 callq 0xffffffff81021b80 <strlen> ffffffff80b4dead: 48 8d 50 01 leaq 0x1(%rax), %rdx ffffffff80b4deb1: 48 89 df movq %rbx, %rdi ffffffff80b4deb4: 4c 89 e6 movq %r12, %rsi ffffffff80b4deb7: ff 53 28 callq *0x28(%rbx) ffffffff80b4deba: 41 89 c5 movl %eax, %r13d ffffffff80b4debd: 4c 89 e7 movq %r12, %rdi ffffffff80b4dec0: 48 c7 c6 c0 5c 8d 81 movq $-0x7e72a340, %rsi # imm = 0x818D5CC0 ffffffff80b4dec7: e8 34 50 fc ff callq 0xffffffff80b12f00 <free> ffffffff80b4decc: 44 89 e8 movl %r13d, %eax ffffffff80b4decf: 85 c0 testl %eax, %eax ffffffff80b4ded1: 0f 85 de 00 00 00 jne 0xffffffff80b4dfb5 <sysctl_handle_string+0x195> ffffffff80b4ded7: eb 64 jmp 0xffffffff80b4df3d <sysctl_handle_string+0x11d> ffffffff80b4ded9: 4c 89 f7 movq %r14, %rdi ffffffff80b4dedc: e8 9f 3c 4d 00 callq 0xffffffff81021b80 <strlen> ffffffff80b4dee1: 49 89 c7 movq %rax, %r15 ffffffff80b4dee4: 49 ff c7 incq %r15 ffffffff80b4dee7: 4c 8b 63 10 movq 0x10(%rbx), %r12 ffffffff80b4deeb: 4c 89 f7 movq %r14, %rdi ffffffff80b4deee: e8 8d 3c 4d 00 callq 0xffffffff81021b80 <strlen> ffffffff80b4def3: 48 89 c2 movq %rax, %rdx ffffffff80b4def6: 4d 85 e4 testq %r12, %r12 ffffffff80b4def9: 74 33 je 0xffffffff80b4df2e <sysctl_handle_string+0x10e> ffffffff80b4defb: 48 ff c2 incq %rdx ffffffff80b4defe: 48 89 df movq %rbx, %rdi ... ``` The current instruction should be `0xffffffff80b4dee4`. ``` % addr2line -fip -e kernel.debug 0xffffffff80b4dee4 sysctl_handle_string at /usr/src/sys/kern/kern_sysctl.c:1783 ``` https://cgit.freebsd.org/src/tree/sys/kern/kern_sysctl.c?h=releng/14.2#n1783 Then that makes sense.
Update: After carefully reading the disassembled code, I can confirm the fault address is RIP (0xffffffff80b4dee7). For `sysctl_handle_string()`, `req` is the last arg which is passed via register %rcx. ``` ffffffff80b4de2e: 48 89 cb movq %rcx, %rbx ``` It was saved to callee-saved register %rbx, and the following flow does not touch it. It was `0000000500000005` when passed in. Then indirect memory access ``` ffffffff80b4dee7: 4c 8b 63 10 movq 0x10(%rbx), %r12 ``` will panic. Part of disassembled code of if_bnxt.ko, ``` $ objdump --disassemble-symbols=bnxt_dcb_list_app -r /boot/kernel/if_bnxt.ko ... 1a5cc: 53 pushq %rbx 1a5cd: 48 81 ec 28 02 00 00 subq $0x228, %rsp # imm = 0x228, reserve app[128] and other local vars. 1a5d4: 48 89 cb movq %rcx, %rbx # save req ... 1a622: 48 89 5d c8 movq %rbx, -0x38(%rbp) ... 1a6f0: ba 00 10 00 00 movl $0x1000, %edx # imm = 0x1000 1a6f5: 4c 89 f7 movq %r14, %rdi 1a6f8: 4c 89 fe movq %r15, %rsi 1a6fb: 48 8b 4d c8 movq -0x38(%rbp), %rcx # previously saved req 1a6ff: e8 00 00 00 00 callq 0x1a704 <bnxt_dcb_list_app+0x144> 000000000001a700: R_X86_64_PLT32 sysctl_handle_string-0x4 ``` If `bnxt_dcb_ieee_listapp()` OOB write the on stack variable app[128], then it make sense that we get `%rbx == 0000000500000005`. We can add asserting for that.
Created attachment 256871 [details] Patch to assert OOB write on-stack allocated variable
(In reply to Zhenlei Huang from comment #16) I applied this patch now, hopefully it doesn't crash again. (In reply to Zhenlei Huang from comment #16)
(In reply to Daniel Porsch from comment #17) No, the last patch only helps debugging OOB write to on-stack allocated variable. I expect one more kernel panic :) , and I believe this time I found the root cause. Once that is confirmed, I'll prepare the final fix.
Created attachment 256886 [details] new panic New panic with the latest partch
(In reply to Daniel Porsch from comment #19) So my previous assumption > If `bnxt_dcb_ieee_listapp()` OOB write the on stack variable app[128], then > it make sense that we get `%rbx == 0000000500000005` is right.
Hi Daniel, now you can apply https://reviews.freebsd.org/D48495 , https://reviews.freebsd.org/D48496 and https://reviews.freebsd.org/D48589 and test again. Actually applying only the last one ( D48589 ) should be enough. D48496 prevent potential OOB write to heap allocated memory, but currently no sign that happens. I do not have that hardware to test, I'll give you credential for the D48495 and D48496 .
Hi Daniel, BTW, I guess you enabled DCBx on the switch port. To workaround this, you can disable DCBx on either the interface [1] or the switch port, at the cost of breaking your current setup ( traffic flow priority etc. ) 1. https://techdocs.broadcom.com/us/en/storage-and-ethernet-connectivity/ethernet-nic-controllers/bcm957xxx/adapters/Configuration-adapter/RoCE/manually-reconfiguring-network-parameters/enable-rdma-and-disable-dcbx.html
Hi Danie, any good news ?
(In reply to Zhenlei Huang from comment #23) Hi, No crashes since applying D48495.diff D48496.diff D48589.diff, so it seem to have worked.
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=1c465e52920848dec6a76f0672fa209db7d5e5b5 commit 1c465e52920848dec6a76f0672fa209db7d5e5b5 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2025-01-20 13:24:48 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2025-01-27 09:04:31 +0000 netinet: enter epoch in garp_rexmit() garp_rexmit() is a callback, so is not in net_epoch, which arprequest_internal() expects. Enter and exit the net_epoch. PR: 284073 MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") (cherry picked from commit 38fdcca05d09b4d5426a253d3c484f9481a73ac2) sys/netinet/if_ether.c | 3 +++ 1 file changed, 3 insertions(+)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e69309223199e56397df3b6f750f012eb729d904 commit e69309223199e56397df3b6f750f012eb729d904 Author: Kristof Provost <kp@FreeBSD.org> AuthorDate: 2025-01-20 13:24:48 +0000 Commit: Kristof Provost <kp@FreeBSD.org> CommitDate: 2025-01-27 09:04:34 +0000 netinet: enter epoch in garp_rexmit() garp_rexmit() is a callback, so is not in net_epoch, which arprequest_internal() expects. Enter and exit the net_epoch. PR: 284073 MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") (cherry picked from commit 38fdcca05d09b4d5426a253d3c484f9481a73ac2) sys/netinet/if_ether.c | 3 +++ 1 file changed, 3 insertions(+)