272117 – bnxt: kernel crash with sysctl and jumbo frames

Bug 272117 - bnxt: kernel crash with sysctl and jumbo frames

Summary: bnxt: kernel crash with sysctl and jumbo frames

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	13.1-RELEASE
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	freebsd-net (Nobody)

URL:
Keywords:	crash

Depends on:
Blocks:

Reported:	2023-06-21 01:21 UTC by Alan Somers
Modified:	2023-07-03 22:35 UTC (History)
CC List:	5 users (show)

See Also:

Attachments
dmesg on TrueNAS 13.1 (4.52 KB, text/plain) 2023-06-21 16:01 UTC, Nilson Lopes	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Alan Somers freebsd_committer

2023-06-21 01:21:57 UTC

I can reliably crash the kernel just by doing "sysctl dev.bnxt.0" if the interface has been configured with jumbo frames.  It seems that the trigger is whether the interface has ever been configured with jumbo frames, not whether it currently uses them.  If I boot with jumbo frames, then do "ifconfig lagg0 mtu 1500", I can still trigger the panic.

This happens on a custom kernel build based on 13.1-RELEASE.

/etc/rc.conf:
ifconfig_bnxt0="up"
ifconfig_bnxt3="up"
cloned_interfaces="lagg0"
ifconfig_lagg0="laggproto lacp -lacp_fast_timeout 10.2.172.79/23 laggport bnxt0 laggport bnxt3"
vlans_lagg0="173"
ifconfig_lagg0_173="10.2.174.79/23"
defaultrouter="10.2.172.1"

Steps to Reproduce:
==================

$ sysctl dev.bnxt.0
...
dev.bnxt.0.iflib.txq00.cpu: 0
<PANIC>

Stack trace:
============
Fatal trap 12: page fault while in kernel mode
cpuid = 21; apic id = 8a
fault virtual address   = 0xc00000148
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80d6dffb
stack pointer           = 0x28:0xfffffe0d24c4ea90
frame pointer           = 0x28:0xfffffe0d24c4ebd0
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 3220 (sysctl)
trap number             = 12
panic: page fault
cpuid = 21
time = 1687302737
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0d24c4e850
vpanic() at vpanic+0x17f/frame 0xfffffe0d24c4e8a0
panic() at panic+0x43/frame 0xfffffe0d24c4e900
trap_fatal() at trap_fatal+0x385/frame 0xfffffe0d24c4e960
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0d24c4e9c0
calltrap() at calltrap+0x8/frame 0xfffffe0d24c4e9c0
--- trap 0xc, rip = 0xffffffff80d6dffb, rsp = 0xfffffe0d24c4ea90, rbp = 0xfffffe0d24c4ebd0 ---
mp_ndesc_handler() at mp_ndesc_handler+0x7b/frame 0xfffffe0d24c4ebd0
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x90/frame 0xfffffe0d24c4ec20
sysctl_root() at sysctl_root+0x271/frame 0xfffffe0d24c4eca0
userland_sysctl() at userland_sysctl+0x173/frame 0xfffffe0d24c4ed50
sys___sysctl() at sys___sysctl+0x5c/frame 0xfffffe0d24c4ee00
amd64_syscall() at amd64_syscall+0x775/frame 0xfffffe0d24c4ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0d24c4ef30
--- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x8011a11ca, rsp = 0x7fffffffc5a8, rbp = 0x7fffffffc5e0 ---
KDB: enter: panic

From GDB, it seems that the sysctl that triggers the panic is dev.bnxt.0.iflib.override_nrxds.  And in mp_ndesc_handler, the value of ctx->ifc_sctx is 0xc00000000 , which doesn't look right, because it ought to be a pointer.

Comment 1 Nilson Lopes 2023-06-21 16:01:03 UTC

Created attachment 242925 [details]
dmesg on TrueNAS 13.1

Comment 2 Nilson Lopes 2023-06-21 16:18:15 UTC

I am experiencing a similar issue with Broadcom BCM57414 on TrueNAS 13. Despite not having jumbo frame configuration, the NIC with the latest firmware (225.1.95.0) encounters problems. Interestingly, another card with older firmware (214.4.42) functions properly.
I'm at the point where I'm testing different firmware until I found the newest one that works.

Comment 3 Graham Perrin freebsd_committer

2023-06-22 04:10:21 UTC

I see bnxt(4)-related work by Sumit Saxena <https://freshbsd.org/freebsd/src?q=bnxt&author%5B%5D=Sumit+Saxena>, and Bugzilla suggests a FreeBSD email address for this author, however: 

a) at least for recent commits, the author's address is @broadcom.com; and 

b) <https://docs.freebsd.org/en/articles/contributors/> does not list the name.

@bsdimp please, should Sumit be informed of this bug 272117 and nearby bug 272119?

Comment 4 Warner Losh freebsd_committer

2023-06-22 04:24:53 UTC

Sumit should be added.

This also points out we're not updating contributors very well... but that's a separate issue...

Comment 5 Graham Perrin freebsd_committer

2023-06-22 19:12:56 UTC

(In reply to Warner Losh from comment #4)

Thanks. Bugzilla does not recognise the address @broadcom.com, so I sent an email drawing attention to bugs 272117 and 272119.

Comment 6 Alan Somers freebsd_committer

2023-07-03 22:35:44 UTC

I can't reproduce the crash on 13.2-RELEASE.  However, I also can't exercise the NICs very well due to bug #269133 .