Kernel modules may reference SDT probes defined in another module (or the kernel itself). A specific example of this are all the mbuf probes in <sys/mbuf.h> for functions like m_get(). Kernel modules which use these inline functions will include a tracepoint that gets registered during kldload in sdt_kld_load_probes. However, sdt_kldunload_try() doesn't cleanup any of the state initialized in sdt_kld_load_probes, only the state initialized in set_kld_load_providers(). As a result, this can leave dangling pointers (e.g. in the tp->probe->tracepoint_list) when a module is unloaded. The panic I've seen is when re-loading a previously-unloaded module that crashes in sdt_kld_load_probes() when it walks off an invalid pointer in the STAILQ_INSERT_TAIL of the tracepoint_list. However, that panic is a bit finicky and not easy to reproduce. A simpler reproducer is below: kldload sdt kldload nvmft kldunload nvmft dtrace -n m-get Panic: Fatal trap 12: page fault while in kernel mode cpuid = 6; apic id = 06 fault virtual address = 0xffffffff8283d008 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff82816b96 stack pointer = 0x28:0xfffffe00dc1e9730 frame pointer = 0x28:0xfffffe00dc1e9740 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 1115 (dtrace) rdi: 0000000000000001 rsi: ffffffff80f3a4fc rdx: 000000000000000f rcx: 0000000080040033 r8: 0000000000000016 r9: 00000000000f4240 rax: 0000000080050033 rbx: fffffe00dc1e98e8 rbp: fffffe00dc1e9740 r10: 0000000000000000 r11: 0000000000000000 r12: 0000000000000000 r13: ffffffff82816b20 r14: ffffffff8283d000 r15: 0000000000000000 trap number = 12 panic: page fault cpuid = 6 time = 1727901518 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00dc1e9400 vpanic() at vpanic+0x13f/frame 0xfffffe00dc1e9530 panic() at panic+0x43/frame 0xfffffe00dc1e9590 trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00dc1e95f0 trap_pfault() at trap_pfault+0xa0/frame 0xfffffe00dc1e9660 calltrap() at calltrap+0x8/frame 0xfffffe00dc1e9660 --- trap 0xc, rip = 0xffffffff82816b96, rsp = 0xfffffe00dc1e9730, rbp = 0xfffffe00dc1e9740 --- sdt_probe_update_cb() at sdt_probe_update_cb+0x76/frame 0xfffffe00dc1e9740 smp_rendezvous_action() at smp_rendezvous_action+0x9d/frame 0xfffffe00dc1e9780 smp_rendezvous_cpus() at smp_rendezvous_cpus+0x145/frame 0xfffffe00dc1e9840 smp_rendezvous() at smp_rendezvous+0x34/frame 0xfffffe00dc1e98d0 sdt_enable() at sdt_enable+0xae/frame 0xfffffe00dc1e9910 dtrace_ecb_create_enable() at dtrace_ecb_create_enable+0xee8/frame 0xfffffe00dc1e99a0 dtrace_match() at dtrace_match+0x444/frame 0xfffffe00dc1e9a80 dtrace_enabling_match() at dtrace_enabling_match+0xc8/frame 0xfffffe00dc1e9b10 dtrace_ioctl() at dtrace_ioctl+0x178b/frame 0xfffffe00dc1e9c00 devfs_ioctl() at devfs_ioctl+0xd1/frame 0xfffffe00dc1e9c50 vn_ioctl() at vn_ioctl+0xbc/frame 0xfffffe00dc1e9cc0 devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe00dc1e9ce0 kern_ioctl() at kern_ioctl+0x286/frame 0xfffffe00dc1e9d40 sys_ioctl() at sys_ioctl+0x12d/frame 0xfffffe00dc1e9e00 amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe00dc1e9f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00dc1e9f30 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0xc9cf811a9fa, rsp = 0xc9ced0c5c28, rbp = 0xc9ced0c5c70 ---
This seems to reproduce the original panic reported to me: kldload nvmft kldload dtraceall kldunload nvmft kldload ctl kldload nvmft Fatal trap 12: page fault while in kernel mode cpuid = 11; apic id = 0b fault virtual address = 0xffffffff8281d078 fault code = supervisor write data, protection violation instruction pointer = 0x20:0xffffffff828f4761 stack pointer = 0x28:0xfffffe00dc30b8f0 frame pointer = 0x28:0xfffffe00dc30ba80 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1084 (kldload) rdi: ffffffff8285e70e rsi: ffffffff8285ed86 rdx: 0000000000000000 rcx: ffffffff8281d078 r8: 0000000000000004 r9: 0000000000000000 rax: ffffffff82865018 rbx: ffffffff82865000 rbp: fffffe00dc30ba80 r10: 0000000000010000 r11: 0000000000000001 r12: fffff80008085c00 r13: fffff8013dd2a220 r14: fffff8003cee6628 r15: fffff80003f37000 trap number = 12 panic: page fault cpuid = 11 time = 1727978657 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00dc30b5c0 vpanic() at vpanic+0x13f/frame 0xfffffe00dc30b6f0 panic() at panic+0x43/frame 0xfffffe00dc30b750 trap_fatal() at trap_fatal+0x40b/frame 0xfffffe00dc30b7b0 trap_pfault() at trap_pfault+0xa0/frame 0xfffffe00dc30b820 calltrap() at calltrap+0x8/frame 0xfffffe00dc30b820 --- trap 0xc, rip = 0xffffffff828f4761, rsp = 0xfffffe00dc30b8f0, rbp = 0xfffffe00dc30ba80 --- sdt_kld_load_probes() at sdt_kld_load_probes+0x3c1/frame 0xfffffe00dc30ba80 linker_load_module() at linker_load_module+0xe90/frame 0xfffffe00dc30bd80 kern_kldload() at kern_kldload+0x16e/frame 0xfffffe00dc30bdd0 sys_kldload() at sys_kldload+0x5c/frame 0xfffffe00dc30be00 amd64_syscall() at amd64_syscall+0x158/frame 0xfffffe00dc30bf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00dc30bf30 --- syscall (304, FreeBSD ELF64, kldload), rip = 0x1085b13898da, rsp = 0x1085aeefe008, rbp = 0x1085aeefe580 ---
Patch at review fixes both panics for me.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=47f49dd4bbb4a72e53d31046964ce3c111ee0d12 commit 47f49dd4bbb4a72e53d31046964ce3c111ee0d12 Author: John Baldwin <jhb@FreeBSD.org> AuthorDate: 2024-10-16 17:50:37 +0000 Commit: John Baldwin <jhb@FreeBSD.org> CommitDate: 2024-10-16 17:50:37 +0000 sdt: Tear down probes in kernel modules during kldunload Previously only providers in kernel modules were removed leaving dangling pointers to tracepoints, etc. in unloaded kernel modules. PR: 281825 Reported by: Sony Arpita Das <sonyarpitad@chelsio.com> Reviewed by: markj Fixes: ddf0ed09bd8f sdt: Implement SDT probes using hot-patching Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D46890 sys/cddl/dev/sdt/sdt.c | 111 +++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 102 insertions(+), 9 deletions(-)
I believe this can be closed, as the commit which introduced the bug was not MFCed.