Created attachment 244458 [details] core.txt file for crash I have an -unmatched RISCV board. It has public v4 and v6 addresses and answers ssh and web requests, but is "largely idle" when this happened. Here's the stack trace. I will attach the core.txt FreeBSD 14.0-CURRENT riscv64 1400093 #0 main-n264297-b03012d0b600-dirty: Wed Jul 26 04:04:24 EDT 2023 cpuid = 2 time = 1693192346 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x36 kdb_backtrace() at kdb_backtrace+0x2c vpanic() at vpanic+0x116 panic() at panic+0x26 trash_ctor() at trash_ctor+0x4a item_ctor() at item_ctor+0xb8 uma_zalloc_arg() at uma_zalloc_arg+0xbc .Lpcrel_hi1350() at .Lpcrel_hi1350+0xe dbuf_create_bonus() at dbuf_create_bonus+0x44 .Lpcrel_hi44() at .Lpcrel_hi44+0x26 .Lpcrel_hi51() at .Lpcrel_hi51+0x26 .Lpcrel_hi284() at .Lpcrel_hi284+0x28 .Lpcrel_hi2() at .Lpcrel_hi2+0x76 .Lpcrel_hi10() at .Lpcrel_hi10+0x56 .Lpcrel_hi783() at .Lpcrel_hi783+0x3e .Lpcrel_hi383() at .Lpcrel_hi383+0x4a VOP_CACHEDLOOKUP_APV() at VOP_CACHEDLOOKUP_APV+0x32 vfs_cache_lookup() at vfs_cache_lookup+0xa4 VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x32 cache_fplookup_noentry() at cache_fplookup_noentry+0x1d4 cache_fplookup() at cache_fplookup+0x4e2 namei() at namei+0x144 kern_statat() at kern_statat+0xd6 sys_fstatat() at sys_fstatat+0x1c do_trap_user() at do_trap_user+0x236 cpu_exception_handler_user() at cpu_exception_handler_user+0x72 --- syscall (552, FreeBSD ELF64, fstatat) KDB: enter: panic
Created attachment 244459 [details] info file for crash I have the vmcore and debug kernel files available on request.
^Triage: assign properly.
Created attachment 244713 [details] Another crash on another day: the core.txt file. I have another largely idle crash. It shares almost all the stack frames with the first one submitted. As before, the dump and a debug kernel are available on request.
is it ZFS related? did you try to disable it
(In reply to Ruslan Bukin from comment #4) This is ZFS-root, so I can't easily disable ZFS. Another crash ... almost identical signature just occurred.
Created attachment 248018 [details] another kernel core with a different signature. Here is another (fairly deep stack) core from the same machine.
I looked at the back traces provided, and the allocation is speculatively belonging to the 'dbuf_kmem_cache' UMA zone, in dbuf_create(). Unfortunately, the core.txt does not provide the context of the thread responsible for the store-after-free, so there is not enough here to deconstruct what might have happened, and whether this is an OpenZFS bug, an OpenZFS/riscv bug, or a FreeBSD/riscv bug. Commit a03c23931eec (Nov. 2023) adds additional information to the panic message, which would help in confirming some details of the allocation in question, including the offset of the store-after-free. If you update past this point it would aid in further diagnosis. Also, inclusion of 'alltrace' ddb command output after the panic _might_ help. https://cgit.freebsd.org/src/commit/?id=a03c23931eec567b0957c2a0b1102dba8d538d98
(In reply to Mitchell Horne from comment #7) Okay I spoke too soon re: the offset; it is trivially calculated as 0x908 - 0x7f8 = 0x188 = 392. According to gdb, the struct dmu_buf_impl member at offset 392, for your revision, is db_user, an 8 byte pointer. The expected contents of uninitialized memory is 0xdeadc0deadc0de, but your reports consistently show the affected address as 0x00000000de00c0de. So it is only partially overwritten, and therefore not an abuse of the db_user field in a dmu_buf_impl_t object. So, I'm thinking this allocation missed the zone's cache (empty), and the memory could have belonged to anything before that, meaning use-after-free could exist anywhere... I'll have to see what other tips I can learn to help identify this. On other platforms we could use KASAN, but for riscv it is not implemented yet.
Certainly this isn't an unusual configuration for an AMD64 in my world. 16G ram, 2T nvme, ethernet. My home router isn't dissimilar, but it doesn't exhibit this crash. So I'm proceeding with the theory that it's RISC-V related. I've started the few-day-long compile (I like my machines to be self-hosted --- which, BTW, is one bug of RISC-V (it needs a patch to be self-hosted) to bring it up to this week. It would be faster, but I also zapped the ccache out of abundance of caution. I should be booted on the new load before the weekend. On average, we should know in two weeks or so... maybe four. I might be able to bring it on with a poudriere build.
Created attachment 248101 [details] Another core.txt file from the problem. This file does _not_ reflect an upgrade to post November. It happened while making world.
Created attachment 248266 [details] Core.txt from recent kernel. Uploading a core.txt from a recent kernel, as requested. As always, I have the core dump and the kernel binaries should someone need to dive in. [1:5:305]root@ump:/var/crash> uname -a FreeBSD ump.daveg.ca 15.0-CURRENT FreeBSD 15.0-CURRENT #0 main-n267962-d56a6f0516a7-dirty: Mon Feb 5 20:54:27 EST 2024 root@ump.daveg.ca:/usr/obj/usr/src/riscv.riscv64/sys/GENERIC riscv ... I caused this dump rather deliberately by getting the machine busy with a poudriere run. Other core.txt's are from a relatively idle machine. Should the machine make an idle dump, I will post it here.