repeatable panic on 1st reboot after install cpuid = 1 time = 3 KDB: stack backtrace: db_trace_self() at db_trace_self db_trace_self_wrapper() at db_trace_self_wrapper+0x38 vpanic() at vpanic+0x1ac spl_panic() at spl_panic+0x44 metaslab_fini() at metaslab_fini+0x474 vdev_metaslab_init() at vdev_metaslab_init+0x208 vdev_load() at vdev_load+0x78c vdev_load_child() at vdev_load_child+0x14 taskq_run() at taskq_run+0x24 taskqueue_run_locked() at taskqueue_run_locked+0x17c taskqueue_thread_loop() at taskqueue_thread_loop+0xc0 fork_exit() at fork_exit+0x78 fork_trampoline() at fork_trampoline+0x18 KDB: enter: panic [ thread pid 5 tid 100150 ] Stopped at kdb_enter+0x48: str xzr, [x19, #2048] db> image is a built-from-sources (make release ...) arm64 using makefs - FreeBSD-15.0-CURRENT-arm64-aarch64-20240910-0871d4d-zfs will try to reproduce on "vanilla" CURRENT shortly.
I cannot reproduce this in qemu arm64, but made some progress on the OCI Ampere Altra VMs: - this issue has been around for a while, last 2 zfs merges do not appear to prevent the post-reboot panic ## working around the panic - it is not sufficient just to wait 500 seconds - nor is it enough to do some zfs & zpool transactions like bectl create/activate/destroy - but unpacking e.g. base.txz into a temporary dataset *is* enough I will pull down a borked zpool for reference
Created attachment 253610 [details] dmesg + early boot console
I added the post-corruption zpool here, reminder it's an arm64 boot image that I see this assert on. https://skunkwerks.at/~dch/OpenZFS/borked-PR281520.zpool.qcow2.xz #openzfs irc commented: i would suggest doing, is setting compatibility= and doing zpool upgrade with different featuresets to narrow down what state might be going or just backing up the cloud disk images before first boot so you can compare on disk state when it worked and didn't it doesn't seem to easily reproduce, but something to keep in mind, the VERIFY that's tripping is an ASSERT, so it won't trigger on non-debug builds but i can't immediately obviously reproduce it on my pi 4 something that would be useful, is if you can try only enabling spacemap_v2 or log_spacemap, rather than just zpool upgrade -a and then seeing if it still breaks.
how is this built? - from existing 15.0-CURRENT arm64 box (should work from 14.1-RELEASE too but I am on current here) - the steps below should also be usable on amd64 FreeBSD if that helps, it will still produce the correct arm64 image - clone https://git.sr.ht/~dch/src main branch, into /usr/src, - switch to commit #f7639cff05f63cfe38532bd70e33a890e1fe6b53 - run as root # export SRCCONF=/dev/null # export SRC_ENV_CONF=/dev/null # make -j2C buildworld TARGET_ARCH=aarch64 TARGET=arm64 -s # make -j2C buildkernel TARGET_ARCH=aarch64 TARGET=arm64 KERNCONF=GENERIC -s # cd ./release # make -j2C clean # make -DNOPORTS -DNOSRC \ WITHOUT_DEBUG_FILES=YES WITHOUT_KERNEL_SYMBOLS=YES \ WITHOUT_LIB32=YES WITHOUT_TESTS=YES \ KERNCONF=GENERIC \ TARGET_ARCH=aarch64 TARGET=arm64 \ WITH_CLOUDWARE=yes \ CLOUDWARE=OCI -s cloudware-release there is now a /usr/obj/projects/oci/14.1-RELEASE/arm64.aarch64/release/oci.zfs.raw this file is converted to qemu for compression, before cloud upload qemu-img convert -S 512b -p -O qcow2 -c -o compression_type=zstd \ /usr/obj/projects/oci/14.1-RELEASE/arm64.aarch64/release/oci.zfs.raw \ /tmp/oci.zfs.qcow2 I tested this (without seeing the same problem) via qemu on a fast amd64: $ qemu-system-aarch64 \ -m 4096M -cpu cortex-a57 -smp cores=4 -M virt -nodefaults \ -bios edk2-aarch64-code.fd \ -serial telnet::4444,server \ -nographic -monitor none -vga none \ -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0 \ -rtc base=utc \ -drive if=none,file=/tmp/FreeBSD-15.0-CURRENT-arm64-aarch64-20240916-f7639cf-zfs.qcow2,id=hd0 \ -device virtio-blk-device,drive=hd0 \ -snapshot
NB the original qcow2 image (before corruption occurs) is here: https://skunkwerks.at/~dch/OpenZFS/FreeBSD-15.0-CURRENT-arm64-aarch64-20240916-f7639cf-zfs.qcow2
^Triage: I'm not seeing a proposed fix yet, so "In Progress" may be premature.