When using GELI with autodetach enabled (which it is by default if using geli_devices in rc.conf) enabled, ZFS can trigger a panic.
Steps to reproduce:
- Create a VM with FreeBSD 12.0-RELEASE rootfs and two attached drives, which I'll be calling ada0 and ada1 for simplicity
- Format the devices with geli init with no passphrase:
# dd if=/dev/random of=/root/k bs=64 count=1
# geli init -PK /root/k ada0
# geli init -PK /root/k ada1
- Attach to the devices and set up a mirrored zpool (I don't know if the mirroring is needed, this is just what my setup was when I discovered it):
# geli attach -pk /root/k ada0
# geli attach -pk /root/k ada1
# zpool create pool mirror ada0.eli ada1.eli
- Ensure zfs and geli load at boot:
# cat >> /boot/loader.conf <<END
# cat >> /etc/rc.conf <<END
geli_ada0_flags="-p -k /root/k"
geli_ada1_flags="-p -k /root/k"
- Reboot the VM and run `zpool status`
- GELI and ZFS work and it shows the status of `pool`
- Kernel panic in vdev_dtl_reassess
I don't seem to have a way to gather the crashlog from this VM (I'm using VMware Player on Linux to reproduce at the moment) otherwise I'd attach it.
- Running stock amd64 FreeBSD 12.0-RELEASE on a fresh installation.
Created attachment 203438 [details]
Core dump log
Found a core.txt from the live system where I first ran into this. Also tried reproducing it with only one geli device/vdev, which also crashes similarly.
Just ran into a similar panic, but with ggate and ggate destroy -f, so force-destroying the ggate device that a zfs/zpool command wants to write to.
There's a ggate specific thing here: because of a bug in ggatec it's possible for a read or write request to end of dangling forever. So for example a zpool-create will never finish.
To get out of that sticky situation you might be inclined to run `ggatec destroy -f -u 0`. That will trigger the kernel panic.
Unread portion of the kernel message buffer:
code segment = base rx0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 6 (solthread 0xfffffff)
trap number = 12
panic: page fault
cpuid = 0
time = 1627668025
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00511a8720
vpanic() at vpanic+0x17b/frame 0xfffffe00511a8770
panic() at panic+0x43/frame 0xfffffe00511a87d0
trap_fatal() at trap_fatal+0x391/frame 0xfffffe00511a8830
trap_pfault() at trap_pfault+0x66/frame 0xfffffe00511a8880
trap() at trap+0x4f7/frame 0xfffffe00511a8990
calltrap() at calltrap+0x8/frame 0xfffffe00511a8990
--- trap 0xc, rip = 0xffffffff803e838c, rsp = 0xfffffe00511a8a60, rbp = 0xfffffe00511a8ad0 ---
vdev_dtl_reassess() at vdev_dtl_reassess+0x11c/frame 0xfffffe00511a8ad0
vdev_dtl_reassess() at vdev_dtl_reassess+0x89/frame 0xfffffe00511a8b50
spa_vdev_state_exit() at spa_vdev_state_exit+0x127/frame 0xfffffe00511a8b80
spa_async_thread_vd() at spa_async_thread_vd+0xe0/frame 0xfffffe00511a8bb0
fork_exit() at fork_exit+0x85/frame 0xfffffe00511a8bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00511a8bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
This is on stable-12, commit 7fa95d69f10827d0b02607682a2c4a1513d658e5, with a custom stripped down kernel, on amd64, built with DIAGNOSTIC and INVARIANTS and stuff.
I have kgdb on this box/vm.
It's very reproducible.