Bug 237067 - ZFS: Crash in vdev_dtl_reassess when using GELI with autodetach
Summary: ZFS: Crash in vdev_dtl_reassess when using GELI with autodetach
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-fs (Nobody)
URL:
Keywords: needs-qa, panic
Depends on:
Blocks:
 
Reported: 2019-04-07 04:48 UTC by vi
Modified: 2021-07-30 18:14 UTC (History)
1 user (show)

See Also:
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments
Core dump log (96.20 KB, text/plain)
2019-04-07 05:15 UTC, vi
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description vi 2019-04-07 04:48:15 UTC
When using GELI with autodetach enabled (which it is by default if using geli_devices in rc.conf) enabled, ZFS can trigger a panic.

Steps to reproduce:

- Create a VM with FreeBSD 12.0-RELEASE rootfs and two attached drives, which I'll be calling ada0 and ada1 for simplicity
- Format the devices with geli init with no passphrase:
# dd if=/dev/random of=/root/k bs=64 count=1
# geli init -PK /root/k ada0
# geli init -PK /root/k ada1
- Attach to the devices and set up a mirrored zpool (I don't know if the mirroring is needed, this is just what my setup was when I discovered it):
# geli attach -pk /root/k ada0
# geli attach -pk /root/k ada1
# zpool create pool mirror ada0.eli ada1.eli
- Ensure zfs and geli load at boot:
# cat >> /boot/loader.conf <<END
zfs_load="YES"
geom_eli_load="YES"
END
# cat >> /etc/rc.conf <<END
geli_devices="ada0 ada1"
geli_ada0_flags="-p -k /root/k"
geli_ada1_flags="-p -k /root/k"
END
- Reboot the VM and run `zpool status`

Expected results:
- GELI and ZFS work and it shows the status of `pool`

Actual results:
- Kernel panic in vdev_dtl_reassess

I don't seem to have a way to gather the crashlog from this VM (I'm using VMware Player on Linux to reproduce at the moment) otherwise I'd attach it.

Configuration:
- Running stock amd64 FreeBSD 12.0-RELEASE on a fresh installation.
Comment 1 vi 2019-04-07 05:15:02 UTC
Created attachment 203438 [details]
Core dump log

Found a core.txt from the live system where I first ran into this. Also tried reproducing it with only one geli device/vdev, which also crashes similarly.
Comment 2 jo 2021-07-30 18:14:24 UTC
Just ran into a similar panic, but with ggate and ggate destroy -f, so force-destroying the ggate device that a zfs/zpool command wants to write to.

There's a ggate specific thing here: because of a bug in ggatec it's possible for a read or write request to end of dangling forever. So for example a zpool-create will never finish.
To get out of that sticky situation you might be inclined to run `ggatec destroy -f -u 0`. That will trigger the kernel panic.

Unread portion of the kernel message buffer:
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 6 (solthread 0xfffffff)
trap number             = 12
panic: page fault
cpuid = 0
time = 1627668025
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00511a8720
vpanic() at vpanic+0x17b/frame 0xfffffe00511a8770
panic() at panic+0x43/frame 0xfffffe00511a87d0
trap_fatal() at trap_fatal+0x391/frame 0xfffffe00511a8830
trap_pfault() at trap_pfault+0x66/frame 0xfffffe00511a8880
trap() at trap+0x4f7/frame 0xfffffe00511a8990
calltrap() at calltrap+0x8/frame 0xfffffe00511a8990
--- trap 0xc, rip = 0xffffffff803e838c, rsp = 0xfffffe00511a8a60, rbp = 0xfffffe00511a8ad0 ---
vdev_dtl_reassess() at vdev_dtl_reassess+0x11c/frame 0xfffffe00511a8ad0
vdev_dtl_reassess() at vdev_dtl_reassess+0x89/frame 0xfffffe00511a8b50
spa_vdev_state_exit() at spa_vdev_state_exit+0x127/frame 0xfffffe00511a8b80
spa_async_thread_vd() at spa_async_thread_vd+0xe0/frame 0xfffffe00511a8bb0
fork_exit() at fork_exit+0x85/frame 0xfffffe00511a8bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00511a8bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---


This is on stable-12, commit 7fa95d69f10827d0b02607682a2c4a1513d658e5, with a custom stripped down kernel, on amd64, built with DIAGNOSTIC and INVARIANTS and stuff.

I have kgdb on this box/vm.

It's very reproducible.