237067 – ZFS: Crash in vdev_dtl_reassess when using GELI with autodetach

Bug 237067 - ZFS: Crash in vdev_dtl_reassess when using GELI with autodetach

Summary: ZFS: Crash in vdev_dtl_reassess when using GELI with autodetach

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	12.0-RELEASE
Hardware:	Any Any

Importance:	--- Affects Some People
Assignee:	freebsd-fs (Nobody)

URL:
Keywords:	crash, needs-qa

Depends on:
Blocks:

Reported:	2019-04-07 04:48 UTC by vi
Modified:	2024-10-03 04:10 UTC (History)
CC List:	2 users (show)

See Also:

Attachments
Core dump log (96.20 KB, text/plain) 2019-04-07 05:15 UTC, vi	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description vi 2019-04-07 04:48:15 UTC

When using GELI with autodetach enabled (which it is by default if using geli_devices in rc.conf) enabled, ZFS can trigger a panic.

Steps to reproduce:

- Create a VM with FreeBSD 12.0-RELEASE rootfs and two attached drives, which I'll be calling ada0 and ada1 for simplicity
- Format the devices with geli init with no passphrase:
# dd if=/dev/random of=/root/k bs=64 count=1
# geli init -PK /root/k ada0
# geli init -PK /root/k ada1
- Attach to the devices and set up a mirrored zpool (I don't know if the mirroring is needed, this is just what my setup was when I discovered it):
# geli attach -pk /root/k ada0
# geli attach -pk /root/k ada1
# zpool create pool mirror ada0.eli ada1.eli
- Ensure zfs and geli load at boot:
# cat >> /boot/loader.conf <<END
zfs_load="YES"
geom_eli_load="YES"
END
# cat >> /etc/rc.conf <<END
geli_devices="ada0 ada1"
geli_ada0_flags="-p -k /root/k"
geli_ada1_flags="-p -k /root/k"
END
- Reboot the VM and run `zpool status`

Expected results:
- GELI and ZFS work and it shows the status of `pool`

Actual results:
- Kernel panic in vdev_dtl_reassess

I don't seem to have a way to gather the crashlog from this VM (I'm using VMware Player on Linux to reproduce at the moment) otherwise I'd attach it.

Configuration:
- Running stock amd64 FreeBSD 12.0-RELEASE on a fresh installation.

Comment 1 vi 2019-04-07 05:15:02 UTC

Created attachment 203438 [details]
Core dump log

Found a core.txt from the live system where I first ran into this. Also tried reproducing it with only one geli device/vdev, which also crashes similarly.

Comment 2 jo 2021-07-30 18:14:24 UTC

Just ran into a similar panic, but with ggate and ggate destroy -f, so force-destroying the ggate device that a zfs/zpool command wants to write to.

There's a ggate specific thing here: because of a bug in ggatec it's possible for a read or write request to end of dangling forever. So for example a zpool-create will never finish.
To get out of that sticky situation you might be inclined to run `ggatec destroy -f -u 0`. That will trigger the kernel panic.

Unread portion of the kernel message buffer:
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 6 (solthread 0xfffffff)
trap number             = 12
panic: page fault
cpuid = 0
time = 1627668025
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00511a8720
vpanic() at vpanic+0x17b/frame 0xfffffe00511a8770
panic() at panic+0x43/frame 0xfffffe00511a87d0
trap_fatal() at trap_fatal+0x391/frame 0xfffffe00511a8830
trap_pfault() at trap_pfault+0x66/frame 0xfffffe00511a8880
trap() at trap+0x4f7/frame 0xfffffe00511a8990
calltrap() at calltrap+0x8/frame 0xfffffe00511a8990
--- trap 0xc, rip = 0xffffffff803e838c, rsp = 0xfffffe00511a8a60, rbp = 0xfffffe00511a8ad0 ---
vdev_dtl_reassess() at vdev_dtl_reassess+0x11c/frame 0xfffffe00511a8ad0
vdev_dtl_reassess() at vdev_dtl_reassess+0x89/frame 0xfffffe00511a8b50
spa_vdev_state_exit() at spa_vdev_state_exit+0x127/frame 0xfffffe00511a8b80
spa_async_thread_vd() at spa_async_thread_vd+0xe0/frame 0xfffffe00511a8bb0
fork_exit() at fork_exit+0x85/frame 0xfffffe00511a8bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00511a8bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---


This is on stable-12, commit 7fa95d69f10827d0b02607682a2c4a1513d658e5, with a custom stripped down kernel, on amd64, built with DIAGNOSTIC and INVARIANTS and stuff.

I have kgdb on this box/vm.

It's very reproducible.

Comment 3 Graham Perrin freebsd_committer

2023-04-08 18:48:32 UTC

This is amongst bug reports that need special attention; see <https://lists.freebsd.org/archives/freebsd-fs/2023-April/002047.html>.

Please: are either of the panics reproducible with a currently supported RELEASE, or branch, of the OS?

Comment 4 Mark Linimon freebsd_committer

2024-10-03 04:10:36 UTC

^Triage: clear stale flags.