Overview: Following the removal of a slog the pool is now in a state where 'bectl list' will panic it. Discovered after removing the slog device from a machine when the SSD it was on has been acting up. Steps to Reproduce: This is on a fresh VM with a minimal reproduction. 1. Create a VM with two disks or suitable partitions for testing. Assuming ada0 was used as the zfs installation disk during bsdinstall... `gpart create -s gpt ada1; gpart add -t freebsd-zfs -a 4096 ada1` is sufficient for the second disk. 2. Baseline -- will run without issues bectl list 3. Add the slog zpool add zroot log ada1p1 4. Baseline 2 -- will run without issues bectl list 5. Remove the slog zpool remove zroot ada1p1 6. Create a panic bectl list Result: vpanic() at vpanic+0x181/frame 0xfffffe00a80af3b0 panic() at panic+0x43/frame 0xfffffe00a80af410 trap_fatal() at trap_fatal+0x387/frame 0xfffffe00a80af470 trap_pfault() at trap_pfault+0x97/frame 0xfffffe00a80af4d0 trap() at trap+0x2ab/frame 0xfffffe00a80af5e0 --- trap 0xc, rip = 0 = 0xfffffe00a80af56b8, rbp = 0xfffffe00a80af6d0 --- ??() at 0/frame 0xfffffe00a80af6d0 spa_vdev_state_enter() at spa_vdev_state_enter+0x43/frame 0xfffffe00a80af700 zfs_ioc_get_bootenv() at zfs_ioc_get_bootenv+0x32/frame 0xfffffe00a80af730 zfsdev_ioctl_common() at zfsdev_ioctl_common+0x4e0/frame 0xfffffe00a80af7d0 zfsdev_ioctl() at zfsdev_ioctl+0x146/frame 0xfffffe00a80af800 devfs_ioctl() at devfs_ioctl+0xcc/frame 0xfffffe00a80af850 vn_ioctl() at vn_ioctl+0x131/frame 0xfffffe00a80af960 devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe00a80af980 kern_ioctl() at kern_ioctl+0x289/frame 0xfffffe00a80afbf0 sys_ioctl() at sys_ioctl+0x12a/frame 0xfffffe00a80afbf0 amd64_syscall() at amd64_syscall+0x12a/frame 0xfffffe00a80afbf0 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00a80afbf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8003db45a, rsp = 0x7fffffffc668, rbp = 0x7fffffffc6d0 --- KDB: enter: panic [ thread pid 796 tid 100470 ] Stopped at kdb_enter+0x37: movq $0,0x1099396(%rip) Build Date & Hardware: In this example this is using the 31 December snapshots ISOs: FreeBSD test 13.0-CURRENT FreeBSD 13.0-CURRENT #0 main-c255460-g282381aa53a: Th Dec 31 06:08:47 UTC 2020 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 Also observed on a hardware with a GENEIRIC-NODEBUG amd64 kernel running: FreeBSD bsd 13.0-CURRENT FreeBSD 13.0-CURRENT #0 main-c255337-g8d405efd73d: Sat Dec 26 17:07:08 UTC 2020 Additional Information: Re-adding the slog device will still results in a panic on 'bectl list'. It will also panic on boot now and will no longer fully boot to the OS. As a workaround bectl create, mount, umnount, activate commands all seem to work from minimal checks. It looks feasible to use zfs list to verify naming and use the other commands to do minimal work on boot environments at this point while ‘bectl list’ is creating a panic.
Expanding on the comment of "Re-adding the slog device will still results in a panic on 'bectl list'. It will also panic on boot now and will no longer fully boot to the OS.", the report in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=245649 looks similar to the observation I have after re-adding the slog device.
Additional commands that panic on ZFS after step 5, removal of a slog device: zpool scrub zroot zfsbootcfg zfsbootcfg -p bectl create test; zfsbootcfg “zfs:zroot/ROOT/test” NB: manually setting the bootfs property does not panic. This appears consistent with the observations that `bectl create test`/`bectl activate test` still allow changing the bootfs. bectl create test; zpool set bootfs=zroot/ROOT/test zroot Similarly, the current version of sysutils/beadm from ports does NOT panic like bectl(8) will beadm list
Created attachment 222658 [details] fix panix after slog removal
Looks like this was patched by Patrick Mooney in https://www.illumos.org/issues/12981 I've submitted a pull request upstream to openzfs, https://github.com/openzfs/zfs/pull/11623 In the meantime, I've included the patch here.
"Tested by:" looks good... I can confirm on a VM and physical hardware running Glen's 13.0-BETA3 commit at 150b4388d3b5 plus the addition of this patch that there are no longer issues running the previous panic inducing commands. I've checked these specific commands below. They are are all running fine on the VM test and aside from not checking the very last one on active physical hardware the first few are no longer inducing a panic. bectl list zpool scrub zroot zfsbootcfg zfsbootcfg -p bectl create test; zfsbootcfg “zfs:zroot/ROOT/test:” Robert thank you for the effort tracking this down. Will this make it in an upcoming 13.0-RC?
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=64649f0285424435634c2dfd39f49536fc2b50dd commit 64649f0285424435634c2dfd39f49536fc2b50dd Author: Martin Matuska <mm@FreeBSD.org> AuthorDate: 2021-02-22 17:05:07 +0000 Commit: Martin Matuska <mm@FreeBSD.org> CommitDate: 2021-02-22 17:26:12 +0000 zfs: fix panic if scrubbing after removing a slog device From openzfs-master 11f2e9a4 commit message: vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL This prevents a panic after a SLOG add/removal on the root pool followed by a zpool scrub. When a SLOG is removed, a hole takes its place - the vdev_ops for a hole is vdev_hole_ops, which defines the handler functions of vdev_op_hold and vdev_op_rele as NULL. Patch Author: Patrick Mooney <pmooney@pfmooney.com> Obtained from: openzfs/zfs@11f2e9a491baa2ae3fc00f6b8b892fa91a852ca1 PR: 252396 MFS after: 3 days (direct commit) sys/contrib/openzfs/module/zfs/vdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
A commit in branch releng/13.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ee0b7e05e72c8820441faff29a9df99b47aed6a0 commit ee0b7e05e72c8820441faff29a9df99b47aed6a0 Author: Martin Matuska <mm@FreeBSD.org> AuthorDate: 2021-02-22 17:05:07 +0000 Commit: Martin Matuska <mm@FreeBSD.org> CommitDate: 2021-02-25 16:19:49 +0000 zfs: fix panic if scrubbing after removing a slog device From openzfs-master 11f2e9a4 commit message: vdev_ops: don't try to call vdev_op_hold or vdev_op_rele when NULL This prevents a panic after a SLOG add/removal on the root pool followed by a zpool scrub. When a SLOG is removed, a hole takes its place - the vdev_ops for a hole is vdev_hole_ops, which defines the handler functions of vdev_op_hold and vdev_op_rele as NULL. Patch Author: Patrick Mooney <pmooney@pfmooney.com> Obtained from: openzfs/zfs@11f2e9a491baa2ae3fc00f6b8b892fa91a852ca1 PR: 252396 Approved by: re (gjb) (cherry picked from commit 64649f0285424435634c2dfd39f49536fc2b50dd) sys/contrib/openzfs/module/zfs/vdev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
The pull request was accepted and Martin got it merged in - I'm closing this. Thanks for the report Jason. And thanks to Martin for getting the patch in.