264127 – "zfs destroy" panic when destroying some snapshots but not all

Bug 264127 - "zfs destroy" panic when destroying some snapshots but not all

Summary: "zfs destroy" panic when destroying some snapshots but not all

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	misc (show other bugs)
Version:	13.1-RELEASE
Hardware:	amd64 Any

Importance:	--- Affects Only Me
Assignee:	freebsd-fs (Nobody)

URL:
Keywords:	crash

Depends on:
Blocks:

Reported:	2022-05-21 16:12 UTC by Ariel Millennium Thornton
Modified:	2022-10-17 02:30 UTC (History)
CC List:	3 users (show)

See Also:

Attachments
kgdb /boot/kernel/kernel vmcore.7 \| tee kgdb.txt (53.95 KB, text/plain) 2022-05-21 16:12 UTC, Ariel Millennium Thornton	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ariel Millennium Thornton 2022-05-21 16:12:45 UTC

Created attachment 234083 [details]
kgdb /boot/kernel/kernel vmcore.7 | tee kgdb.txt

I'm getting a kernel panic when trying to destroy snapshots in one specific zfs dataset in my pool.  For example:

zfs destroy -d zroot/var/mail@zfs-auto-snap_015m-2022-04-17-16h45

The system is an HP 22-df0023w all-in-one PC I'm using as a home desktop.  I installed 13.0-RELEASE amd64 on it, upgraded it to 13.0-RELEASE-p11 when the problem began manifesting in April 2022, and upgraded it again to 13.1-RELEASE on May 16, 2022.  It's running the stock kernel, very few customizations not documented in the FreeBSD Handbook, and zfstools (until I commented out the zfs-auto-snapshot cron jobs).

When I ran the example zfs command normally as root, I got a panic and recovered the first part of the attachment from the core dump.

When I commented out the following lines in /etc/rc.conf:

kld_list="amdgpu"
xdm_enable="YES"

Then rebooted and tried again with the same zfs command, I got another panic and recovered the second part of the attachment from that core dump.

I suspect I would be able to boot into the installer image and use it as a rescue system to destroy the troublesome snapshots, but I'm willing to keep them for a while if it allows the cause of this bug to be found and fixed.

Comment 1 Graham Perrin freebsd_committer

2022-05-21 18:12:04 UTC

When did you last upgrade packages?

Also, please share outputs from these commands: 

pkg -vv | grep -e url -e enabled

pkg info -x drm gpu-firmware

Comment 2 Ariel Millennium Thornton 2022-05-21 19:36:39 UTC

(In reply to Graham Perrin from comment #1)

I last upgraded packages on May 15, the day before the OS upgrade.  I didn't think to check and upgrade packages immediately following because freebsd-update(8) didn't suggest I should.

[arielmt@swiftpaw] arielmt $ pkg -vv | grep -e url -e enabled
    url             : "pkg+http://pkg.FreeBSD.org/FreeBSD:13:amd64/quarterly",
    enabled         : yes,
[arielmt@swiftpaw] arielmt $ pkg info -x drm gpu-firmware
drm-fbsd13-kmod-5.4.144.g20220223
drm-kmod-g20190710_1
libdrm-2.4.110,1
gpu-firmware-kmod-g20210330
[arielmt@swiftpaw] arielmt $

Comment 3 Ariel Millennium Thornton 2022-08-27 16:46:59 UTC

New information:

I booted into the installer image's "Live CD" option as a rescue system and tried to destroy one of the troublesome zfs snapshots today.  I imported the pool with:

# mkdir /tmp/swiftpaw
# zpool import -R /tmp/swiftpaw zroot

After getting the name of a snapshot, I tried:

# zfs destroy -d zroot/var/mail@zfs-auto-snap_015m-2022-04-16-08h45

However, this caused a kernel panic as well.  I don't know how to capture a core dump when booted into the installer image.

The installer image I booted into is the same one used to install FreeBSD initially.  It was obtained from https://download.freebsd.org/ftp/releases/ISO-IMAGES/13.0/FreeBSD-13.0-RELEASE-amd64-memstick.img

Comment 4 Kurt Jaeger freebsd_committer

2022-08-27 17:04:22 UTC

Does a zpool scrub work ?

Comment 5 Ariel Millennium Thornton 2022-08-27 20:38:03 UTC

(In reply to Kurt Jaeger from comment #4)

Yes, with nothing bad found.

# zpool status
  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 02:02:05 with 0 errors on Sat Aug 27 13:18:45 2022
config:

        NAME          STATE     READ WRITE CKSUM
        zroot         ONLINE       0     0     0
          ada0p4.eli  ONLINE       0     0     0

errors: No known data errors

Comment 6 Ariel Millennium Thornton 2022-10-16 20:12:07 UTC

This may have been a corruption in my pool instead of a bug in zfs.

I found that the only snapshot in my zroot/var/mail dataset that didn't cause a kernel panic when destroyed was the most recently created.  Destroying every snapshot in reverse order, from newest to oldest, was successful.  After that, creating snapshots then destroying the oldest snapshot worked as expected.

I reenabled my zfs-auto-snap cron jobs, and their `zfs destroy -d` actions have so far all been quietly successful.

Comment 7 Graham Perrin freebsd_committer

2022-10-16 20:28:35 UTC

(In reply to Ariel Millennium Thornton from comment #6)

> … the only snapshot in my zroot/var/mail dataset that didn't cause a 
> kernel panic when destroyed was the most recently created. …

What was the date of creation of that most recent snapshot?

Now, I'm reminded of a ZFS snapshot-related bug that (if I recall correctly) was fixed in recent months, however it was impossible for the fix to be retroactive … something like that. 

Annoyingly, I can't recall enough details to identify the issue/bug, to tell whether it relates. Anyone? (Is my memory playing tricks on me?)

Comment 8 Ariel Millennium Thornton 2022-10-17 02:30:45 UTC

(In reply to Graham Perrin from comment #7)

I don't remember the exact date/time, but it was one or two snapshots after the one I've been trying, created by a separate zfs-auto-snap hourly job instead.