Bug 249985 - kernel panic at shutdown in zfs_acl_free() and list_remove(), related crash at snapshot removal
Summary: kernel panic at shutdown in zfs_acl_free() and list_remove(), related crash a...
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-29 13:48 UTC by Guido Falsi
Modified: 2022-02-01 21:10 UTC (History)
3 users (show)

See Also:


Attachments
dump info file (493 bytes, text/plain)
2020-09-29 13:48 UTC, Guido Falsi
no flags Details
core dump details (73.02 KB, text/plain)
2020-09-29 13:48 UTC, Guido Falsi
no flags Details
Crash info for crash duting pkg upgrade (156.13 KB, text/plain)
2020-10-29 13:27 UTC, Guido Falsi
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Guido Falsi freebsd_committer freebsd_triage 2020-09-29 13:48:00 UTC
Created attachment 218405 [details]
dump info file

Hi,

On my laptop I'm experiencing some kernel panics, mainly at startup or shutdown. I've got a good dump from a panic happening in shutdown, I'm attaching the core.txt and info files.

The panic happens every 3-4 shutdowns, and sometimes at startup, I'll also add a dump from the startup crash as soon as I can get one.

The machine is running r366077, with a GENERIC-NODEBUG kernel. I did test the ram and it looks fine.

Also this machine already had a zfs elated issue due to a shutdown panic some time ago, it was refusing to boot with "zfs: allocating allocated segment". I reinstalled the OS from scratch at the time.

I am unable to tell much more, apart from the clear connection to ZFS.


If any further information is needed please ask. I do have the core file, so if some further investigation is needed I can try that, if told what to look at.


Thanks in advance.
Comment 1 Guido Falsi freebsd_committer freebsd_triage 2020-09-29 13:48:25 UTC
Created attachment 218406 [details]
core dump details
Comment 2 Guido Falsi freebsd_committer freebsd_triage 2020-10-29 08:47:00 UTC
I wanted to followup in this.

While it's not definitive I discovered, when updating the machine from old in kernel ZFS to in kernel OpenZFS I actually did not perform the "zpool upgrade".

I did that at a later time (a few days ago). And I have not seen a crash since. This is not definite because too few days have passed and I can't still rule out a crash in the next few days, but a pattern is showing.

The system was crashing also with the old in kernel ZFS code.

Maybe the same problem was present there and is still lurking in OpenZFS, but is mitigated by the presence of a new feature flag?
Comment 3 Guido Falsi freebsd_committer freebsd_triage 2020-10-29 13:27:28 UTC
Created attachment 219200 [details]
Crash info for crash duting pkg upgrade

I spoke too early. Just after sending the previous comment the machine crashed during "pkg upgrade". It was extracting content of a package.

I'm attaching info about this last crash.

I update to new head since, it was r366077 as in the previous dump.
Comment 4 Guido Falsi freebsd_committer freebsd_triage 2021-03-04 20:45:21 UTC
I have updated my laptop to main-n245104-dfff1de729b and have not seen these crashes for some time.

I'll keep an eye on this but it is possible the recent OpenZFS imports have fixed this issue.

I'm leaving this one open for now, but will close if the issue does not show anymore for a while.
Comment 5 Guido Falsi freebsd_committer freebsd_triage 2021-04-03 21:04:58 UTC
While less frequent I am still seeing these panics sporadically, so I'm leaving this one open.
Comment 6 Guido Falsi freebsd_committer freebsd_triage 2021-04-17 12:39:32 UTC
By chance I discovered something interesting.

Now the machine is regularly down this:

# zfs list -t snapshot
internal error: cannot iterate filesystems: Invalid argument
Abort (core dumped)

(backtrace at the end of this comment, but I don't think this one is interesting)

I tracked this down to a single snapshot that looks corrupted, if I try to analyze it with zfs zfs crashes, 

If I try to destroy that snashot with:

zfs destroy zroot/var/mail@2021-03-14_18.00.00--1w

I cause a kernel panic, backtrace also at end of message.

What I gather from this panic is that the openzfs code is returning EINVAL at 
I cause a kernel panic, backtrace also at end of message. I don't know enough about ZFS to understand more than this, unluckily.

Some more information:

> uname -a
FreeBSD ubik.madpilot.net 14.0-CURRENT FreeBSD 14.0-CURRENT main-n246069-112f007e128 MPNET  amd64


The machine is an acer laptop, the disk is an nvd(4) device, and I'm running it eli encrypted, the disk layout was created by the installer when 13 was still current.

I'm actually curious if there is a way to recover from this condition. I'll try experimenting with zdb to see if I can gather some details about why this snapshot causes a crash.

-----

zfs.core backtrace:

#0  0x00000008015dd4ba in thr_kill () from /lib/libc.so.7
#1  0x0000000801552de4 in raise () from /lib/libc.so.7
#2  0x0000000801606dc9 in abort () from /lib/libc.so.7
#3  0x000000080112e75e in zfs_standard_error_fmt () from /lib/libzfs.so.4
#4  0x000000080112e2b5 in zfs_standard_error () from /lib/libzfs.so.4
#5  0x00000008011175a3 in zfs_iter_snapshots () from /lib/libzfs.so.4
#6  0x0000000001031182 in ?? ()
#7  0x00000008011172c2 in zfs_iter_filesystems () from /lib/libzfs.so.4
#8  0x000000000103114d in ?? ()
#9  0x00000008011172c2 in zfs_iter_filesystems () from /lib/libzfs.so.4
#10 0x000000000103114d in ?? ()
#11 0x00000008011092f9 in zfs_iter_root () from /lib/libzfs.so.4
#12 0x0000000001030968 in ?? ()
#13 0x000000000103454c in ?? ()
#14 0x000000000103145e in ?? ()
#15 0x00000000010303df in ?? ()
#16 0x0000000001030300 in ?? ()
#17 0x0000000000000000 in ?? ()


-----

kernel panic backtrace

panic: VERIFY3(0 == dsl_dataset_hold_obj(dp, dsl_dataset_phys(ds_next)->ds_next_snap_obj, FTAG, &ds_nextnext)) failed (0 == 22)

cpuid = 7
time = 1618658184
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00b55e22b0
vpanic() at vpanic+0x181/frame 0xfffffe00b55e2300
spl_panic() at spl_panic+0x3a/frame 0xfffffe00b55e2360
dsl_destroy_snapshot_sync_impl() at dsl_destroy_snapshot_sync_impl+0xbf6/frame 0xfffffe00b55e2440
dsl_destroy_snapshot_sync() at dsl_destroy_snapshot_sync+0x4e/frame 0xfffffe00b55e2480
zcp_synctask_destroy() at zcp_synctask_destroy+0xb0/frame 0xfffffe00b55e24c0
zcp_synctask_wrapper() at zcp_synctask_wrapper+0xee/frame 0xfffffe00b55e2510
luaD_precall() at luaD_precall+0x25f/frame 0xfffffe00b55e25e0
luaV_execute() at luaV_execute+0xf88/frame 0xfffffe00b55e2660
luaD_call() at luaD_call+0x1b3/frame 0xfffffe00b55e26a0
luaD_rawrunprotected() at luaD_rawrunprotected+0x53/frame 0xfffffe00b55e2740
luaD_pcall() at luaD_pcall+0x37/frame 0xfffffe00b55e2790
lua_pcallk() at lua_pcallk+0xa6/frame 0xfffffe00b55e27d0
zcp_eval_impl() at zcp_eval_impl+0xbc/frame 0xfffffe00b55e2800
dsl_sync_task_sync() at dsl_sync_task_sync+0xb4/frame 0xfffffe00b55e2830
dsl_pool_sync() at dsl_pool_sync+0x43b/frame 0xfffffe00b55e28b0
spa_sync() at spa_sync+0xafe/frame 0xfffffe00b55e2ae0
txg_sync_thread() at txg_sync_thread+0x3b3/frame 0xfffffe00b55e2bb0
fork_exit() at fork_exit+0x7d/frame 0xfffffe00b55e2bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00b55e2bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
Comment 7 Guido Falsi freebsd_committer freebsd_triage 2022-02-01 21:10:36 UTC
I've since reinstalled the machine from scratch and has not seen this bug anymore.

Due to this I'm closing this bug report, since I'm unable to reproduce it. Maybe it was really caused by some data corruption, maybe caused by a then existing bug or unlucky circumstances.