Bug 252821

Summary: UFS filesystem panics after deadlock during mksnap_ffs
Product: Base System Reporter: ml
Component: kernAssignee: freebsd-fs (Nobody) <fs>
Status: Closed FIXED    
Severity: Affects Only Me CC: crest, lwhsu, mckusick
Priority: --- Keywords: crash
Version: 12.2-RELEASE   
Hardware: Any   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244048

Description ml 2021-01-19 09:22:55 UTC
Backstory of this is in #244048.
Briefly, a test VM with all debugs turned on (INVARIANT, WITNESS, etc...) deadlocked while taking a snapshot of a non-root filesystem and had to be reset.

Now I've got a snapshot where:
_ at reboot fsck is run on /, then system starts; after 1 minute a background fsck is run and this causes a panic;
_ I can boot in single user mode, run fsck -y on the filesystem and boot works again; however I get the same panic if I try taking a new snapshot.

This makes me think the filesystem is somehow ruined and fsck -y won't fix it.

Panic trace is:
panic: ffs_copyonwrite: bad copy block
cpuid = 0
time = 1581243816
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001beef0e0
vpanic() at vpanic+0x19d/frame 0xfffffe001beef130
panic() at panic+0x43/frame 0xfffffe001beef190
ffs_copyonwrite() at ffs_copyonwrite+0x74c/frame 0xfffffe001beef230
ffs_geom_strategy() at ffs_geom_strategy+0x8c/frame 0xfffffe001beef260
ufs_strategy() at ufs_strategy+0x83/frame 0xfffffe001beef290
VOP_STRATEGY_APV() at VOP_STRATEGY_APV+0xc9/frame 0xfffffe001beef2c0
bufstrategy() at bufstrategy+0x44/frame 0xfffffe001beef2f0
bufwrite() at bufwrite+0x230/frame 0xfffffe001beef330
ffs_snapshot() at ffs_snapshot+0x8e0/frame 0xfffffe001beef630
ffs_mount() at ffs_mount+0xb3a/frame 0xfffffe001beef7d0
vfs_domount() at vfs_domount+0x8b6/frame 0xfffffe001beef9f0
vfs_donmount() at vfs_donmount+0x7e7/frame 0xfffffe001beefa90
sys_nmount() at sys_nmount+0xf2/frame 0xfffffe001beefac0
amd64_syscall() at amd64_syscall+0x281/frame 0xfffffe001beefbf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe001beefbf0
--- syscall (378, FreeBSD ELF64, sys_nmount), rip = 0x8002d88ba, rsp = 0x7fffffffd288, rbp = 0x7fffffffeae0 ---
KDB: enter: panic

I can connect with remote GDB to this VM, but my knowledge of kernel internal is not enough to debug this all by myself.
Comment 1 Kirk McKusick freebsd_committer freebsd_triage 2021-02-21 23:34:59 UTC
This bug was reported and fixed in
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253158
It has not yet been, but will be MFC'ed to stable/12.
Comment 2 ml 2021-02-22 11:18:40 UTC
(In reply to Kirk McKusick from comment #1)

Thanks!

Would this patch affect #244048 too?
Comment 3 Kirk McKusick freebsd_committer freebsd_triage 2021-02-23 05:38:30 UTC
(In reply to ml from comment #2)
Almost certainly this fix will solve the panic reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244048 too.
Comment 4 ml 2021-02-23 08:23:58 UTC
(In reply to Kirk McKusick from comment #3)

Sorry, I wasn't clear.
Of course this will fix the panic in #244048, as this bug is just a spin off to better describe it.
I was asking if the mentioned patch could help with the main subject of that bug, i.e. the deadlock (which was the root cause of the later panic).
Comment 5 Kirk McKusick freebsd_committer freebsd_triage 2021-02-23 21:32:25 UTC
(In reply to ml from comment #4)
I do not think that the main problem reported in #244048 (apparently running out of buffers in the buffer cache) will be fixed by this patch. It is not clear to me what changed between the 11-stable release and the 12-stable release to cause the buffer problem. At this point I would consider trying out the 13 release to see if the problem is still present in that distribution.
Comment 6 Kirk McKusick freebsd_committer freebsd_triage 2021-02-23 21:51:56 UTC
See also suggestion to try patch in https://reviews.freebsd.org/D28901.
Comment 7 Kirk McKusick freebsd_committer freebsd_triage 2021-05-22 23:52:26 UTC
The panic described in this report has been fixed in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253158. The buffer cache hanging was addressed in https://reviews.freebsd.org/D28901.