Bug 231211 - [zfs] possible deadlock triggered by zfs test suite
Summary: [zfs] possible deadlock triggered by zfs test suite
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-07 04:08 UTC by Li-Wen Hsu
Modified: 2019-11-26 19:18 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Li-Wen Hsu freebsd_committer freebsd_triage 2018-09-07 04:08:22 UTC
When running zfs test suite in bhyve, it usually panics with following message:

panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffff8006b642000, blocked for 180024 ticks

VM image and core files (vmcore.1 is used in this bug report, vmcore.0 is previous run, also panicked in the same point):

https://people.freebsd.org/~lwhsu/zfs-deadlock/

Some ddb outputs:

https://gist.github.com/lwhsu/88bce6ffaa2ccc5e8da4fe186dbeb54f

Also note that there might be another issue:

chain 96:                                                                     
 thread 100230 (pid 0, zio_null_intr) blocked on lockmgr (null)EXCL                     
thread -559038242 (pid 268435455, pppppppppppppppppppppppppppppppppppppppppppppsecondarycache) ??? (0xdeadc0de)
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2018-09-14 18:01:58 UTC
Is this still reproducible? The job seems to be running to completion: https://ci.freebsd.org/job/FreeBSD-head-amd64-test_zfs/
Comment 2 Stefan Rink 2019-06-15 10:52:29 UTC
I hit this bug on a bhyve with UFS filesystem on 13-current!

 thread 100377 (pid 36752, sh) blocked on lockmgr ufsEXCL
 thread 100078 (pid 22, syncer) blocked on lockmgr bufwaitEXCL

It's still in KDB but I can only access the console via VNC so can't copy/paste text, dump it or make screenshots.

Trace of the sh process that started this;
sched_switch()
mi_switch()
sleepq_switch()
sleepq_wait()
sleeplk()
lockmgr_slock_hard()
__lockmgr_args()
ffs_lock()
VOP_LOCK1_APV()
_vn_lock()
vget()
cache_lookup()
vfs_cache_lookup()
VOP_LOOKUP_APV()
lookup()
namei()
vn_open_cred()
kern_openat()
amd64_syscall() - 




Need any more info?
Comment 3 jesper 2019-11-26 19:18:25 UTC
Reproducible crash on my RockPro64 with 13-CURRENT/aarch64. There isn't any ZFS drives attached at this point. I thought it was the mmc/sd controller freaking out (it's only been bootable at all for a few weeks now and the clocks aren't much better on this board) but... who knows.

root@generic:~ # zpool import
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
mmcsd0: Error indicated: 4 Failed
rockchip_dwmmc0: Failed to update clk

# At this point, a very long time passes with all IO on the system seemingly dead

panic: deadlres_td_sleep_q: possible deadlock detected for 0xfffffd000280e560, blocked for 1802665 ticks

cpuid = 3
time = 1574727302
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
         pc = 0xffff00000072a50c  lr = 0xffff0000001066c8
         sp = 0xffff00005f73c580  fp = 0xffff00005f73c790

db_trace_self_wrapper() at vpanic+0x18c
         pc = 0xffff0000001066c8  lr = 0xffff000000400eb8
         sp = 0xffff00005f73c7a0  fp = 0xffff00005f73c850

vpanic() at panic+0x44
         pc = 0xffff000000400eb8  lr = 0xffff000000400c68
         sp = 0xffff00005f73c860  fp = 0xffff00005f73c8e0

panic() at deadlkres+0x314
         pc = 0xffff000000400c68  lr = 0xffff00000039ce80
         sp = 0xffff00005f73c8f0  fp = 0xffff00005f73c940

deadlkres() at fork_exit+0x7c
         pc = 0xffff00000039ce80  lr = 0xffff0000003c11bc
         sp = 0xffff00005f73c950  fp = 0xffff00005f73c980

fork_exit() at fork_trampoline+0x10
         pc = 0xffff0000003c11bc  lr = 0xffff0000007471a4
         sp = 0xffff00005f73c990  fp = 0x0000000000000000

KDB: enter: panic
[ thread pid 0 tid 100073 ]
Stopped at      0
db>