Bug 150390 - [zfs] zfs deadlock when arcmsr reports drive faulted
Summary: [zfs] zfs deadlock when arcmsr reports drive faulted
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-fs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-08 17:10 UTC by Rich Ercolani
Modified: 2018-01-10 09:50 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rich Ercolani 2010-09-08 17:10:02 UTC
System deadlocks 100% reliably when a disk is reported FAULTED in the arcmsr card.

dmesg looks like:
arcmsr0:block 'read/write' command with gone raid volume Cmd= 8, TargetId=1, Lun=5 
arcmsr0:block 'read/write' command with gone raid volume Cmd= 8, TargetId=1, Lun=5 
arcmsr0:block 'read/write' command with gone raid volume Cmd= 8, TargetId=1, Lun=5 

zpool and zfs-related commands, and all IO to the affected pool, hang forever in state D.

procstat reports:
[root@manticore ~]# ps aux | grep zpool
stump  3287  0.0  0.0 15700  1540   0  D+   12:03PM   0:00.00 zpool status
root   3286  0.0  0.0 15700  1528   1  T+   12:03PM   0:00.00 zpool status
root   3316  0.0  0.0  9120  1164   3  S+   12:07PM   0:00.00 grep zpool
[root@manticore ~]# procstat -k 3286
  PID    TID COMM             TDNAME           KSTACK                       
 3286 100484 zpool            -                mi_switch sleepq_wait _cv_wait spa_config_enter spa_config_generate spa_open_common spa_get_stats zfs_ioc_pool_stats zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscall Xfast_syscall 
[root@manticore ~]# procstat -k 3287
  PID    TID COMM             TDNAME           KSTACK                       
 3287 100532 zpool            -                mi_switch sleepq_wait _cv_wait spa_config_enter spa_config_generate spa_open_common spa_get_stats zfs_ioc_pool_stats zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscall Xfast_syscall

How-To-Repeat: 1) Have a disk fault on an arcmsr card.
2) Hang!
Comment 1 Rich Ercolani 2010-09-08 18:00:26 UTC
A neat update:
It's demonstrably the case that this only occurs when a disk is marked
FAULTED - if you physically remove a disk while the system is booted,
the disk is correctly removed from the list of disks in areca-cli, and
ZFS reports write errors but behaves correctly.

- Rich
Comment 2 Bruce Cran freebsd_committer 2010-09-11 15:57:59 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 3 Fabian Keil 2015-11-09 12:52:31 UTC
This could be the same issue as #203906.

If you can still reproduce the problem, the output of "procstat -kk -a" and "zpool status" (or a description of the pool layout) might help to confirm this.