System deadlocks 100% reliably when a disk is reported FAULTED in the arcmsr card. dmesg looks like: arcmsr0:block 'read/write' command with gone raid volume Cmd= 8, TargetId=1, Lun=5 arcmsr0:block 'read/write' command with gone raid volume Cmd= 8, TargetId=1, Lun=5 arcmsr0:block 'read/write' command with gone raid volume Cmd= 8, TargetId=1, Lun=5 zpool and zfs-related commands, and all IO to the affected pool, hang forever in state D. procstat reports: [root@manticore ~]# ps aux | grep zpool stump 3287 0.0 0.0 15700 1540 0 D+ 12:03PM 0:00.00 zpool status root 3286 0.0 0.0 15700 1528 1 T+ 12:03PM 0:00.00 zpool status root 3316 0.0 0.0 9120 1164 3 S+ 12:07PM 0:00.00 grep zpool [root@manticore ~]# procstat -k 3286 PID TID COMM TDNAME KSTACK 3286 100484 zpool - mi_switch sleepq_wait _cv_wait spa_config_enter spa_config_generate spa_open_common spa_get_stats zfs_ioc_pool_stats zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscall Xfast_syscall [root@manticore ~]# procstat -k 3287 PID TID COMM TDNAME KSTACK 3287 100532 zpool - mi_switch sleepq_wait _cv_wait spa_config_enter spa_config_generate spa_open_common spa_get_stats zfs_ioc_pool_stats zfsdev_ioctl devfs_ioctl_f kern_ioctl ioctl syscall Xfast_syscall How-To-Repeat: 1) Have a disk fault on an arcmsr card. 2) Hang!
A neat update: It's demonstrably the case that this only occurs when a disk is marked FAULTED - if you physically remove a disk while the system is booted, the disk is correctly removed from the list of disks in areca-cli, and ZFS reports write errors but behaves correctly. - Rich
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s).
This could be the same issue as #203906. If you can still reproduce the problem, the output of "procstat -kk -a" and "zpool status" (or a description of the pool layout) might help to confirm this.
^Triage: feedback timeout (>1 year).