Bug 203906 - ZFS lockup, spa_namespace_lock
Summary: ZFS lockup, spa_namespace_lock
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-fs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-20 23:15 UTC by Adrian Palmer
Modified: 2015-10-28 17:34 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Palmer 2015-10-20 23:15:55 UTC
zpool/zfs utilities hang when referencing a now, non-existant volume that has not cleaned itself up in the kernel.  Makes zfs administration useless.  Cannot recover failed volume till reboot.

$ uname -a
FreeBSD Vortex 10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

$ bzcat messages.*.bz2 | grep -rn ATA  
... //trimmed
1065:Sep 25 21:12:09 Vortex kernel: ada4: <ST31000333AS CC3H> ATA-8 SATA 2.x device
1067:Sep 25 21:12:09 Vortex kernel: ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
1073:Sep 25 21:12:30 Vortex kernel: (aprobe0:siisch0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
1086:Sep 25 21:13:00 Vortex kernel: (aprobe0:siisch0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
1091:Sep 25 21:13:30 Vortex kernel: (aprobe0:siisch0:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
1095:Sep 25 21:13:45 Vortex kernel: (aprobe0:siisch0:0:0:0): CAM status: ATA Status Error
1096:Sep 25 21:13:45 Vortex kernel: (aprobe0:siisch0:0:0:0): ATA status: 00 ()
1100:Sep 25 21:14:01 Vortex kernel: (aprobe0:siisch0:0:0:0): CAM status: ATA Status Error
1101:Sep 25 21:14:01 Vortex kernel: (aprobe0:siisch0:0:0:0): ATA status: 00 ()
...


$ ps -ax|grep D+
38986  8- D+        0:00.00 zpool iostat 1 10
42143  0- D+        0:00.01 zpool list
42210  1- D+        0:00.01 zpool iostat 1 10
42242  2- D+        0:00.01 zfs list
51117  4- D+        0:00.01 ls
65547  5- D+        0:00.01 zfs unmount -f zStorage1
45501 11  S+        0:00.01 grep D+
65624  9- D+        0:00.01 zfs list
65496  6- D+        0:00.01 ls zStorage1


$ procstat -kk -a 
  PID    TID COMM             TDNAME           KSTACK                       
 1252 100078 csh              -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_sigsuspend+0xf4 sys_sigsuspend+0x31 amd64_syscall+0x351 Xfast_syscall+0xfb 
38986 100168 zpool            -                mi_switch+0xe1 sleepq_wait+0x3a _sx_xlock_hard+0x48a _sx_xlock+0x5d spa_all_configs+0x6e zfs_ioc_pool_configs+0x19 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb 
41360 101772 csh              -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_sigsuspend+0xf4 sys_sigsuspend+0x31 amd64_syscall+0x351 Xfast_syscall+0xfb 
41661 100159 gdb              -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_wait6+0x5f4 sys_wait4+0x72 amd64_syscall+0x351 Xfast_syscall+0xfb 
41662 101773 zpool            -                mi_switch+0xe1 sleepq_wait+0x3a _sx_xlock_hard+0x48a _sx_xlock+0x5d spa_all_configs+0x6e zfs_ioc_pool_configs+0x19 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb 
42143 100174 zpool            -                mi_switch+0xe1 sleepq_wait+0x3a _sx_xlock_hard+0x48a _sx_xlock+0x5d spa_all_configs+0x6e zfs_ioc_pool_configs+0x19 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb 
42242 101757 zfs              -                mi_switch+0xe1 sleepq_wait+0x3a _sx_xlock_hard+0x48a _sx_xlock+0x5d spa_all_configs+0x6e zfs_ioc_pool_configs+0x19 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb 
45485 101761 csh              -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_sigsuspend+0xf4 sys_sigsuspend+0x31 amd64_syscall+0x351 Xfast_syscall+0xfb 
45553 100079 procstat         -                <running>                    
51117 101760 ls               -                mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x16d dbuf_read+0x45b dmu_buf_hold+0x3d zap_lockdir+0x46 zap_cursor_retrieve+0x131 zfs_freebsd_readdir+0x3e1 VOP_READDIR_APV+0xa7 kern_getdirentries+0x21c sys_getdirentries+0x28 amd64_syscall+0x351 Xfast_syscall+0xfb 
65149 101762 csh              -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x16a tty_wait+0x1c ttydisc_read+0x2d4 ttydev_read+0x86 devfs_read_f+0xeb dofileread+0x95 kern_readv+0x68 sys_read+0x63 amd64_syscall+0x351 Xfast_syscall+0xfb 
65496 101767 ls               -                mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x16d dbuf_read+0x45b dmu_buf_hold+0x3d zap_lockdir+0x46 zap_cursor_retrieve+0x131 zfs_freebsd_readdir+0x3e1 VOP_READDIR_APV+0xa7 kern_getdirentries+0x21c sys_getdirentries+0x28 amd64_syscall+0x351 Xfast_syscall+0xfb 
65624 101755 zfs              -                mi_switch+0xe1 sleepq_wait+0x3a _sx_xlock_hard+0x48a _sx_xlock+0x5d spa_all_configs+0x6e zfs_ioc_pool_configs+0x19 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb 


$ gdb zpool
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found)...
(gdb) r list
Starting program: /sbin/zpool list
(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...(no debugging symbols found)...[New LWP 101773]
(no debugging symbols found)...(no debugging symbols found)...
load: 0.14  cmd: zpool 41662 [spa_namespace_lock] 9.17r 0.05u 0.03s 0% 3220k
load: 0.11  cmd: zpool 41662 [spa_namespace_lock] 18011.15r 0.05u 0.03s 0% 3220k
load: 0.64  cmd: zpool 41662 [spa_namespace_lock] 82405.76r 0.05u 0.03s 0% 3220k

Any and all commands that reference zpool volume zStorage1 (on ada4, mounted on /vol/zStorage1) enter D+ state, and ctrl-T (before I exit the session) indicates  the process is stuck in spa_namespace_lock.


The umount [brutally] succeeded (via normal umount, camcontrol reset and hot plug on the SATA drive), but ls still hangs on that directory.  Drive responds normally via camcontrol.


Other zpools/filesystems are operational -- I've resisted rebooting because I only have one affected volume (which I would like back).  This is reoccurring on reboots: my system lasts for about 2 weeks before I start seeing errors related to ZFS.

Read https://wiki.freebsd.org/AvgZfsDeadlockDebug with little help.

Point me in a direction, and I can certainly help research the codebase.
Comment 1 Fabian Keil 2015-10-28 13:46:54 UTC
Can you please attach the complete "procstat -kk -a" output?

I suspect that this is the same issue as:
https://lists.freebsd.org/pipermail/freebsd-current/2014-September/052018.html
Analysis continues at:
https://lists.freebsd.org/pipermail/freebsd-current/2015-October/057982.html

If that's the case and if your applications can handle it,
changing the pool's failmode from "wait" to "continue" may be
worth a try.
Comment 2 Adrian Palmer 2015-10-28 17:34:03 UTC
$ procstat -kk -a 
  PID    TID COMM             TDNAME           KSTACK                       
 1252 100078 csh              -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_sigsuspend+0xf4 sys_sigsuspend+0x31 amd64_syscall+0x351 Xfast_syscall+0xfb 
38986 100168 zpool            -                mi_switch+0xe1 sleepq_wait+0x3a _sx_xlock_hard+0x48a _sx_xlock+0x5d spa_all_configs+0x6e zfs_ioc_pool_configs+0x19 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb 
41360 101772 csh              -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_sigsuspend+0xf4 sys_sigsuspend+0x31 amd64_syscall+0x351 Xfast_syscall+0xfb 
41661 100159 gdb              -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_wait6+0x5f4 sys_wait4+0x72 amd64_syscall+0x351 Xfast_syscall+0xfb 
41662 101773 zpool            -                mi_switch+0xe1 sleepq_wait+0x3a _sx_xlock_hard+0x48a _sx_xlock+0x5d spa_all_configs+0x6e zfs_ioc_pool_configs+0x19 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb 
42143 100174 zpool            -                mi_switch+0xe1 sleepq_wait+0x3a _sx_xlock_hard+0x48a _sx_xlock+0x5d spa_all_configs+0x6e zfs_ioc_pool_configs+0x19 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb 
42242 101757 zfs              -                mi_switch+0xe1 sleepq_wait+0x3a _sx_xlock_hard+0x48a _sx_xlock+0x5d spa_all_configs+0x6e zfs_ioc_pool_configs+0x19 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb 
45485 101761 csh              -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _sleep+0x27d kern_sigsuspend+0xf4 sys_sigsuspend+0x31 amd64_syscall+0x351 Xfast_syscall+0xfb 
45553 100079 procstat         -                <running>                    
51117 101760 ls               -                mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x16d dbuf_read+0x45b dmu_buf_hold+0x3d zap_lockdir+0x46 zap_cursor_retrieve+0x131 zfs_freebsd_readdir+0x3e1 VOP_READDIR_APV+0xa7 kern_getdirentries+0x21c sys_getdirentries+0x28 amd64_syscall+0x351 Xfast_syscall+0xfb 
65149 101762 csh              -                mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x16a tty_wait+0x1c ttydisc_read+0x2d4 ttydev_read+0x86 devfs_read_f+0xeb dofileread+0x95 kern_readv+0x68 sys_read+0x63 amd64_syscall+0x351 Xfast_syscall+0xfb 
65496 101767 ls               -                mi_switch+0xe1 sleepq_wait+0x3a _cv_wait+0x16d dbuf_read+0x45b dmu_buf_hold+0x3d zap_lockdir+0x46 zap_cursor_retrieve+0x131 zfs_freebsd_readdir+0x3e1 VOP_READDIR_APV+0xa7 kern_getdirentries+0x21c sys_getdirentries+0x28 amd64_syscall+0x351 Xfast_syscall+0xfb 
65624 101755 zfs              -                mi_switch+0xe1 sleepq_wait+0x3a _sx_xlock_hard+0x48a _sx_xlock+0x5d spa_all_configs+0x6e zfs_ioc_pool_configs+0x19 zfsdev_ioctl+0x6f0 devfs_ioctl_f+0x114 kern_ioctl+0x255 sys_ioctl+0x13c amd64_syscall+0x351 Xfast_syscall+0xfb 


There isn't much running on the machine at this point.  I'm waiting for ZFS to be a bit more stable.

I'll reconfigure the zpool failmode when I reboot.  It may take a while for the problem to replicate.