Bug 244656 - ZFS: resilver doesn't provide enough information (progress, activity type, disks involved) in some circumstances
Summary: ZFS: resilver doesn't provide enough information (progress, activity type, di...
Status: Closed Not Accepted
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: Graham Perrin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-07 15:24 UTC by Peter Eriksson
Modified: 2023-04-08 18:58 UTC (History)
4 users (show)

See Also:
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Peter Eriksson 2020-03-07 15:24:19 UTC
FreeBSD 12.1-RELEASE-p2

One of our servers seems to have developed some kind of problem in one of the ZFS pools. It claims to be resilvering data, but I can't see any progress at all in the output from "zpool status". "zpool iostat" and "iostat -x" does indicate a lot of reads being done though.


# zpool status -v DATA2
  pool: DATA2
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Mar  7 13:12:06 2020
	10.4M scanned at 0/s, 56.1M issued at 0/s, 390T total
	0 resilvered, 0.00% done, no estimated completion time
config:
	NAME                      STATE     READ WRITE CKSUM
	DATA2                     ONLINE       0     0     0
	  raidz2-0                ONLINE       0     0     0
	    diskid/DISK-7PH8LBKG  ONLINE       0     0     0
	    diskid/DISK-7PH8N1EG  ONLINE       0     0     0
	    diskid/DISK-7PH3EZDG  ONLINE       0     0     0
	    diskid/DISK-7PGN0VJG  ONLINE       0     0     0
	    diskid/DISK-7PH8LVMG  ONLINE       0     0     0
	    diskid/DISK-7PH8MZVG  ONLINE       0     0     0
	    diskid/DISK-7PH5RASG  ONLINE       0     0     0
	    diskid/DISK-7PH3EVJG  ONLINE       0     0     0
	  raidz2-1                ONLINE       0     0     0
	    diskid/DISK-7PH3EXRG  ONLINE       0     0     0
	    diskid/DISK-7PH8LV5G  ONLINE       0     0     0
	    diskid/DISK-7PH3EX2G  ONLINE       0     0     0
	    diskid/DISK-7PH77Z6G  ONLINE       0     0     0
	    diskid/DISK-7PH8LWXG  ONLINE       0     0     0
	    diskid/DISK-7PH81L1G  ONLINE       0     0     0
	    diskid/DISK-7PH8N0BG  ONLINE       0     0     0
	    diskid/DISK-7PH8MNMG  ONLINE       0     0     0
	  raidz2-2                ONLINE       0     0     0
	    diskid/DISK-7PH3EWHG  ONLINE       0     0     0
	    diskid/DISK-7PH8LTWG  ONLINE       0     0     0
	    diskid/DISK-7PH5R02G  ONLINE       0     0     0
	    diskid/DISK-7PH8L88G  ONLINE       0     0     0
	    diskid/DISK-7PH81MUG  ONLINE       0     0     0
	    diskid/DISK-7PH8LXMG  ONLINE       0     0     0
	    diskid/DISK-7PH8N26G  ONLINE       0     0     0
	    diskid/DISK-7PH93ZWG  ONLINE       0     0     0
	  raidz2-3                ONLINE       0     0     0
	    diskid/DISK-7PH8LEVG  ONLINE       0     0     0
	    diskid/DISK-7PH8MYKG  ONLINE       0     0     0
	    diskid/DISK-7PH8MXRG  ONLINE       0     0     0
	    diskid/DISK-7PH4TB1G  ONLINE       0     0     0
	    diskid/DISK-7PH8G8XG  ONLINE       0     0     0
	    diskid/DISK-7PH9SP9G  ONLINE       0     0     0
	    diskid/DISK-7PH8LARG  ONLINE       0     0     0
	    diskid/DISK-7PH8KRYG  ONLINE       0     0     0
	  raidz2-4                ONLINE       0     0     0
	    diskid/DISK-7PH8LWMG  ONLINE       0     0     0
	    diskid/DISK-7PH8LVXG  ONLINE       0     0     0
	    diskid/DISK-7PH8LU1G  ONLINE       0     0     0
	    diskid/DISK-7PH9451G  ONLINE       0     0     0
	    diskid/DISK-7PH8ESMG  ONLINE       0     0     0
	    diskid/DISK-7PH3EWVG  ONLINE       0     0     0
	    diskid/DISK-7PH8EMKG  ONLINE       0     0     0
	    diskid/DISK-7PH8LG9G  ONLINE       0     0     0
	  raidz2-5                ONLINE       0     0     0
	    diskid/DISK-7PH93X6G  ONLINE       0     0     0
	    diskid/DISK-7PH8MXJG  ONLINE       0     0     0
	    diskid/DISK-7PH3EY8G  ONLINE       0     0     0
	    diskid/DISK-7PHAHG3G  ONLINE       0     0     0
	    diskid/DISK-7PH8LVYG  ONLINE       0     0     0
	    diskid/DISK-7PH8L8KG  ONLINE       0     0     0
	    diskid/DISK-7PH8LVNG  ONLINE       0     0     0
	    diskid/DISK-7PH8L9SG  ONLINE       0     0     0
	spares
	  diskid/DISK-7PH8G10G    AVAIL   
	  diskid/DISK-7PH8GJ2G    AVAIL   
	  diskid/DISK-7PH8MZ8G    AVAIL   
	  diskid/DISK-7PGRM7TG    AVAIL   
	  diskid/DISK-7PH7UUAG    AVAIL   
	  diskid/DISK-7PH8GJTG    AVAIL   

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x0>
        <metadata>:<0xc49c12>
        <metadata>:<0x19>
        <metadata>:<0x1d>
        <metadata>:<0x20>
        <metadata>:<0x65714b>
        <metadata>:<0x657157>
        <metadata>:<0x65717b>
        <metadata>:<0x657193>
        <metadata>:<0xc530fe>
        DATA2/filur06.it.liu.se/DATA/old/isy/nobackup-server@auto-2020-01-21.23:00:00:/cvl/GARNICS_DATA/20120123sorted/plant__69683/pos1/cam9181310_TS1327623708156387.tif




# procstat -kka|egrep zfs
    0 100212 kernel              zfsvfs              mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 taskqueue_thread_loop+0xf1 fork_exit+0x83 fork_trampoline+0xe 
    0 100519 kernel              zfs_vn_rele_taskq   mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 taskqueue_thread_loop+0xf1 fork_exit+0x83 fork_trampoline+0xe 
    0 100891 kernel              zfs_vn_rele_taskq   mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 taskqueue_thread_loop+0xf1 fork_exit+0x83 fork_trampoline+0xe 
    0 102051 kernel              zfs_vn_rele_taskq   mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 taskqueue_thread_loop+0xf1 fork_exit+0x83 fork_trampoline+0xe 
   32 100190 zfskern             solthread 0xfffffff mi_switch+0xe2 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a zthr_procedure+0xff fork_exit+0x83 fork_trampoline+0xe 
   32 100191 zfskern             solthread 0xfffffff mi_switch+0xe2 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a zthr_procedure+0xff fork_exit+0x83 fork_trampoline+0xe 
   32 100192 zfskern             arc_dnlc_evicts_thr mi_switch+0xe2 sleepq_wait+0x2c _cv_wait+0x152 arc_dnlc_evicts_thread+0x14f fork_exit+0x83 fork_trampoline+0xe 
   32 100194 zfskern             dbuf_evict_thread   mi_switch+0xe2 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a dbuf_evict_thread+0x1c8 fork_exit+0x83 fork_trampoline+0xe 
   32 100211 zfskern             l2arc_feed_thread   mi_switch+0xe2 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a l2arc_feed_thread+0x239 fork_exit+0x83 fork_trampoline+0xe 
   32 100482 zfskern             trim zroot          mi_switch+0xe2 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a trim_thread+0x120 fork_exit+0x83 fork_trampoline+0xe 
   32 100552 zfskern             txg_thread_enter    mi_switch+0xe2 sleepq_wait+0x2c _cv_wait+0x152 txg_quiesce_thread+0xbb fork_exit+0x83 fork_trampoline+0xe 
   32 100553 zfskern             txg_thread_enter    mi_switch+0xe2 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a txg_sync_thread+0x47f fork_exit+0x83 fork_trampoline+0xe 
   32 100554 zfskern             solthread 0xfffffff mi_switch+0xe2 sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0x117 fork_exit+0x83 fork_trampoline+0xe 
   32 100555 zfskern             solthread 0xfffffff mi_switch+0xe2 sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0x117 fork_exit+0x83 fork_trampoline+0xe 
   32 100808 zfskern             trim DATA2          mi_switch+0xe2 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a trim_thread+0x120 fork_exit+0x83 fork_trampoline+0xe 
   32 101517 zfskern             txg_thread_enter    mi_switch+0xe2 sleepq_wait+0x2c _cv_wait+0x152 txg_quiesce_thread+0xbb fork_exit+0x83 fork_trampoline+0xe 
   32 101518 zfskern             txg_thread_enter    mi_switch+0xe2 sleepq_wait+0x2c _cv_wait+0x152 zio_wait+0x9b dbuf_read+0x669 dnode_hold_impl+0x1af dmu_bonus_hold+0x1d dsl_deadlist_open+0x49 dsl_dataset_hold_obj+0x3d5 dsl_scan_sync+0xf65 spa_sync+0xb57 txg_sync_thread+0x238 fork_exit+0x83 fork_trampoline+0xe 
   32 101519 zfskern             solthread 0xfffffff mi_switch+0xe2 sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0x117 fork_exit+0x83 fork_trampoline+0xe 
   32 101520 zfskern             solthread 0xfffffff mi_switch+0xe2 sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0x117 fork_exit+0x83 fork_trampoline+0xe 
   32 102005 zfskern             trim DATA3          mi_switch+0xe2 sleepq_timedwait+0x2f _cv_timedwait_sbt+0x17a trim_thread+0x120 fork_exit+0x83 fork_trampoline+0xe 
   32 102261 zfskern             txg_thread_enter    mi_switch+0xe2 sleepq_wait+0x2c _cv_wait+0x152 txg_quiesce_thread+0xbb fork_exit+0x83 fork_trampoline+0xe 
   32 102262 zfskern             txg_thread_enter    mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 taskqueue_quiesce+0x114 dsl_scan_sync+0xd2a spa_sync+0xb57 txg_sync_thread+0x238 fork_exit+0x83 fork_trampoline+0xe 
   32 102263 zfskern             solthread 0xfffffff mi_switch+0xe2 sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0x117 fork_exit+0x83 fork_trampoline+0xe 
  32 102264 zfskern             solthread 0xfffffff mi_switch+0xe2 sleepq_wait+0x2c _cv_wait+0x152 zthr_procedure+0x117 fork_exit+0x83 fork_trampoline+0xe 
   32 102265 zfskern             zvol DATA3/sekur-is mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 zvol_geom_worker+0x16d fork_exit+0x83 fork_trampoline+0xe 
   32 102266 zfskern             zvol DATA3/sekur-is mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 zvol_geom_worker+0x16d fork_exit+0x83 fork_trampoline+0xe 
   32 102267 zfskern             zvol DATA3/sekur-is mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 zvol_geom_worker+0x16d fork_exit+0x83 fork_trampoline+0xe 
...
   32 102415 zfskern             zvol DATA3/sekur-is mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 zvol_geom_worker+0x16d fork_exit+0x83 fork_trampoline+0xe 
   32 102416 zfskern             zvol DATA3/sekur-is mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 zvol_geom_worker+0x16d fork_exit+0x83 fork_trampoline+0xe 
 2640 100568 zfsd                -                   mi_switch+0xe2 sleepq_catch_signals+0x425 sleepq_wait_sig+0xf _cv_wait_sig+0x154 seltdwait+0xbf kern_poll+0x43d sys_poll+0x50 amd64_syscall+0x364 fast_syscall_common+0x101 
20980 100567 zfs.orig            -                   zap_cursor_retrieve+0x70 zap_value_search+0x8f dsl_dataset_get_snapname+0x90 dsl_dataset_name+0x34 dsl_dataset_stats+0xf4 dmu_objset_stats+0x1d zfs_ioc_objset_stats_impl+0x50 zfs_ioc_dataset_list_next+0x143 zfsdev_ioctl+0x72e devfs_ioctl+0xad VOP_IOCTL_APV+0x7c vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x2be sys_ioctl+0x15d amd64_syscall+0x364 fast_syscall_common+0x101 



# zpool iostat -v DATA2 10
                             capacity     operations    bandwidth
pool                      alloc   free   read  write   read  write
------------------------  -----  -----  -----  -----  -----  -----
DATA2                      390T  44.8T    871      0  3.68M  5.56K
  raidz2                  66.4T  6.10T    167      0   726K    883
    diskid/DISK-7PH8LBKG      -      -     22      0  92.6K    287
    diskid/DISK-7PH8N1EG      -      -     22      0  92.4K    296
    diskid/DISK-7PH3EZDG      -      -     22      0  93.6K    284
    diskid/DISK-7PGN0VJG      -      -     22      0  92.6K    248
    diskid/DISK-7PH8LVMG      -      -     22      0  92.9K    251
    diskid/DISK-7PH8MZVG      -      -     22      0  93.2K    269
    diskid/DISK-7PH5RASG      -      -     22      0  92.5K    284
    diskid/DISK-7PH3EVJG      -      -     22      0  92.3K    287
  raidz2                  66.3T  6.16T    163      0   714K    883
    diskid/DISK-7PH3EXRG      -      -     21      0  91.7K    248
    diskid/DISK-7PH8LV5G      -      -     21      0  91.9K    251
    diskid/DISK-7PH3EX2G      -      -     21      0  91.0K    275
    diskid/DISK-7PH77Z6G      -      -     21      0  92.1K    287
    diskid/DISK-7PH8LWXG      -      -     21      0  90.8K    320
    diskid/DISK-7PH81L1G      -      -     21      0  91.0K    284
    diskid/DISK-7PH8N0BG      -      -     21      0  91.7K    263
    diskid/DISK-7PH8MNMG      -      -     21      0  91.5K    266
  raidz2                  66.3T  6.22T    166      0   719K   1013
    diskid/DISK-7PH3EWHG      -      -     22      0  92.3K    246
    diskid/DISK-7PH8LTWG      -      -     21      0  91.0K    231
    diskid/DISK-7PH5R02G      -      -     21      0  91.8K    231
    diskid/DISK-7PH8L88G      -      -     21      0  93.1K    219
    diskid/DISK-7PH81MUG      -      -     21      0  92.1K    225
    diskid/DISK-7PH8LXMG      -      -     21      0  92.1K    237
    diskid/DISK-7PH8N26G      -      -     21      0  92.5K    237
    diskid/DISK-7PH93ZWG      -      -     21      0  91.0K    240
  raidz2                  66.4T  6.14T    132      0   581K   1013
    diskid/DISK-7PH8LEVG      -      -     17      0  73.0K    237
    diskid/DISK-7PH8MYKG      -      -     17      0  73.3K    219
    diskid/DISK-7PH8MXRG      -      -     17      0  73.4K    192
    diskid/DISK-7PH4TB1G      -      -     17      0  74.5K    222
    diskid/DISK-7PH8G8XG      -      -     17      0  73.9K    231
    diskid/DISK-7PH9SP9G      -      -     17      0  73.4K    219
    diskid/DISK-7PH8LARG      -      -     17      0  73.8K    201
    diskid/DISK-7PH8KRYG      -      -     17      0  73.6K    231
  raidz2                  62.4T  10.1T     69      0   298K   1013
    diskid/DISK-7PH8LWMG      -      -      9      0  38.4K    210
    diskid/DISK-7PH8LVXG      -      -      9      0  38.3K    225
    diskid/DISK-7PH8LU1G      -      -      9      0  37.7K    213
    diskid/DISK-7PH9451G      -      -      9      0  38.2K    210
    diskid/DISK-7PH8ESMG      -      -      9      0  38.2K    237
    diskid/DISK-7PH3EWVG      -      -      9      0  37.8K    240
    diskid/DISK-7PH8EMKG      -      -      9      0  38.1K    228
    diskid/DISK-7PH8LG9G      -      -      9      0  38.2K    213
  raidz2                  62.7T  10.1T    172      0   733K    883
    diskid/DISK-7PH93X6G      -      -     22      0  94.1K    246
    diskid/DISK-7PH8MXJG      -      -     22      0  93.7K    266
    diskid/DISK-7PH3EY8G      -      -     22      0  93.2K    257
    diskid/DISK-7PHAHG3G      -      -     22      0  93.9K    281
    diskid/DISK-7PH8LVYG      -      -     22      0  93.9K    281
    diskid/DISK-7PH8L8KG      -      -     22      0  94.4K    237
    diskid/DISK-7PH8LVNG      -      -     22      0  93.1K    243
    diskid/DISK-7PH8L9SG      -      -     22      0  93.4K    266
------------------------  -----  -----  -----  -----  -----  -----
Comment 1 Peter Eriksson 2020-03-07 18:29:15 UTC
Oh, after many hours it seems to have finished:

# zpool status DATA2
  pool: DATA2
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
	still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(7) for details.
  scan: resilvered 152K in 0 days 06:07:55 with 0 errors on Sat Mar  7 19:20:01 2020
config:
..
	  diskid/DISK-7PH8GJTG    AVAIL   

errors: No known data errors

(The reason it didn't have all features enabled was that I upgraded this server from 11.3-p6 till 12.1-p2 as one of way to try to fix the problem...


# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH ALTROOT
DATA2   435T   390T  44.8T        -         -    23%    89%  1.00x  ONLINE  -
DATA3   145T  75.5T  69.5T        -         -    11%    52%  1.00x  ONLINE  -
zroot   113G  35.5G  77.5G        -         -    56%    31%  1.00x  ONLINE  -


and now zfs commands don't hang any more...

Now, a wild guess is that it was busy destroying defered-destroyed snapshots. 

Since before I restarted this server we had many thousands of those "deleted" with "zfs destroy -d FS@SNAPSHOT". Hmm.. I wonder if it would be possible to make this more visible somehow - perhaps a "destroyed-snapshots" counter via a sysctl or something? And/or have some way to pause the snapshot-deletion thread (similar to "zfs scrub -p").
Comment 2 Alan Somers freebsd_committer freebsd_triage 2023-01-22 21:34:36 UTC
This isn't FreeBSD specific.  I think you should open a feature request upstream at https://github.com/openzfs/zfs and close this issue.
Comment 3 Graham Perrin freebsd_committer freebsd_triage 2023-04-08 18:58:44 UTC
(In reply to Alan Somers from comment #2)

Agreed, especially given the transition to OpenZFS: 

* an upstream report may be appropriate. 

Thank you