Bug 169319 - [zfs] zfs resilver can't complete
Summary: [zfs] zfs resilver can't complete
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-22 08:50 UTC by dirk.meyer
Modified: 2022-09-15 03:12 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dirk.meyer 2012-06-22 08:50:02 UTC
tank was created with 3 disks.
one disk was replaced and resilved completed.
tank was extended wuth 3 new disks.
one disk was replaced and resilved does not complete.
resilver keeps restarting roughly every 3 minutes.
No entry in dmesg or syslog.
smartctl -a on all disks reprts no problems.

This Problem was reported mutiple times:
http://forums.freebsd.org/showthread.php?t=27878
http://lists.freebsd.org/pipermail/freebsd-fs/2008-December/005532.html
http://kerneltrap.org/mailarchive/freebsd-fs/2008/12/27/4498564
http://forums.freenas.org/showthread.php?5008-ZFS-raidz1-resilver-restarts-after-1-min
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/42740
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/49144



 zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Sun Jun 17 22:17:19 2012
    52,6M scanned out of 3,38T at 1,64M/s, 599h21m to go
    7,27M resilvered, 0,00% done
config:

        NAME                        STATE     READ WRITE CKSUM
        tank                        DEGRADED     0     0     0
          raidz1-0                  ONLINE       0     0     0
            ada5p1                  ONLINE       0     0     0
            ada4p1                  ONLINE       0     0     0
            ada3p1                  ONLINE       0     0     0
          raidz1-1                  DEGRADED     0     0     0
            ada1p1                  ONLINE       0     0     0
            ada0p1                  ONLINE       0     0     0
            replacing-2             UNAVAIL      0     0     0
              16718347834481227056  UNAVAIL      0     0     0  was /dev/ada6p1/old
              ada6p1                ONLINE       0     0     0  (resilvering)

errors: No known data errors

Fix: 

No fix or workaround known.

1) detach the faulty disk did not help:

[...]
2012-06-19.20:58:11 [internal pool scrub done txg:3910984] complete=0
2012-06-19.20:58:11 [internal pool scrub txg:3910984] func=2 mintxg=3 maxtxg=3885254
2012-06-19.21:11:03 [internal pool scrub done txg:3911106] complete=0
2012-06-19.21:11:03 [internal pool scrub txg:3911106] func=2 mintxg=3 maxtxg=3885254
2012-06-19.21:13:26 [internal pool scrub done txg:3911128] complete=0
2012-06-19.21:13:26 [internal pool scrub txg:3911128] func=2 mintxg=3 maxtxg=3885254
2012-06-19.21:13:48 zpool offline tank 16718347834481227056
2012-06-19.21:16:15 [internal vdev detach txg:3911155] vdev=/dev/ada6p1/old
2012-06-19.21:16:15 zpool detach tank 16718347834481227056
2012-06-19.21:19:17 [internal pool scrub done txg:3911183] complete=0
2012-06-19.21:19:17 [internal pool scrub txg:3911183] func=2 mintxg=3 maxtxg=3885254
2012-06-19.21:20:18 [internal pool scrub done txg:3911192] complete=0
2012-06-19.21:20:18 [internal pool scrub txg:3911192] func=2 mintxg=3 maxtxg=3885254
2012-06-19.21:30:33 [internal pool scrub done txg:3911291] complete=0
2012-06-19.21:30:33 [internal pool scrub txg:3911291] func=2 mintxg=3 maxtxg=3885254
2012-06-19.21:34:32 [internal pool scrub done txg:3911328] complete=0
2012-06-19.21:34:32 [internal pool scrub txg:3911328] func=2 mintxg=3 maxtxg=3885254
[...]

2) removing the disk and scrub the drive did not help.

The scrub did complete without error, see "complete=1"

[...]
2012-06-20.06:46:38 [internal pool scrub done txg:3916369] complete=0
2012-06-20.06:46:38 [internal pool scrub txg:3916369] func=2 mintxg=3 maxtxg=3885254
2012-06-20.06:52:29 [internal pool scrub done txg:3916425] complete=0
2012-06-20.06:52:29 [internal pool scrub txg:3916425] func=2 mintxg=3 maxtxg=3885254
2012-06-20.06:57:04 zpool offline tank ada6p1
2012-06-20.07:11:35 zpool online tank 4320346087595370573
2012-06-20.09:03:19 [internal pool scrub done txg:3917680] complete=1
2012-06-20.17:36:49 zpool online tank ada6p1
2012-06-20.18:57:32 [internal pool scrub txg:3924817] func=2 mintxg=3 maxtxg=3924817
2012-06-20.18:57:43 [internal vdev attach txg:3924819] replace vdev=/dev/ada6p1 for vdev=/dev/ada6p1/old
2012-06-20.18:57:43 zpool replace tank ada6p1 ada6p1
2012-06-20.18:58:50 [internal pool scrub done txg:3924829] complete=0
2012-06-20.18:58:50 [internal pool scrub txg:3924829] func=2 mintxg=3 maxtxg=3924817
2012-06-20.19:00:12 [internal pool scrub done txg:3924843] complete=0
2012-06-20.19:00:13 [internal pool scrub txg:3924843] func=2 mintxg=3 maxtxg=3924817
2012-06-20.19:01:38 zpool offline tank ada6p1
2012-06-20.21:13:30 [internal pool scrub done txg:3926107] complete=1
2012-06-20.21:20:44 [internal pool scrub txg:3926198] func=2 mintxg=1616210 maxtxg=3926195
2012-06-20.21:20:57 zpool online tank ada6p1
2012-06-20.21:24:49 [internal pool scrub done txg:3926239] complete=0
2012-06-20.21:24:50 [internal pool scrub txg:3926239] func=2 mintxg=1616210 maxtxg=3926195
2012-06-20.21:27:49 [internal pool scrub done txg:3926266] complete=0
2012-06-20.21:27:49 [internal pool scrub txg:3926266] func=2 mintxg=1616210 maxtxg=3926195
2012-06-20.21:29:59 [internal pool scrub done txg:3926285] complete=0
2012-06-20.21:29:59 [internal pool scrub txg:3926285] func=2 mintxg=1616210 maxtxg=3926195
[...]

yy3) cleaning drive and replace again does not help.

[...]
2012-06-20.06:46:38 [internal pool scrub done txg:3916369] complete=0
2012-06-20.06:46:38 [internal pool scrub txg:3916369] func=2 mintxg=3 maxtxg=3885254
2012-06-20.06:52:29 [internal pool scrub done txg:3916425] complete=0
2012-06-20.06:52:29 [internal pool scrub txg:3916425] func=2 mintxg=3 maxtxg=3885254
2012-06-20.06:57:04 zpool offline tank ada6p1
2012-06-20.07:11:35 zpool online tank 4320346087595370573
2012-06-20.09:03:19 [internal pool scrub done txg:3917680] complete=1
2012-06-20.17:36:49 zpool online tank ada6p1
2012-06-20.18:57:32 [internal pool scrub txg:3924817] func=2 mintxg=3 maxtxg=3924817
2012-06-20.18:57:43 [internal vdev attach txg:3924819] replace vdev=/dev/ada6p1 for vdev=/dev/ada6p1/old
2012-06-20.18:57:43 zpool replace tank ada6p1 ada6p1
2012-06-20.18:58:50 [internal pool scrub done txg:3924829] complete=0
2012-06-20.18:58:50 [internal pool scrub txg:3924829] func=2 mintxg=3 maxtxg=3924817
2012-06-20.19:00:12 [internal pool scrub done txg:3924843] complete=0
2012-06-20.19:00:13 [internal pool scrub txg:3924843] func=2 mintxg=3 maxtxg=3924817
2012-06-20.19:01:38 zpool offline tank ada6p1
2012-06-20.21:13:30 [internal pool scrub done txg:3926107] complete=1
2012-06-20.21:20:44 [internal pool scrub txg:3926198] func=2 mintxg=1616210 maxtxg=3926195
2012-06-20.21:20:57 zpool online tank ada6p1
2012-06-20.21:24:49 [internal pool scrub done txg:3926239] complete=0
2012-06-20.21:24:50 [internal pool scrub txg:3926239] func=2 mintxg=1616210 maxtxg=3926195
2012-06-20.21:27:49 [internal pool scrub done txg:3926266] complete=0
[...]

4) debugging with a DTRACE enable kernel:

dtrace -q -n '::zfs_dbgmsg:entry{printf("%s;%d\n", stringof(arg0), arg1)}'

[...]
spa=%s async request task=%u;-2198978129920
doing scan sync txg %llu; bm=%llu/%llu/%llu/%llu;3931639
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;368
visited %llu blocks in %llums;9349
restarting resilver txg=%llu;3931640
internal %s pool:%s txg:%llu %s;-2123208961
spa=%s async request task=%u;-2198978129920
restarting scan func=%u txg=%llu;2
internal %s pool:%s txg:%llu %s;-2123209268
doing scan sync txg %llu; ddt bm=%llu/%llu/%llu/%llx;3931640
scanned %llu ddt entries with class_max = %u; pausing=%u;0
visited %llu blocks in %llums;2386
doing scan sync txg %llu; bm=%llu/%llu/%llu/%llu;3931641
doing scan sync txg %llu; bm=%llu/%llu/%llu/%llu;3931642
visited %llu blocks in %llums;1548
visited %llu blocks in %llums;1223
doing scan sync txg %llu; bm=%llu/%llu/%llu/%llu;3931643
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;18
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;21
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;234
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;44
visited %llu blocks in %llums;558593
doing scan sync txg %llu; bm=%llu/%llu/%llu/%llu;3931644
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;44
visited %llu blocks in %llums;355513
doing scan sync txg %llu; bm=%llu/%llu/%llu/%llu;3931645
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;44
visited %llu blocks in %llums;233292
doing scan sync txg %llu; bm=%llu/%llu/%llu/%llu;3931646
doing scan sync txg %llu; bm=%llu/%llu/%llu/%llu;3931647
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;44
visited %llu blocks in %llums;183353
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;44
visited %llu blocks in %llums;208309
doing scan sync txg %llu; bm=%llu/%llu/%llu/%llu;3931648
spa=%s async request task=%u;-2198978129920
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;44
visited %llu blocks in %llums;47458
spa=%s async request task=%u;-2198978129920
doing scan sync txg %llu; bm=%llu/%llu/%llu/%llu;3931649
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;44
scanned dataset %llu (%s) with min=%llu max=%llu; pausing=%u;368
visited %llu blocks in %llums;21488
restarting resilver txg=%llu;3931650
internal %s pool:%s txg:%llu %s;-2123208961
spa=%s async request task=%u;-2198978129920
restarting scan func=%u txg=%llu;2
[...]
How-To-Repeat: 
$ zpool history -i
2011-10-30.06:33:44 zpool create tank raidz1 ada1p1 ada2p1 ada3p1
(disk fail)
2011-11-29.09:38:28 zpool replace tank ada2p1 ada2p1
(reording of devices here)
2012-01-27.11:21:50 zpool add tank raidz1 ada0p1 ada1p1 ada3p1
(disk fail)
2012-06-16.11:44:50 zpool scrub tank
2012-06-17.22:17:32 zpool replace tank ada6p1 ada6p1
2012-06-17.22:55:26 [internal pool scrub done txg:3885584] complete=0
2012-06-17.22:55:26 [internal pool scrub txg:3885584] func=2 mintxg=3 maxtxg=3885254
2012-06-17.23:07:09 [internal pool scrub done txg:3885689] complete=0
2012-06-17.23:07:09 [internal pool scrub txg:3885689] func=2 mintxg=3 maxtxg=3885254
2012-06-17.23:11:29 [internal pool scrub done txg:3885725] complete=0
2012-06-17.23:11:29 [internal pool scrub txg:3885725] func=2 mintxg=3 maxtxg=3885254
2012-06-17.23:14:16 [internal pool scrub done txg:3885748] complete=0
2012-06-17.23:14:16 [internal pool scrub txg:3885748] func=2 mintxg=3 maxtxg=3885254
2012-06-17.23:56:03 [internal pool scrub done txg:3886098] complete=0
[...]
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2012-06-22 08:52:21 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:00:56 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 3 prasedenica 2022-09-15 03:12:42 UTC
MARKED AS SPAM