Bug 223085 - ZFS Resilver not completing - stuck at 99%
Summary: ZFS Resilver not completing - stuck at 99%
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Bugmeister
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-18 10:48 UTC by Paul Houselander
Modified: 2025-01-19 07:07 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Houselander 2017-10-18 10:48:07 UTC
I have a number of FreeBSD system with large (30TB) ZFS pools.

I have had several disks fail over time and have seen problems with resilvers either not completing or getting to 99% within a week but then taking a further month to complete.

I have been seeking advice in the forums.

https://forums.freebsd.org/threads/61643/#post-355088

A system that has a disk replaced some time ago is in this state

  pool: s11d34
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Sep 14 15:08:15 2017
        49.4T scanned out of 49.8T at 17.7M/s, 6h13m to go
        4.93T resilvered, 99.24% done
config:

        NAME                             STATE     READ WRITE CKSUM
        s11d34                           DEGRADED     0     0     0
          raidz2-0                       ONLINE       0     0     0
            multipath/J11F18-1EJB8KUJ    ONLINE       0     0     0
            multipath/J11R01-1EJ2XT4F    ONLINE       0     0     0
            multipath/J11R02-1EHZE2GF    ONLINE       0     0     0
            multipath/J11R03-1EJ2XTMF    ONLINE       0     0     0
            multipath/J11R04-1EJ3NK4J    ONLINE       0     0     0
          raidz2-1                       DEGRADED     0     0     0
            multipath/J11R05-1EJ2Z8AF    ONLINE       0     0     0
            multipath/J11R06-1EJ2Z8NF    ONLINE       0     0     0
            replacing-2                  OFFLINE      0     0     0
              7444569586532474759        OFFLINE      0     0     0  was /dev/multipath/J11R07-1EJ03GXJ
              multipath/J11F23-1EJ3AJBJ  ONLINE       0     0     0  (resilvering)
            multipath/J11R08-1EJ3A0HJ    ONLINE       0     0     0
            multipath/J11R09-1EJ32UPJ    ONLINE       0     0     0

It got to 99.24% within a week but has stuck there since.

I have stopped ALL access to the pool and ran zpool iostat and there is still activity (although low e.g. 1.2M read, 1.78M write etc...) so it does appear to be doing something.

The disks (6TB or 8TB HGST SAS) are attached via an LSI 9207-8e HBA which is connected to a LSI 6160 SAS Switch that is connected to a Supermicro JBOD.

The HBA's have 2 connectors, each is connected to a different SAS switch.

The system sees the disk twice as expected and I use gmultipath to label the disks and set in Active/Passive mode, I then use the multipath name during zpool create e.g.

root@freebsd04:~ # gmultipath status
Name Status Components
multipath/J11R00-1EJ2XR5F OPTIMAL da0 (ACTIVE)
da11 (PASSIVE)
multipath/J11R01-1EJ2XT4F OPTIMAL da1 (ACTIVE)
da12 (PASSIVE)
multipath/J11R02-1EHZE2GF OPTIMAL da2 (ACTIVE)
da13 (PASSIVE)

zpool create -f store43 raidz2 multipath/J11R00-1EJ2XR5F multipath/J11R01-1EJ2XT4F etc.......

Any advice if this is a bug or something wrong with my setup?

Thanks

Paul
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2025-01-19 07:07:03 UTC
^Triage: I'm sorry that this PR did not get addressed in a timely fashion.

By now, the version that it was created against is long out of support.
As well, many newer versions of ZFS have been imported.

Please re-open if it is still a problem on a supported version.