Bug 219537 - OpenZFS 8166 - zpool scrub thinks it repaired offline device
Summary: OpenZFS 8166 - zpool scrub thinks it repaired offline device
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Glen Barber
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-25 16:36 UTC by g_amanakis
Modified: 2017-06-06 14:47 UTC (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description g_amanakis 2017-05-25 16:36:39 UTC
The bug was first described in ZfsOnLinux, see here:
https://github.com/zfsonlinux/zfs/commit/335b251ac1a1f8ba8434450dc0f24986bc44f688

The way to reproduce it is found here:
https://github.com/zfsonlinux/zfs/issues/5806

I can reproduce it on FreeBSD 11.0-RELEASE.

It was resolved by code in the first link. As far as I can tell it has not landed in CURRENT, yet. Could somebody look into this?

Thank you,
George
Comment 1 Allan Jude freebsd_committer freebsd_triage 2017-05-27 02:08:44 UTC
This only landed upstream 3 days ago, but it is a small fix so we should be able to grab it easily.
Comment 2 Allan Jude freebsd_committer freebsd_triage 2017-05-27 02:10:45 UTC
This should get merged in time for 11.1 since it is low risk and high impact.
Comment 3 g_amanakis 2017-06-06 00:27:01 UTC
Is it possible to MFC this one (MFV r318942) to 11.1-RELEASE?

Thank you,
George
Comment 4 commit-hook freebsd_committer freebsd_triage 2017-06-06 14:47:15 UTC
A commit references this bug:

Author: gjb
Date: Tue Jun  6 14:46:23 UTC 2017
New revision: 319624
URL: https://svnweb.freebsd.org/changeset/base/319624

Log:
  MFC r318943 (avg):

   MFV r318942: 8166 zpool scrub thinks it repaired offline device

   https://www.illumos.org/issues/8166
    If we do a scrub while a leaf device is offline (via "zpool offline"),
    we will inadvertently clear the DTL (dirty time log) of the offline
    device, even though it is still damaged. When the device comes back
    online, we will incompletely resilver it, thinking that the scrub
    repaired blocks written before the scrub was started. The incomplete
    resilver can lead to data loss if there is a subsequent failure of a
    different leaf device.
    The fix is to never clear the DTL of offline devices. Note that if a
    device is onlined while a scrub is in progress, the scrub will be
    restarted.
    The problem can be worked around by running "zpool scrub" after
    "zpool online".
    See also https://github.com/zfsonlinux/zfs/issues/5806

  PR:		219537
  Approved by:	re (kib)
  Sponsored by:	The FreeBSD Foundation

Changes:
_U  stable/11/
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
Comment 5 commit-hook freebsd_committer freebsd_triage 2017-06-06 14:47:18 UTC
A commit references this bug:

Author: gjb
Date: Tue Jun  6 14:46:46 UTC 2017
New revision: 319625
URL: https://svnweb.freebsd.org/changeset/base/319625

Log:
  MFC r318943 (avg):

   MFV r318942: 8166 zpool scrub thinks it repaired offline device

   https://www.illumos.org/issues/8166
    If we do a scrub while a leaf device is offline (via "zpool offline"),
    we will inadvertently clear the DTL (dirty time log) of the offline
    device, even though it is still damaged. When the device comes back
    online, we will incompletely resilver it, thinking that the scrub
    repaired blocks written before the scrub was started. The incomplete
    resilver can lead to data loss if there is a subsequent failure of a
    different leaf device.
    The fix is to never clear the DTL of offline devices. Note that if a
    device is onlined while a scrub is in progress, the scrub will be
    restarted.
    The problem can be worked around by running "zpool scrub" after
    "zpool online".
    See also https://github.com/zfsonlinux/zfs/issues/5806

  PR:		219537
  Sponsored by:	The FreeBSD Foundation

Changes:
_U  stable/10/
  stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c