Bug 178349 - [zfs] zfs scrub on deduped data could be much less seeky
Summary: [zfs] zfs scrub on deduped data could be much less seeky
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.1-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-05 19:00 UTC by Nathaniel Filardo
Modified: 2024-12-17 06:38 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Nathaniel Filardo 2013-05-05 19:00:00 UTC
ZFS tries to save time in scrubbing by visiting data in
most-referenced-to-least-referenced order (so that it need not visit a block
once for each reference to it): in short, it scans the DDT for all blocks
with refcount >1 and then walks the on-disk tree to visit refcount==1
blocks.  Unfortunately, the first phase is apparently prone to being very
seeky, resulting in agonizingly slow scrubs and resilvers (my disks all get
18-25 ops/sec during this phase, for a grand total of ~1.5MB/sec from my
raidz2; later traversals are much more respectable at 35MB/sec or so).  It
would be better, I think, if the scrub logic traversed the DDT with a
measure of on-disk locality (though this will, naturally, take several
passes to visit all blocks).

A straightforward way to do this, though by no means necessarily the best,
would be to allocate in RAM a fixed-size sorted queue of visited block
pointers and ignore block pointers that fell outside the min and max of this
queue (rather like the HAMMER2 lazy deduplication logic, amusingly enough).
Upon visiting a block pointer, it would be inserted into the queue and may
displace a higher address (which will be unnecessarily revisited later, but
that's OK), but will thereby restrict this pass to a narrower region of the
disk, reducing the number of long-distance seeks.  When a pass over the DDT
has finished, if the queue's max is still infinity, no additional passes are
needed; otherwise, the max of the queue should be made the min, the max
should be reset to infinity, and another pass over the DDT should be made.

The current bookmarking scheme is sufficient to resume this game, as well, I
think, with the understanding that all blocks in the DDT whose on-disk
location is greater than the bookmark are still due for scan (i.e. when
resuming, use the bookmark as the min of the queue and initialize the max to
infinity).

It may make sense, rather than tracking exact block pointers in the queue,
to mask off some number of bits from the bottom of their addresses and track
those values instead.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2013-05-05 19:21:16 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:01:33 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 3 Michael Dexter freebsd_triage 2024-12-17 06:38:54 UTC
"Fast Dedupe" addresses many ZFS deduplication issues: https://github.com/openzfs/zfs/discussions/15896

Please re-open if issue is not addressed by fast dedupe.