Bug 249579 - [ZFS] Can't resume a zfs receive stream to a dataset with a mounted clone
Summary: [ZFS] Can't resume a zfs receive stream to a dataset with a mounted clone
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.1-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: Alan Somers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-24 16:14 UTC by Alan Somers
Modified: 2021-06-17 21:21 UTC (History)
1 user (show)

See Also:


Attachments
In iter_dependents_cb, don't recurse into clones of the destination (1.42 KB, patch)
2020-09-24 17:06 UTC, Alan Somers
no flags Details | Diff
Fix unmount/remount when resuming a receive stream (797 bytes, patch)
2020-09-26 02:29 UTC, Alan Somers
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alan Somers freebsd_committer freebsd_triage 2020-09-24 16:14:42 UTC
On FreeBSD stable/12, you cannot receive a resumed ZFS stream into a dataset that has a mounted clone.  The bug was likely introduced by r364412.  The problem is that when receiving libzfs tries to unmount any dataset whose mountpoint might be changed.  Such datasets include all children of the destination, as well as all clones of those children.  Clones of the destination itself SHOULD NOT be included, but libzfs includes them anyway.  Datasets whose mountpoint property is locally set also SHOULD NOT be included, but libzfs seems to include them anyway, too.  The problem is not reproducible on head (which has switched to OpenZFS), because OpenZFS's libzfs does not try to unmount datasets when receiving a stream.  I don't know why not.

Steps to reproduce:
> sudo zpool create tank vtbd1
> sudo zfs create tank/src
> sudo dd if=/dev/zero bs=1m count=1024 of=/tank/src/zerofile
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 8.282515 secs (129639593 bytes/sec)
> sudo zfs snapshot tank/src@1
> sudo zfs send -R tank/src@1 | sudo zfs recv -vs tank/dst
receiving full stream of tank/src@1 into tank/dst@1
received 1.00GB stream in 4 seconds (257MB/sec)
> sudo zfs clone tank/dst@1 tank/clone
> # In another shell, cd to /tank/clone
> sudo dd if=/dev/zero bs=1m count=1024 of=/tank/src/zerofile2
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 5.961812 secs (180103269 bytes/sec)
> sudo zfs snapshot tank/src@2
> sudo zfs send -i tank/src@1 tank/src@2 | head -c 536870912 | sudo zfs receive  -vs tank/dst
receiving incremental stream of tank/src@2 into tank/dst@2
warning: cannot send 'tank/src@2': signal received
cannot receive incremental stream: checksum mismatch or incomplete stream.
Partially received snapshot is saved.
A resuming stream can be generated on the sending system by running:
    zfs send -t 1-XXXXXX
> sudo zfs send -t 1-XXXXXX | sudo zfs receive -vs tank/dst
cannot unmount '/tank/clone': Device busy
Comment 1 Alan Somers freebsd_committer freebsd_triage 2020-09-24 17:06:43 UTC
Created attachment 218246 [details]
In iter_dependents_cb, don't recurse into clones of the destination
Comment 2 Alan Somers freebsd_committer freebsd_triage 2020-09-24 17:39:29 UTC
I'm guessing that https://github.com/openzfs/zfs/commit/0c6d09361d is the reason why I can't reproduce this problem on head.  But I still don't know why OpenZFS isn't vulnerable to bug 248606 .
Comment 3 Alan Somers freebsd_committer freebsd_triage 2020-09-24 22:22:05 UTC
That patch works, but unfortunately breaks the ability to do "zfs destroy -R" of a snapshot with clones.
Comment 4 Alan Somers freebsd_committer freebsd_triage 2020-09-26 02:29:45 UTC
Created attachment 218303 [details]
Fix unmount/remount when resuming a receive stream

zfs: Fix resuming receive stream to dataset with mounted clone

My fix for bug 248606 (zfs receive: Input/output error accessing dataset
after resuming interrupted receive), r364412, introduced a regression:
attempting to resume a receive into a dataset with a mounted clone would
fail if that clone were in-use.  This change reverts r364412 and fixes it in
a better way.

Background:
When ZFS receives a stream, it may decide to unmount and remount the
destination and all of its children.  However, ever since resumable
send/receive was implemented, ZFS has skipped the unmount/remount step when
resuming a stream.  I don't know why.

That let to bug 248606.  When resuming the stream, ZFS didn't unmount and
remount the destination, leaving a destroyed dataset mounted.

My original fix was to always unmount and remount when resuming a receive,
but that caused other problems, like bug 249579.  A better solution is to
unmount and remount when resuming a receive of a stream that would've
unmounted and remounted when it was new.

Direct commit to stable/12 because head has moved to OpenZFS.  The bug
exists there, too, but a change to the OpenZFS code can't be merged to the
old ZFS code.

PR: 249579

Test Plan: ZFS test suite
Comment 5 Matt Macy freebsd_committer freebsd_triage 2020-09-26 02:41:08 UTC
(In reply to Alan Somers from comment #4)
Based on my understanding of the code and your description of the problem this change looks fine to me.
Comment 6 commit-hook freebsd_committer freebsd_triage 2020-09-26 02:51:25 UTC
A commit references this bug:

Author: asomers
Date: Sat Sep 26 02:50:29 UTC 2020
New revision: 366180
URL: https://svnweb.freebsd.org/changeset/base/366180

Log:
  zfs: Fix resuming receive stream to dataset with mounted clone

  My fix for bug 248606 (zfs receive: Input/output error accessing dataset
  after resuming interrupted receive), r364412, introduced a regression:
  attempting to resume a receive into a dataset with a mounted clone would
  fail if that clone were in-use.  This change reverts r364412 and fixes it in
  a better way.

  Background:
  When ZFS receives a stream, it may decide to unmount and remount the
  destination and all of its children.  However, ever since resumable
  send/receive was implemented, ZFS has skipped the unmount/remount step when
  resuming a stream.  I don't know why.

  That let to bug 248606.  When resuming the stream, ZFS didn't unmount and
  remount the destination, leaving a destroyed dataset mounted.

  My original fix was to always unmount and remount when resuming a receive,
  but that caused other problems, like bug 249579.  A better solution is to
  unmount and remount when resuming a receive of a stream that would've
  unmounted and remounted when it was new.

  Direct commit to stable/12 because head has moved to OpenZFS.  The bug
  exists there, too, but a change to the OpenZFS code can't be merged to the
  old ZFS code.

  PR:		249579
  Reviewed by:	mmacy
  MFC after:	1 week
  Sponsored by:	Axcient

Changes:
  stable/12/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_sendrecv.c