Bug 171415 - [zfs] zfs recv fails with "cannot receive incremental stream: invalid backup stream"
Summary: [zfs] zfs recv fails with "cannot receive incremental stream: invalid backup ...
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Bugmeister
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-07 17:00 UTC by Martin Birgmeier
Modified: 2025-01-19 07:31 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Birgmeier 2012-09-07 17:00:10 UTC
I have a machine "hal", running release/8.2.0, which is equipped with 6 TB disks, labeled disk31..disk36. These are more or less split in half using gpart(8), yielding partitions disk31p3..disk36p3 and disk31p4..disk36p4, each slightly less than 1 TB in size.

The first set of these halves, disk31p3..disk36p3, is combined into a raidz2 zpool "hal.1", yielding approximately 4 TB. This pool is used for production. It was created on 2010-10-22, with zpool version 14 and zfs version 3 ("zpool get version hal.1", "zfs get version hal.1").

The second set of these halves, disk31p4..disk36p4, is exported as iSCSI targets using net/istgt. These are used for experimental purposes.

On hal, using emulators/virtualbox-ose, I run a virtual machine v903, which I usually keep quite close to head. In v903, using iscontrol(8), I import the iSCSI targets, yielding (in order) da0..da5.

I want to see what will happen when I create a zpool under head and try to duplicate the file systems in hal.1 to it. To test this with the file system "hal.1/backup/databases", I start v903 and on it run:

[0]# kldload iscsi_initiator.ko
iscsi: version 2.3.1
[0]# for i in disk3{1..6}p4
do
echo "*** $i ***"
iscontrol -n ${i} || break 
sleep 1
done
*** disk31p4 ***
iscontrol[1257]: running
iscontrol[1257]: (pass2:iscsi0:0:0:0):  tagged openings now 0
*** disk32p4 ***
iscontrol[1262]: running
iscontrol[1262]: (pass3:iscsi1:0:0:0):  tagged openings now 0
*** disk33p4 ***
iscontrol[1267]: running
iscontrol[1267]: (pass4:iscsi2:0:0:0):  tagged openings now 0
*** disk34p4 ***
iscontrol[1272]: running
iscontrol[1272]: (pass5:iscsi3:0:0:0):  tagged openings now 0
*** disk35p4 ***
iscontrol[1277]: running
iscontrol[1277]: (pass6:iscsi4:0:0:0):  tagged openings now 0
*** disk36p4 ***
iscontrol[1282]: running
iscontrol[1282]: (pass7:iscsi5:0:0:0):  tagged openings now 0
[0]# zpool create v903.2 raidz2 da0 da1 da2 da3 da4 da5
ZFS NOTICE: Prefetch is disabled by default if less than 4GB of RAM is present;
            to enable, add "vfs.zfs.prefetch_disable=0" to /boot/loader.conf.
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
[0]# zfs create v903.2/backup
[0]# rsh -n hal "zfs send -vR hal.1/backup/databases@2012-07-21.14:23:36" | zfs receive -v v903.2/backup/databases
sending from @ to hal.1/backup/databases@2011-05-01.08:34:45
receiving full stream of hal.1/backup/databases@2011-05-01.08:34:45 into v903.2/backup/databases@2011-05-01.08:34:45
sending from @2011-05-01.08:34:45 to hal.1/backup/databases@2011-06-02.20:42:45
received 48.2MB stream in 13 seconds (3.71MB/sec)
receiving incremental stream of hal.1/backup/databases@2011-06-02.20:42:45 into v903.2/backup/databases@2011-06-02.20:42:45
sending from @2011-06-02.20:42:45 to hal.1/backup/databases@2011-10-15.13:56:24
received 254MB stream in 70 seconds (3.63MB/sec)
receiving incremental stream of hal.1/backup/databases@2011-10-15.13:56:24 into v903.2/backup/databases@2011-10-15.13:56:24
sending from @2011-10-15.13:56:24 to hal.1/backup/databases@2011-10-27.19:21:00
received 268MB stream in 75 seconds (3.57MB/sec)
receiving incremental stream of hal.1/backup/databases@2011-10-27.19:21:00 into v903.2/backup/databases@2011-10-27.19:21:00
sending from @2011-10-27.19:21:00 to hal.1/backup/databases@2012-03-19.21:39:06
received 305MB stream in 82 seconds (3.72MB/sec)
receiving incremental stream of hal.1/backup/databases@2012-03-19.21:39:06 into v903.2/backup/databases@2012-03-19.21:39:06
sending from @2012-03-19.21:39:06 to hal.1/backup/databases@2012-07-21.14:23:36
received 345MB stream in 101 seconds (3.42MB/sec)
receiving incremental stream of hal.1/backup/databases@2012-07-21.14:23:36 into v903.2/backup/databases@2012-07-21.14:23:36
cannot receive incremental stream: invalid backup stream
rsh -n hal "zfs send -vR hal.1/backup/databases@2012-07-21.14:23:36"  0.22s user 21.94s system 5% cpu 6:33.67 total
zfs receive -v v903.2/backup/databases  0.02s user 5.46s system 1% cpu 6:37.24 total
[1]# 

As can be seen, importing the snapshot created on 2012-07-21 fails.

On hal, I have the following:

[0]# zpool history | grep backup/databases
2010-12-04.10:37:37 zfs create hal.1/backup/databases
2011-05-01.08:34:45 zfs snapshot hal.1/backup/databases@2011-05-01.08:34:45
2011-06-02.20:42:46 zfs snapshot hal.1/backup/databases@2011-06-02.20:42:45
2011-10-15.13:56:24 zfs snapshot hal.1/backup/databases@2011-10-15.13:56:24
2011-10-27.19:21:00 zfs snapshot hal.1/backup/databases@2011-10-27.19:21:00
2012-03-19.21:39:07 zfs snapshot hal.1/backup/databases@2012-03-19.21:39:06
2012-07-21.14:23:37 zfs snapshot hal.1/backup/databases@2012-07-21.14:23:36
[0]# 

Googling reveals very little information about this problem, one possibility might be http://wesunsolve.net/bugid/id/7002362, but as can be seen above, this file system has never been renamed.

I have tried a similar command also with another filesystem in hal.1, which already fails when receiving the first intermediate snapshot.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2012-09-07 20:52:59 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 Martin Birgmeier 2012-09-08 07:50:34 UTC
There is one more interesting hit with Google: 
https://wikis.oracle.com/display/FishWorks/ak-2011.04.24.2.0+Release+Notes

In it, one finds:

---


    Issues Addressed

The following CRs have been fixed in this release:

[...]

7056496 cannot receive new filesystem stream: invalid backup stream

---


So it seems Sun/Oracle have already resolved this.

Regards,

Martin
Comment 3 Martin Birgmeier 2012-10-17 20:17:26 UTC
Good news...

I upgraded the FreeBSD head running in the virtual machine v903 to
r241507, with the goal to see whether the problem described has been
fixed in the meantime. (The server, hal, is still at 8.2 release.)

In addition, instead of using iSCSI to make disk{31..36}p4 from hal
appear in v903, I attached them as devices directly using VirtualBox
(all to one and the same virtual SATA controller). Now the test worked,
and I could successfully receive the backup stream. I even made an md5
comparison of all the files in the received file system (> 53000), which
went through without a single difference. Very good!

What was left was testing with the original test setup again, i.e.,
using iSCSI. Unfortunately, this still fails in the same way, so it
seems that something with iscontrol or istgt is broken.

On a hunch, I'd say it is iscontrol, simply because I assume that on the
server side (ports/net/istgt) there is not that much that can break, it
being a userland application, whereas iscontrol is a kernel module
hooking into the SCSI subsystem.

Therefore, this defect should possibly be reclassified and instead of
[zfs] have some other sticker applied.

Regards,

Martin

p.s. I am using istgt also with disk images of MS Windows XP
installations, exporting them to remotely running VirtualBox instances
without any issues - to me a hint that istgt is quite o.k. Of course,
the zfs scenario uses 6 iSCSI targets at the same time, whereas the XP
emulation uses only one, so maybe either istgt or iscontrol have
problems with multiple connections/instances.
Comment 4 Martin Birgmeier 2012-10-17 22:50:53 UTC
One more test with the goal of determining whether iscontrol or istgt is
the culprit...

The test setup is altered as follows: In addition to the 6 partitions
which are being used for the zpool on v903, also v903's OS partition
(GPT, holding /, /usr, and swap) is being exported from hal via istgt
(i.e., as iSCSI targets). Then v903 is started as a VirtualBox client on
a Windows 7 host, where the (now 7) partitions are included via
VirtualBox's iSCSI facility
(http://www.virtualbox.org/manual/ch05.html#idp17868816) as ada0 and
ada1..ada6.

This setup results in a flawless zfs receive of the sent stream, again
checked via md5 of all files. In addition the number (but not md5) of
files in all snapshot directories (.zfs/snapshot/*) were checked.

This is a strong indication that indeed iscontrol is problematic.

Regards,

Martin

p.s. Some observations using this setup: It is a lot faster than running
v903 in a VirtualBox client on hal. I had to do a disk check of v903
because it had crashed, and checking the UFS file systems / and /usr was
in fact blazingly fast, much faster than when its containing partition
was directly connected in hal as VirtualBox host. Also, the subsequent
zfs import test ran with a sustained rate of approximately 23  MB/s,
whereas with v903 on hal as host it ran with only 15 MB/s (see my
previous test), a speed increase of 50%. One should note that in the
setup of this test, there is a 1 Gbps Ethernet between hal as the server
and the Windows host, whereas otherwise v903 runs directly on hal, which
at least in theory should result in much quicker I/O.

So it seems that not only iscontrol needs fixing, but VirtualBox on
FreeBSD hosts could use some optimization.

But then, FreeBSD is the work of volunteers, and at this point of my
tests is certainly the time to thank them for all the great
possibilities this system offers!
Comment 5 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:52 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 6 Mark Linimon freebsd_committer freebsd_triage 2025-01-19 06:30:54 UTC
^Triage: I'm sorry that this PR did not get addressed in a timely fashion.

By now, the version that it was created against is long out of support.
As well, many newer versions of ZFS have been imported.

Please re-open if it is still a problem on a supported version.
Comment 7 Martin Birgmeier 2025-01-19 07:31:55 UTC
It is hopefully fixed by now, however, since that time I do not dare using "-v" with "zfs send" anymore (and I am using zfs send/receive heavily nowadays, for backup purposes). Therefore, I cannot even confirm whether this is really fixed. :-)

-- Martin