Bug 250816 - ZFS cannot import its own export on AWS EC2 12.1 & 12.2-RELEASE
Summary: ZFS cannot import its own export on AWS EC2 12.1 & 12.2-RELEASE
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-02 19:01 UTC by Gunther Schadow
Modified: 2021-04-18 16:27 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Gunther Schadow 2020-11-02 19:01:02 UTC
On a fresh deployment of the most recent official FreeBSD-12.2 EC2 AMI on Amazon. No complicated configurations. Only one added line in rc.conf 

    zfs_enable="YES"

without which zfs wouldn't even work. The summary overview is this:

 1. zpool create .... works and creates the pool shown with zpool list
 2. zpool export ... without error
 3. zpool import ... says that one or more devices are corrupt

Here is a (ba)sh script, you can just run this yourself:

<script>

mkdir zfstc
truncate -s 100M zfstc/0
truncate -s 100M zfstc/1
mkdir zfstd
for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done

zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
zpool list
zpool export testpool
zpool import -d zfstd

for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
rm zfstc/*
truncate -s 100M zfstc/0
truncate -s 100M zfstc/1
for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done

zpool create testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
zpool list
zpool export testpool
zpool import -d zfstd

for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
rm zfstc/*
truncate -s 100M zfstc/0
truncate -s 100M zfstc/1
for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done

zpool create testpool mirror $(for i in zfstd/* ; do readlink $i ; done)
zpool list
zpool export testpool
zpool import -d zfstd

for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
rm -r zfstc zfstd

</script>

You see in it repeated attempts changing the options and zfs device type, none of which makes any difference.

Here is the log on another system where it all worked:

<log>

# mkdir zfstc
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# mkdir zfstd
# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
#
# zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   186K   176M        -         -     1%     0%  1.00x  ONLINE  -
# zpool export testpool
# zpool import -d zfstd
   pool: testpool
     id: 14400958070908437474
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:
        testpool    ONLINE
          raidz1-0  ONLINE
            md10    ONLINE
            md11    ONLINE
#
# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
# rm zfstc/*
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
#
# zpool create testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   156K   176M        -         -     1%     0%  1.00x  ONLINE  -
# zpool export testpool
# zpool import -d zfstd
   pool: testpool
     id: 7399105644867648490
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:
        testpool    ONLINE
          raidz1-0  ONLINE
            md10    ONLINE
            md11    ONLINE
#
# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
# rm zfstc/*
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
#
# zpool create testpool mirror $(for i in zfstd/* ; do readlink $i ; done)
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool    80M  67.5K  79.9M        -         -     1%     0%  1.00x  ONLINE  -
# zpool export testpool
# zpool import -d zfstd
   pool: testpool
     id: 18245765184438368558
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:
        testpool    ONLINE
          mirror-0  ONLINE
            md10    ONLINE
            md11    ONLINE
#
# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
# rm -r zfstc zfstd

</log>

Now here on the new system where it fails:

<log>

[root@geli ~]# mkdir zfstc
[root@geli ~]# truncate -s 100M zfstc/0
[root@geli ~]# truncate -s 100M zfstc/1
[root@geli ~]# mkdir zfstd
[root@geli ~]# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
[root@geli ~]#
[root@geli ~]# zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
[root@geli ~]# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   182K   176M        -         -     1%     0%  1.00x  ONLINE  -
[root@geli ~]# zpool export testpool
[root@geli ~]# zpool import -d zfstd
   pool: testpool
     id: 3796165815934978103
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:
        testpool                 UNAVAIL  insufficient replicas
          raidz1-0               UNAVAIL  insufficient replicas
            7895035226656775877  UNAVAIL  corrupted data
            5600170865066624323  UNAVAIL  corrupted data
[root@geli ~]#
[root@geli ~]# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
[root@geli ~]# rm zfstc/*
[root@geli ~]# truncate -s 100M zfstc/0
[root@geli ~]# truncate -s 100M zfstc/1
[root@geli ~]# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
[root@geli ~]#
[root@geli ~]# zpool create testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
[root@geli ~]# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   146K   176M        -         -     1%     0%  1.00x  ONLINE  -
[root@geli ~]# zpool export testpool
[root@geli ~]# zpool import -d zfstd
   pool: testpool
     id: 17325954959132513026
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:
        testpool                 UNAVAIL  insufficient replicas
          raidz1-0               UNAVAIL  insufficient replicas
            7580076550357571857  UNAVAIL  corrupted data
            9867268050600021997  UNAVAIL  corrupted data
[root@geli ~]#
[root@geli ~]#
[root@geli ~]# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
[root@geli ~]# rm zfstc/*
[root@geli ~]# truncate -s 100M zfstc/0
[root@geli ~]# truncate -s 100M zfstc/1
[root@geli ~]# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
[root@geli ~]#
[root@geli ~]# zpool create testpool mirror $(for i in zfstd/* ; do readlink $i ; done)
[root@geli ~]# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool    80M    73K  79.9M        -         -     3%     0%  1.00x  ONLINE  -
[root@geli ~]# zpool export testpool
[root@geli ~]# zpool import -d zfstd
   pool: testpool
     id: 7703888355221758527
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:
        testpool                  UNAVAIL  insufficient replicas
          mirror-0                UNAVAIL  insufficient replicas
            23134336724506526     UNAVAIL  corrupted data
            16413307577104054419  UNAVAIL  corrupted data
[root@geli ~]#
[root@geli ~]# for i in zfstd/* ; do mdconfig -d -u $(readlink $i) && rm $i ; done
[root@geli ~]# rm -r zfstc zfstd

<log>

If you are wondering if there is anything wrong with the md vnode device, I can assure you that there is not, since I produced the MD5 hash on the underlying chunk files and through the /dev/md?? device with the same result.

If you are wondering whether it is the create or export that is faulty or the import, I have proof that it is the import that is faulty. Why? Because I discovered this problem when I moved such files from the other FreeBSD system to the new one, and failed on the import like that. First thing was run md5 hash over the files to see if they were corrupted. But no. And same files with same checksum could be imported again on the old system.
Comment 1 Gunther Schadow 2020-11-02 19:03:33 UTC
Just FYI, not delivering any real news, a forum post where I first reported this and I show proof with the MD5 hashes. https://forums.freebsd.org/threads/zpool-import-unavail-corrupted-data-after-moving-from-an-11-2-to-a-12-2-how-is-this-possible.77560/
Comment 2 Gunther Schadow 2020-11-03 00:11:31 UTC
Testing on other FreeBSD EC2 AMIs

User data to facilitate the test with less manual work:
---------------------------------------------
#!/bin/sh
echo >>/etc/rc.conf
echo 'zfs_enable="YES"' >>/etc/rc.,conf
---------------------------------------------

Here is my test protocol login as ec2-user then cut and paste:
---------------------------------------------
su
sh
uname -a
mkdir zfstc
truncate -s 100M zfstc/0
truncate -s 100M zfstc/1
mkdir zfstd
for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done

zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
zpool list
zpool export testpool
zpool import -d zfstd

shutdown -p now
---------------------------------------------

Now the test result:

---------------------------------------------
# uname -a
FreeBSD freebsd 11.4-RELEASE-p3 FreeBSD 11.4-RELEASE-p3 #0: Tue Sep  1 08:22:33 UTC 2020     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
# mkdir zfstc
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# mkdir zfstd
# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
#
# zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   186K   176M        -         -     1%     0%  1.00x  ONLINE  -
# zpool export testpool
# zpool import -d zfstd
   pool: testpool
     id: 488462546239790676
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        testpool    ONLINE
          raidz1-0  ONLINE
            md0     ONLINE
            md1     ONLINE
#
# shutdown -p now
Shutdown NOW!
---------------------------------------------

Next, is it an issue with FreeBSD-12 in particular? First I will reproduce the problem on a brand new 12.2 again, this time following the exact same protocol that succeeded with 11.4

---------------------------------------------
ec2-user@freebsd:~ $ su
root@freebsd:/home/ec2-user # sh
root@freebsd:/home/ec2-user # uname -a
FreeBSD freebsd 12.2-RELEASE FreeBSD 12.2-RELEASE r366954 GENERIC  amd64
root@freebsd:/home/ec2-user # mkdir zfstc
root@freebsd:/home/ec2-user # truncate -s 100M zfstc/0
root@freebsd:/home/ec2-user # truncate -s 100M zfstc/1
root@freebsd:/home/ec2-user # mkdir zfstd
root@freebsd:/home/ec2-user # for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
root@freebsd:/home/ec2-user #
root@freebsd:/home/ec2-user # zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
root@freebsd:/home/ec2-user # zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   167K   176M        -         -     1%     0%  1.00x  ONLINE  -
root@freebsd:/home/ec2-user # zpool export testpool
root@freebsd:/home/ec2-user # zpool import -d zfstd
   pool: testpool
     id: 7726589044207947012
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:

        testpool                  UNAVAIL  insufficient replicas
          raidz1-0                UNAVAIL  insufficient replicas
            17541269597952794361  UNAVAIL  corrupted data
            14116473632156932352  UNAVAIL  corrupted data
root@freebsd:/home/ec2-user #
root@freebsd:/home/ec2-user # shutdown -p now
Shutdown NOW!
---------------------------------------------

Let's do the 12.2/ARM, perhaps something was screwed up only on the 12.2 amd64 build

---------------------------------------------
ec2-user@freebsd:~ $ su
root@freebsd:/home/ec2-user # sh
root@freebsd:/home/ec2-user # uname -a
FreeBSD freebsd 12.2-RELEASE FreeBSD 12.2-RELEASE r366954 GENERIC  arm64
root@freebsd:/home/ec2-user # mkdir zfstc
root@freebsd:/home/ec2-user # truncate -s 100M zfstc/0
root@freebsd:/home/ec2-user # truncate -s 100M zfstc/1
root@freebsd:/home/ec2-user # mkdir zfstd
root@freebsd:/home/ec2-user # for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
root@freebsd:/home/ec2-user #
root@freebsd:/home/ec2-user # zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
root@freebsd:/home/ec2-user # zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   167K   176M        -         -     1%     0%  1.00x  ONLINE  -
root@freebsd:/home/ec2-user # zpool export testpool
root@freebsd:/home/ec2-user # zpool import -d zfstd
   pool: testpool
     id: 4979253895326493489
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:

        testpool                  UNAVAIL  insufficient replicas
          raidz1-0                UNAVAIL  insufficient replicas
            3861186839400824362   UNAVAIL  corrupted data
            14304100812262636401  UNAVAIL  corrupted data
root@freebsd:/home/ec2-user #
root@freebsd:/home/ec2-user # shutdown -p now
Shutdown NOW!
---------------------------------------------

So it's obviously 12.2 specifically. Unfortunately there isn't an official 12.1 that I could try, but a community 12.1-RELEASE AMI

---------------------------------------------
$ su
root@freebsd:/usr/home/ec2-user # sh
# uname -a
FreeBSD freebsd 12.1-RELEASE FreeBSD 12.1-RELEASE r354233 GENERIC  arm64
# mkdir zfstc
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# mkdir zfstd
# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
#
# zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   165K   176M        -         -     1%     0%  1.00x  ONLINE  -
# zpool export testpool
# zpool import -d zfstd
   pool: testpool
     id: 13451961690108720630
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:

        testpool                  UNAVAIL  insufficient replicas
          raidz1-0                UNAVAIL  insufficient replicas
            7947115053877644123   UNAVAIL  corrupted data
            12584889417144834990  UNAVAIL  corrupted data
#
# shutdown -p now
Shutdown NOW!
---------------------------------------------

Now this is getting interesting! Shall we try 12.0-RELEASE now? Yes! Here is one that says "12.0-RELEASE amd64 ZFS", in this case I won't even add the user data to set zfs_enable=YES in rc.conf. 

---------------------------------------------
$ su
root@freebsd:/usr/home/ec2-user # sh
# uname -a
FreeBSD freebsd 12.0-RELEASE-p13 FreeBSD 12.0-RELEASE-p13 GENERIC  amd64
# mkdir zfstc
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# mkdir zfstd
# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
#
# zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   624K   175M        -         -     1%     0%  1.00x  ONLINE  -
zroot     9.50G  2.01G  7.49G        -         -     2%    21%  1.00x  ONLINE  -
# zpool export testpool
# zpool import -d zfstd
   pool: testpool
     id: 15013624344781576480
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        testpool    ONLINE
          raidz1-0  ONLINE
            md0     ONLINE
            md1     ONLINE
#
# shutdown -p now
Shutdown NOW!
---------------------------------------------

Aaaaand... it's a winner! Maybe I should look for the word ZFS in the title? What is the significance of whether or not ZFS is mentioned?
Comment 3 Gunther Schadow 2020-11-03 00:41:13 UTC
12.0-RC3 amd64 works!
-------------------------------------------------
$ su
root@freebsd:/usr/home/ec2-user # sh
# uname -a
FreeBSD freebsd 12.0-RC3 FreeBSD 12.0-RC3 r341271 GENERIC  amd64
# mkdir zfstc
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# mkdir zfstd
# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
#
# zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz $(for i in zfstd/* ; do readlink $i ; done)
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   174K   176M        -         -     1%     0%  1.00x  ONLINE  -
# zpool export testpool
# zpool import -d zfstd
   pool: testpool
     id: 3192103039539057096
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        testpool    ONLINE
          raidz1-0  ONLINE
            md0     ONLINE
            md1     ONLINE
#
# shutdown -p now
Shutdown NOW!
-----------------------------------------

So now, I think, I showed you how this might be an issue introduced only through 12.1 and .2 and possibly it is a matter how this EC2 AMI is put together.
Comment 4 Andriy Gapon freebsd_committer 2020-11-03 08:22:12 UTC
Try to see if zdb -G -e -p ... reports anything interesting.
Comment 5 Gunther Schadow 2020-11-03 13:32:53 UTC
root@geli:/home/ec2-user # zdb -G -e -p zfstd -u testpool

Uberblock:
        magic = 0000000000bab10c
        version = 5000
        txg = 13
        guid_sum = 13184861157982554310
        timestamp = 1604409782 UTC = Tue Nov  3 13:23:02 2020
        mmp_magic = 00000000a11cea11
        mmp_delay = 0
        mmp_valid = 0
        checkpoint_txg = 0


ZFS_DBGMSG(zdb):
spa_import: importing testpool
spa_load(testpool, config trusted): LOADING
disk vdev '/usr/home/ec2-user/zfstd/0': best uberblock found for spa testpool. txg 13
spa_load(testpool, config untrusted): using uberblock with txg=13
vdev_copy_path: vdev 16509719173445145761: path changed from '/dev/md0' to '/usr/home/ec2-user/zfstd/0'
vdev_copy_path: vdev 12908006057264574797: path changed from '/dev/md1' to '/usr/home/ec2-user/zfstd/1'
spa_load(testpool, config trusted): LOADED
spa=testpool async request task=32
Comment 6 Colin Percival freebsd_committer 2020-11-03 18:13:47 UTC
Is there any reason to think this is specific to EC2?  I'm not seeing anything which gives me that impression but maybe I'm missing something?
Comment 7 Andriy Gapon freebsd_committer 2020-11-03 21:46:55 UTC
(In reply to Andriy Gapon from comment #4)
So, it appears that zdb can work with the pool but the kernel driver cannot.
Interesting...

Could you please run this dtrace oneliner:
   dtrace -qn 'zfs-dbgmsg{printf("%s\n", stringof(arg0))}'
and try to import the pool at the same time?
Comment 8 Gunther Schadow 2020-11-05 02:55:48 UTC
Thank you for your initial interest. 

Colin, I don't know if other platforms are affected. For sure it's not hardware dependent (same for amd and arm). I hope some people would just try to cut and paste my little test script, it couldn't be easier. Please, anyone looking at this: try and post your results. 12.2 and 12.1 are suspect. But it could be the specific kernel built used for EC2 AMI. Unfortunately these days the kernel sources are no longer part of the normal system setup, and I admit have forgotten a little bit the grunt of running config and the corners of the sys/ tree. (I used to, after all, I have written one device driver for FreeBSD-2.x).

Andriy, here is the dtrace result:

root@geli:/home/ec2-user # dtrace -qn 'zfs-dbgmsg{printf("%s\n", stringof(arg0))}' -c 'zpool import -d zfstd'
dtrace: buffer size lowered to 1m
   pool: testpool
     id: 4731456272891350032
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:

        testpool                  UNAVAIL  insufficient replicas
          raidz1-0                UNAVAIL  insufficient replicas
            16509719173445145761  UNAVAIL  corrupted data
            12908006057264574797  UNAVAIL  corrupted data
spa_tryimport: importing testpool
spa_load($import, config trusted): LOADING
spa_load($import, config untrusted): vdev tree has 1 missing top-level vdevs.
spa_load($import, config untrusted): current settings allow for maximum 0 missing top-level vdevs at this stage.
spa_load($import, config untrusted): FAILED: unable to open vdev tree [error=22]
  vdev 0: root, guid: 4731456272891350032, path: N/A, can't open
    vdev 0: raidz, guid: 15929167801800586952, path: N/A, can't open
      vdev 0: disk, guid: 16509719173445145761, path: /usr/home/ec2-user/zfstd/0, can't open
      vdev 1: disk, guid: 12908006057264574797, path: /usr/home/ec2-user/zfstd/1, can't open
spa_load($import, config untrusted): UNLOADING

Hope that helps.
Comment 9 Gunther Schadow 2020-11-05 02:58:07 UTC
... oh, and before you say "can't open" means it's not there, here it is:

root@geli:/home/ec2-user # ls -l zfstd
total 0
lrwxr-xr-x  1 root  ec2-user  8 Nov  3 13:22 0 -> /dev/md0
lrwxr-xr-x  1 root  ec2-user  8 Nov  3 13:22 1 -> /dev/md1
root@geli:/home/ec2-user # mdconfig -l
md0 md1
root@geli:/home/ec2-user # md5 zfstd/*
MD5 (zfstd/0) = eb86f41b44e70726a1b108bb59140e3a
MD5 (zfstd/1) = fd2de07f97a01968856e8613104937ca

no problem reading these devices.
Comment 10 Andriy Gapon freebsd_committer 2020-11-05 09:05:23 UTC
(In reply to Gunther Schadow from comment #9)
Could you please use zdb to dump the pool configuration?
Something like zdb -e -p zfstd -CC testpool.
Comment 11 Andriy Gapon freebsd_committer 2020-11-05 09:16:14 UTC
I think that the problem could be with base r348901.
You are using an unusual configuration were your vdevs are actually disks (md devices are handled by vdev_geom) but they are configured via symbolic links outside of /dev.

zdb is not affected because it accesses everything from userland as files.

CC mav.
Comment 12 Alexander Motin freebsd_committer 2020-11-05 14:51:40 UTC
Andriy guess seems probably to me.  The mentioned code needs another look from this perspective.

What I can't understand though is how the symlink paths are getting into the pool metadata if in your pool creation command I see `readlink $i`?  As I understand it supposed to pass "/dev/mdX" to `zpool create`.  And Pawel's patch would block anything not starting with "/dev/".  Is theer chance you've created some pool without readlink before 11.2?
Comment 13 Gunther Schadow 2020-11-05 20:34:20 UTC
I will give you the dump in a while.

Please note that it is super easy for you to reproduce. I show the entire test in one script. It doesn't matter how I created the pool. This test pool is created in the same session of the test. 

I don't think the options have any influence.

There is an asymmetry with import -d <dir> vs. create <enumerated /dev/ nodes> ...

Your hunch seems very reasonable. Please try yourself also.
Comment 14 Gunther Schadow 2020-11-05 22:51:31 UTC
Here is the zdb output, running the testcase from the start:

------------------------------------------------------------
# mkdir zfstc
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# mkdir zfstd
# for i in zfstc/* ; do ln -s /dev/$(mdconfig -a -t vnode -f $i) zfstd/$(basename $i) ; done
#
# zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 te
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   180K   176M        -         -     1%     0%  1.00x  ONLINE  -
------------------------------------------------------------

Now I ran the zdb with the pool just created, before exporting:

------------------------------------------------------------
# zdb -e -p zfstd -CC testpool

Configuration for import:
        vdev_children: 1
        version: 5000
        pool_guid: 1836577300510068694
        name: 'testpool'
        state: 0
        hostid: 2817290760
        hostname: 'geli'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 1836577300510068694
            children[0]:
                type: 'raidz'
                id: 0
                guid: 13558473444627327763
                nparity: 1
                metaslab_array: 68
                metaslab_shift: 24
                ashift: 9
                asize: 200278016
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 10226269325811407084
                    whole_disk: 1
                    create_txg: 4
                    path: '/usr/home/schadow/zfstd/0'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 4864983578256370556
                    whole_disk: 1
                    create_txg: 4
                    path: '/usr/home/schadow/zfstd/1'
        load-policy:
            load-request-txg: 18446744073709551615
            load-rewind-policy: 2
zdb: can't open 'testpool': File exists
------------------------------------------------------------------

Now to export and try again

------------------------------------------------------------------
# zpool export testpool
# zdb -e -p zfstd -CC testpool

Configuration for import:
        vdev_children: 1
        version: 5000
        pool_guid: 1836577300510068694
        name: 'testpool'
        state: 1
        hostid: 2817290760
        hostname: 'geli'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 1836577300510068694
            children[0]:
                type: 'raidz'
                id: 0
                guid: 13558473444627327763
                nparity: 1
                metaslab_array: 68
                metaslab_shift: 24
                ashift: 9
                asize: 200278016
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 10226269325811407084
                    whole_disk: 1
                    create_txg: 4
                    path: '/usr/home/schadow/zfstd/0'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 4864983578256370556
                    whole_disk: 1
                    create_txg: 4
                    path: '/usr/home/schadow/zfstd/1'
        load-policy:
            load-request-txg: 18446744073709551615
            load-rewind-policy: 2

MOS Configuration:
        version: 5000
        name: 'testpool'
        state: 1
        txg: 44
        pool_guid: 1836577300510068694
        hostid: 2817290760
        hostname: 'geli'
        com.delphix:has_per_vdev_zaps
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 1836577300510068694
            create_txg: 4
            children[0]:
                type: 'raidz'
                id: 0
                guid: 13558473444627327763
                nparity: 1
                metaslab_array: 68
                metaslab_shift: 24
                ashift: 9
                asize: 200278016
                is_log: 0
                create_txg: 4
                com.delphix:vdev_zap_top: 65
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 10226269325811407084
                    path: '/dev/md0'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 66
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 4864983578256370556
                    path: '/dev/md1'
                    whole_disk: 1
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 67
        features_for_read:
            com.delphix:embedded_data
            com.delphix:hole_birth
------------------------------------------------------------------------

Finally again the test that indeed the import problem still exists:

------------------------------------------------------------------------
# zpool import -d zfstd
   pool: testpool
     id: 1836577300510068694
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:

        testpool                  UNAVAIL  insufficient replicas
          raidz1-0                UNAVAIL  insufficient replicas
            10226269325811407084  UNAVAIL  corrupted data
            4864983578256370556   UNAVAIL  corrupted data
# zpool import -d zfstd testpool
cannot import 'testpool': invalid vdev configuration
------------------------------------------------------------------------

And now to test your hypothesis that we have to have /dev/md* nodes, not symlinks

But I cannot even find an import option where I could identify individual vnodes as if the dir option is all we have?

     zpool import [-d dir | -c cachefile] [-D]
     zpool import [-o mntopts] [-o property=value] ...
           [--rewind-to-checkpoint] [-d dir | -c cachefile] [-D] [-f] [-m]
           [-N] [-R root] [-F [-n]] -a
     zpool import [-o mntopts] [-o property=value] ...
           [--rewind-to-checkpoint] [-d dir | -c cachefile] [-D] [-f] [-m]
           [-N] [-R root] [-t] [-F [-n]] pool | id [newpool]

But OK, I get it now, the -d option is to point to an alternative /dev/ directory, and it is not required:

------------------------------------------------------------------------
# zpool import
   pool: testpool
     id: 1836577300510068694
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        testpool    ONLINE
          raidz1-0  ONLINE
            md0     ONLINE
            md1     ONLINE
# zpool import testpool
# zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   176M   231K   176M        -         -     2%     0%  1.00x  ONLINE  -
-------------------------------------------------------------------------

It actually worked!

And now instead of symlinks, let me build this zfstd directory with real device nodes to test

-------------------------------------------------------------------------
# zpool export testpool
# mkdir zfstd2
# ls -l zfstd
total 0
lrwxr-xr-x  1 root  schadow  8 Nov  5 22:02 0 -> /dev/md0
lrwxr-xr-x  1 root  schadow  8 Nov  5 22:02 1 -> /dev/md1
root@geli:/home/schadow/zfstd2 # (cd /dev ; tar cf - md[01]) |(cd cd zfstd2 ; tar xvf -)
x md0
x md1
# ls -l zfstd2
total 0
crw-r-----  1 root  operator  0x6b Nov  5 22:02 md0
crw-r-----  1 root  operator  0x6c Nov  5 22:02 md1
# ls -l /dev/md[01]
crw-r-----  1 root  operator  0x6b Nov  5 22:02 /dev/md0
crw-r-----  1 root  operator  0x6c Nov  5 22:02 /dev/md1
# zpool import -d zfstd2
# zpool list
no pools available
# md5 zfstd*/*
MD5 (zfstd/0) = 0d48de20f5717fe54be0bdef93eb8358
MD5 (zfstd/1) = 2c4e7de0b3359bd75f17b49d3dcab394
md5: zfstd2/md0: Operation not supported
md5: zfstd2/md1: Operation not supported
----------------------------------------------------------------------------

So, I don't know what the purpose of the -d is if the symlinks don't work, because with the new devfs way of creating device nodes no longer with mknod, or copyable with tar, I cannot confine these nodes to a device.

Are you telling me I don't even need to make them vnode devices? That I could just use files?

----------------------------------------------------------------------------
# zpool list
no pools available
# mdconfig -d -u md0
# mdconfig -d -u md1
# mdconfig -l
# zpool import -d zfstc
   pool: testpool
     id: 1836577300510068694
  state: UNAVAIL
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-5E
 config:

        testpool                  UNAVAIL  insufficient replicas
          raidz1-0                UNAVAIL  insufficient replicas
            10226269325811407084  UNAVAIL  corrupted data
            4864983578256370556   UNAVAIL  corrupted data
# ls -l zfstc
total 204864
-rw-r--r--  1 root  schadow  104857600 Nov  5 22:15 0
-rw-r--r--  1 root  schadow  104857600 Nov  5 22:15 1
----------------------------------------------------------------------------

So you are telling me it can import directly from files, but that doesn't work. OK, OK, I get it now, you want me to also create the pool without these md vnodes ...

----------------------------------------------------------------------------
# rm -rf zfst*
# mkdir zfstc
# truncate -s 100M zfstc/0
# truncate -s 100M zfstc/1
# zpool create -o feature@embedded_data=enabled -o feature@lz4_compress=enabled -O dedup=on -O compression=lz4 testpool raidz zfstc/*
cannot open 'zfstc/0': no such GEOM provider
must be a full path or shorthand device name
----------------------------------------------------------

see, that's what I thought, I had to use these vnode md devices because zpool create does not operate on filed directly.
Comment 15 Andriy Gapon freebsd_committer 2020-11-06 13:37:40 UTC
Have you tried passing full file paths to zpool create?
Comment 16 Gunther Schadow 2020-11-06 17:47:19 UTC
Why whaddoyouknow, this way it works:

---------------------------------------------------------------------------------------------
root@geli:/home/schadow # zpool create testpool /home/schadow/zfstc/0 /home/schadow/zfstc/1
root@geli:/home/schadow # zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   160M  83.5K   160M        -         -     1%     0%  1.00x  ONLINE  -
root@geli:/home/schadow # zpool export testpool
root@geli:/home/schadow # zpool import -d zfstc
   pool: testpool
     id: 16913270329707857467
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        testpool                     ONLINE
          /usr/home/schadow/zfstc/0  ONLINE
          /usr/home/schadow/zfstc/1  ONLINE
root@geli:/home/schadow # zpool list
no pools available
root@geli:/home/schadow # zpool import -d zfstc testpool
root@geli:/home/schadow # zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
testpool   160M   120K   160M        -         -     3%     0%  1.00x  ONLINE  -
-------------------------------------------------------------------------------------------

So, while this works as a workaround, I still think it is a bug. There is too much reliance on these file names and paths and whether or not its a symlink, etc, that should just not matter. And there is also that asymmetry, that I have to specify full paths for create, but for import only a directory.

It is also a bug because it used to work and then suddenly it stopped working (for no real benefit AFAICS)
Comment 17 Andriy Gapon freebsd_committer 2020-11-06 22:16:44 UTC
(In reply to Gunther Schadow from comment #16)
It's also a bug that affects tiny minority of users who do very unusual things with ZFS.  That's why it hasn't been hit for so long since the original commit.
So, please set your expectations accordingly.  I mean in terms of changing your workflow vs FreeBSD changing its code (not earlier than future releases anyway).