Bug 219972 - Unable to zpool export following some zfs recv
Summary: Unable to zpool export following some zfs recv
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-13 23:56 UTC by pfribeiro
Modified: 2023-09-25 08:25 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description pfribeiro 2017-06-13 23:56:07 UTC
root@sunflower:~ # uname -a
FreeBSD sunflower 11.0-RELEASE-p9 FreeBSD 11.0-RELEASE-p9 #0: Tue Apr 11 08:48:40 UTC 2017     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

I must admit this is quite some odd behaviour I'm getting regarding zfs pools.

I'm trying to synchronize snapshots from another FreeBSD system (FreeNAS in fact) to a pure FreeBSD 11-RELEASE-p9 system, with a USB 3.0-attached Seagate 8TB drive. Quite often, I'm unable to do 'zpool export master' after a 'zpool import master'. This seems to happen more consistently after having done a 'zfs recv'. Something alone the lines of:

root@sunflower:~ # zpool import master
root@sunflower:~ # zpool export master
root@sunflower:~ # zpool import master
root@sunflower:~ # zpool export master
root@sunflower:~ # zpool import master
root@sunflower:~ # zpool export master
cannot export 'master': pool is busy
root@sunflower:~ # zpool export master
cannot export 'master': pool is busy

I note that there are no processes using the pool. I have also tried using 'zpool import -N master', but the result is often the same. I have also tried 'zpool export -f master' but to no avail.

I don't see any erroneous messages in dmesg, neither in /var/log/messages. I initially was using geli with an AES encrypted device being used to create the zfs pool, but I have since then tested without crypto, and the problem persists.

Following this behaviour I am then unable to export the pool till I reboot the system.

I have opened a GitHub issue with the original guide I followed at: https://github.com/hughobrien/zfs-remote-mirror/issues/7, and subsequently posted on the FreeBSD forum: https://forums.freebsd.org/threads/61291/. But quite frankly, I am almost giving up on this storage idea, because I cannot get to the bottom of this issue. There are no obvious logged errors that I can poke at. I would appreciate some guidance to confirm this erroneous behaviour further.

I see that there are some vaguely related bugs like #201735, but in my case I am not accessing any particular folder or file at all.

Thanks
Comment 1 pfribeiro 2017-06-14 14:23:52 UTC
I've done some further testing on this by isolating the issue to a 'zfs send' and 'zfs recv' on the same machine between the external USB drive (on /dev/da0) and a USB stick (on /dev/da1).

1. I made sure that there were no existing pools on both devices, by destroying the pools and doing 'zpool labelclear -f /dev/da0' and 'zpool labelclear -f /dev/da1'.

root@sunflower:~ # zpool labelclear -f /dev/da0
root@sunflower:~ # zpool labelclear -f /dev/da1

2. Created two pools:

root@sunflower:~ # zpool create -O atime=off -O compression=lz4 master /dev/da0
root@sunflower:~ # zpool create -O atime=off -O compression=lz4 slave /dev/da1
root@sunflower:~ # zpool list
NAME     SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
master  7.25T   360K  7.25T         -     0%     0%  1.00x  ONLINE  -
slave   7.25G  78.5K  7.25G         -     0%     0%  1.00x  ONLINE  -
zroot   51.5G  1.11G  50.4G         -     0%     2%  1.00x  ONLINE  -

3. Created some test file on the master pool:

root@sunflower:~ # touch /master/test
root@sunflower:~ # echo "Hello" > /master/test

4. Created a snapshot on the master pool:

root@sunflower:~ # zfs snapshot -r "master@1"

5. Exported the 'slave' pool and imported unmounted (not sure this matters really):

root@sunflower:~ # zpool export slave
root@sunflower:~ # zpool import -N slave

6. Sent the snapshot across to the slave pool:

root@sunflower:~ # zfs send -Rev "master@1" | zfs recv -Fu slave
full send of master@1 estimated size is 10K
total estimated size is 10K
TIME        SENT   SNAPSHOT
root@sunflower:~ # zpool list
NAME     SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
master  7.25T   588K  7.25T         -     0%     0%  1.00x  ONLINE  -
slave   7.25G   376K  7.25G         -     0%     0%  1.00x  ONLINE  -
zroot   51.5G  1.11G  50.4G         -     0%     2%  1.00x  ONLINE  -

7. Tried to export and import the slave a few times:

root@sunflower:~ # zpool export slave
root@sunflower:~ # zpool import slave
root@sunflower:~ # zpool export slave
cannot export 'slave': pool is busy

...and bang the problem is back. This seems a little crazy.

I would appreciate if someone could try and help me reproduce this problem on their system, and perhaps point me to some further debugging steps. I'm out of ideas as to what else I could/should test at this point.

Thanks
Comment 2 pfribeiro 2017-06-18 21:59:29 UTC
I would just like to add that this behaviour is reproducible using a device as a file, rather than just usb devices, eg: mdconfig -a -t vnode blob00, and then sending a snapshot from md0 to md1 using the 'zfs send' and 'zfs recv' flags as in comment #1.

An accompanying thread that I opened earlier about this behaviour is available at: https://forums.freebsd.org/threads/61291/.
Comment 3 pfribeiro 2017-06-22 00:33:40 UTC
It seems the ability to reproduce this bug relies on there being more than one CPU core, relatively unused, on the same system. In the case of my original system, if I run 'dd if=/dev/zero of=/dev/null' while I try the steps mentioned in comment #1, then the bug is not reproducible. However, as soon as this command is killed, the bug is reproducible.

On the forum thread linked to in comment #2, another user tried to reproduce the problem, but without success. I then configured two FreeBSD VMs on ESXi (host is a Xeon-D 1518, 4 core, with HyperThreading enabled), one running the VMDK image provided on the official FreeBSD download page, and another installed from the ISO image (with ZFS as root filesystem) to better mirror my installation on my original system, an Intel NUC5CPYH. Initially, I also could not reproduce the bug in the VMs, no matter how many times I tried.

However, having observed the behaviour on the NUC installation, I then proceeded to change the CPU affinity on ESXi, so that the VM is allocated logical cores 0,2, and has minimum 1000MHz reserved. By running the import/export of 'slave' multiple times (after the respective zfs send/recv), I was eventually able to trigger this on the 56th run of import/export.

Regarding the NUC installation, I can see that killing 'dd if=/dev/zero of=/dev/null' (ie, making the other core widely available) is only relevant for the import/export of 'slave' after the 'zfs send | recv' has taken place, which suggests that there is a race-condition of sorts in the zpool export ioctl code (which somehow relies on a previous recv).

I would be happy to provide more diagnostics, however I would need further guidance from you, as I am not very familiar with the synchronization primitives of FreeBSD. I believe this bug will be hard to track down.

I would also like to add that I have successfully (following the above caveats) reproduced this bug under more than one platform, with the following versions, all on amd64:
 
11.0-RELEASE-p1 #0 r306420
11.0-RELEASE-p9 #0: Tue Apr 11 08:48:40 UTC 2017
11.1-BETA2 #0 r320072: Sun Jun 18 18:45:14 BST 2017 (I compiled my own kernel with debugging on)

Finally, for completeness my rudimentary test bash script is available at: https://pastebin.com/YcKSU1LA. You're welcome to check the related forum post from comment #2 as well.
Comment 4 Andriy Gapon freebsd_committer freebsd_triage 2017-07-17 06:31:11 UTC
You could try the following DTrace one-liner to try to track down where the error originates.

# dtrace -n 'sdt:::set-error /arg0 == EBUSY/ { printf("func %s", probefunc); stack(); }'
Comment 5 Matthias Pfaller 2017-09-11 17:10:41 UTC
The same just happened to me. dtrace:
[root@zork ~]# dtrace -n 'sdt:::set-error /arg0 == EBUSY/ { printf("func %s", probefunc); stack(); }'
dtrace: description 'sdt:::set-error ' matched 1 probe
CPU     ID                    FUNCTION:NAME
  0  63932                   none:set-error func none
              zfs.ko`zfs_ioc_pool_export+0x30
              zfs.ko`zfsdev_ioctl+0x6d6
              kernel`devfs_ioctl_f+0x128
              kernel`kern_ioctl+0x255
              kernel`sys_ioctl+0x16f
              kernel`amd64_syscall+0x6c4
              kernel`0xffffffff80ec392b

[root@zork ~]# zpool list
NAME     SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
backup  1.81T   638G  1.19T         -    15%    34%  1.04x  ONLINE  /backup
data    2.72T  2.29T   438G         -    28%    84%  1.00x  ONLINE  -
root     111G  51.0G  60.0G         -    48%    45%  1.00x  ONLINE  -
[root@zork ~]# zdb backup
zdb: can't open 'backup': No such file or directory

ZFS_DBGMSG(zdb):

Any additional information I could provide? This will happen again on next monday...
Comment 6 Matthias Pfaller 2017-09-11 17:12:19 UTC
I forgot:
[root@zork ~]# zpool list -v backup
NAME               SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
backup            1.81T   638G  1.19T         -    15%    34%  1.04x  ONLINE  /backup
  gpt/backup.eli  1.81T   638G  1.19T         -    15%    34%
Comment 7 Matthias Pfaller 2017-09-23 12:35:14 UTC
Additional information: I'm sending zvols. @pfribeiro: Do you have zvols in your recv as well?
Comment 8 pfribeiro 2017-11-01 15:46:41 UTC
(In reply to Matthias Pfaller from comment #7)

Hi Matthias,

Sorry for having abandoned this thread for a while.. I haven't had much time to come back to examining this problem in more detail. I did not have any zvols on the disk, just a regular dataset. Apologies if I'm mistaken with the terminology of ZFS.

I also haven't had time to retest on recent releases but I suppose the behaviour is still the same. Unless someone else is willing to dig into this, I don't think I will be able to offer much help, even after examining the source code and the call trace at the time. Concurrency issues are tricky.
Comment 9 dr_sweety_1337 2018-03-07 06:04:10 UTC
I ran into this bug as well with my backup script (which uses snapshots and import/export). My workaround: delete /boot/zfs/zpool.cache (make a backup first) and restart your system. Afterwards all pools (except for zroot) will have to be reimported but at least the locked pool is gone.
Comment 10 Matthias Pfaller 2018-03-07 07:07:31 UTC
(In reply to dr_sweety_1337 from comment #9)
Hmm. For me after a "geli detach -f ..." and a reboot did the trick. No messing around with /boot/zfs/zpool.cache. Without the "geli detach -f ..." the reboot would hang. With the "geli detach -f ..." i can at least skip the power cycle.
Comment 11 pfribeiro 2018-03-30 21:31:23 UTC
(In reply to Matthias Pfaller from comment #10)

I just installed a fresh 11.1-RELEASE (r321309) and can see that the issue I reported is still there. I don't seem to recall having had a problem with reboot getting stuck. It's just that rebooting is necessary to get the pool unstuck. Following the next boot it is in some kind of dormant state, (UNAVAIL?) but can be exported successfully.

I would like to add that in my `dtrace` output the call trace is exactly like that of Matthias.

I've tried playing with cpuset to change the cores available to the scheduler, as well as on a per process basis, but couldn't work around the problem with zfs export. For the lack of a better workaround, I've added 'dd if=/dev/zero of=dev/null' around my usage of zpool export as follows:

dd if=/dev/zero of=/dev/null bs=1024k  & PIDDD=$!
zpool export -f backup
kill -9 $PIDDD

First I spawn a concurrent dd, and then once the export terminates I kill it. In this way I don't seem to get a busy pool even when doing zfs import/export many, many times after the zfs recv. Perhaps this can give someone a hint of where the problem may lie. Would be nice to get a bug fix, though :)
Comment 12 Patrik Hildingsson 2018-04-03 18:18:18 UTC
I'm running FreeBSD 11.1-RELEASE-p8 and I am also experiencing the very same issue. I too zfs send|recv zvols on the same machine between two pools.
Comment 13 Patrik Hildingsson 2019-08-13 17:41:12 UTC
(In reply to Patrik Hildingsson from comment #12)
Issue still persists in 12.0-RELEASE-p8
Comment 14 Allan Jude freebsd_committer freebsd_triage 2019-08-13 17:46:02 UTC
(In reply to Patrik Hildingsson from comment #13)
What other subsystems are involved here?

I see mention of USB and GELI. If we can get a lowest-common-denominator to recreate the issue, that will help.

Most often the reason for EBUSY on export is a mounted filesystem, or a zvol that is opened by an application.

What may be causing the issue here is if there is a zvol that has a partition table, it may be open by GEOM.

Setting 'volmode' to 'dev' instead of the default (geom), may solve the issue in that case.
Comment 15 Matthias Pfaller 2019-08-14 09:06:47 UTC
(In reply to Allan Jude from comment #14)
In my case its my backup disk. There is a zvol on the disk, but volmode is already set to dev.
Import is done as follows:

  geli attach -p -k /tmp/backup.key /dev/gpt/backup
  zpool import -R /backup backup || exit 1

Then some "zfs send / zfs receive" stuff is done. 
Export is done as follows:

  zpool export backup
  geli detach /dev/gpt/backup.eli

When the machine is loaded, the export will fail.
There are two ways to get rid of the pool
- reboot (will hang most of the time and require a hard reset)
or
- geli detach -f /dev/gpt/backup.eli
After detaching I usually could export the pool :-/ (iirc)
Comment 16 Patrik Hildingsson 2019-08-16 16:05:15 UTC
(In reply to Allan Jude from comment #14)
After our discussion about the error message
+g_dev_taste: make_dev_p() failed (gp->name=zvol/backup5/zroot/BHYVE/VM/zapp/disk0@zfs-auto-snap_daily-2019-08-12-00h01p1, error=63)

I followed your lead and set
1.
/boot/loader.conf
vfs.zfs.vol.mode=2

2.
zfs set volmode=geom zroot/swap

and rebooted. 

I have not had an issue with exporting the pool so far after 6 import/exports of the USB-connected single disk vdev pool. I never used geli to encrypt the data on the USB-connected disk.

I will keep monitoring though and update this thread.
Comment 17 Patrik Hildingsson 2019-08-20 16:19:34 UTC
(In reply to Patrik Hildingsson from comment #16)
Having monitored the systems which were affected I haven't seen a single failure of zpool export after having made the modifications mentioned in my last post.