Bug 262189 - ZFS volume not showing up in /dev/zvol when 1 CPU
Summary: ZFS volume not showing up in /dev/zvol when 1 CPU
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-02-25 10:02 UTC by Janis
Modified: 2023-04-29 08:01 UTC (History)
3 users (show)

See Also:


Attachments
list of created ZVOLS for seq 1 1000 (4.92 KB, text/plain)
2022-02-26 12:23 UTC, Janis
no flags Details
Output of zfs list command (20.42 KB, text/plain)
2022-02-26 12:24 UTC, Janis
no flags Details
devd.pipe events (323.12 KB, text/plain)
2022-02-26 12:25 UTC, Janis
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Janis 2022-02-25 10:02:48 UTC
I have found 100% repeatable problem on 4+ different setups, with ZFS zvol not showing up in /dev/zvol until system reboot. In a way this is a continuation of investigation for problem reported at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261059, where i had my suspicion that there are some ZFS concurrency issues.

Requirements to reproduce (as i have tested), latest BSD as of now (13.0-RELEASE-p7 or 13.0-RELEASE) machine must have 1CPU (or 1 vCPU !IMPORTANT!). RAM seems not to matter, i have tested with 16G, 2G, 1G setups. Example will be for basic ZFS install, default options from DVD installer, automatic partitioning (just next-next install). Ensure that before running there is no ZVOLs and /dev/zvol directory does not exist (not mandatory, bug exists even if there are zvols, just easier to detect if there aren't any).


To trigger the bug, run shell script below (adjust script preamble of zpool/zfs dataset creation/destruction, name as necessary):

/bin/sh
name_pool=zroot/stress
# zpool create -f $name_pool /dev/ada1
# or
# zfs create $name_pool
zfs set mountpoint=none $name_pool

# zfs destroy -r $name_pool

seq 1 100 | while read i; do
zfs create -o volmode=dev -V 1G $name_pool/data$i
dd if=/dev/zero of=/dev/zvol/$name_pool/data$i bs=1M
done


You will see error output like or similar at some point in loop:
dd: /dev/zvol/zroot/stress/data1: No such file or directory


Now to validate run:
ls /dev/zvol
ls: /dev/zvol: No such file or directory

zfs list
# .. output containing all created ZVOLS ..


After reboot, zvols show up in /dev/zvol as expected (as should have been the case after create).


More details and observations on different environments:

1.
100% reproducible inside VirtualBox 6.0 VM (default FreeBSD settings, 13.0-RELEASE, default ZFS install), 1vCPU, 1GB RAM. Zvols are created on the same Zpool where BSD is installed.

2.
100% reproducible inside XEN 4.15 DomU, 1vCPU, 2GB RAM. FreeBSD installed on ada0 UFS, zpool created on /dev/ada1 whole disk, zvol directly (name_pool=zroot) without hierarchy.

3.
100% reproducible inside XEN 4.15 Dom0, 1vCPU, 16GB RAM, 13.0-RELEASE-p7. FreeBSD installed on separate /dev/gpt/label1 (ada0) disk, Zpool on ada1 gpt partitioned.

4.
100% reproducible on physical hardware, in BIOS disable all CPU cores, except one. 16GB RAM, Xeon CPU.


Observations:

This bug seems to be CPU count related. If 2 CPUs are available, there will be around 30% not created /dev/zvol devices, If 4 CPUs, then around 15% or less (percentage calculations are not exactly calculated, but shows CPU role, concurrency). I do not have more CPUs in my testing hardware, but it seems that the more there are, the less probable that this bug will manifest itself.

For script part "seq 1 100", on single CPU is far too much, it is enough to be "1 to 15" to see enough, for more CPU, higher count is better, since sometimes all ZVOLs are created.

After restart ZVOLs are always showing up in /dev/zvol. This works for reboot and manual import as well.

If zpool is on separate disk not where FreeBSD is installed, zpool export, zpool import results that ZVOLs showing up in /dev/zvol.

There is no difference if ZVOL is sparse volume, created with -s flag.


On 4 CPU setup, sometimes i have noticed errors like this in serial console:
g_dev_taste: g_dev_taste(zvol/test/data22) failed to g_attach, error=6

For me this seems suspicious, since volmode=dev and g_dev_taste should not trigger g_attach on such devices (volmode=dev), am i right? What this error code mean, is it from errno.h, "#define ENXIO 6 /* Device not configured */"? Maybe those are related to this bug, maybe not.

If i dd /dev/zero on block device and detach/attach it, this does not trigger g_dev_taste error.

I have seen 6 similar reported bugs with ZVOL not showing up. Though those were related to different (clone, send and recv) commands and seemed outdated. I will investigate those as well, and link them in my next comments. Maybe they have similar cause, but did not look like duplicates.

At the moment this is as far as i am able to dig onto this.
Comment 1 Janis 2022-02-25 13:20:44 UTC
Bug about problems with volmode property at ZFS dataset creation (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=251828) gave me some more ideas to test out, since it seemed a bit related/similar.

So i changed volmode in test script from dev to geom, like so.

..
zfs create -o volmode=geom -V 1G $name_pool/data$i
..

All ZVOLS in 13.0-RELEASE were created and accessible as expected.

It seems that ZFS property "mountpoint" value does not change anything.

I tested this case script in 12.2-RELEASE, VirtualBox, 1 CPU core, volmode=dev and this bug (#262189) does not manifest itself there, thus it makes me believe that while fixing bug #251828 a new problem was introduced/discovered.
Comment 2 Aleksandr Fedorov freebsd_committer freebsd_triage 2022-02-25 14:49:58 UTC
There are two things.

First, I think /dev/zvol/<pool>/device is created asynchronously. Therefore, after the completion of the "zfs create ..." command, the device may not have been created yet and the dd command will fail.

I think that if you add "sleep 5" to the script, the error will not be reproduced:

seq 1 100 | while read i; do
zfs create -o volmode=dev -V 1G $name_pool/data$i

sleep 5

dd if=/dev/zero of=/dev/zvol/$name_pool/data$i bs=1M
done

Second, the OpenZFS code creates a ZVOL device in a very strange way: https://github.com/openzfs/zfs/blob/master/module/zfs/zvol.c#L1394

/*
* It's unfortunate we need to remove minors before we create new ones:
* this is necessary because our backing gendisk (zvol_state->zv_disk)
* could be different when we set, for instance, volmode from "geom"
* to "dev" (or vice versa).
*/

First, a ZVOL device is created with the default volmode, then it's removed and created with the requested one. In FreeBSD, the default value for vfs.zfs.vol.mode is 1 (GEOM). Therefore, there is a race between the ZFS and GEOM threads. That's why you see this error: "g_dev_taste: g_dev_taste(zvol/test/data22) failed to g_attach, error=6".

For example output of the "cat /var/run/devd.pipe" when I create ZVOL (zfs create -V 1G -o volmode=dev datapool/test).
!system=GEOM subsystem=DEV type=CREATE cdev=zvol/datapool/test
!system=DEVFS subsystem=CDEV type=DESTROY cdev=zvol/datapool/test
!system=GEOM subsystem=DEV type=DESTROY cdev=zvol/datapool/test
!system=DEVFS subsystem=CDEV type=CREATE cdev=zvol/datapool/test
Comment 3 Janis 2022-02-26 12:23:56 UTC
Created attachment 232119 [details]
list of created ZVOLS for seq 1 1000
Comment 4 Janis 2022-02-26 12:24:48 UTC
Created attachment 232120 [details]
Output of zfs list command
Comment 5 Janis 2022-02-26 12:25:15 UTC
Created attachment 232121 [details]
devd.pipe events
Comment 6 Janis 2022-02-26 12:29:12 UTC
Thank's for the useful information, i did not know the internals in this regard. Now i understand what the bug #250297 report is speaking about. That bug seemed might be relevant to my case. I tried to reproduce it with zfs create/destroy shell script loop and could not hit kernel panic as stated there in comments. I did not get that he speaks about how volmode=dev is created internally, so his use-case seemed a bit bizarre.

About asynchronous nature for "zfs create" command. At first i thought this is the case, but it does not seem to be. There are two problems as i see:
1) if this was just that "zfs create" returns too early, it would be a mini-bug, since it is expected that command returns when things are done. I guess, i wouldn't even report it.
2) with sleep between "zfs create" and "dd" on multi-core systems it "solves" some of dd problems, but not all of the missing ZVOL cases; there are still ZVOLs that never show up in /dev/zvol but can be seen in "zfs list". On single CPU case, it solves nothing at all and ZVOL never appears, only after reboot/export-import.


To illustrate 1 CPU case, i run script:
#!/bin/sh
name_pool=zroot/stress
echo `date`
ls /dev/zvol
seq 1 10 | while read i; do
zfs create -o volmode=dev -V 1G $name_pool/data$i
done
sleep 300
echo `date`
ls /dev/zvol

Output is:
Sat Feb 26 12:21:08 EET 2022
ls: /dev/zvol: No such file or directory
Sat Feb 26 12:26:11 EET 2022
ls: /dev/zvol: No such file or directory

even now after a while
# date
Sat Feb 26 12:35:03 EET 2022
# ls /dev/zvol
ls: /dev/zvol: No such file or directory


I do not know how long is considered asynchronous, but it seems too long, so i assume that ZVOL will never show up.

With 1 CPU machine in file "cat /var/run/devd.pipe" i get lines like so for each create command:
!system=DEVFS subsystem=CDEV type=CREATE cdev=zvol/zroot/stress/data20
!system=GEOM subsystem=DEV type=CREATE cdev=zvol/zroot/stress/data20
!system=DEVFS subsystem=CDEV type=DESTROY cdev=zvol/zroot/stress/data20
!system=GEOM subsystem=DEV type=DESTROY cdev=zvol/zroot/stress/data20

This seems wrong, since both "last" events are DESTROY.


With 4 CPUs it is harder to reproduce, so i ran with 2 CPUs enabled in BIOS. Physical hardware, 16G RAM.

So i ran script as follows:
#!/bin/sh
name_pool=zroot/stress
zfs create -o mountpoint=none $name_pool
seq 1 1000 | while read i; do
zfs create -o volmode=dev -V 1G $name_pool/data$i
done

Testing result:
# zfs list | grep stress | wc -l
    1001
# ls /dev/zvol/zroot/stress/ | wc -l
     638

Output clearly shows that ZVOLs are missing (even if ignoring output header and ZVOL parent, diff is too big)

I created files and will attach them (though maybe it is enough with my pointers):
zfs list -H -o name > /service/log/zfs_list_001.log
ls ls /dev/zvol/zroot/stress/ > ls_dev_zvol_stress__001.log
cat /var/run/devd.pipe| grep -v "!system=ZFS" > /service/log/grepped_devd.pipe_no_dd_seq_1000__001.log


With diff and sorting we see that, i.e. there is no ZVOL for:
-data526
-data527
-data528
-data529

# ls /dev/zvol/zroot/stress/data526
ls: /dev/zvol/zroot/stress/data526: No such file or directory

# zfs get -H volmode zroot/stress/data526
zroot/stress/data526 volmode dev local


For non-existing case we see this in devd.pipe file:
# cat /service/log/grepped_devd.pipe_no_dd_seq_1000__001.log | grep data526
!system=DEVFS subsystem=CDEV type=CREATE cdev=zvol/zroot/stress/data526
!system=GEOM subsystem=DEV type=CREATE cdev=zvol/zroot/stress/data526
!system=DEVFS subsystem=CDEV type=DESTROY cdev=zvol/zroot/stress/data526
!system=GEOM subsystem=DEV type=DESTROY cdev=zvol/zroot/stress/data526

It's the same fingerprint as in 1 CPU case, double destroy events are last ones.

Whereas for existing ZVOLs there is:
# cat /service/log/grepped_devd.pipe_no_dd_seq_1000__001.log | grep data525
!system=DEVFS subsystem=CDEV type=CREATE cdev=zvol/zroot/stress/data525
!system=GEOM subsystem=DEV type=CREATE cdev=zvol/zroot/stress/data525
!system=DEVFS subsystem=CDEV type=DESTROY cdev=zvol/zroot/stress/data525
!system=GEOM subsystem=DEV type=DESTROY cdev=zvol/zroot/stress/data525
!system=DEVFS subsystem=CDEV type=CREATE cdev=zvol/zroot/stress/data525
Comment 7 Janis 2022-02-26 12:38:36 UTC
".. In FreeBSD, the default value for vfs.zfs.vol.mode is 1 (GEOM). .."
Would it be possible for me to manually change vfs.zfs.vol.mode to value that it is DEV? I don't know where value description for this sysctl is in manpages.

I could test how system works then, in hope that it could help debugging the problem.
Comment 8 Aleksandr Fedorov freebsd_committer freebsd_triage 2022-02-26 13:08:22 UTC
> Would it be possible for me to manually change vfs.zfs.vol.mode to value that it is DEV?

Yes. To get short description about sysctl:
# sysctl -d vfs.zfs.vol.mode
vfs.zfs.vol.mode: Expose as GEOM providers (1), device files (2) or neither

# sysctl vfs.zfs.vol.mode=2
vfs.zfs.vol.mode: 1 -> 2

You can change it at runtime.

If you set this sуsctl to 2, ZFS will not attempt to create a GEOM provider.
Comment 9 Janis 2022-02-28 09:12:29 UTC
Now i tried to test the same script with sysctl vfs.zfs.vol.mode=2, single CPU, single core. Added sparse option, since it does not reserve space and seems not to influence the bug.


sysctl vfs.zfs.vol.mode=2
.. loop ..
zfs create -o volmode=dev -s -V 1G $name_pool/data$i
..

All ZVOLs show up as would be expected from "zfs create" with vfs.zfs.vol.mode=1 as well.


The interesting thing is that with
sysctl vfs.zfs.vol.mode=2
.. loop ..
zfs create -o volmode=geom -s -V 1G $name_pool/dataG$i
..

All ZVOLs showed up. I somewhat expected this to fail, since default mode was set to DEV, and i thought that create-destroy stuff as mentioned in comment #2 would mess things up, but no, it did not.


But when i switch sysctl value back
sysctl vfs.zfs.vol.mode=1
.. loop ..
zfs create -o volmode=dev -s -V 1G $name_pool/dataD$i
..

No ZVOL shows up in /dev/zvol


Based on this, it seems that actually it would be better if default value is DEV, since then in both "zfs create" calls, ZVOLs show up (bug does not manifest itself). So to trick away this bug, it could be possible to set vfs.zfs.vol.mode=2(DEV) as default and for "zfs create" command to auto-fill volmode=geom if volmode is not specified. I would not like this to be a solution, since that does not fix the bug, it just masks it, which might manifest itself in different cases anyways.

It seems that value change vfs.zfs.vol.mode=2 might help in some systems to minimize problems with ZVOLs whose type is DEV.


So it seems that second part of comment #2 might be closer where the bug comes from. What i did not get though is, why then with vfs.zfs.vol.mode=2; zfs create volmode=geom did not fail? Does it take different path then not the create-destroy-create?
Comment 10 pprocacci 2023-04-29 08:01:51 UTC
Obligatory, "me too".

Changing to vfs.zfs.vol.mode=2 fixes it for me as well.