Bug 208882 - zfs root filesystem mount failure on startup in FreeBSD 10.3-RELEASE if USB hdd with zpool is attached to another port
Summary: zfs root filesystem mount failure on startup in FreeBSD 10.3-RELEASE if USB ...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-fs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-18 04:48 UTC by Masachika ISHIZUKA
Modified: 2017-07-24 21:07 UTC (History)
8 users (show)

See Also:


Attachments
messages (33.34 KB, text/plain)
2016-04-18 04:48 UTC, Masachika ISHIZUKA
no flags Details
parse_mount(): Use vfs.mountroot.timeout for ZFS root (and NFS) as well (4.09 KB, patch)
2016-04-23 11:52 UTC, Fabian Keil
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Masachika ISHIZUKA 2016-04-18 04:48:33 UTC
Created attachment 169419 [details]
messages

I have a USB hdd drive with zpool.
In 10.3-RELEASE, it is bootable with UEFI and it is good working when attaching USB hdd to the same USB port. But I changed USB hdd to one of other USB ports, kernel cannot mount root filesystem with the following messages.

Solaris: NOTICE: Cannot find the pool label for 'zroot'
Mounting from zfs:zroot/ROOT/default failed with error 5.

And with prompt 'mountroot>', then I entered 'zfs:zroot/ROOT/default', it can mount normally.

USB HDD is GPT formatted as follow..
% gpart show da0
=>        34  3907029097  da0  GPT  (1.8T)
          34        1606       - free -  (803K)
        1640          88    2  freebsd-boot  (44K)
        1728   134217728    3  freebsd-ufs  (64G)
   134219456    16777216    4  freebsd-swap  (8.0G)
   150996672    52428800       - free -  (25G)
   203425472   209715200   27  freebsd-zfs  (100G)
   413140672  3493888459       - free -  (1.6T)

A zpool was created on /dev/da0p27 as follows.

% zfs list
NAME                USED  AVAIL  REFER  MOUNTPOINT
zroot               26.8G  69.6G    96K  none
zroot/ROOT          1.24G  69.6G    96K  none
zroot/ROOT/default  1.24G  69.6G  1.24G  /
zroot/usr           22.2G  69.6G  6.60G  /usr
zroot/usr/local     15.6G  69.6G  15.6G  /usr/local
zroot/var           3.34G  69.6G  3.34G  /var

And my configuration is as follows.
% grep -v '^#' /boot/loader.conf
kernels="kernel kernel.i915 kernel.old"
i915_load="YES"
fuse_load="YES"
if_axge_load="YES"
zfs_load="YES"
vfs.root.mountfrom="zfs:zroot/ROOT/default"
% grep -v '^#' /etc/rc.conf
hostname="carrot.ish.org"
sshd_enable="YES"
moused_enable="YES"
ntpd_enable="YES"
dumpdev="NO"
keymap="us"
defaultrouter="192.168.1.1"
ifconfig_ue0="inet 192.168.1.8 netmask 255.255.255.0"
ifconfig_ue0_ipv6="inet6 accept_rtadv"
moused_flags="-m 4=5 -m 5=4"
apm_enable="YES"
dbus_enable="YES"
hald_enable="YES"
linux_enable="YES"
fusefs_enable="YES"
nfs_client_enable="YES"
powerd_enable="YES"
performance_cx_lowest="C3"
economy_cx_lowest="C3"
nfsuserd_enable="YES"
nfsuserd_flags="-domain ish.org"
nfscbd_enable="YES"
local_unbound_enable="YES"
rtsold_enable="YES"
firewall_enable="YES"
firewall_script="/etc/ipfw.conf"
firewall_logging="YES"
autofs_enable="YES"
zfs_enable="YES"

/var/log/messages is attached.
Comment 1 Fabian Keil 2016-04-20 10:02:44 UTC
This could be the result of vfs.mountroot.timeout being ignored
when booting from ZFS. See also:
https://lists.freebsd.org/pipermail/freebsd-fs/2015-March/020997.html
Comment 2 Masachika ISHIZUKA 2016-04-21 00:29:17 UTC
(In reply to Fabian Keil from comment #1)
Thank you very much.
This patch is applicable to 10.3-RELEASE and fixes this bug.
After applying this patch, I can boot with zfs normally.
The log show like that.

> Apr 21 09:15:32 carrot kernel: Solaris: NOTICE: Cannot find the pool label for '
zroot'
> Apr 21 09:15:32 carrot kernel: Mounting from zfs:zroot/ROOT/default failed with 
error 5. 2 seconds left. Retrying.

And then, zfs succeeded to mount automatically.
Comment 3 Fabian Keil 2016-04-23 11:52:24 UTC
Created attachment 169589 [details]
parse_mount(): Use vfs.mountroot.timeout for ZFS root (and NFS) as well

You're welcome, thanks for testing.

I'm attaching an updated version of the patch that
applies against 11-CURRENT after r290196 and has been
included in ElectroBSD since last October.

The patch set also contains a patch to use the timeout
for NFS mounts as well, but so far I did not have the
opportunity to test this and suspect that nobody else
tested it either.
Comment 4 Masachika ISHIZUKA 2016-04-23 12:27:25 UTC
(In reply to Fabian Keil from comment #3)

Thank you for new patch.
Although I hope this patch to be merged into base/head branch, I don't know how to request them.

# I'm not test 'nfs mount retry' because I don't use nfs filesystems that are nessesary on bootup, and I use 'autofs' for /home, /usr/altlocal and so on to avoid mount error when nfs server is down for maintenance.
Comment 5 Fabian Keil 2016-04-25 16:46:11 UTC
No worries, I did not expect you to test the NFS mount retries.

The freebsd-fs@ list is already CC'd, so an additional request
to get the patch reviewed and committed should not be necessary.
Comment 6 Masachika ISHIZUKA 2016-04-26 03:29:31 UTC
(In reply to Fabian Keil from comment #5)
Thank you for reply.

> committed should not be necessary.
It is very disappointing.

By the way, I think that although the retry messages are shown in seconds, it is not match real time. It may be shown in count.

i.e.
printf("Mounting from %s:%s failed with error %d. "
    "%d time(s) left. Retrying.\n", fs, dev, error,
    (timeout + delay - 1)/ delay);
Comment 7 pete 2017-03-13 15:00:17 UTC
Just a comment that this is still an isse, and affects running machines in Azure with ZFS root mounts. The Microsoft supplied config waits 300 seconds for the UFS partitions, and I hadnt realised ZFS didnt respect this until a number of my machines became unresponsive after an upgrade to 11.

Booting off USB sticks is not that critical, but an inability to boot cloud machines impacts us severely, and I would expect also impacts other people too.
Comment 8 Edward Tomasz Napierala freebsd_committer 2017-03-13 18:55:44 UTC
To be honest, I think it should be fixed in ZFS and not in the mountroot code.  If you look at the failing dmesg, you'll see that the mountroot already does wait for USB enumeration, just as expected:

Apr 18 12:57:09 carrot kernel: Root mount waiting for: usbus0
Apr 18 12:57:09 carrot kernel: Root mount waiting for: usbus0
Apr 18 12:57:09 carrot kernel: ugen0.8: <ADATA> at usbus0
Apr 18 12:57:09 carrot kernel: umass0: <ADATA HV620, class 0/0, rev 3.00/65.03, addr 8> on usbus0
Apr 18 12:57:09 carrot kernel: umass0:  SCSI over Bulk-Only; quirks = 0x0100
Apr 18 12:57:09 carrot kernel: umass0:1:0:-1: Attached to scbus1
Apr 18 12:57:09 carrot kernel: Trying to mount root from zfs:zroot/ROOT/default []...

It's just that ZFS still fails for some weird reason.
Comment 9 Julian Elischer freebsd_committer 2017-03-19 10:15:51 UTC
I confirm that the retry strategy for ZFS (I did not try nfs) does wor effectively, and you don't need to wait the 300 seconds. (mine had to wait 3 seconds)
I changed the code to only print out the message every 50 retries as any message becomes obnoxious when printed 10 time per second.

> 			if (err_stride <= 0 ) {
> 					printf("Mounting from %s:%s failed with error %d. "
> 			    	"%d seconds left. Retrying.\n", fs, dev, error,
> 			    	timeout / hz);
> 			}
> 			err_stride += 1;
> 			err_stride %= 50;
> 			pause("rmzfs", delay);
Comment 10 Julian Elischer freebsd_committer 2017-03-19 10:19:37 UTC
this should probably make it to 'errata' and get added, but it's not obvious how many people are running 10.3 on azure yet.. (obviously some are...)

I'm told that it's fixed in 10-stable so maybe the best answer is just to upgrade a bit.
Comment 11 Julian Elischer freebsd_committer 2017-03-19 10:22:49 UTC
to Edward,  
teh problem is that it only knows how to wait for DEVICES,
A filesystem based approach woudl require the filesystem to report back what devices it would need.. it might bw worth while adding a VFS method for 'report availability" or something but the approach used here works fine and I'm told it's fixed in 10-stable anyhow.
Comment 12 Edward Tomasz Napierala freebsd_committer 2017-04-08 11:20:09 UTC
Julian, I don't think that's the problem.  The code in 10 doesn't wait for devices, it just waits for all the root mount holds to be released.  The code in 11 waits for devices, but for filesystems that don't have a specific device - like ZFS or NFS - it falls back to 10 behaviour, ie waiting for the root mount holds.

To be honest I don't think this is a problem with root mount mechanism at all.  It looks more like something internal to ZFS.  Perhaps we should just put the loop inside zfs_mountroot()?
Comment 13 pete 2017-05-25 09:01:05 UTC
Is there any chance of this match making it into 11.1 which is now in prerelease ? I am running perfectly happily with this in Azure and it has fixed my mount issues. I understand the arguments for it not being the correct place for the fix, but in practical terms this its us reliably boot FreeBSD somewhere that it does not reliably boot without it, so theres surely at least an argument for committing it for now until a better fix can be applied ? As written is doesn't have any negative effects after all...

Would like to not have to run a custom patched kernel after 11.1 comes out if possible...
Comment 14 stb 2017-07-23 21:21:08 UTC
I'm encountering this behaviour as well, trying to boot off a USB stick.  It would be great if the patch would make it into stable, at least as an option I can enable through a tunable.
Comment 15 Edward Tomasz Napierala freebsd_committer 2017-07-24 07:36:49 UTC
FWIW, it should be possible to fix without any source changes using mount.conf (man mount.conf).
Comment 16 stb 2017-07-24 07:52:31 UTC
(In reply to Edward Tomasz Napierala from comment #15)

Really? Would you mind pointing out which specific contents of /.mount.conf would work around this issue? And where would that file live?
Comment 17 stb 2017-07-24 21:07:57 UTC
I can report that the patch works for me.  I would really appreciate seeing this in stable and in a patch release for 11.1, so I can run with a stock kernel again.