Bug 208882

Summary: zfs root filesystem mount failure on startup in FreeBSD 10.3-RELEASE if USB hdd with zpool is attached to another port
Product: Base System Reporter: Masachika ISHIZUKA <ish>
Component: kernAssignee: Ian Lepore <ian>
Status: Closed FIXED    
Severity: Affects Some People CC: avg, decui, elij+freebsd, fk, grahamperrin, ian, imp, ish, julian, pete, pi, stb, trasz
Priority: ---    
Version: 10.3-RELEASE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
messages
none
parse_mount(): Use vfs.mountroot.timeout for ZFS root (and NFS) as well none

Description Masachika ISHIZUKA 2016-04-18 04:48:33 UTC
Created attachment 169419 [details]
messages

I have a USB hdd drive with zpool.
In 10.3-RELEASE, it is bootable with UEFI and it is good working when attaching USB hdd to the same USB port. But I changed USB hdd to one of other USB ports, kernel cannot mount root filesystem with the following messages.

Solaris: NOTICE: Cannot find the pool label for 'zroot'
Mounting from zfs:zroot/ROOT/default failed with error 5.

And with prompt 'mountroot>', then I entered 'zfs:zroot/ROOT/default', it can mount normally.

USB HDD is GPT formatted as follow..
% gpart show da0
=>        34  3907029097  da0  GPT  (1.8T)
          34        1606       - free -  (803K)
        1640          88    2  freebsd-boot  (44K)
        1728   134217728    3  freebsd-ufs  (64G)
   134219456    16777216    4  freebsd-swap  (8.0G)
   150996672    52428800       - free -  (25G)
   203425472   209715200   27  freebsd-zfs  (100G)
   413140672  3493888459       - free -  (1.6T)

A zpool was created on /dev/da0p27 as follows.

% zfs list
NAME                USED  AVAIL  REFER  MOUNTPOINT
zroot               26.8G  69.6G    96K  none
zroot/ROOT          1.24G  69.6G    96K  none
zroot/ROOT/default  1.24G  69.6G  1.24G  /
zroot/usr           22.2G  69.6G  6.60G  /usr
zroot/usr/local     15.6G  69.6G  15.6G  /usr/local
zroot/var           3.34G  69.6G  3.34G  /var

And my configuration is as follows.
% grep -v '^#' /boot/loader.conf
kernels="kernel kernel.i915 kernel.old"
i915_load="YES"
fuse_load="YES"
if_axge_load="YES"
zfs_load="YES"
vfs.root.mountfrom="zfs:zroot/ROOT/default"
% grep -v '^#' /etc/rc.conf
hostname="carrot.ish.org"
sshd_enable="YES"
moused_enable="YES"
ntpd_enable="YES"
dumpdev="NO"
keymap="us"
defaultrouter="192.168.1.1"
ifconfig_ue0="inet 192.168.1.8 netmask 255.255.255.0"
ifconfig_ue0_ipv6="inet6 accept_rtadv"
moused_flags="-m 4=5 -m 5=4"
apm_enable="YES"
dbus_enable="YES"
hald_enable="YES"
linux_enable="YES"
fusefs_enable="YES"
nfs_client_enable="YES"
powerd_enable="YES"
performance_cx_lowest="C3"
economy_cx_lowest="C3"
nfsuserd_enable="YES"
nfsuserd_flags="-domain ish.org"
nfscbd_enable="YES"
local_unbound_enable="YES"
rtsold_enable="YES"
firewall_enable="YES"
firewall_script="/etc/ipfw.conf"
firewall_logging="YES"
autofs_enable="YES"
zfs_enable="YES"

/var/log/messages is attached.
Comment 1 Fabian Keil 2016-04-20 10:02:44 UTC
This could be the result of vfs.mountroot.timeout being ignored
when booting from ZFS. See also:
https://lists.freebsd.org/pipermail/freebsd-fs/2015-March/020997.html
Comment 2 Masachika ISHIZUKA 2016-04-21 00:29:17 UTC
(In reply to Fabian Keil from comment #1)
Thank you very much.
This patch is applicable to 10.3-RELEASE and fixes this bug.
After applying this patch, I can boot with zfs normally.
The log show like that.

> Apr 21 09:15:32 carrot kernel: Solaris: NOTICE: Cannot find the pool label for '
zroot'
> Apr 21 09:15:32 carrot kernel: Mounting from zfs:zroot/ROOT/default failed with 
error 5. 2 seconds left. Retrying.

And then, zfs succeeded to mount automatically.
Comment 3 Fabian Keil 2016-04-23 11:52:24 UTC
Created attachment 169589 [details]
parse_mount(): Use vfs.mountroot.timeout for ZFS root (and NFS) as well

You're welcome, thanks for testing.

I'm attaching an updated version of the patch that
applies against 11-CURRENT after r290196 and has been
included in ElectroBSD since last October.

The patch set also contains a patch to use the timeout
for NFS mounts as well, but so far I did not have the
opportunity to test this and suspect that nobody else
tested it either.
Comment 4 Masachika ISHIZUKA 2016-04-23 12:27:25 UTC
(In reply to Fabian Keil from comment #3)

Thank you for new patch.
Although I hope this patch to be merged into base/head branch, I don't know how to request them.

# I'm not test 'nfs mount retry' because I don't use nfs filesystems that are nessesary on bootup, and I use 'autofs' for /home, /usr/altlocal and so on to avoid mount error when nfs server is down for maintenance.
Comment 5 Fabian Keil 2016-04-25 16:46:11 UTC
No worries, I did not expect you to test the NFS mount retries.

The freebsd-fs@ list is already CC'd, so an additional request
to get the patch reviewed and committed should not be necessary.
Comment 6 Masachika ISHIZUKA 2016-04-26 03:29:31 UTC
(In reply to Fabian Keil from comment #5)
Thank you for reply.

> committed should not be necessary.
It is very disappointing.

By the way, I think that although the retry messages are shown in seconds, it is not match real time. It may be shown in count.

i.e.
printf("Mounting from %s:%s failed with error %d. "
    "%d time(s) left. Retrying.\n", fs, dev, error,
    (timeout + delay - 1)/ delay);
Comment 7 Pete French 2017-03-13 15:00:17 UTC
Just a comment that this is still an isse, and affects running machines in Azure with ZFS root mounts. The Microsoft supplied config waits 300 seconds for the UFS partitions, and I hadnt realised ZFS didnt respect this until a number of my machines became unresponsive after an upgrade to 11.

Booting off USB sticks is not that critical, but an inability to boot cloud machines impacts us severely, and I would expect also impacts other people too.
Comment 8 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-03-13 18:55:44 UTC
To be honest, I think it should be fixed in ZFS and not in the mountroot code.  If you look at the failing dmesg, you'll see that the mountroot already does wait for USB enumeration, just as expected:

Apr 18 12:57:09 carrot kernel: Root mount waiting for: usbus0
Apr 18 12:57:09 carrot kernel: Root mount waiting for: usbus0
Apr 18 12:57:09 carrot kernel: ugen0.8: <ADATA> at usbus0
Apr 18 12:57:09 carrot kernel: umass0: <ADATA HV620, class 0/0, rev 3.00/65.03, addr 8> on usbus0
Apr 18 12:57:09 carrot kernel: umass0:  SCSI over Bulk-Only; quirks = 0x0100
Apr 18 12:57:09 carrot kernel: umass0:1:0:-1: Attached to scbus1
Apr 18 12:57:09 carrot kernel: Trying to mount root from zfs:zroot/ROOT/default []...

It's just that ZFS still fails for some weird reason.
Comment 9 Julian Elischer freebsd_committer freebsd_triage 2017-03-19 10:15:51 UTC
I confirm that the retry strategy for ZFS (I did not try nfs) does wor effectively, and you don't need to wait the 300 seconds. (mine had to wait 3 seconds)
I changed the code to only print out the message every 50 retries as any message becomes obnoxious when printed 10 time per second.

> 			if (err_stride <= 0 ) {
> 					printf("Mounting from %s:%s failed with error %d. "
> 			    	"%d seconds left. Retrying.\n", fs, dev, error,
> 			    	timeout / hz);
> 			}
> 			err_stride += 1;
> 			err_stride %= 50;
> 			pause("rmzfs", delay);
Comment 10 Julian Elischer freebsd_committer freebsd_triage 2017-03-19 10:19:37 UTC
this should probably make it to 'errata' and get added, but it's not obvious how many people are running 10.3 on azure yet.. (obviously some are...)

I'm told that it's fixed in 10-stable so maybe the best answer is just to upgrade a bit.
Comment 11 Julian Elischer freebsd_committer freebsd_triage 2017-03-19 10:22:49 UTC
to Edward,  
teh problem is that it only knows how to wait for DEVICES,
A filesystem based approach woudl require the filesystem to report back what devices it would need.. it might bw worth while adding a VFS method for 'report availability" or something but the approach used here works fine and I'm told it's fixed in 10-stable anyhow.
Comment 12 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-04-08 11:20:09 UTC
Julian, I don't think that's the problem.  The code in 10 doesn't wait for devices, it just waits for all the root mount holds to be released.  The code in 11 waits for devices, but for filesystems that don't have a specific device - like ZFS or NFS - it falls back to 10 behaviour, ie waiting for the root mount holds.

To be honest I don't think this is a problem with root mount mechanism at all.  It looks more like something internal to ZFS.  Perhaps we should just put the loop inside zfs_mountroot()?
Comment 13 Pete French 2017-05-25 09:01:05 UTC
Is there any chance of this match making it into 11.1 which is now in prerelease ? I am running perfectly happily with this in Azure and it has fixed my mount issues. I understand the arguments for it not being the correct place for the fix, but in practical terms this its us reliably boot FreeBSD somewhere that it does not reliably boot without it, so theres surely at least an argument for committing it for now until a better fix can be applied ? As written is doesn't have any negative effects after all...

Would like to not have to run a custom patched kernel after 11.1 comes out if possible...
Comment 14 Stefan Bethke 2017-07-23 21:21:08 UTC
I'm encountering this behaviour as well, trying to boot off a USB stick.  It would be great if the patch would make it into stable, at least as an option I can enable through a tunable.
Comment 15 Edward Tomasz Napierala freebsd_committer freebsd_triage 2017-07-24 07:36:49 UTC
FWIW, it should be possible to fix without any source changes using mount.conf (man mount.conf).
Comment 16 Stefan Bethke 2017-07-24 07:52:31 UTC
(In reply to Edward Tomasz Napierala from comment #15)

Really? Would you mind pointing out which specific contents of /.mount.conf would work around this issue? And where would that file live?
Comment 17 Stefan Bethke 2017-07-24 21:07:57 UTC
I can report that the patch works for me.  I would really appreciate seeing this in stable and in a patch release for 11.1, so I can run with a stock kernel again.
Comment 18 Kurt Jaeger freebsd_committer freebsd_triage 2018-03-03 19:01:35 UTC
See 

https://lists.freebsd.org/pipermail/freebsd-stable/2018-March/088484.html

and discussion. Seems to be a common problem.
Comment 19 Warner Losh freebsd_committer freebsd_triage 2018-03-04 18:31:16 UTC
There needs to be a 'zpool wait' or there needs to be something that tries again when devices arrive in the background dependng on how critical the need for the zpool is. devd could be useful here, imho. devd is also useful for mounting /usr or some other crit fs when the device arrives, and we could put waits in rc for them if necessary. That may need some more thought.

IMHO, the whole root hold thing is shite and should be flushed. It should be replaced by a loop similar to what we have today: try to mount /, sleep until the next device arrives and try again. timeout to a prompt after N seconds (or in an ideal world, when kbkit is true :).
Comment 20 Ian Lepore freebsd_committer freebsd_triage 2018-03-10 19:46:25 UTC
To expand a little bit on Warner's comment, based on a discussion we had on irc...  The current root mount hold system is incomplete, and it may be inadequate overall to ever help with the zfs situations.

One way the current system is incomplete is illustrated by the usb subsystem.  A root hold is placed for each usb bus while it is scanned, and released when the scan is complete.  That allows any umass devices on the bus to be created before releasing the hold.  The problem is that a umass device is not the thing that hosts the root filesystem, it needs to attach to CAM and eventually have a da device created, then geom needs to taste that device and create all geoms on it, and so on.  All this work ends up happening asynchronously after the usb bus root hold is released.  To fix this properly, the umass device would have to place its own root hold which didn't get released until CAM had created and attached the da device(s).  CAM would have to place its own root holds that would have to remain in place until geom had finished tasting the new devices.  All this same extra plumbing of holds would have to be replicated in the mmc/sd subsystem as well.  Because of all the existing races you can't actually rely on the releasing of all mount holds to indicate that devices are ready.  So the scheme only works for devices that are mounted by specifying a /dev/something name, because that allows the code in vfs_mountroot.c to keep looping until a name lookup succeeds.

Which brings us to zfs, a way to host root filesystems that doesn't mount them based on names in /dev, so there's no way the existing root mount hold system and detection of availability can even be extended to zfs.

A much simpler alternative is the idea contained in the patch with this PR: detect the availability of the root filesystem by attempting to mount it, and if that fails, pause briefly and try again, until the timeout expires.  With this new retry loop in place, there is little reason for the whole root mount hold system to continue to exist as well, but removing it is more complicated and should be handled separately after this simple fix is comitted.
Comment 21 commit-hook freebsd_committer freebsd_triage 2018-03-10 22:08:22 UTC
A commit references this bug:

Author: ian
Date: Sat Mar 10 22:07:57 UTC 2018
New revision: 330745
URL: https://svnweb.freebsd.org/changeset/base/330745

Log:
  Make root mount timeout logic work for filesystems other than ufs.

  The vfs.mountroot.timeout tunable and .timeout directive in a mount.conf(5)
  file allow specifying a wait timeout for the device(s) hosting the root
  filesystem to become usable.  The current mechanism for waiting for devices
  and detecting their availability can't be used for zfs-hosted filesystems.
  See the comment #20 in the PR for some expanded detail on these points.

  This change adds retry logic to the actual root filesystem mount.  That is,
  insted of relying on device availability using device name lookups, it uses
  the kernel_mount() call itself to detect whether the filesystem can be
  mounted, and loops until it succeeds or the configured timeout is exceeded.

  These changes are based on the patch attached to the PR, but it's rewritten
  enough that all mistakes belong to me.

  PR:		208882
  X-MFC after:	sufficient testing, and hopefully in time for 11.1

Changes:
  head/sys/kern/vfs_mountroot.c
Comment 22 commit-hook freebsd_committer freebsd_triage 2018-03-20 21:03:41 UTC
A commit references this bug:

Author: ian
Date: Tue Mar 20 21:02:43 UTC 2018
New revision: 331262
URL: https://svnweb.freebsd.org/changeset/base/331262

Log:
  MFC r330745:

  Make root mount timeout logic work for filesystems other than ufs.

  The vfs.mountroot.timeout tunable and .timeout directive in a mount.conf(5)
  file allow specifying a wait timeout for the device(s) hosting the root
  filesystem to become usable.  The current mechanism for waiting for devices
  and detecting their availability can't be used for zfs-hosted filesystems.
  See the comment #20 in the PR for some expanded detail on these points.

  This change adds retry logic to the actual root filesystem mount.  That is,
  insted of relying on device availability using device name lookups, it uses
  the kernel_mount() call itself to detect whether the filesystem can be
  mounted, and loops until it succeeds or the configured timeout is exceeded.

  These changes are based on the patch attached to the PR, but it's rewritten
  enough that all mistakes belong to me.

  PR:		208882

Changes:
_U  stable/11/
  stable/11/sys/kern/vfs_mountroot.c
Comment 23 commit-hook freebsd_committer freebsd_triage 2018-03-20 22:57:38 UTC
A commit references this bug:

Author: ian
Date: Tue Mar 20 22:57:14 UTC 2018
New revision: 331276
URL: https://svnweb.freebsd.org/changeset/base/331276

Log:
  MFC r330745:

  Make root mount timeout logic work for filesystems other than ufs.

  The vfs.mountroot.timeout tunable and .timeout directive in a mount.conf(5)
  file allow specifying a wait timeout for the device(s) hosting the root
  filesystem to become usable.  The current mechanism for waiting for devices
  and detecting their availability can't be used for zfs-hosted filesystems.
  See the comment #20 in the PR for some expanded detail on these points.

  This change adds retry logic to the actual root filesystem mount.  That is,
  insted of relying on device availability using device name lookups, it uses
  the kernel_mount() call itself to detect whether the filesystem can be
  mounted, and loops until it succeeds or the configured timeout is exceeded.

  These changes are based on the patch attached to the PR, but it's rewritten
  enough that all mistakes belong to me.

  PR:		208882

Changes:
  stable/10/sys/kern/vfs_mountroot.c