Bug 268157 - /etc/rc.d/zpool runs too early, before usb disks available
Summary: /etc/rc.d/zpool runs too early, before usb disks available
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: usb (show other bugs)
Version: 13.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-usb (Nobody)
URL:
Keywords: needs-qa
Depends on:
Blocks:
 
Reported: 2022-12-05 00:14 UTC by Barney Wolff
Modified: 2022-12-10 20:08 UTC (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Barney Wolff 2022-12-05 00:14:52 UTC

    
Comment 1 Barney Wolff 2022-12-05 00:18:06 UTC
Adding "sleep 5" to /etc/rc.d/zpool gets around the problem but is surely not the right fix.

I have this problem on my rpi4. desmg.boot shows zpool aborting just before the usb disk is detected. So the zpool does not stay imported/mounted across reboots.
Comment 2 Graham Perrin freebsd_committer freebsd_triage 2022-12-05 02:35:21 UTC
Triage: freebsd-arm@ (assignment) and usb (component), both tentative.
Comment 3 Barney Wolff 2022-12-05 02:53:36 UTC
I don't think this is arm-specific, although it may show up more often on systems like rpi which have usb disks. The fix has to be in zfs, but I don't know if zfs can know whether a pool import that's cached is on a usb disk or not. In any case zpool should not abort if a pool disk has disappeared over a reboot.

I can imagine a nightmare scenario where a vital pool that's on a permanently-attached internal disk fails to get imported because an unimportant pool on a usb disk has vanished.
Comment 4 Graham Perrin freebsd_committer freebsd_triage 2022-12-05 05:32:16 UTC
Thanks, now I might have a clearer idea of what you're describing. 

(In reply to Barney Wolff from comment #3)

> … a vital pool that's on a permanently-attached internal disk 
> fails to get imported because an unimportant pool on a usb disk has 
> vanished.

For me, occasionally, boot (from OpenZFS on adao) fails in the presence of some combination of external USB devices. IIRC multi-user mode either (a) is not reached, or (b) does not progress. 

The 'offending' device is not necessarily one that contains a pool. Sometimes it might be a cache device; other times IIRC it might be not storage-related. Sometimes, randomness – a failure to boot with a particular set of USB devices might be followed by a successful boot with the same set, with no change to connections. When the bug bites: I remove the device, boot continues. There might be an existing bug report for this type of thing.
Comment 5 Daniel Engberg freebsd_committer freebsd_triage 2022-12-05 10:16:58 UTC
Related, https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=261808
Comment 6 Barney Wolff 2022-12-05 16:52:19 UTC
Does appear to be the same problem.

As I read the zpool_main code, import -a processes cache file entries sequentially, so if one import fails badly, others will not get done. One brute-force fix for that would be to fork for each one. Presumably there is already code to serialize updates to the cache file, as potentially multiple root processes could do imports simultaneously.

I think it's an open question what import should do if a vdev doesn't exist (yet). It might wait forever. I don't know how usb signals that it's done, or whether zpool import can know if there are any usb vdevs, so always waiting for usb to finish doesn't seem right.

In my own case waiting for root mount would not help, as root is on the sd card while I want usr, var, home on a usb disk pool.
Comment 7 Hans Petter Selasky freebsd_committer freebsd_triage 2022-12-05 16:54:20 UTC
There are some tunables and sysctls you can set, to wait for USB disks before kicking init.

--HPS
Comment 8 Daniel Engberg freebsd_committer freebsd_triage 2022-12-07 09:49:09 UTC
(In reply to Hans Petter Selasky from comment #7)
Examples, I didn't see anything obvious looking?
Comment 9 Warner Losh freebsd_committer freebsd_triage 2022-12-07 23:50:57 UTC
It would be cool if there were a command 'walk the usb tree and finish all exploration and return' We could put that in a script and enable it on boot before zpool import :)
Comment 10 Tomoaki AOKI 2022-12-08 00:24:13 UTC
(In reply to Warner Losh from comment #9)

As rc.d script with (if possible) configurable timeout?
Comment 11 Mark Millard 2022-12-08 01:06:47 UTC
(In reply to Daniel Engberg from comment #8)

Possibly:

# sysctl -d kern.cam.boot_delay
kern.cam.boot_delay: Bus registration wait time

I generally have /boot/loader.conf contain:

kern.cam.boot_delay=10000

because of some history of running into an issue that it avoided.

Too bad no units for the time are indicated. ms if I infer correctly.

I also use:

vfs.mountroot.timeout=10
vfs.root_mount_always_wait=1

but those would only be indirectly useful for other media
-- and possibly only if the root mount was the slowest.

# sysctl -d vfs.root_mount_always_wait
vfs.root_mount_always_wait: Wait for root mount holds even if the root
device already exists

vfs.mountroot.timeout is not in sysctl. Again no time-unit indication.
Seconds if I infer correctly.