Bug 251610

Summary: rc.d/zpool runs before ada(4) attaches
Product: Base System Reporter: Harald Schmalzbauer <bugzilla.freebsd>
Component: binAssignee: freebsd-rc (Nobody) <rc>
Status: Open ---    
Severity: Affects Some People CC: fs, hf
Priority: --- Keywords: needs-qa
Version: 13.0-STABLE   
Hardware: Any   
OS: Any   
URL: https://github.com/freebsd/freebsd-src/blob/main/libexec/rc/rc.d/dumpon
Attachments:
Description Flags
extend rc.d/dumpon to check and delay for cam(4) probing
none
extend rc.d/dumpon to check and delay for cam(4) probing
none
Boot log excerpt none

Description Harald Schmalzbauer 2020-12-05 17:41:18 UTC
Created attachment 220284 [details]
extend rc.d/dumpon to check and delay for cam(4) probing

On systems with root from memory filesystem, and probably also true for nvd(4) backed root systems, rc.d/zpool doesn't work, because at the time it runs, cam(4) is still in state (aprobe) for attached SSDs (850pro in my test case).
I can imagine much slower disks attaching, leading to similar problems for the not so rare cases where root mounts not from md(4).

This affects dumpon and swapon too, which might have even higher impact than zpool!

The attached diff checks the output of camcontrol(8), which is in /sbin on rootfs, without utilizing anything from /usr filesystem.

The overhead should be neglectable on systems not affected, which unconditionally run the check, but it's not expensive considering it's only run once at startup.
Comment 1 Harald Schmalzbauer 2021-02-15 21:21:22 UTC
Created attachment 222477 [details]
extend rc.d/dumpon to check and delay for cam(4) probing

Accidentally found this unresolved PR and noticed, that the patch is outdated.
Here's what enables me to use OpenZFS in production.
Comment 2 Hauke Fath 2021-12-22 10:03:17 UTC
Created attachment 230308 [details]
Boot log excerpt
Comment 3 Hauke Fath 2021-12-22 10:18:32 UTC
Came here to say this.

In the process of setting up a fileserver, I noticed that a 10 TB pool replicated from the old omniosce machine wouldn't mount during boot, while it had no problem when imported manually.

My observations and comments:

The (rather misleading) error message from /etc/rc.d/zpool is frequently interspersed with kernel autoconfiguration messages (see attachment). This made it harder than necessary to figure out what was going wrong, together with my disbelief that something so string-and-ducttape-y should be shipped in a release.

Why does the kernel boot multi-user before it's done with autoconfiguration? And if parallelizing operations is the idea, why isn't there a barrier in place for things as vital as disk operations?

Et c'est pas fini: Downstream, mountd is utterly confused about a list of mounts in /etc/zfs/exports that don't exist ("mountd[7977]: bad exports list line 'redacted': symbolic link in export path or statfs failed"). Why is this information persisted, instead of being created during the zpool import under /var/run - or not, if the pool isn't found?

In the end, I inserted a 20 sec sleep in /etc/rc.d/zpool, and moved on, rather unimpressed.
Comment 4 Hauke Fath 2021-12-22 15:21:04 UTC
In the same venue, there is this <https://www.reddit.com/r/freebsd/comments/n0rxud/zfs_loads_before_disks_are_ready_during_boot_then/>
Comment 5 Graham Perrin freebsd_committer freebsd_triage 2022-10-17 12:40:23 UTC
Keyword: 

    patch
or  patch-ready

– in lieu of summary line prefix: 

    [patch]

* bulk change for the keyword
* summary lines may be edited manually (not in bulk). 

Keyword descriptions and search interface: 

    <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>
Comment 6 Graham Perrin 2023-10-01 16:47:29 UTC
(In reply to Harald Schmalzbauer from comment #0)

> … probably also true for nvd(4) backed root systems, …

Is the situation significantly different with 15.0-CURRENT? 

<https://cgit.freebsd.org/src/log/?qt=grep&q=nvd>, a few things catch my eye.


(In reply to Harald Schmalzbauer from comment #1)

% git -C /usr/src pull --ff-only freebsd main
From https://git.freebsd.org/src
 * branch                      main       -> FETCH_HEAD
Already up to date.
% git -C /usr/src apply --check --verbose /tmp/222477.patch
Checking patch libexec/rc/rc.d/dumpon...
Hunk #1 succeeded at 47 (offset 12 lines).
% 


(In reply to Hauke Fath from comment #4)

Re: <https://old.reddit.com/r/freebsd/comments/n0rxud/-/gwbx5up/> for reference only, 

% sysctl -d kern.cam.boot_delay kern.cam.scsi_delay
kern.cam.boot_delay: Bus registration wait time
kern.cam.scsi_delay: Delay to allow devices to settle after a SCSI bus reset (ms)
%