Created attachment 220284 [details] extend rc.d/dumpon to check and delay for cam(4) probing On systems with root from memory filesystem, and probably also true for nvd(4) backed root systems, rc.d/zpool doesn't work, because at the time it runs, cam(4) is still in state (aprobe) for attached SSDs (850pro in my test case). I can imagine much slower disks attaching, leading to similar problems for the not so rare cases where root mounts not from md(4). This affects dumpon and swapon too, which might have even higher impact than zpool! The attached diff checks the output of camcontrol(8), which is in /sbin on rootfs, without utilizing anything from /usr filesystem. The overhead should be neglectable on systems not affected, which unconditionally run the check, but it's not expensive considering it's only run once at startup.
Created attachment 222477 [details] extend rc.d/dumpon to check and delay for cam(4) probing Accidentally found this unresolved PR and noticed, that the patch is outdated. Here's what enables me to use OpenZFS in production.
Created attachment 230308 [details] Boot log excerpt
Came here to say this. In the process of setting up a fileserver, I noticed that a 10 TB pool replicated from the old omniosce machine wouldn't mount during boot, while it had no problem when imported manually. My observations and comments: The (rather misleading) error message from /etc/rc.d/zpool is frequently interspersed with kernel autoconfiguration messages (see attachment). This made it harder than necessary to figure out what was going wrong, together with my disbelief that something so string-and-ducttape-y should be shipped in a release. Why does the kernel boot multi-user before it's done with autoconfiguration? And if parallelizing operations is the idea, why isn't there a barrier in place for things as vital as disk operations? Et c'est pas fini: Downstream, mountd is utterly confused about a list of mounts in /etc/zfs/exports that don't exist ("mountd[7977]: bad exports list line 'redacted': symbolic link in export path or statfs failed"). Why is this information persisted, instead of being created during the zpool import under /var/run - or not, if the pool isn't found? In the end, I inserted a 20 sec sleep in /etc/rc.d/zpool, and moved on, rather unimpressed.
In the same venue, there is this <https://www.reddit.com/r/freebsd/comments/n0rxud/zfs_loads_before_disks_are_ready_during_boot_then/>
Keyword: patch or patch-ready – in lieu of summary line prefix: [patch] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>
(In reply to Harald Schmalzbauer from comment #0) > … probably also true for nvd(4) backed root systems, … Is the situation significantly different with 15.0-CURRENT? <https://cgit.freebsd.org/src/log/?qt=grep&q=nvd>, a few things catch my eye. (In reply to Harald Schmalzbauer from comment #1) % git -C /usr/src pull --ff-only freebsd main From https://git.freebsd.org/src * branch main -> FETCH_HEAD Already up to date. % git -C /usr/src apply --check --verbose /tmp/222477.patch Checking patch libexec/rc/rc.d/dumpon... Hunk #1 succeeded at 47 (offset 12 lines). % (In reply to Hauke Fath from comment #4) Re: <https://old.reddit.com/r/freebsd/comments/n0rxud/-/gwbx5up/> for reference only, % sysctl -d kern.cam.boot_delay kern.cam.scsi_delay kern.cam.boot_delay: Bus registration wait time kern.cam.scsi_delay: Delay to allow devices to settle after a SCSI bus reset (ms) %
Probably, it's a duplicate of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242189
This is PR obsolete, see d878a66a9a, PR: 242189 Closing