263473 – ZFS drives fail to mount datasets when rebooting - 13.1-RC4

Bug 263473 - ZFS drives fail to mount datasets when rebooting - 13.1-RC4

Summary: ZFS drives fail to mount datasets when rebooting - 13.1-RC4

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	13.1-STABLE
Hardware:	Any Any

Importance:	--- Affects Many People
Assignee:	freebsd-fs (Nobody)

URL:
Keywords:	needs-qa

Depends on:
Blocks:

Reported:	2022-04-22 18:27 UTC by Rick Summerhill
Modified:	2023-02-28 13:53 UTC (History)
CC List:	12 users (show)

See Also:	261808

Attachments
Patch to fix the issue (960 bytes, patch) 2022-08-16 07:18 UTC, Maxim Sobolev	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Rick Summerhill 2022-04-22 18:27:31 UTC

I'm running a 13.1 RC-4 server that has a zfs problem that didn't exist under 13.0 RELEASE.

First, here is the configuration of the server.  It has the operating system on an NVD drive with all the partitions UFS.  It has 8 UFS formatted drives in a SAS configuration.  All of these show up when rebooting.  I also have 2 drives in a ZFS mirror where the home directories are located and where the data in a MySQL database is located.  None of the ZFS datasets mount when rebooting.  After rebooting, if I do a "zpool import" all of the ZFS datasets mount.

Looking at dmesg after rebooting, it shows the following lines after the nvd0 drive shows up:

Trying to mount root from ufs:/dev/nvd0p2 [rw]...
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
pid 48 (zpool), jid 0, uid 0: exited on signal 6
pid 49 (zpool), jid 0, uid 0: exited on signal 6

Further on in dmesg, the other drives show up, the 8 sas drives and the
and the 2 zfs drives.  It appears ZFS is trying to configure itself, but can't know about its drives yet?

It has worked flawlessly in 13.0 for almost a year.   Note also that each of the sas drives and each of the sata drives for zfs has a gpart label and fstab uses those labels. However, since nvd0 is the only such "drive" in the box, and it does not have a label.

Comment 1 Marek Zarychta 2022-04-22 18:50:05 UTC

This one looks similar to bug 261808. 
Unfortunately, I am neither a member of the triage team nor a committer to change "See Also".

Comment 2 tech-lists 2022-04-22 22:30:15 UTC

Hi,

I'm seeing a similar NVMe problem. My context was different though:

Attempting to install from:
FreeBSD-14.0-CURRENT-amd64-20220421-b91a48693a5-254961-memstick.img

to brand-new hardware, the installer failed at the stage after where one
selects partition schema (so, GPT in this case) and UFS filesystem
with an error like (sorry to be paraphrasing this from memory as the hardware
is no longer available)

"autopart failed -5"

The hardware in question was Crucial;CT1000P5PSSD8 1TB.

Comment 3 Alfonso S. Siciliano freebsd_committer

2022-04-22 23:25:48 UTC

(In reply to tech-lists from comment #2)

tech-lists@zyxst.net thank you for the report,

please could you attach some screenshot to reproduce the error?

Reading: "autopart failed -5", Did you choose `Auto UFS Guided Disk Setup`?

Comment 4 Rick Summerhill 2022-04-22 23:47:53 UTC

FYI,

I looked at the referenced similar bug report, and per a recommendation therein, added the following to /boot/loader.conf:

vfs.root_mount_always_wait=1		# Wait to mount root

The zfs datasets mounted on reboot without later intervention.

Hope that helps.

Comment 5 Graham Perrin freebsd_committer

2022-04-23 09:17:48 UTC

Cross-reference: <https://forums.freebsd.org/threads/84900/>

Also (no response yet): <https://lists.freebsd.org/archives/freebsd-stable/2022-April/000719.html>

Comment 6 tech-lists 2022-04-23 11:57:38 UTC

(In reply to Alfonso S. Siciliano from comment #3)

Hi,

Unfortunately I no longer have the hardware around to test, sorry.

Comment 7 Christos Chatzaras 2022-05-07 19:29:50 UTC

After upgrading to 13.1-RC6, I notice "pid 12218 (zpool), jid 0, uid 0: exited on signal 6" message in my logs too, but ZFS datasets mount correctly.

Comment 8 Andriy Gapon freebsd_committer

2022-05-09 09:33:54 UTC

In my experience this may happen if zpool.cache has old / obsolete pool records.
Normally, zpool command would not crash in such a case, but it seems to happen when zpool.cache is on a read-only filesystem which is the case during the boot because rc.d/zpool is run before rc.d/root.

Comment 9 Christos Chatzaras 2022-05-09 12:15:48 UTC

(In reply to Andriy Gapon from comment #8)

zpool get cachefile
NAME   PROPERTY   VALUE      SOURCE
zroot  cachefile  -          default

zpool set cachefile=/boot/zfs/zpool.cache zroot

But after a reboot:

zpool get cachefile
NAME   PROPERTY   VALUE      SOURCE
zroot  cachefile  -          default

Any idea why it doesn't keep the value?

Comment 10 Christos Chatzaras 2022-05-09 12:31:55 UTC

After adding vfs.root_mount_always_wait=1 the message "pid 12218 (zpool), jid 0, uid 0: exited on signal 6" doesn't show up again after a reboot.

Comment 11 Christos Chatzaras 2022-05-09 13:01:15 UTC

Finally what solve this was:

zpool set cachefile=/boot/zfs/zpool.cache zroot

which rebuild the zpool.cache (file already exist).

---------

Looks like the "issue" was related to:

zpool set cachefile=/mnt/boot/zfs/zpool.cache zroot

which used during the initial remote installation using mfsBSD.

Comment 12 Rick Summerhill 2022-05-22 00:52:55 UTC

Could someone fill me in on the status of this bug?  I have updated to 13.1-RELEASE (uname -mrsv:  FreeBSD 13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64) and the problem persists, except when the boot loader contains:  vfs.root_mount_always_wait=1.  I do not have a zfs root drive, but I fiddled with the suggestions made by Christos (with /etc/zfs/zpool.cache) to no avail.  Is this just relegated to a future version?

Comment 13 Maxim Sobolev freebsd_committer

2022-08-15 17:20:46 UTC

+1. Upgraded my dev box from 12.2 to 13.1 and it no longer auto-imports zfs pool. The box boots off UFS as its root FS and has a single raidz1-0 pool. Adding vfs.root_mount_always_wait=1 workarounds the problem.

Comment 14 Maxim Sobolev freebsd_committer

2022-08-16 07:17:06 UTC

OK, little bit more info about this issue. It also happening on purely ZFS system as well running on m5 AWS instance also after upgrading from 12.2. Only boot pool is imported on boot, not 3 more pools that this disk has connected to it. Interestingly enough, the vfs.root_mount_always_wait=1 did not help in that case.

I've done a bit of a investigation: it turns out to be the result of the OpenZFS import. The new default in 13.1 just disables autoimport of zfs pools. From the quick grep through sources, it seems that Linux has some mechanism to do import in userland (i.e. "rc.d" magic), while FreeBSD doesn't.

^3b0ce0e28db module/zfs/spa_config.c                     (Matt Macy      2020-08-24 22:48:19 +0000  71) int zfs_autoimport_disable = 1;

As such, I suggest the default needs to be reverted to 0 (i.e. autoimport to be enabled again) until and when we get some other way to provide this feature.

Unfortunately, that code path seems 100% untested, as setting autoimport_disable to 0 just causes kernel to panic when ZFS is initialized.

Timecounters tick every 10.000 msec
calling spa_init()...
calling spa_config_load()...


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x1d0
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff806aa7d9
stack pointer           = 0x28:0xffffffff8167be70
frame pointer           = 0x28:0xffffffff8167be70
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (swapper)
trap number             = 12
panic: page fault
cpuid = 3
time = 1
KDB: stack backtrace:
#0 0xffffffff8061f685 at kdb_backtrace+0x65
#1 0xffffffff805d5f6f at vpanic+0x17f
#2 0xffffffff805d5de3 at panic+0x43
#3 0xffffffff808fac35 at trap_fatal+0x385
#4 0xffffffff808fac8f at trap_pfault+0x4f
#5 0xffffffff808d1f68 at calltrap+0x8
#6 0xffffffff80580972 at pwd_ensure_dirs+0x1f2
#7 0xffffffff8118a508 at zfs_file_open+0x28
#8 0xffffffff810e0b9d at spa_config_load+0x5d
#9 0xffffffff810e9aae at spa_init+0x11e
#10 0xffffffff81190bb2 at zfs_kmod_init+0x32
#11 0xffffffff81007db0 at zfs_modevent+0x30
#12 0xffffffff805b3d54 at module_register_init+0xa4
#13 0xffffffff8056674f at mi_startup+0xdf
#14 0xffffffff802ca022 at btext+0x22

Comparing implementation of the zfs_file_open() with the kobj_file_open() in 12.x that it replaced, the former is not designed to run before root is mounted. So I had also to disable call to the spa_config_load() from the spa_init() and just leave one in the spa_boot_init(), that seems to DTRT in all cases (zfs loaded from the loader, zfs loaded from the multi-user). Patch is attached, feedback is welcome.

Comment 15 Maxim Sobolev freebsd_committer

2022-08-16 07:18:04 UTC

Created attachment 235935 [details]
Patch to fix the issue

Comment 16 Andriy Gapon freebsd_committer

2022-08-18 11:56:26 UTC

(In reply to Maxim Sobolev from comment #14)
Apology if I am asking something too obvious, but do you have rc.d/zpool after the upgrade?

Comment 17 virushuo 2023-01-02 03:53:29 UTC

I can confirm vfs.root_mount_always_wait=1 works for me.

Moved my disk array to a new server with 13.1-RELEASE without zfs on root, then the zpool auto import was not working, but  the old server with zroot had no problem.  Check dmesg output and found zpool trying to import pools before the disk array is ready, the add vfs.root_mount_always_wait=1 can fix the issue.

Comment 18 Xin LI freebsd_committer

2023-01-03 05:14:02 UTC

(In reply to virushuo from comment #17)

Some additional details (I've talked with the reporter over Telegram):

Both old and new systems have on board RAID controller; the old system was flashed to IT mode, the new system was Dell H730, and the owner chooses to not flash it to avoid bricking it.

On the old system, / was ZFS (using two disks in a mirrored zpool); the new system was using a mirrored UFS for /.

The disk array showed up as NETAPP DS424IOM6; it was connected to the same HBA moved from the old system to the new system.

We observed that the ses(4) device for the NetApp disk array appeared pretty *late* at boot time, which was after /etc/rc.d/fsck and the disks only showed up after that.  In the current RC order, /etc/rc.d/zfs runs much earlier, so it died with:

cannot import '<pool0>': no such pool or dataset
        Destroy and re-create the pool from
        a backup source.
cannot import '<pool1>': no such pool or dataset
        Destroy and re-create the pool from
        a backup source.
cachefile import failed, retrying
nvpair_value_nvlist(nvp, &rv) == 0 (0x16 == 0)
ASSERT at /usr/src/sys/contrib/openzfs/module/nvpair/fnvpair.c:586:fnvpair_value_nvlist()
pid 48 (zpool), jid 0, uid 0: exited on signal 6
Abort trap