Summary: | 11.2-RELEASE kernel wont boot with zfs mirror root | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Patrick Mackinlay <freebsd.68fba> | ||||
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||
Status: | Closed Overcome By Events | ||||||
Severity: | Affects Some People | CC: | astralblue, bro.development, dimka, harrison | ||||
Priority: | --- | Keywords: | regression | ||||
Version: | 11.2-RELEASE | ||||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Patrick Mackinlay
2018-07-22 21:30:22 UTC
I noticed the same error yesterday. I am currently trying to get a full kernel log over a serial console. Will update shortly. Created attachment 195385 [details]
Verbose boot log (includes "?" output from the mountroot prompt)
Uploaded a boot -v log. The important part is at the end:
--- BEGIN dragon ---
Trying to mount root from zfs:hydrogen []...
GEOM: new disk ada1
GEOM: new disk ada2
GEOM: new disk ada3
GEOM: new disk ada4
GEOM: new disk ada5
random: unblocking device.
Mounting from zfs:hydrogen failed with error 2; retrying for 3 more seconds
Mounting from zfs:hydrogen failed with error 2; retrying for 2 more seconds
Mounting from zfs:hydrogen failed with error 2; retrying for 1 more second
Mounting from zfs:hydrogen failed with error 2.
Loader variables:
vfs.root.mountfrom=zfs:hydrogen
Manual root filesystem specification:
<fstype>:<device> [options]
Mount <device> using filesystem <fstype>
and with the specified (optional) option list.
eg. ufs:/dev/da0s1a
zfs:tank
cd9660:/dev/cd0 ro
(which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)
? List valid disk boot devices
. Yield 1 second (for background tasks)
<empty line> Abort manual input
mountroot> ?
List of GEOM managed disk devices:
diskid/DISK-Z500CAZ4p2 diskid/DISK-Z500CAZ4p1 gptid/9c8516d4-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-6-root gptid/9c689369-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-6-boot diskid/DISK-Z500CAKLp2 diskid/DISK-Z500CAKLp1 gptid/9b9a22dc-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-5-root gptid/9b7d9f31-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-5-boot diskid/DISK-Z500C9H1p2 diskid/DISK-Z500C9H1p1 gptid/98c41a39-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-2-root gptid/98a79a2f-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-2-boot diskid/DISK-Z500CB0Ap2 diskid/DISK-Z500CB0Ap1 gptid/97bc7621-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-1-root gptid/97763b2e-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-1-boot diskid/DISK-Z500CAZ4 ada5p2 ada5p1 diskid/DISK-Z500CAKL ada4p2 ada4p1 diskid/DISK-Z500C9H1 ada1p2 ada1p1 diskid/DISK-Z500CB0A ada0p2 ada0p1 diskid/DISK-Z304Z8ZGp2 diskid/DISK-Z304Z8ZGp1 gptid/962dbc0a-08a9-11e6-ae59-74d02b1366fc gpt/hydrogen-4-root gptid/700a5eeb-08a9-11e6-ae59-74d02b1366fc gpt/hydrogen-4-boot diskid/DISK-Z30508VNp2 diskid/DISK-Z30508VNp1 gptid/691970f2-0853-11e6-ae59-74d02b1366fc gpt/hydrogen-3-root gptid/3c36c0de-0853-11e6-ae59-74d02b1366fc gpt/hydrogen-3-boot diskid/DISK-Z304Z8ZG ada3p2 ada3p1 diskid/DISK-Z30508VN ada2p2 ada2p1 ada5 ada4 ada3 ada2 ada1 ada0
mountroot>
--- END dragon ---
A few notes: * This kernel is a vanilla VIMAGE kernel (GENERIC + options VIMAGE). * Loaded modules are shown in the boot log. * The root pool (hydrogen) is a raidz pool with 6 GPT partitions (ada[0-5]p2). * The root pool accesses the partitions using the GPT labels (/dev/gpt/hydrogen-[1-6]-root). * The old kernel (vanilla VIMAGE kernel from 11.1-RELEASE-p10) can still boot from the pool. Yesterday I had some time to investigate this further and believe I have found the problem (at least for me). I created a bhyve vm and installed a simple vanilla FreeBSD 11.1 instance with a single root ZFS pool (nothing special, single partition, no raid or mirror). I then used freebsd-update to bring it up to the latests 11.1 patch level, this booted fine. After that I used freebsd-update to go to 11.2. No problems. My main desktop (the one that failed the upgrade) has two ZFS pools, a mirror for the base OS and a raidz2 pool (on geli partitions) for my data. I copied the two disks I use for my image partition onto two old spare disks. The zfs partitions I copied using the zfs send/receive functionality. The boot partitions I created from scratch and used the boot code (and partcode) from my 11.2 vm install. This is when I noticed that the gptzfsboot code from 11.2 is different from the 11.1 gptzfsboot code. After a few changes to the vm copies (rc.conf had to be modified for the different network, loader.conf vfs.root.mountfrom had to be changed ...). I booted the copy in my vm. I followed the freebsd-update process, but note that my install has a custom kernel, so after the final "freebsd-update install" used the old 11.1 kernel. I then built my kernel from source and rebooted the vm. All went well, no issues. So there are 2 things I did different for the true upgrade and the vm upgrade. 1. I used the latest gptzfsboot code in the vm upgrade 2. I built the custom kernel after the 11.2 base upgrade in the vm. For the non vm I build the new kernel before the base upgrade and then installed it after the base upgrade One of those two steps fixed the problem. I assume it was using the latest gptzfsboot code that fixed the issue (I always build the new kernel with the old code base (new src) and I have never had problems in the past). So as far as I am concerned this issue is fixed, although it would be nicer if FreeBSD were a bit more forgiving when you get it wrong. Also I did not see any note about the gptzfsboot code changing in the UPDATING file. I have a way to fix the problem which has worked for 4 systems after upgrading to 11.2. I believe it's a race condition on boot with the zfs partitions. Basically the system will try to mount the partitions in no specific order. This causes a problem for the ROOT partition needs to be mounted first then the rest can be mounted. I fixed this by booting to a USB drive and mounting the zfs zroot/default/ROOT partition first then zfs mount -a. After root boot the system came back without issues. Steps: 1. boot off USB 11.2 disk 2. zpool import -R /mnt <zroot> 3. zfs mount <zroot>/default/ROOT 4. zfs mount -a 5. reboot back into the upgraded OS. I have a way to fix the problem which has worked for 4 systems after upgrading to 11.2. I believe it's a race condition on boot with the zfs partitions. Basically the system will try to mount the partitions in no specific order. This causes a problem for the ROOT partition needs to be mounted first then the rest can be mounted. I fixed this by booting to a USB drive and mounting the zfs zroot/default/ROOT partition first then zfs mount -a. After root boot the system came back without issues. Steps: 1. boot off USB 11.2 disk 2. zpool import -R /mnt <zroot> 3. zfs mount <zroot>/default/ROOT 4. zfs mount -a 5. reboot back into the upgraded OS. Is this the bug listed on https://www.freebsd.org/releases/11.2R/errata.html ? """ [2017-07-25] A late issue was discovered with FreeBSD/arm64 and "root on ZFS" installations where the root ZFS pool would fail to be located. There currently is no workaround. """ Any hints on how to debug this issue? |