I tried to upgrade from freebsd 11.1 to 11.2. After booting with the 11.2 kernel the boot failed and automatically rebooted again. I cant see the error message because it reboots too quickly. However, it seems to happen when the kernel tries to mount the root file system. My root file system is a zfs mirror. Nothing is written to my log files so for sure the root file system is never mounted.
I noticed the same error yesterday. I am currently trying to get a full kernel log over a serial console. Will update shortly.
Created attachment 195385 [details]
Verbose boot log (includes "?" output from the mountroot prompt)
Uploaded a boot -v log. The important part is at the end:
--- BEGIN dragon ---
Trying to mount root from zfs:hydrogen ...
GEOM: new disk ada1
GEOM: new disk ada2
GEOM: new disk ada3
GEOM: new disk ada4
GEOM: new disk ada5
random: unblocking device.
Mounting from zfs:hydrogen failed with error 2; retrying for 3 more seconds
Mounting from zfs:hydrogen failed with error 2; retrying for 2 more seconds
Mounting from zfs:hydrogen failed with error 2; retrying for 1 more second
Mounting from zfs:hydrogen failed with error 2.
Manual root filesystem specification:
Mount <device> using filesystem <fstype>
and with the specified (optional) option list.
(which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)
? List valid disk boot devices
. Yield 1 second (for background tasks)
<empty line> Abort manual input
List of GEOM managed disk devices:
diskid/DISK-Z500CAZ4p2 diskid/DISK-Z500CAZ4p1 gptid/9c8516d4-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-6-root gptid/9c689369-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-6-boot diskid/DISK-Z500CAKLp2 diskid/DISK-Z500CAKLp1 gptid/9b9a22dc-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-5-root gptid/9b7d9f31-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-5-boot diskid/DISK-Z500C9H1p2 diskid/DISK-Z500C9H1p1 gptid/98c41a39-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-2-root gptid/98a79a2f-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-2-boot diskid/DISK-Z500CB0Ap2 diskid/DISK-Z500CB0Ap1 gptid/97bc7621-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-1-root gptid/97763b2e-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-1-boot diskid/DISK-Z500CAZ4 ada5p2 ada5p1 diskid/DISK-Z500CAKL ada4p2 ada4p1 diskid/DISK-Z500C9H1 ada1p2 ada1p1 diskid/DISK-Z500CB0A ada0p2 ada0p1 diskid/DISK-Z304Z8ZGp2 diskid/DISK-Z304Z8ZGp1 gptid/962dbc0a-08a9-11e6-ae59-74d02b1366fc gpt/hydrogen-4-root gptid/700a5eeb-08a9-11e6-ae59-74d02b1366fc gpt/hydrogen-4-boot diskid/DISK-Z30508VNp2 diskid/DISK-Z30508VNp1 gptid/691970f2-0853-11e6-ae59-74d02b1366fc gpt/hydrogen-3-root gptid/3c36c0de-0853-11e6-ae59-74d02b1366fc gpt/hydrogen-3-boot diskid/DISK-Z304Z8ZG ada3p2 ada3p1 diskid/DISK-Z30508VN ada2p2 ada2p1 ada5 ada4 ada3 ada2 ada1 ada0
--- END dragon ---
A few notes:
* This kernel is a vanilla VIMAGE kernel (GENERIC + options VIMAGE).
* Loaded modules are shown in the boot log.
* The root pool (hydrogen) is a raidz pool with 6 GPT partitions (ada[0-5]p2).
* The root pool accesses the partitions using the GPT labels (/dev/gpt/hydrogen-[1-6]-root).
* The old kernel (vanilla VIMAGE kernel from 11.1-RELEASE-p10) can still boot from the pool.
Yesterday I had some time to investigate this further and believe I have found the problem (at least for me).
I created a bhyve vm and installed a simple vanilla FreeBSD 11.1 instance with a single root ZFS pool (nothing special, single partition, no raid or mirror). I then used freebsd-update to bring it up to the latests 11.1 patch level, this booted fine. After that I used freebsd-update to go to 11.2. No problems.
My main desktop (the one that failed the upgrade) has two ZFS pools, a mirror for the base OS and a raidz2 pool (on geli partitions) for my data. I copied the two disks I use for my image partition onto two old spare disks. The zfs partitions I copied using the zfs send/receive functionality. The boot partitions I created from scratch and used the boot code (and partcode) from my 11.2 vm install. This is when I noticed that the gptzfsboot code from 11.2 is different from the 11.1 gptzfsboot code. After a few changes to the vm copies (rc.conf had to be modified for the different network, loader.conf vfs.root.mountfrom had to be changed ...). I booted the copy in my vm. I followed the freebsd-update process, but note that my install has a custom kernel, so after the final "freebsd-update install" used the old 11.1 kernel. I then built my kernel from source and rebooted the vm. All went well, no issues.
So there are 2 things I did different for the true upgrade and the vm upgrade.
1. I used the latest gptzfsboot code in the vm upgrade
2. I built the custom kernel after the 11.2 base upgrade in the vm. For the non vm I build the new kernel before the base upgrade and then installed it after the base upgrade
One of those two steps fixed the problem. I assume it was using the latest gptzfsboot code that fixed the issue (I always build the new kernel with the old code base (new src) and I have never had problems in the past).
So as far as I am concerned this issue is fixed, although it would be nicer if FreeBSD were a bit more forgiving when you get it wrong. Also I did not see any note about the gptzfsboot code changing in the UPDATING file.
I have a way to fix the problem which has worked for 4 systems after upgrading to 11.2.
I believe it's a race condition on boot with the zfs partitions. Basically the system will try to mount the partitions in no specific order. This causes a problem for the ROOT partition needs to be mounted first then the rest can be mounted. I fixed this by booting to a USB drive and mounting the zfs zroot/default/ROOT partition first then zfs mount -a. After root boot the system came back without issues.
1. boot off USB 11.2 disk
2. zpool import -R /mnt <zroot>
3. zfs mount <zroot>/default/ROOT
4. zfs mount -a
5. reboot back into the upgraded OS.
Is this the bug listed on https://www.freebsd.org/releases/11.2R/errata.html ?
[2017-07-25] A late issue was discovered with FreeBSD/arm64 and "root on ZFS" installations where the root ZFS pool would fail to be located.
There currently is no workaround.
Any hints on how to debug this issue?