I tried to upgrade from freebsd 11.1 to 11.2. After booting with the 11.2 kernel the boot failed and automatically rebooted again. I cant see the error message because it reboots too quickly. However, it seems to happen when the kernel tries to mount the root file system. My root file system is a zfs mirror. Nothing is written to my log files so for sure the root file system is never mounted.
I noticed the same error yesterday. I am currently trying to get a full kernel log over a serial console. Will update shortly.
Created attachment 195385 [details] Verbose boot log (includes "?" output from the mountroot prompt) Uploaded a boot -v log. The important part is at the end: --- BEGIN dragon --- Trying to mount root from zfs:hydrogen []... GEOM: new disk ada1 GEOM: new disk ada2 GEOM: new disk ada3 GEOM: new disk ada4 GEOM: new disk ada5 random: unblocking device. Mounting from zfs:hydrogen failed with error 2; retrying for 3 more seconds Mounting from zfs:hydrogen failed with error 2; retrying for 2 more seconds Mounting from zfs:hydrogen failed with error 2; retrying for 1 more second Mounting from zfs:hydrogen failed with error 2. Loader variables: vfs.root.mountfrom=zfs:hydrogen Manual root filesystem specification: <fstype>:<device> [options] Mount <device> using filesystem <fstype> and with the specified (optional) option list. eg. ufs:/dev/da0s1a zfs:tank cd9660:/dev/cd0 ro (which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /) ? List valid disk boot devices . Yield 1 second (for background tasks) <empty line> Abort manual input mountroot> ? List of GEOM managed disk devices: diskid/DISK-Z500CAZ4p2 diskid/DISK-Z500CAZ4p1 gptid/9c8516d4-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-6-root gptid/9c689369-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-6-boot diskid/DISK-Z500CAKLp2 diskid/DISK-Z500CAKLp1 gptid/9b9a22dc-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-5-root gptid/9b7d9f31-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-5-boot diskid/DISK-Z500C9H1p2 diskid/DISK-Z500C9H1p1 gptid/98c41a39-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-2-root gptid/98a79a2f-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-2-boot diskid/DISK-Z500CB0Ap2 diskid/DISK-Z500CB0Ap1 gptid/97bc7621-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-1-root gptid/97763b2e-59b9-11e4-88bf-74d02b1366fc gpt/hydrogen-1-boot diskid/DISK-Z500CAZ4 ada5p2 ada5p1 diskid/DISK-Z500CAKL ada4p2 ada4p1 diskid/DISK-Z500C9H1 ada1p2 ada1p1 diskid/DISK-Z500CB0A ada0p2 ada0p1 diskid/DISK-Z304Z8ZGp2 diskid/DISK-Z304Z8ZGp1 gptid/962dbc0a-08a9-11e6-ae59-74d02b1366fc gpt/hydrogen-4-root gptid/700a5eeb-08a9-11e6-ae59-74d02b1366fc gpt/hydrogen-4-boot diskid/DISK-Z30508VNp2 diskid/DISK-Z30508VNp1 gptid/691970f2-0853-11e6-ae59-74d02b1366fc gpt/hydrogen-3-root gptid/3c36c0de-0853-11e6-ae59-74d02b1366fc gpt/hydrogen-3-boot diskid/DISK-Z304Z8ZG ada3p2 ada3p1 diskid/DISK-Z30508VN ada2p2 ada2p1 ada5 ada4 ada3 ada2 ada1 ada0 mountroot> --- END dragon ---
A few notes: * This kernel is a vanilla VIMAGE kernel (GENERIC + options VIMAGE). * Loaded modules are shown in the boot log. * The root pool (hydrogen) is a raidz pool with 6 GPT partitions (ada[0-5]p2). * The root pool accesses the partitions using the GPT labels (/dev/gpt/hydrogen-[1-6]-root). * The old kernel (vanilla VIMAGE kernel from 11.1-RELEASE-p10) can still boot from the pool.
Yesterday I had some time to investigate this further and believe I have found the problem (at least for me). I created a bhyve vm and installed a simple vanilla FreeBSD 11.1 instance with a single root ZFS pool (nothing special, single partition, no raid or mirror). I then used freebsd-update to bring it up to the latests 11.1 patch level, this booted fine. After that I used freebsd-update to go to 11.2. No problems. My main desktop (the one that failed the upgrade) has two ZFS pools, a mirror for the base OS and a raidz2 pool (on geli partitions) for my data. I copied the two disks I use for my image partition onto two old spare disks. The zfs partitions I copied using the zfs send/receive functionality. The boot partitions I created from scratch and used the boot code (and partcode) from my 11.2 vm install. This is when I noticed that the gptzfsboot code from 11.2 is different from the 11.1 gptzfsboot code. After a few changes to the vm copies (rc.conf had to be modified for the different network, loader.conf vfs.root.mountfrom had to be changed ...). I booted the copy in my vm. I followed the freebsd-update process, but note that my install has a custom kernel, so after the final "freebsd-update install" used the old 11.1 kernel. I then built my kernel from source and rebooted the vm. All went well, no issues. So there are 2 things I did different for the true upgrade and the vm upgrade. 1. I used the latest gptzfsboot code in the vm upgrade 2. I built the custom kernel after the 11.2 base upgrade in the vm. For the non vm I build the new kernel before the base upgrade and then installed it after the base upgrade One of those two steps fixed the problem. I assume it was using the latest gptzfsboot code that fixed the issue (I always build the new kernel with the old code base (new src) and I have never had problems in the past). So as far as I am concerned this issue is fixed, although it would be nicer if FreeBSD were a bit more forgiving when you get it wrong. Also I did not see any note about the gptzfsboot code changing in the UPDATING file.
I have a way to fix the problem which has worked for 4 systems after upgrading to 11.2. I believe it's a race condition on boot with the zfs partitions. Basically the system will try to mount the partitions in no specific order. This causes a problem for the ROOT partition needs to be mounted first then the rest can be mounted. I fixed this by booting to a USB drive and mounting the zfs zroot/default/ROOT partition first then zfs mount -a. After root boot the system came back without issues. Steps: 1. boot off USB 11.2 disk 2. zpool import -R /mnt <zroot> 3. zfs mount <zroot>/default/ROOT 4. zfs mount -a 5. reboot back into the upgraded OS.
Is this the bug listed on https://www.freebsd.org/releases/11.2R/errata.html ? """ [2017-07-25] A late issue was discovered with FreeBSD/arm64 and "root on ZFS" installations where the root ZFS pool would fail to be located. There currently is no workaround. """ Any hints on how to debug this issue?