The base r345759 boot failes because it can not find the encrypted ZFS pool any more. The console looks like: BTX loader 1.00 BTX version 1.02 Consoles: internal video/keyboard BIOS drive C: is disk0 BIOS drive D: is disk1 BIOS drive E: is disk2 BIOS 630kb/30... available memory FreeBSD/x86 bootstrap loader, Revision 1.1 ZFS: can't find pool by guid ZFS: can't find pool by guid ZFS: can't find pool by guid Startup error in /boot/lua/loader.lua: LUA ERROR: cannot open /boot/lua/loader.lua: invalid argument. can't load 'kernel' Type '?' for a list of commands... I managed to get the machine back by - booting from FreeBSD-13.0-CURRENT-amd64-20190321-r345355-memstick.img - attaching the partitions containing the ZFS boot pool with "geli attach" - using "zpool import -f" to import the pool and - replacing the base r345759 loader (linked with loader_lua) with the base r345243 loader_lua. So the machine is currently bootstraping with base r345243 loader_lua and running the base r345759 kernel and world. Searching the commit logs between base r345243 and base r345759 leads me to suspect base r345330 "loader: fix loading of kernels with . in path", but I did not had the time to just reverse base r345330 and give the resulting loader_lua a try. Ralf
CC committer of base r345330
> ZFS: can't find pool by guid suggests to me that something's going wrong before we get to the point where the r345330 change takes effect; it would be great if you can try testing with just base r345330 reverted. I've CC'd the author of the base r345330 patch.
(In reply to Ed Maste from comment #2) Ed is right, I was wrong. Just reverting base r345330 results in the same messages when trying to boot. After buildung the new loader_lua (without base r345330) I noticed that the resulting file has excatly the same size as the one with r345330 but I gave it a try. The working base r345243 loader_lua is around 32K bigger than the failing ones. $ ls -li loader* 306844 -r-xr-xr-x 3 root wheel 430080 Apr 3 22:31 loader 65536 -r-xr-xr-x 2 root wheel 397312 Apr 1 13:32 loader.20190401 492114 -r-xr-xr-x 2 root wheel 397312 Apr 3 22:30 loader.20190403 306762 -r--r--r-- 1 root wheel 7418 Apr 1 13:32 loader.4th 4987 -rw-r--r-- 1 root wheel 888 Oct 15 2017 loader.conf 306855 -r-xr-xr-x 2 root wheel 475648 Apr 1 13:32 loader.efi 611509 -r--r--r-- 1 root wheel 15084 Aug 12 2018 loader.help 49530 -r-xr-xr-x 1 root wheel 434176 Aug 11 2018 loader.old 306779 -r--r--r-- 1 root wheel 461 Apr 1 13:32 loader.rc 306843 -r-xr-xr-x 1 root wheel 348160 Apr 1 13:32 loader_4th 306854 -r-xr-xr-x 1 root wheel 410112 Apr 1 13:32 loader_4th.efi 352728 -r-xr-xr-x 1 root wheel 380928 Mar 17 16:28 loader_4th.old 306844 -r-xr-xr-x 3 root wheel 430080 Apr 3 22:31 loader_lua 65580 -r-xr-xr-x 1 root wheel 430080 Mar 17 16:28 loader_lua.20190317 65536 -r-xr-xr-x 2 root wheel 397312 Apr 1 13:32 loader_lua.20190401 492114 -r-xr-xr-x 2 root wheel 397312 Apr 3 22:30 loader_lua.20190403 306855 -r-xr-xr-x 2 root wheel 475648 Apr 1 13:32 loader_lua.efi 352729 -r-xr-xr-x 1 root wheel 430080 Mar 17 16:28 loader_lua.old 306845 -r-xr-xr-x 1 root wheel 299008 Apr 1 13:32 loader_simp 306856 -r-xr-xr-x 1 root wheel 350720 Apr 1 13:32 loader_simp.efi 352730 -r-xr-xr-x 1 root wheel 319488 Mar 17 16:28 loader_simp.old $ The .20190317 file is from r345243, .20190401 is at r345759 and .20190403 is r345759 build without r345330.
(In reply to iz-rpi03 from comment #3) The size of loader_lua does not seem to be related to the ability to find the ZFS pool. All newly build loader_lua files have the smaller size and some of them are working as expected. I am currently trying to find the release when the failure happend by building release versions of loader_lua. For this I bisect the (head) releases in between and trying out the resulting loader_lua. From the running r345759 world I try to achive this with the following commands, e.g.: rel=345475 svnlite up -r ${rel} . make -j8 buildworld cp /usr/obj/usr/src/amd64.amd64/stand/i386/loader_lua/loader_lua \ /boot/loader_lua.r${rel} cp /boot/loader_lua.r${rel} /boot/loader_lua I hope this is not a silly idea.
(In reply to iz-rpi03 from comment #4) Finally I found the change which causes the issue. Verified by switching between with the working and non working relase several times. loader_lua from base r345476 is working as expected. loader_lua from base r345477 does not find the pool. The SHA256 changes from loader_lua.r345476 = deb308b2f112101fd6f6fd882d05e70540b0413d9cf3049762e1c12f8ea6e106 to loader_lua.r345477 = fbd759e2e00e39d4176717c88fd9da830783477662fe0975ea417fd0943ce695 and remains constant till base r345763 at least.
Review for identified commit: https://reviews.freebsd.org/D19262
A commit references this bug: Author: ian Date: Thu Apr 25 15:09:21 UTC 2019 New revision: 346675 URL: https://svnweb.freebsd.org/changeset/base/346675 Log: Restore the ability to open a raw disk or partition in loader(8). The disk_open() function searches for "the best partition" when slice and partition information is not provided as part of the device name. As of r345477 the slice and partition fields of a disk_devdesc are initialized to D_SLICEWILD and D_PARTWILD; in the past they were initialized to -1, which was sometimes interpreted as meaning 'wildcard' and sometimes as 'open the raw partition' depending on the context. So as an unintended side effect of r345477 it became basically impossible to ever open a disk or partition without doing the 'best partition' search. One visible effect of that was the inability to open the raw disk to read the partition table correctly in zfs_probe_dev(), leading to failures to find the zfs pool unless it was on the first partition. Now instead of always initializing slice and partition to wildcards, the disk_parsedev() function initializes them based on the presence of a path/file name following the device. If there is any path or filename following the ':' that ends the device name, then slice and partition are initialized to D_SLICEWILD and D_PARTWILD. If there is nothing after the ':' then it is considered to be a request to open the raw device or partition itself (not a file stored within it), and the fields are initialized to D_SLICENONE and D_PARTNONE. With this change in place, all the tests in src/tools/boot are succesful again, including the recently-added cases of booting from a zfs pool on a partition other than slice 1 of the device. PR: 236981 Changes: head/stand/common/disk.c
I can confirm that with r346675, actual r346736, loader_lua is booting the system from a encrypted ZFS pool again. So I close this bug report. Thank you.
A commit references this bug: Author: kevans Date: Thu Oct 24 03:52:35 UTC 2019 New revision: 354006 URL: https://svnweb.freebsd.org/changeset/base/354006 Log: MFC r345477, r346675, r346984, r348748 r345477: Distinguish between "no partition" and "choose best partition" with a constant. The values of the d_slice and d_partition fields of a disk_devdesc have a few values with special meanings in the disk_open() routine. Through various evolutions of the loader code over time, a d_partition value of -1 has meant both "use the first ufs partition found in the bsd label" and "don't open a bsd partition at all, open the raw slice." This defines a new special value of -2 to mean open the raw slice, and it gives symbolic names to all the special values used in d_slice and d_partition, and adjusts all existing uses of those fields to use the new constants. The phab review for this timed out without being accepted, but I'm still citing it below because there is useful commentary there. r346675: Restore the ability to open a raw disk or partition in loader(8). The disk_open() function searches for "the best partition" when slice and partition information is not provided as part of the device name. As of r345477 the slice and partition fields of a disk_devdesc are initialized to D_SLICEWILD and D_PARTWILD; in the past they were initialized to -1, which was sometimes interpreted as meaning 'wildcard' and sometimes as 'open the raw partition' depending on the context. So as an unintended side effect of r345477 it became basically impossible to ever open a disk or partition without doing the 'best partition' search. One visible effect of that was the inability to open the raw disk to read the partition table correctly in zfs_probe_dev(), leading to failures to find the zfs pool unless it was on the first partition. Now instead of always initializing slice and partition to wildcards, the disk_parsedev() function initializes them based on the presence of a path/file name following the device. If there is any path or filename following the ':' that ends the device name, then slice and partition are initialized to D_SLICEWILD and D_PARTWILD. If there is nothing after the ':' then it is considered to be a request to open the raw device or partition itself (not a file stored within it), and the fields are initialized to D_SLICENONE and D_PARTNONE. With this change in place, all the tests in src/tools/boot are succesful again, including the recently-added cases of booting from a zfs pool on a partition other than slice 1 of the device. r346984: Use D_PARTISGPT rather than bare 255 These three cases dovetail with other places in the code where we use or set D_PARTISGPT when we mean that the partitioning scheme is GPT. Use this #define to make the code easier to undertand. r348748: loader: disk_open() should honor D_PARTNONE The D_PARTNONE is documented to make it possible to open raw MBR partition, but the current disk_open() does not really implement this statement. The current code is checking partition against -1 (D_PARTNONE) but does attempt to open partition table in case we do have FreeBSD MBR partition type. Instead, we should check -2 (D_PARTWILD). In case we do have MBR + BSD label, this code is only working because by default, the first BSD partiton is created starting with relative sector 0, and we can still access the BSD table from that MBR slice. PR: 236981 Changes: _U stable/12/ stable/12/stand/common/disk.c stable/12/stand/common/disk.h stable/12/stand/efi/libefi/efipart.c stable/12/stand/efi/loader/main.c stable/12/stand/i386/libi386/biosdisk.c stable/12/stand/libsa/zfs/zfs.c stable/12/stand/mips/beri/loader/beri_disk_cfi.c stable/12/stand/mips/beri/loader/beri_disk_sdcard.c stable/12/stand/uboot/common/main.c stable/12/stand/uboot/lib/disk.c stable/12/stand/usb/storage/umass_loader.c stable/12/stand/userboot/userboot/main.c stable/12/stand/userboot/userboot/userboot_disk.c