Bug 238258

Summary: loader cant't find pool by guid
Product: Base System Reporter: cs
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Open ---    
Severity: Affects Only Me CC: bhughes, emaste, kevans, koobs, sigsys, tsoome
Priority: --- Keywords: needs-qa, patch
Version: 11.2-STABLE   
Hardware: Any   
OS: Any   

Description cs 2019-05-31 08:54:49 UTC
New loader (unified with zfsloader) doesn
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2019-05-31 09:04:01 UTC
c@cs Can you provide more system information, including exact FreeBSD versions, and version used prior to any upgrades (if applicable)
Comment 2 cs 2019-05-31 09:14:54 UTC
Sorry once more. Old zfsloader, of course.
Comment 3 cs 2019-05-31 09:41:31 UTC
Sorry for the incomplete description.

New loader (unified with zfsloader) can't find zfs pool on an unpartitioned disk.

How to repeat:
1. Connect fresh zeroed-out disk to a freebsd machine.
2. Create zfs pool right over the whole disk: zpool create testpool da0.
3. zpool set bootfs=testpool testpool
4. mkdir -p /testpool/usr/src
5. cd /testpool/usr/src
6. svnlite co svn://svn.freebsd.org/base/stable/11 .
7. export MAKEOBJDIRPREFIX=/testpool/usr/obj
8. make buildworld
9. make buildkernel
10. make DESTDIR=/testpool installworld
11. make DESTDIR=/testpool distribution
12. make DESTDIR=/testpool installkernel
13. echo 'zfs_load="YES"' >> /testpool/boot/loader.conf
14. echo 'devfs /dev devfs rw,multilabel 0 0' >> /testpool/etc/fstab
15. cp -p /testpool/boot/zfsboot /tmp
16. zpool export testpool
17. dd if=/tmp/zfsboot of=/dev/da0 bs=512 count=1
18. dd if=/tmp/zfsboot of=/dev/da0 bs=512 skip=1 seek=1k
19. Reboot from the newly created pool and observe on the console:

FreeBSD/x86 bootstrap loader, Revision 1.1
(Thu May 30 18:55:21 MSK 2019 root@white)
ZFS: can't find pool by guid
ZFS: can't find pool by guid
ZFS: can't find pool by guid
.....

Workaround: copy old (e.g. from 10.2-RELEASE) zfsloader over /boot/loader (or wherever boot block expect to find it). Then it will boot correctly.
Comment 4 Kubilay Kocak freebsd_committer freebsd_triage 2019-05-31 09:45:24 UTC
What version (uname -a) of FreeBSD?
Is this a regresion after an upgrade?
If so, what exact version was the upgrade from?
Comment 5 cs 2019-06-25 11:54:43 UTC
Working uname: 
FreeBSD test 11.2-RELEASE FreeBSD 11.2-RELEASE #0 r335510: Fri Jun 22 04:32:14 UTC 2018     root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

Broken uname:
FreeBSD test 11.3-PRERELEASE FreeBSD 11.3-PRERELEASE #0 r349132: Mon Jun 17 14:54:06 UTC 2019     root@test:/usr/obj/usr/src/sys/GENERIC  amd64

The pool in both cases is almost exactly the same, including bootblocks. The only difference is bootfs property.

Mitigation is to replace new unified /boot/loader binary by old /boot/zfsloader.
Comment 6 Bradley T. Hughes freebsd_committer freebsd_triage 2019-07-02 18:57:28 UTC
I have also run into this problem. I have a system running 11.2-RELEASE. I used freebsd-update to upgrade to 11.3-RC3, wrote new boot blocks (/boot/zfsboot) to all disks in the pool. After rebooting, I get the same "can't find pool by guid" message (also 3 times) from the loader. I haven't tried the workaround mentioned; I ended up rolling back to 11.2-RELEASE.
Comment 7 Kyle Evans freebsd_committer freebsd_triage 2019-07-02 19:00:28 UTC
(In reply to Bradley T. Hughes from comment #6)

Hi,

Can you try loader from a head snapshot?
Comment 8 cs 2019-07-04 05:58:17 UTC
I've tried FreeBSD test 13.0-CURRENT FreeBSD 13.0-CURRENT r349638 GENERIC  amd64:

FreeBSD/x86 bootstrap loader, Revision 1.1
ZFS: can't find pool by guid
ZFS: can't find pool by guid
ZFS: can't find pool by guid
Startup error in /boot/lua/loader.lua
LUA ERROR: cannot open /boot/lua/loader.lua: invalid argument.

can't load  'kernel'
Type '?' etc.

And, of course, old good zfsloader still works.
Comment 9 Kyle Evans freebsd_committer freebsd_triage 2019-07-11 17:29:30 UTC
CC'ing tsoome@; partitionless disk setups were somewhat intentionally broken in r342151 [0] because they pose a number of problems (outlined in the commit message linked). loader/libsa built with the patch at [1] should workaround this, but more consideration needs to be made for loader handling these setups.

[0] https://svnweb.freebsd.org/base?view=revision&revision=342151
[1]

Index: stand/libsa/zfs/zfs.c
===================================================================
--- stand/libsa/zfs/zfs.c       (revision 349913)
+++ stand/libsa/zfs/zfs.c       (working copy)
@@ -580,11 +580,10 @@
        pa.fd = open(devname, O_RDONLY);
        if (pa.fd == -1)
                return (ENXIO);
-       /*
-        * We will not probe the whole disk, we can not boot from such
-        * disks and some systems will misreport the disk sizes and will
-        * hang while accessing the disk.
-        */
+       /* Probe the whole disk */
+       ret = zfs_probe(pa.fd, pool_guid);
+       if (ret == 0)
+               return (0);
        if (archsw.arch_getdev((void **)&dev, devname, NULL) == 0) {
                int partition = dev->d_partition;
                int slice = dev->d_slice;
Comment 10 cs 2019-07-13 17:16:11 UTC
Thanks for the patch and clarification. Perhaps the times are coming to say goodbye to the portability of system disks between different machines. Thereto UEFI encourages users to keep separate dedicated /boot disk for each motherboard.

By the way, UEFI itself readily hangs upon encountering specially crafted partition scheme on any of the attached disks, providing user with no way other than physically disconnecting offending disk(s).

P.S. Is it worth to add a knob like WITH_PARTITIONLESS_LOADER to the build system?