Bug 234741

Summary: Loader fails to load from ZFS with strip sets using raw disks
Product: Base System Reporter: David Chisnall <theraven>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Only Me CC: dexuan, tsoome
Priority: --- Keywords: loader
Version: 12.0-RELEASE   
Hardware: Any   
OS: Any   

Description David Chisnall freebsd_committer 2019-01-08 11:27:43 UTC
After an upgrade of a Hyper-V VM to 12.0, I can no longer boot.  The root filesystem is on ZFS, with two virtual disks in a JBOD configuration (my preferred configuration for VMs, because it gives a nice easy way of adding storage: just add more virtual disks as needed and expand the zpool).

The loader fails with the error "all block copies unavailable.".  I could still load the old kernel from the same zpool and boot, and I can boot from the CD image with currdev and vfs.root.mountfrom set to the ZFS filesystem (so I can, at least, use the VM - for now, though rebooting involves typing a lot into the loader prompt).

There's also a suspicious loader error message that loader.lua contains an invalid character '*' on line 1: loader.lua doesn't contain the character '*' anywhere.

When the machine was originally installed, it had a single virtual disk in the zpool, so presumably all of /boot (on zpool/ROOT/default) was on a single disk.  The second virtual disk is larger than the initial one, so it's likely that blocks in /boot are split over both disks.  My guess is that loader is failing to find blocks that are on the second disk.
Comment 1 David Chisnall freebsd_committer 2019-01-08 11:52:27 UTC
(In reply to David Chisnall from comment #0)

My guess as to the cause is somewhat supported by the fact that setting 'copies=2' on the ZFS filesystem and reinstalling the kernel seems to mostly (though I need to manually load opensolaris.ko and zfs.ko, because I haven't installed two copies of whatever tells loader to load them: without this, I get to the mountroot prompt and can't mount my root fs).
Comment 2 Andriy Gapon freebsd_committer 2019-01-08 15:54:11 UTC
Please check how lsdev -v reports the disks and the pool.
Comment 3 David Chisnall freebsd_committer 2019-01-09 11:06:56 UTC
(In reply to Andriy Gapon from comment #2)

OK lsdev -v
disk devices:
    disk0:   BIOS drive A (2880 X 512):
read 1 from 0 to 0xf691eac0, error: 0x80
    disk1:   BIOS drive C(266338304 X 512):
      disk1p1: EFI                200MB
      disk1p2: FreeBSD boot       512KB
      disk1p3: FreeBSD swap       2048MB
      disk1p4: FreeBSD ZFS        124GB
    disk2:   BIOS drive D (16514064 X 512):
zfs devices:
  pool: zroot
bootfs: zroot/ROOT/default

        NAME STATE
        zroot ONLINE
          da0p4 ONLINE
          da1 ONLINE

The ZFS pool is disk1p4 / da0p4 (single partition) and disk2 / da1 (full disk).  It was originally created as just the partition on the first disk, the second disk was added later.  The loader doesn't seem to have any problems with files that were written to the pool before it was expanded, but is unhappy with newer ones (though fine with things that were written after I set copies=w on zroot/ROOT/default).
Comment 4 Andriy Gapon freebsd_committer 2019-01-09 16:21:26 UTC
(In reply to David Chisnall from comment #3)
I recall a recent change to ignore unpartitioned disks for the purpose of looking for ZFS.
Comment 5 Toomas Soome freebsd_committer 2019-01-09 16:24:35 UTC
(In reply to Andriy Gapon from comment #4)
Yes. It would be nice if it is possible verify with 12-stable, to see if it the issue is already fixed.
Comment 6 Andriy Gapon freebsd_committer 2019-01-09 16:27:08 UTC
Is disk2 / da1 just 8GB in size?
Comment 7 David Chisnall freebsd_committer 2019-01-10 10:57:21 UTC
(In reply to Andriy Gapon from comment #6)

No, and the size reported in loader appears to be wrong.  In dmesg, it shows as:

da1: 524288MB (1073741824 512 byte sectors

I wonder if this is part of the issue?

I can test with a different branch if you tell me what I need to build / install.  I've reinstalled the 12.0-RELEASE kernel from source, but not the bootloader (and I don't actually know how to install the bootloader outside of the normal installer).
Comment 8 Toomas Soome freebsd_committer 2019-01-10 12:08:11 UTC
(In reply to David Chisnall from comment #7)

Ok, so your second disk is not partitioned and the size is misreported, that is bad.

The update in stable and current will ignore unpartitioned disks, but for mirror it should not matter as long as the first disk is readable. The problem is when your first disk will fail.

Note you can fix the partitioning by zpool detach, create partitions, zpool attach.

The partition will give us chance to detect the correct size for the disk/partition, so we would be able to read pool labels. However, there is another problem -- if the BIOS is buggy and your pool size will go past the *reported* disk size, then the BIOS is most likely unable to read the pool past that size line anyhow. If so, there are only 2 options - either make sure the boot pool is within limits set by BIOS, or use UEFI.

for test, you can copy /boot/loader from 12-stable or current into /boot/loader.test, on boot, press space on first spinner and enter /boot/loader.test, or start boot loader from iso/usb
Comment 9 Andriy Gapon freebsd_committer 2019-01-10 12:53:29 UTC
(In reply to David Chisnall from comment #7)
I think that the incorrect size is the problem.
We use a size reported by BIOS or in a partition table as the size.
In this case, most of the second disk would be inaccessible because of that.
I think that that explains what you see.

Also, it seems that we currently ignore the "asize" property in the vdev label.
But even if we didn't, we could only warn that the "physical" size is smaller than the allocated size.

(In reply to Toomas Soome from comment #8)
Toomas, this is not a mirror, there are two top level vdevs.
Comment 10 Toomas Soome freebsd_committer 2019-01-10 15:02:38 UTC
(In reply to Andriy Gapon from comment #9)

Woops, but in that case we are indeed in trouble, because we do not probe the partitionless disks.

So in this case, we have few options:

1. can we install some nice partition table like BSD table, and define zfs partition to start from sector 0, so the table will fit in label reserved area and we wont disturb the data.

2. build the estimate for the pool size

3. we can provide local workaround..

but in any case this setup is not good and should be fixed.
Comment 11 David Chisnall freebsd_committer 2019-01-10 15:43:39 UTC
(In reply to Andriy Gapon from comment #9)

I'm somewhat confused as to why the kernel is able to see the correct size, but loader sees the wrong size.  Is it likely to be an EFI-related issue, or something specific to Hyper-V?  Adding dexuan@ to the CC in case it's a Hyper-V issue.  It seems redundant to require a partition table for a disk that is 100% managed by ZFS, but if there's a mechanism for installing one after creation then I could potentially do that.
Comment 12 Toomas Soome freebsd_committer 2019-01-10 16:16:28 UTC
(In reply to David Chisnall from comment #11)

The problem with disk sizes, BIOS, and kernel is that kernel has its own drivers to read the disk size - whatever type of disk is there, we do have specific drivers and resources in kernel.

In loader, we *only* use firmware facilities - for UEFI we have UEFI API, for BIOS we have INT13 interface, and unfortunately there is unbelievable amount of bugs, especially if the BIOS is actually emulated on top of UEFI. Especially nasty when the system will hung if you access disk past end.

With zfs, we have pool config and uberblock pointer stored in pool labels (4), labels are stored 2 in front of the pool and 2 at the back. To read data, we actually should read all 4, find the most recent copy and use it. To read last 2 labels, we need to know the location.

In situation where we can not trust the sectors count (if reported at all), it appeared   the only reliable source to get information about size is partition table because it is  created by the OS.
Comment 13 David Chisnall freebsd_committer 2019-01-16 12:51:38 UTC
(In reply to Toomas Soome from comment #12)
I've upgraded the machine to 12-stable, and now it won't boot even with copies=2.  So, not great.  I am currently booting using the install CD's kernel, but that isn't a good long-term solution.

Is there any way to add a partition table (or some other kind of metadata that the loader can read) retroactively?  If not, we *really* should teach zpool not to allow adding an unpartitioned disk to the root pool (it already doesn't let you remove devices from the root pool claiming that GRUB can't handle them, but I'm not using GRUB so I don't care and there's no way to override this).

I tried adding a new disk, installing a partition table, and adding the partition to the pool, but when I tried to remove da1 I got the aforementioned error about GRUB, so I reverted the pool to a checkpoint.
Comment 14 Toomas Soome freebsd_committer 2019-01-16 13:17:23 UTC
(In reply to David Chisnall from comment #13)

Yes you can install BSD label and define partition from sector 0, see comment #10.

For device removal, stay tuned, I do have the patch. And yes, it would be good idea to reject partitionless from being added to boot pool.
Comment 15 David Chisnall freebsd_committer 2019-01-17 09:53:59 UTC
(In reply to Toomas Soome from comment #14)
How do I install the partition table?  Is there a way of checking that there isn't any ZFS data in the place where the partition table would live?
Comment 16 Toomas Soome freebsd_committer 2019-01-17 10:35:46 UTC
(In reply to David Chisnall from comment #15)

First of all: make sure you have backup/snapshot of the disk..

gpart create -s BSD da1
gpart add -t freebsd-zfs -b 0 da1

That should create BSD label and partition starting from sector 0 (label is 512B, one sector). Since zfs does have reserved 8KB from the very start, this will not disturb zfs. (and would be good to test first!).

Note you *can not* use GPT, because GPT will store backup at the end of the disk and that will clash with zfs. Also you must use partition scheme which does allow to create  partition from absolute sector 0.

However, there also is alternate and much better approach. Since you have VM, if you can provision new, a bit larger disk, you can create proper partition table on it, large enough to fit da1, then *attach* it to da1:

zpool attach zpool da1 <newdisk>pX, wait for resilver, then zpool detach zpool da1.

just do not mix up the zpool attach and zpool add, those are 2 very different commands.

if you do not have enough space to make full copy but can extend the existing da1, then  you can move its content to make space for partition table, but thats also complicated operation...
Comment 17 David Chisnall freebsd_committer 2019-01-26 12:31:18 UTC
(In reply to Toomas Soome from comment #16)

This work-around worked for me - I added a second disk with a GPT partition table, added it as a mirror, waited for resilver, and then removed the original.  I can now boot again.  Having the `zfs` tool refuse to create situations like this in the first place would definitely be preferable!
Comment 18 Andriy Gapon freebsd_committer 2019-01-27 16:19:43 UTC
Well, the problem is really with the bogus BIOS of your VM.