Bug 271262 - bsdinstaller in AutoZFS + MBR mode always wipes disklabel - rendering system non-bootable
Summary: bsdinstaller in AutoZFS + MBR mode always wipes disklabel - rendering system ...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 13.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-sysinstall (Nobody)
URL:
Keywords: install
Depends on:
Blocks:
 
Reported: 2023-05-05 16:29 UTC by Henryk Paluch
Modified: 2023-08-02 01:25 UTC (History)
5 users (show)

See Also:


Attachments
Installation log for AutoZFS on MBR layout that will destroy disklabel (78.90 KB, text/plain)
2023-05-05 16:29 UTC, Henryk Paluch
no flags Details
Trace of zfs create command that will wipe disklabel s1a (28.72 KB, text/plain)
2023-05-05 16:29 UTC, Henryk Paluch
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Henryk Paluch 2023-05-05 16:29:05 UTC
Created attachment 241996 [details]
Installation log for AutoZFS on MBR layout that will destroy disklabel

While trying several time this installation:
- FreeBSD-13.2-RELEASE-amd64-disc1.iso

When I select this layout:
- AutoZFS
- MBR

It always ends up in broken - Non-bootable system (famous error "Missing operating system"

Analysis of such system (booting CD in Shell mode) has shown that somehow whole disklabel is missing.
Only primary FreeBSD partition exists.

I traced down the problem that this command (from bsdinstall_log):

# this works
disklabel da0s1
# this command wipes 1st 4KB of da0s1a thus destroying disklabel!
zpool create -o altroot=/mnt  -m "/bootpool" -f "bootpool"   da0s1a
# the command below will now end with error!
disklabel da0s1

Tracing above "zpool create" command I have found that it really wipes
1st 4KB of "da0s1a" thus destroying disklabel that occupies offset from 512 to 1024 bytes.

Here is relevant excerpt from truss tracing:

  702: fstatat(AT_FDCWD,"/dev/da0s1a",{ mode=crw-r----- ,inode=104,size=0,blksize=4096 },0x0) = 0 (0x0)
  702: openat(AT_FDCWD,"/dev/da0s1a",O_WRONLY|O_EXCL,00) = 6 (0x6)
  702: write(6,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
  702: fdatasync(6)                              = 0 (0x0)
  702: close(6)                                  = 0 (0x0)
  702: open("/mnt/bootpool",O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC,015) ERR#2 'No such file or directory'

You can clearly see that device da0s1a is opened and immediately wiped with 4096 bytes of zeroes thus liquidating disklabel (!).
Comment 1 Henryk Paluch 2023-05-05 16:29:59 UTC
Created attachment 241997 [details]
Trace of zfs create command that will wipe disklabel s1a
Comment 2 Henryk Paluch 2023-05-06 06:05:12 UTC
Additional details:

The original installation disklabel looks that way:
(the da0s1a was create by install using: gpart add  -i 1 -t freebsd-zfs -s 2147483648b "da0s1")

# /dev/da0s1:
8 partitions:
#          size     offset    fstype   [fsize bsize bps/cpg]
  a:    4194304          0       ZFS                    
  b:    4194304    4194304      swap                    
  c:   18874304          0    unused        0     0     # "raw" part, don't edit
  d:    4194304    8388608       ZFS                    

Please notice that da0s1 starts at 0 - so extreme care must be taken
to not overwrite disklabel on 2nd sector (byte offest from 512 to 1023).

===========================================================================

Using development machine with 2nd disk and GDB I have found following
stacktrace (by putting breakpoint just at __sys_write):

reakpoint 1.2, _write () at _write.S:4
4       _write.S: No such file or directory.
(gdb) bt
#0  _write () at _write.S:4
#1  0x00000008011af1f6 in __thr_write (fd=6, buf=0x7fffffffb040, nbytes=4096)
    at /usr/src/lib/libthr/thread/thr_syscalls.c:618
#2  0x000000000104c900 in zero_label (path=path@entry=0x7fffffffc190 "/dev/da1s1a")
    at /usr/src/sys/contrib/openzfs/cmd/zpool/zpool_vdev.c:905
#3  0x000000000104b5f0 in make_disks (zhp=zhp@entry=0x0, nv=0x801c10420)
    at /usr/src/sys/contrib/openzfs/cmd/zpool/zpool_vdev.c:1048
#4  0x000000000104b333 in make_disks (zhp=zhp@entry=0x0, nv=nv@entry=0x801c1d080)
    at /usr/src/sys/contrib/openzfs/cmd/zpool/zpool_vdev.c:1070
#5  0x000000000104b8e1 in make_root_vdev (zhp=0x6, zhp@entry=0x0, props=<optimized out>, force=force@entry=1, 
    check_rep=<optimized out>, replacing=replacing@entry=B_FALSE, dryrun=dryrun@entry=B_FALSE, argc=1, 
    argv=0x801c36040) at /usr/src/sys/contrib/openzfs/cmd/zpool/zpool_vdev.c:1864
#6  0x0000000001036ef9 in zpool_do_create (argc=<optimized out>, argv=<optimized out>)
    at /usr/src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:1587
#7  0x00000000010364b6 in main (argc=9, argv=0x7fffffffeb38)
    at /usr/src/sys/contrib/openzfs/cmd/zpool/zpool_main.c:10755

The problematic code is on 

#3  0x000000000104b5f0 in make_disks (zhp=zhp@entry=0x0, nv=0x801c10420)
    at /usr/src/sys/contrib/openzfs/cmd/zpool/zpool_vdev.c:1048

Which looks like:

                       ret = zero_label(udevpath);
                        if (ret)
                                return (ret);


Here is source pointer but the line offset is a bit different from what I see in local sources of 13.2. release:
- https://cgit.freebsd.org/src/blame/sys/contrib/openzfs/cmd/zpool/zpool_vdev.c?h=releng/13.2#n1080


And this is problem - overwriting disklabel..
Comment 3 Henryk Paluch 2023-05-06 06:06:40 UTC
(In reply to Henryk Paluch from comment #2)

Sentence:
> Please notice that da0s1 starts at 0 - so extreme care must be taken
Should be:
> Please notice that da0s1a starts at 0 - so extreme care must be taken
Comment 4 Henryk Paluch 2023-05-06 07:22:27 UTC
I suspect that this commit that is supposed to fix bootloader problem:
- https://cgit.freebsd.org/src/commit/?id=a94af9543df400cc2b8f41a4b6866b9684064728
causes this "wipe disklabel" problem, because it does take into account that "zfs create" wipes 1st 4KB of that slice "da0s1a" thus killing disklabel.
Comment 5 Henryk Paluch 2023-05-06 07:24:47 UTC
(In reply to Henryk Paluch from comment #4)
Correction
> causes this "wipe disklabel" problem, because it does take into account that "zfs create" wipes 1st 4KB of that slice "da0s1a" thus killing disklabel.
Should be:
> causes this "wipe disklabel" problem, because it does NOT take into account that "zfs create" wipes 1st 4KB of that slice "da0s1a" thus killing disklabel.
Comment 6 R. Christian McDonald 2023-06-30 12:54:48 UTC
Might we just remove the MBR option in zfsboot? The usefulness of Auto ZFS + ZFS in 2023 is questionable and it is currently broken as-is. We either need to fix it or remove the option to eliminate footgunning
Comment 7 R. Christian McDonald 2023-06-30 14:02:00 UTC
Also worth noting that this is broken on every release since OpenZFS (13 and beyond).
Comment 8 Allan Jude freebsd_committer freebsd_triage 2023-06-30 14:20:44 UTC
I would concur that removing MBR support is better.

There are significant issues with updating `zfsboot` (the next stage bootstrap) in an MBR pool, since it lives within the partition which is opened exclusively when the system is booted. So you have to use GEOM debug flags to allow scribbling over random offsets of the disk in order to update your bootcode.

So even if it was not broken, it is a bad idea in general.
It only really existed because of some BIOSes that couldn't boot from GPT, but that was fixed with the "GPT + Active" and "Lenovo Fix" options.
Comment 9 R. Christian McDonald 2023-06-30 17:46:41 UTC
(In reply to Allan Jude from comment #8)
Allan,

Thanks for the reply. I'm pretty novice when it comes to low-level storage, GEOM, CAM, etc...but I wanted to use this opportunity to learn something even if MBR is indeed on the path to being removed :)

A few notes:

1. I tweaked bsdinstall/scripts/zfsboot to offset the first freebsd-zfs label (the boot pool partition) by `-b 8` (4096 bytes). This at least preserves the labels that were getting clobbered (noted earlier in this report).
2. After making the above tweak the /boot/mbr just sits at a blinking cursor, and /boot/boot0 just prints a `#`...but it no longer says "Missing operating system"
3. Booting back into the installer and breaking out into a shell, I confirm bsdlabel is able to read /dev/ada0s1 without any issues. Without the tweak in #1, this failed (again, as noted earlier in this report).
4. Also I reinstalled zfsboot using `dd if=/boot/zfsboot of=/dev/ada0s1 count=1`and `dd if=/boot/zfsboot of=/dev/ada0s1 skip=1 seek=1024`

After doing this I am able to boot.
Comment 10 R. Christian McDonald 2023-06-30 18:06:07 UTC
Actually I'm realizing now that I probably just got lucky... It is probably safer to offset the boot pool partition farther than `-b 8` , something like `-b 2048` is safer. I believe my `dd if=/boot/zfsboot of=/dev/ada0s1 skip=1 seek=1024` is likely stomping on something it shouldn't be.
Comment 11 R. Christian McDonald 2023-07-01 01:00:11 UTC
https://reviews.freebsd.org/D40816


diff --git a/usr.sbin/bsdinstall/scripts/zfsboot b/usr.sbin/bsdinstall/scripts/zfsboot
--- a/usr.sbin/bsdinstall/scripts/zfsboot
+++ b/usr.sbin/bsdinstall/scripts/zfsboot
@@ -106,6 +106,11 @@
 #
 : ${ZFSBOOT_BOOT_POOL_SIZE:=2g}
 
+#
+# Default offset for the boot pool when enabled (e.g., geli(8) or MBR)
+#
+: ${ZFSBOOT_BOOT_POOL_OFFSET:=2048}
+
 #
 # Default disks to use (always empty unless being scripted)
 #
@@ -1008,12 +1013,14 @@
 
 		#
 		# Always prepare a boot pool on MBR
-		# Do not align this partition, there must not be a gap
+		#
+		# Offset the boot pool in order to protect the disk label
+		# and zfsboot from being clobbered by ZFS
 		#
 		ZFSBOOT_BOOT_POOL=1
 		f_eval_catch $funcname gpart \
 		             "$GPART_ADD_ALIGN_INDEX_WITH_SIZE" \
-		             "" 1 freebsd-zfs ${bootsize}b ${disk}s1 ||
+		             "-b $ZFSBOOT_BOOT_POOL_OFFSET" 1 freebsd-zfs ${bootsize}b ${disk}s1 ||
 		             return $FAILURE
 		# Pedantically nuke any old labels
 		f_eval_catch -d $funcname zpool "$ZPOOL_LABELCLEAR_F" \
@@ -1424,11 +1431,11 @@
 		fi
 
 		f_dprintf "$funcname: Updating MBR boot loader on disks..."
-		# Stick the ZFS boot loader in the "convenient hole" after
-		# the ZFS internal metadata
+		# The boot pool is offset from the beginning of the slice in order
+		# to protect the disk label and zfsboot from being clobbered by ZFS
 		for disk in $disks; do
 			f_eval_catch $funcname dd "$DD_WITH_OPTIONS" \
-			             /boot/zfsboot /dev/$disk$bootpart \
+			             /boot/zfsboot /dev/${disk}s1 \
 			             "skip=1 seek=1024" || return $FAILURE
 		done
Comment 12 void 2023-08-01 07:17:30 UTC
(In reply to Allan Jude from comment #8)

If MBR is removed, then I (and expect others) wouldn't be able to install from the dvd isos and memstick images available at the moment on download.freebsd.org, for any version 13.1 and upward.

https://lists.freebsd.org/archives/freebsd-stable/2023-July/001346.html

https://lists.freebsd.org/archives/freebsd-stable/2023-July/001347.html

motherboard is an Asus Z87-PRO both UEFI and legacy are supported, and GPT booting has worked before.

In the end, I had to install auto-ufs with MBR as I couldn't get a zfs install to boot with any version of the installation media.
Comment 13 void 2023-08-02 01:25:13 UTC
(In reply to void from comment #12)

I should have mentioned it more clearly before, my issue I concluded in the end was with GPT specifically (auto-ufs2 & GPT). I tried auto-zfs first (I understood this to default to GPT, I can't remember now), failed, tried auto-ufs, selected GPT (defaults to MBR), failed, then noticed the install media which booted fine was MBR, so selected auto-ufs with MBR and that worked.

maybe my issue needs a different PR?