Bug 226714 - zfsboot(8) erroneously suggests creating a BSD label
Summary: zfsboot(8) erroneously suggests creating a BSD label
Status: Closed FIXED
Alias: None
Product: Documentation
Classification: Unclassified
Component: Manual Pages (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-19 06:13 UTC by Victor Sudakov
Modified: 2019-03-01 16:26 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Victor Sudakov 2018-03-19 06:13:24 UTC
man zfsboot in the example of "zfsboot can also be installed in an MBR slice" suggests that a BSD label must be created  within the slice:

gpart create -s BSD ada0s1

In fact, a BSD label in the slice prevents zfsboot from detecting the bootable zfs pool in the slice. 

More details and a working example of creating a bootable zfs pool within an MBR slice can be found here: https://docs.freebsd.org/cgi/getmsg.cgi?fetch=22620+0+current/freebsd-questions

The Wiki Page https://wiki.freebsd.org/RootOnZFS/ZFSBootSlice says "does not boot" but in fact a correct configuration is quite possible.

Another Wiki PAGE https://wiki.freebsd.org/RootOnZFS/ZFSBootPartition contains the same error: the BSD label which is erroneous.
Comment 1 Eugene Grosbein freebsd_committer freebsd_triage 2018-03-19 09:44:12 UTC
This is what happens inside: zfsboot normally reads zfsloader from active slice containing ZFS boot pool and starts it.

zfsloader tastes all slices using zfs_probe_partition() function from sys/boot/zfs/zfs.c that calls open(devname, O_RDONLY).

open(devname, O_RDONLY) returns -1/ENOENT for a slice that has traces of BSD label, so zfsloader fails to find bootable ZFS pool. This is libstand's open() call in stable/11. It returns -1 if detects "DISKMAGIC ((u_int32_t)0x82564557) /* The disk magic number */" in the second sector of a slice.

One can use tools/tools/bootparttest to see if that is the case, for example:

disk0s2: FreeBSD                2047MB
read 512 bytes from the block 0 [+16777720]
read 512 bytes from the block 1 [+16777720]
ptable_bsdread: BSD detected
disk_close: disk0: closed => 0x801621000 [1]

In fact, disk0s2 contains ZFS boot pool and DISKMAGIC in the second sector.

Easiest fix would be to replace "dd if=/boot/zfsboot of=/dev/ada0s1 count=1" with "dd if=/boot/zfsboot of=/dev/ada0s1 count=2" in the docs to make sure that second sector has no traces of BSD label.
Comment 2 Victor Sudakov 2018-03-19 16:35:04 UTC
(In reply to Eugene Grosbein from comment #1)
> Easiest fix would be to replace "dd if=/boot/zfsboot of=/dev/ada0s1 count=1" with "dd if=/boot/zfsboot of=/dev/ada0s1 count=2" in the docs to make sure that second sector has no traces of BSD label.

This fix may be easy but it will look obscure to the uninitiated. We need to state firmly that a bootable ZFS pool is not supported if a BSD label is present.
Comment 3 Victor Sudakov 2018-03-25 04:32:49 UTC
It also seems that the requirement to set "sysctl kern.geom.debugflags=0x10" is superfluous, dd'ing the bootcode onto the VBR works fine without it.
Comment 4 Andriy Gapon freebsd_committer freebsd_triage 2018-03-27 13:06:57 UTC
(In reply to vas from comment #0)
I am not sure why the first wiki link is marked as "does not boot".  The instructions look sane, but maybe there is a mistake in them.  Someone needs to follow them exactly and to check what happens.

The second link also contains perfectly valid instructions.  It is not necessary to create a bsdlabel and a partition within it for ZFS, but it can be done if so desired.

zfsboot(8) is the worst, of course.  It instructs to create a bsdlabel within the slice, but the subsequent instructions are for installing ZFS to the slice.  So, the label is useless at best and harmful at worst.

I think that "gpart create -s BSD ada0s1" should be removed from the manual page.
Everything else looks okay.
Comment 5 commit-hook freebsd_committer freebsd_triage 2018-03-27 17:37:53 UTC
A commit references this bug:

Author: eugen
Date: Tue Mar 27 17:37:08 UTC 2018
New revision: 331630
URL: https://svnweb.freebsd.org/changeset/base/331630

Log:
  Fix instructions in the zfsboot manual page.

  zfsloader(8) fails to probe a slice containing ZFS pool if its second sector
  contains traces of BSD label (DISKMAGIC == 0x82564557).
  Fix manual page to show working example erasing such traces.

  PR:		226714
  Approved by:	avg (mentor)
  MFC after:	3 days

Changes:
  head/stand/i386/zfsboot/zfsboot.8
Comment 6 Victor Sudakov 2018-03-28 01:56:54 UTC
(In reply to commit-hook from comment #5)

>  dd if=/dev/zero of=/dev/ada0s1 count=2

I find it inappropriate. You never explain the meaning of this hack, it will look enigmatic for the uninitiated.
Comment 7 Victor Sudakov 2018-03-28 02:06:13 UTC
(In reply to Andriy Gapon from comment #4)
> Someone needs to follow them exactly and to check what happens

Both the wiki links have multiple issues. If I were given the power, I would merge them into one article, and rewrite it thoroughly in the spirit of https://victor-sudakov.dreamwidth.org/437590.html 

> It is not necessary to create a bsdlabel and a partition within it for ZFS, 
> but it can be done if so desired.

Eugene says that the presence of a bsdlabel prevents zfsboot from detecting the bootable zpool, I have no reason to disbelieve him.

"gpart create -s BSD" should be eliminated from everywhere.
Comment 8 Mark Linimon freebsd_committer freebsd_triage 2018-03-28 02:22:28 UTC
(In reply to vas from comment #7)

I am willing to help get the information into the wiki.  Please email linimon@ if you are interested.
Comment 9 Eugene Grosbein freebsd_committer freebsd_triage 2018-03-28 02:43:34 UTC
(In reply to vas from comment #6)

This "installation procedure" is inappropriate as a whole and should be not explained but thrown away completely and replaced with something like new command "zpool bootcfg -b /boot/zfsboot poolname" but that will take much more time.

I see not reason to explain manual workarounds for misfeatures/bugs in the libsa/libstand and I'm not intended to do so in the manual page.
Comment 10 Eugene Grosbein freebsd_committer freebsd_triage 2018-03-28 02:45:16 UTC
(In reply to vas from comment #7)

The problem is not in zfsboot (it works just fine despite of BSD label presense), it is in the libstand/libsa linked to zfsloader(8).
Comment 11 Warner Losh freebsd_committer freebsd_triage 2018-03-28 02:47:35 UTC
Explain the bug in libsa, and I'll fix the bug in libsa.

I agree, we shouldn't work around a bug that's easily fixed, or make snarky comments about parts of the system that clearly have an active maintainer.
Comment 12 Eugene Grosbein freebsd_committer freebsd_triage 2018-03-28 02:52:42 UTC
(In reply to Warner Losh from comment #11)

Please look at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226714#c1 for details.
Comment 13 Victor Sudakov 2018-03-28 03:57:37 UTC
(In reply to Eugene Grosbein from comment #9)
> zpool bootcfg 

Is that a good idea? I would think it would be more appropriate for "gpart bootcode"
Comment 14 Warner Losh freebsd_committer freebsd_triage 2018-03-28 04:51:59 UTC
(In reply to Eugene Grosbein from comment #12)

So I'm not so sure that's the bug.

The bug appears to be here:
         pa.fd = open(devname, O_RDONLY);
        if (pa.fd == -1)
                return (0);
        ret = zfs_probe(pa.fd, ppa->pool_guid);
        if (ret == 0)
                return (0);
        /* Do we have BSD label here? */
        if (part->type == PART_FREEBSD) {
                pa.devname = devname;
                pa.pool_guid = ppa->pool_guid;
                pa.secsz = ppa->secsz;
                table = ptable_open(&pa, part->end - part->start + 1,
                    ppa->secsz, zfs_diskread);
                if (table != NULL) {
                        ptable_iterate(table, &pa, zfs_probe_partition);
                        ptable_close(table);
                }
        }

So if we can't open the s1 device (which we prohibit currently), then we return. If we can't open the device, we should just not probe zfs on the slice, but we should probe it for the BSD partitions. That bug is easy enough to fix. The bug is here, not in libsa. It's behavior is working as designed. ptable_open should recurse properly though.

It's kind of a moot distinction, though, because while we support non-zero offsets for the zpool partition, we can't really boot off such partitions because zfsboot assumes that the bootable zpool is in a partition that is at offset 0 (need not be the first one, it appears). The current instructions assume that zfsboot is installed in the ZFS boot zone, which also assumes that the uber block is where it would be if the sliced started at location zero.

I think the following fix is the right one, but I'm not sure yet:
diff --git a/stand/zfs/zfs.c b/stand/zfs/zfs.c
index 0050bbe0056..f51c5c1efaa 100644
--- a/stand/zfs/zfs.c
+++ b/stand/zfs/zfs.c
@@ -488,15 +488,19 @@ zfs_probe_partition(void *arg, const char *partname,
                return (0);

        ppa = (struct zfs_probe_args *)arg;
-       strncpy(devname, ppa->devname, strlen(ppa->devname) - 1);
-       devname[strlen(ppa->devname) - 1] = '\0';
-       sprintf(devname, "%s%s:", devname, partname);
+       sprintf(devname, "%s%s:", ppa->devname, partname);
        pa.fd = open(devname, O_RDONLY);
-       if (pa.fd == -1)
-               return (0);
-       ret = zfs_probe(pa.fd, ppa->pool_guid);
-       if (ret == 0)
-               return (0);
+       if (pa.fd != -1) {
+               /*
+                * Only probe the slice if there's no BSD label
+                * on it. When we have a nested scheme, we'll not
+                * be able to open the slice, but we will be able
+                * to open the ptable.
+                */
+               ret = zfs_probe(pa.fd, ppa->pool_guid);
+               if (ret == 0)
+                       return (0);
+       }
        /* Do we have BSD label here? */
        if (part->type == PART_FREEBSD) {
                pa.devname = devname;

Oh, and I agree we should stop telling people to create a BSD partition, but we should support it because sometimes it's necessary for a swap or a dump partition.
Comment 15 Eugene Grosbein freebsd_committer freebsd_triage 2018-03-28 05:11:21 UTC
(In reply to Warner Losh from comment #14)

> So I'm not so sure that's the bug.

I do not insist that's the bug, a mis-feature may be :-)

> So if we can't open the s1 device (which we prohibit currently), then we return. If we can't open the device, we should just not probe zfs on the slice, but we should probe it for the BSD partitions. That bug is easy enough to fix. The bug is here, not in libsa. It's behavior is working as designed. ptable_open should recurse properly though.

Not agreed. Perhaps, you missed the point: the problem manifests if 
there is a slice without real BSD label inside containing its own partition table but ZFS pool and BSD label magic number in the second block.

Everyone that:

- tries to migrate from traditional UFS-only system having MBR+BSD label to ZFS-only bootable pool,
- does "gpart destroy -F ada0" then re-creates MBR and slice having same offsets,
- installs boot code and creates ZFS pool without creating BSD label using instructions from zfsboot(8) before my last change

will have mentioned ZFS pool without rea; BSD label inside but with its magic number in the second block. Been there, seen that.
Comment 16 Eugene Grosbein freebsd_committer freebsd_triage 2018-03-28 05:16:20 UTC
(In reply to Warner Losh from comment #14)

> ptable_open should recurse properly though

There is nothing to recurse in the scenario described above.

Either zfsboot installation procedure must clear second block to ensure it does not hit libsa's "design feature" that prohibits opening such a slice, or libsa should not prohibit that and then it must have a knowledge about ZFS pools over slices and how to differentiate them having BSD label magic in the second block from real BSD labels.
Comment 17 Eugene Grosbein freebsd_committer freebsd_triage 2018-03-28 05:20:46 UTC
(In reply to vas from comment #13)

ZFS won't allow GEOM to open its pool for writing unless the pool is exported and export is not possible for mounted boot/root pool. Only ZFS itself may update zfsboot for running system.
Comment 18 Eugene Grosbein freebsd_committer freebsd_triage 2018-03-28 05:26:24 UTC
(In reply to Warner Losh from comment #14)

> Oh, and I agree we should stop telling people to create a BSD partition, but we should support it because sometimes it's necessary for a swap or a dump partition.

MBR still has a space for three more slices for swap/dump/whatever slices and since ZFS has its own volume manager, I do not see large demand for ZFS-inside-BSDlabel-inside-MBRslice configuration but simplier ZFS-inside-MBRslice should be just fine.
Comment 19 Victor Sudakov 2018-03-28 05:34:54 UTC
(In reply to Eugene Grosbein from comment #17)
I cannot imagine a scenario in which you have to install/update zfsboot on the running system, over a mounted root pool. If it is a new installation, you should install the bootcode first, and then create the pool.

Or, if it's a fixit situation, you boot from LiveCD, mfsBSD etc and fix the bootcode without importing the actual pool of the system you are fixing.
Comment 20 Victor Sudakov 2018-03-28 05:36:55 UTC
(In reply to Eugene Grosbein from comment #18)
> I do not see large demand for ZFS-inside-BSDlabel-inside-MBRslice 
> configuration but simplier ZFS-inside-MBRslice should be just fine.

Hear, hear! The sooner bsdlabel is forgotten the better.
Comment 21 Eugene Grosbein freebsd_committer freebsd_triage 2018-03-28 06:04:54 UTC
(In reply to vas from comment #19)

ZFS boot blocks get updated periodically to support new zpool-features(7). For example, booting from pools utilizing "skein" hash algotithm has not been supported until recently boot blocks were updated.
Comment 22 Eugene Grosbein freebsd_committer freebsd_triage 2018-03-28 06:07:27 UTC
(In reply to vas from comment #20)

> Hear, hear! The sooner bsdlabel is forgotten the better.

No need to be extreme: modern BSD label capable of holding upto 20 partitions is still just fine for MBR/BSDlabel/UFS-based FreeBSD installations.
Comment 23 Victor Sudakov 2018-03-28 07:28:05 UTC
(In reply to Eugene Grosbein from comment #21)
Sounds scary.
Comment 24 Victor Sudakov 2018-03-28 08:55:25 UTC
(In reply to Eugene Grosbein from comment #21)
All this scary stuff (boot sectors, boot records) will probably go for good when GPT+UEFI becomes more ubiquitous.
Comment 25 Warner Losh freebsd_committer freebsd_triage 2018-03-28 15:58:42 UTC
Leaving vestiges of a bsdlabel around is asking for trouble. What I'm worried about is that we don't actually handle the nesting case in general. Also, if the bsdlabel looks valid, the boot blocks have to honor it. Otherwise it's a laying violation. zfs isn't immune from properly labeling disks, so we won't be adding a weird exception for that. We could add additional checks to make sure that the bsdlabel is sane and ignore it if it isn't. The current checks are pretty minimal.

I'm surprised that gpart destory of the MBR doesn't recursively destroy the nested things. The man page is silent, though it has a -F to force destroying a non-empty one. This sounds more like a pilot error, however. If the BSD label is indeed stale, there's little that we can do. We have similar issues with things converted from MBR to GPT sometimes if the old labels aren't properly cleared out.

I spent an hour on the plane staring at the code, and we should handle it properly with. I know it's the design point to disallow opens for nested containers, but I couldn't find where in the code that actually happened. It's spring break this week, so I'll take a look at it when things are over.

I also agree, btw, that gpart installboot should just work rather than the crazy dd stuff...
Comment 26 commit-hook freebsd_committer freebsd_triage 2018-04-03 14:06:30 UTC
A commit references this bug:

Author: eugen
Date: Tue Apr  3 14:05:40 UTC 2018
New revision: 331927
URL: https://svnweb.freebsd.org/changeset/base/331927

Log:
  MFC r331630: Fix instructions in the zfsboot manual page.

  zfsloader(8) fails to probe a slice containing ZFS pool if its second sector
  contains traces of BSD label (DISKMAGIC == 0x82564557).
  Fix manual page to show working example erasing such traces.

  PR:		226714
  Approved by:	avg (mentor)

  _M   .
  M    stand/i386/zfsboot/zfsboot.8

Changes:
_U  stable/11/
  stable/11/stand/i386/zfsboot/zfsboot.8
Comment 27 Victor Sudakov 2018-04-04 08:58:31 UTC
I think those two Wiki pages mentioned in the problem description should be merged and rewritten. If anybody can review my attempt at rewriting them, please look at https://bitbucket.org/victor_sudakov/faq/src/4e181ae2de0132967a699abbcd234e257b8d47b5/FreeBSD/Installing_FreeBSD_Root_on_ZFS_using_FreeBSD-ZFS_partition_in_a_FreeBSD_MBR_Slice.txt?at=default and say what you think.
Comment 28 Eugene Grosbein freebsd_committer freebsd_triage 2018-04-04 09:28:17 UTC
(In reply to vas from comment #27)

"Setting the active slice is not strictly necessary" is wrong because there are BIOS'es that won't even try to load Boot Manager from a disk having no active slice saying "No operating system found".
Comment 29 Victor Sudakov 2018-04-04 09:45:27 UTC
(In reply to Eugene Grosbein from comment #28)
A valid point, thank you, I've corrected the document.
Comment 31 Eugene Grosbein freebsd_committer freebsd_triage 2018-04-05 04:24:21 UTC
(In reply to vas from comment #30)

> Setting the active slice is not strictly necessary, the Boot Manager
> will do it for you anyway.
>
> However there are BIOSes that won't even try to load the Boot Manager
> from a disk having no active slice saying "No operating system found."
> To be on the safe side, we should always set the active slice.

You are not creating an encyclopedia for IBM PC compatibles but ZFS-centric article :-)

The shorter, the easier to read. Brevity is the soul of wit. I suggest removing both paragraphs completely as they discuss something unrelated to ZFS and FreeBSD.
Comment 32 Victor Sudakov 2018-04-05 05:33:53 UTC
(In reply to Eugene Grosbein from comment #31)
> I suggest removing both paragraphs completely
> as they discuss something unrelated to ZFS and FreeBSD.

Noted but declined for the present. I prefer explaining every command whose meaning is not immediately obvious from the context.
Comment 33 Eugene Grosbein freebsd_committer freebsd_triage 2018-04-05 05:39:44 UTC
(In reply to vas from comment #32)

But labeling active slice is pretty obvious... And BootEasy is not sole boot manager for FreeBSD (/boot/mbr is preferable for single-boot). In fact, there is no reason to NOT label active slice as such no matter what kind of boot manager is used.
Comment 34 Victor Sudakov 2018-04-06 17:50:19 UTC
The question is who has the power to edit the FreeBSD Wiki and what chance of getting there my article has.
Comment 35 Eugene Grosbein freebsd_committer freebsd_triage 2018-04-14 22:46:24 UTC
zfsboot(8) manual page is fixed and fix MFC'd.

Discussion of FreeBSD Wiki should not take place in the Bugzilla, please use our mailing lists or web forums.