man zfsboot in the example of "zfsboot can also be installed in an MBR slice" suggests that a BSD label must be created within the slice: gpart create -s BSD ada0s1 In fact, a BSD label in the slice prevents zfsboot from detecting the bootable zfs pool in the slice. More details and a working example of creating a bootable zfs pool within an MBR slice can be found here: https://docs.freebsd.org/cgi/getmsg.cgi?fetch=22620+0+current/freebsd-questions The Wiki Page https://wiki.freebsd.org/RootOnZFS/ZFSBootSlice says "does not boot" but in fact a correct configuration is quite possible. Another Wiki PAGE https://wiki.freebsd.org/RootOnZFS/ZFSBootPartition contains the same error: the BSD label which is erroneous.
This is what happens inside: zfsboot normally reads zfsloader from active slice containing ZFS boot pool and starts it. zfsloader tastes all slices using zfs_probe_partition() function from sys/boot/zfs/zfs.c that calls open(devname, O_RDONLY). open(devname, O_RDONLY) returns -1/ENOENT for a slice that has traces of BSD label, so zfsloader fails to find bootable ZFS pool. This is libstand's open() call in stable/11. It returns -1 if detects "DISKMAGIC ((u_int32_t)0x82564557) /* The disk magic number */" in the second sector of a slice. One can use tools/tools/bootparttest to see if that is the case, for example: disk0s2: FreeBSD 2047MB read 512 bytes from the block 0 [+16777720] read 512 bytes from the block 1 [+16777720] ptable_bsdread: BSD detected disk_close: disk0: closed => 0x801621000 [1] In fact, disk0s2 contains ZFS boot pool and DISKMAGIC in the second sector. Easiest fix would be to replace "dd if=/boot/zfsboot of=/dev/ada0s1 count=1" with "dd if=/boot/zfsboot of=/dev/ada0s1 count=2" in the docs to make sure that second sector has no traces of BSD label.
(In reply to Eugene Grosbein from comment #1) > Easiest fix would be to replace "dd if=/boot/zfsboot of=/dev/ada0s1 count=1" with "dd if=/boot/zfsboot of=/dev/ada0s1 count=2" in the docs to make sure that second sector has no traces of BSD label. This fix may be easy but it will look obscure to the uninitiated. We need to state firmly that a bootable ZFS pool is not supported if a BSD label is present.
It also seems that the requirement to set "sysctl kern.geom.debugflags=0x10" is superfluous, dd'ing the bootcode onto the VBR works fine without it.
(In reply to vas from comment #0) I am not sure why the first wiki link is marked as "does not boot". The instructions look sane, but maybe there is a mistake in them. Someone needs to follow them exactly and to check what happens. The second link also contains perfectly valid instructions. It is not necessary to create a bsdlabel and a partition within it for ZFS, but it can be done if so desired. zfsboot(8) is the worst, of course. It instructs to create a bsdlabel within the slice, but the subsequent instructions are for installing ZFS to the slice. So, the label is useless at best and harmful at worst. I think that "gpart create -s BSD ada0s1" should be removed from the manual page. Everything else looks okay.
A commit references this bug: Author: eugen Date: Tue Mar 27 17:37:08 UTC 2018 New revision: 331630 URL: https://svnweb.freebsd.org/changeset/base/331630 Log: Fix instructions in the zfsboot manual page. zfsloader(8) fails to probe a slice containing ZFS pool if its second sector contains traces of BSD label (DISKMAGIC == 0x82564557). Fix manual page to show working example erasing such traces. PR: 226714 Approved by: avg (mentor) MFC after: 3 days Changes: head/stand/i386/zfsboot/zfsboot.8
(In reply to commit-hook from comment #5) > dd if=/dev/zero of=/dev/ada0s1 count=2 I find it inappropriate. You never explain the meaning of this hack, it will look enigmatic for the uninitiated.
(In reply to Andriy Gapon from comment #4) > Someone needs to follow them exactly and to check what happens Both the wiki links have multiple issues. If I were given the power, I would merge them into one article, and rewrite it thoroughly in the spirit of https://victor-sudakov.dreamwidth.org/437590.html > It is not necessary to create a bsdlabel and a partition within it for ZFS, > but it can be done if so desired. Eugene says that the presence of a bsdlabel prevents zfsboot from detecting the bootable zpool, I have no reason to disbelieve him. "gpart create -s BSD" should be eliminated from everywhere.
(In reply to vas from comment #7) I am willing to help get the information into the wiki. Please email linimon@ if you are interested.
(In reply to vas from comment #6) This "installation procedure" is inappropriate as a whole and should be not explained but thrown away completely and replaced with something like new command "zpool bootcfg -b /boot/zfsboot poolname" but that will take much more time. I see not reason to explain manual workarounds for misfeatures/bugs in the libsa/libstand and I'm not intended to do so in the manual page.
(In reply to vas from comment #7) The problem is not in zfsboot (it works just fine despite of BSD label presense), it is in the libstand/libsa linked to zfsloader(8).
Explain the bug in libsa, and I'll fix the bug in libsa. I agree, we shouldn't work around a bug that's easily fixed, or make snarky comments about parts of the system that clearly have an active maintainer.
(In reply to Warner Losh from comment #11) Please look at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226714#c1 for details.
(In reply to Eugene Grosbein from comment #9) > zpool bootcfg Is that a good idea? I would think it would be more appropriate for "gpart bootcode"
(In reply to Eugene Grosbein from comment #12) So I'm not so sure that's the bug. The bug appears to be here: pa.fd = open(devname, O_RDONLY); if (pa.fd == -1) return (0); ret = zfs_probe(pa.fd, ppa->pool_guid); if (ret == 0) return (0); /* Do we have BSD label here? */ if (part->type == PART_FREEBSD) { pa.devname = devname; pa.pool_guid = ppa->pool_guid; pa.secsz = ppa->secsz; table = ptable_open(&pa, part->end - part->start + 1, ppa->secsz, zfs_diskread); if (table != NULL) { ptable_iterate(table, &pa, zfs_probe_partition); ptable_close(table); } } So if we can't open the s1 device (which we prohibit currently), then we return. If we can't open the device, we should just not probe zfs on the slice, but we should probe it for the BSD partitions. That bug is easy enough to fix. The bug is here, not in libsa. It's behavior is working as designed. ptable_open should recurse properly though. It's kind of a moot distinction, though, because while we support non-zero offsets for the zpool partition, we can't really boot off such partitions because zfsboot assumes that the bootable zpool is in a partition that is at offset 0 (need not be the first one, it appears). The current instructions assume that zfsboot is installed in the ZFS boot zone, which also assumes that the uber block is where it would be if the sliced started at location zero. I think the following fix is the right one, but I'm not sure yet: diff --git a/stand/zfs/zfs.c b/stand/zfs/zfs.c index 0050bbe0056..f51c5c1efaa 100644 --- a/stand/zfs/zfs.c +++ b/stand/zfs/zfs.c @@ -488,15 +488,19 @@ zfs_probe_partition(void *arg, const char *partname, return (0); ppa = (struct zfs_probe_args *)arg; - strncpy(devname, ppa->devname, strlen(ppa->devname) - 1); - devname[strlen(ppa->devname) - 1] = '\0'; - sprintf(devname, "%s%s:", devname, partname); + sprintf(devname, "%s%s:", ppa->devname, partname); pa.fd = open(devname, O_RDONLY); - if (pa.fd == -1) - return (0); - ret = zfs_probe(pa.fd, ppa->pool_guid); - if (ret == 0) - return (0); + if (pa.fd != -1) { + /* + * Only probe the slice if there's no BSD label + * on it. When we have a nested scheme, we'll not + * be able to open the slice, but we will be able + * to open the ptable. + */ + ret = zfs_probe(pa.fd, ppa->pool_guid); + if (ret == 0) + return (0); + } /* Do we have BSD label here? */ if (part->type == PART_FREEBSD) { pa.devname = devname; Oh, and I agree we should stop telling people to create a BSD partition, but we should support it because sometimes it's necessary for a swap or a dump partition.
(In reply to Warner Losh from comment #14) > So I'm not so sure that's the bug. I do not insist that's the bug, a mis-feature may be :-) > So if we can't open the s1 device (which we prohibit currently), then we return. If we can't open the device, we should just not probe zfs on the slice, but we should probe it for the BSD partitions. That bug is easy enough to fix. The bug is here, not in libsa. It's behavior is working as designed. ptable_open should recurse properly though. Not agreed. Perhaps, you missed the point: the problem manifests if there is a slice without real BSD label inside containing its own partition table but ZFS pool and BSD label magic number in the second block. Everyone that: - tries to migrate from traditional UFS-only system having MBR+BSD label to ZFS-only bootable pool, - does "gpart destroy -F ada0" then re-creates MBR and slice having same offsets, - installs boot code and creates ZFS pool without creating BSD label using instructions from zfsboot(8) before my last change will have mentioned ZFS pool without rea; BSD label inside but with its magic number in the second block. Been there, seen that.
(In reply to Warner Losh from comment #14) > ptable_open should recurse properly though There is nothing to recurse in the scenario described above. Either zfsboot installation procedure must clear second block to ensure it does not hit libsa's "design feature" that prohibits opening such a slice, or libsa should not prohibit that and then it must have a knowledge about ZFS pools over slices and how to differentiate them having BSD label magic in the second block from real BSD labels.
(In reply to vas from comment #13) ZFS won't allow GEOM to open its pool for writing unless the pool is exported and export is not possible for mounted boot/root pool. Only ZFS itself may update zfsboot for running system.
(In reply to Warner Losh from comment #14) > Oh, and I agree we should stop telling people to create a BSD partition, but we should support it because sometimes it's necessary for a swap or a dump partition. MBR still has a space for three more slices for swap/dump/whatever slices and since ZFS has its own volume manager, I do not see large demand for ZFS-inside-BSDlabel-inside-MBRslice configuration but simplier ZFS-inside-MBRslice should be just fine.
(In reply to Eugene Grosbein from comment #17) I cannot imagine a scenario in which you have to install/update zfsboot on the running system, over a mounted root pool. If it is a new installation, you should install the bootcode first, and then create the pool. Or, if it's a fixit situation, you boot from LiveCD, mfsBSD etc and fix the bootcode without importing the actual pool of the system you are fixing.
(In reply to Eugene Grosbein from comment #18) > I do not see large demand for ZFS-inside-BSDlabel-inside-MBRslice > configuration but simplier ZFS-inside-MBRslice should be just fine. Hear, hear! The sooner bsdlabel is forgotten the better.
(In reply to vas from comment #19) ZFS boot blocks get updated periodically to support new zpool-features(7). For example, booting from pools utilizing "skein" hash algotithm has not been supported until recently boot blocks were updated.
(In reply to vas from comment #20) > Hear, hear! The sooner bsdlabel is forgotten the better. No need to be extreme: modern BSD label capable of holding upto 20 partitions is still just fine for MBR/BSDlabel/UFS-based FreeBSD installations.
(In reply to Eugene Grosbein from comment #21) Sounds scary.
(In reply to Eugene Grosbein from comment #21) All this scary stuff (boot sectors, boot records) will probably go for good when GPT+UEFI becomes more ubiquitous.
Leaving vestiges of a bsdlabel around is asking for trouble. What I'm worried about is that we don't actually handle the nesting case in general. Also, if the bsdlabel looks valid, the boot blocks have to honor it. Otherwise it's a laying violation. zfs isn't immune from properly labeling disks, so we won't be adding a weird exception for that. We could add additional checks to make sure that the bsdlabel is sane and ignore it if it isn't. The current checks are pretty minimal. I'm surprised that gpart destory of the MBR doesn't recursively destroy the nested things. The man page is silent, though it has a -F to force destroying a non-empty one. This sounds more like a pilot error, however. If the BSD label is indeed stale, there's little that we can do. We have similar issues with things converted from MBR to GPT sometimes if the old labels aren't properly cleared out. I spent an hour on the plane staring at the code, and we should handle it properly with. I know it's the design point to disallow opens for nested containers, but I couldn't find where in the code that actually happened. It's spring break this week, so I'll take a look at it when things are over. I also agree, btw, that gpart installboot should just work rather than the crazy dd stuff...
A commit references this bug: Author: eugen Date: Tue Apr 3 14:05:40 UTC 2018 New revision: 331927 URL: https://svnweb.freebsd.org/changeset/base/331927 Log: MFC r331630: Fix instructions in the zfsboot manual page. zfsloader(8) fails to probe a slice containing ZFS pool if its second sector contains traces of BSD label (DISKMAGIC == 0x82564557). Fix manual page to show working example erasing such traces. PR: 226714 Approved by: avg (mentor) _M . M stand/i386/zfsboot/zfsboot.8 Changes: _U stable/11/ stable/11/stand/i386/zfsboot/zfsboot.8
I think those two Wiki pages mentioned in the problem description should be merged and rewritten. If anybody can review my attempt at rewriting them, please look at https://bitbucket.org/victor_sudakov/faq/src/4e181ae2de0132967a699abbcd234e257b8d47b5/FreeBSD/Installing_FreeBSD_Root_on_ZFS_using_FreeBSD-ZFS_partition_in_a_FreeBSD_MBR_Slice.txt?at=default and say what you think.
(In reply to vas from comment #27) "Setting the active slice is not strictly necessary" is wrong because there are BIOS'es that won't even try to load Boot Manager from a disk having no active slice saying "No operating system found".
(In reply to Eugene Grosbein from comment #28) A valid point, thank you, I've corrected the document.
Still updating the article at https://bitbucket.org/victor_sudakov/faq/src/tip/FreeBSD/Installing_FreeBSD_Root_on_ZFS_using_FreeBSD-ZFS_partition_in_a_FreeBSD_MBR_Slice.txt
(In reply to vas from comment #30) > Setting the active slice is not strictly necessary, the Boot Manager > will do it for you anyway. > > However there are BIOSes that won't even try to load the Boot Manager > from a disk having no active slice saying "No operating system found." > To be on the safe side, we should always set the active slice. You are not creating an encyclopedia for IBM PC compatibles but ZFS-centric article :-) The shorter, the easier to read. Brevity is the soul of wit. I suggest removing both paragraphs completely as they discuss something unrelated to ZFS and FreeBSD.
(In reply to Eugene Grosbein from comment #31) > I suggest removing both paragraphs completely > as they discuss something unrelated to ZFS and FreeBSD. Noted but declined for the present. I prefer explaining every command whose meaning is not immediately obvious from the context.
(In reply to vas from comment #32) But labeling active slice is pretty obvious... And BootEasy is not sole boot manager for FreeBSD (/boot/mbr is preferable for single-boot). In fact, there is no reason to NOT label active slice as such no matter what kind of boot manager is used.
The question is who has the power to edit the FreeBSD Wiki and what chance of getting there my article has.
zfsboot(8) manual page is fixed and fix MFC'd. Discussion of FreeBSD Wiki should not take place in the Bugzilla, please use our mailing lists or web forums.