Bug 282832 - makefs zfs creates images with the same guid
Summary: makefs zfs creates images with the same guid
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 14.1-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: Mark Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-11-18 11:14 UTC by Pat Maddox
Modified: 2024-11-28 14:41 UTC (History)
3 users (show)

See Also:
markj: mfc-stable14+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pat Maddox 2024-11-18 11:14:36 UTC
Run makefs twice, and it will produce zpools with the same guid. It makes it impossible to import them at the same time. You have to import them one at a time and `zpool reguid` them.
Comment 1 Pat Maddox 2024-11-18 12:26:29 UTC
I see there is a comment [1] about using a fixed seed:

	/*
	 * Use a fixed seed to provide reproducible pseudo-random numbers for
	 * on-disk structures when needed (e.g., GUIDs, ZAP hash salts).
	 */

When are these needed to be reproducible?

Should makefs take another flag to produce random GUIDs, or have a note in the man page that it will always produce the same GUID? I spent quite a bit of time trying to load two images into bhyve before realizing it was a guid conflict.

I don't think it should be necessary to import the zpool and reguid it, so I'd be in favor of a flag if there's some reason the default should always produce the same GUID.

[1] https://cgit.freebsd.org/src/tree/usr.sbin/makefs/zfs.c#n787
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2024-11-18 14:27:11 UTC
The same GUID is used because I didn't want to break reproducibility of VM images (the main use-case for makefs -t zfs).  That is, if you and I both build an image with the same inputs, the output images should be byte-identical.

Certainly the documentation is deficient, I'll work on that.  I don't have strong feelings on what the default behaviour should be, but I'm a bit inclined towards keeping the current default and adding a non-reproducible mode.  How exactly are you using makefs?
Comment 3 Pat Maddox 2024-11-18 16:47:04 UTC
I am using it to create two disk images that I attach to a single bhyve. One is zroot and gets replaced periodically (using it like a BE basically). The second is zdata which is long-lived.

So, I made two images, started bhyve, and it failed to boot [1]. Turns out it’s because the zpools have the same guid.

[1] https://gist.github.com/patmaddox/da981282718fc033b05053716bc36144#file-2_first_boot-txt
Comment 4 Pat Maddox 2024-11-18 16:50:37 UTC
bhyve command is:

bhyve -c 2 -m 4G -A -H -P \
  -s 0:0,hostbridge \
  -s 1:0,virtio-net,tap1 \
  -s 2:0,ahci-hd,/tmp/bhyve-pb/poudriere-builder-15.0-stabweek-2024-10.img \
  -s 3:0,ahci-hd,/tmp/bhyve-pb/zdata.img \
  -s 31,lpc -l com1,stdio \
  -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
  pb
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2024-11-18 16:56:26 UTC
(In reply to Pat Maddox from comment #3)
But the root pool should be reguid'ed on first boot anyway.  The official VM images configure this automatically (they set zpool_reguid=zroot in /etc/rc.conf), and in general you'd want to make sure that two VMs using the same image will have different pool GUIDs.  How did you build the root pool?
Comment 6 Pat Maddox 2024-11-18 17:13:22 UTC
My root pool is copied from release / our prior convo:

    makefs -t zfs -s 20g \
	   -o poolname=zroot -o bootfs=zroot/ROOT/default -o rootpath=/ \
	   -o fs=zroot\;mountpoint=none \
	   -o fs=zroot/ROOT\;mountpoint=none \
	   -o fs=zroot/ROOT/default\;mountpoint=/ \
	   -o fs=zroot/home\;mountpoint=/home \
	   -o fs=zroot/tmp\;mountpoint=/tmp\;exec=on\;setuid=off \
	   -o fs=zroot/usr\;mountpoint=/usr\;canmount=off \
	   -o fs=zroot/usr/ports\;setuid=off \
	   -o fs=zroot/usr/src \
	   -o fs=zroot/usr/obj \
	   -o fs=zroot/var\;mountpoint=/var\;canmount=off \
	   -o fs=zroot/var/audit\;setuid=off\;exec=off \
	   -o fs=zroot/var/crash\;setuid=off\;exec=off \
	   -o fs=zroot/var/log\;setuid=off\;exec=off \
	   -o fs=zroot/var/mail\;atime=on \
	   -o fs=zroot/var/tmp\;setuid=off \
	   ${outfileroot} ${rootdir}

and the data pool is another typical invocation:

    makefs -t zfs -s 100m \
	   -o poolname=zdata -o rootpath=/ \
	   -o fs=zdata\;mountpoint=/\;canmount=noauto \
	   -o fs=zdata/usr\;mountpoint=/usr\;canmount=off \
	   -o fs=zdata/usr/local\;canmount=off \
	   -o fs=zdata/usr/local/poudriere \
	   ${BUILDDIR}/zdata.zfs ${BUILDDIR}/data

-----

> the root pool should be reguid'ed on first boot anyway.  The official VM images configure this automatically (they set zpool_reguid=zroot in /etc/rc.conf)

Good to know, I will check that out. I would kind of expect it to not work, because I think the boot process doesn't even make it that far as I showed above. I'll try it out and report back though.

So I think it may be worth providing a way to 1) randomize the guid on creation and/or 2) seed the RNG on creation.

Extend this to a third disk: I have one root pool, one read-only pool with a dataset, and a third writable pool that contains long-lived data. I need all of these to have different GUIDs.

The reason I may want to seed the RNG is because if I replace the root pool, I want the VM to think it's the same. From the VM standpoint, it's like I exported the pool, imported it to another host, did some stuff on it, and imported it back on the VM. I happen to be reconstructing the disk via code, but no reason the VM needs to know that.
Comment 7 Pat Maddox 2024-11-18 17:31:30 UTC
I'll have to think about this some more... because I wonder if zfs-reguid should accept a fixed value?

Consider this: a build script that creates a root pool, a read-only data pool, and a writeable data pool. I would want the build script to just produce a single image each. Then when attaching them to VMs, I would want them to have a different GUID per VM - but also retain their GUIDs within a single VM.

So it would look like:

for vm in vm1 vm2 vm3; do
  cp root.zfs ${vm}.root.zfs
  import_pool ${vm}.root.zfs
  zfs reguid ${vm}-root $(lookup_guid vm1 root)
  export_pool ${vm}-root

  cp data.zfs ${vm}.data.zfs
  import_pool ${vm}.data.zfs
  zfs reguid ${vm}-data $(lookup_guid vm1 data)
  export_pool ${vm}-data
done
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2024-11-18 18:43:08 UTC
You're booting a VM with two disks that each have their own pool generated by makefs, and the kernel can't mount root because both pools have the same GUID?  That seems surprising.  I just tried that experiment myself and was able to boot, so I think something else is going on there.
Comment 9 Pat Maddox 2024-11-18 20:51:01 UTC
Here's an example I put together: https://github.com/patmaddox/lab/blob/trunk/share/examples/bhyve/two-makefs-images/Makefile

Does that work for you? Or do you see something wrong I'm doing in it?

For me it fails with:




Mounting from zfs:zroot failed with error 22; retrying for 3 more seconds
random: unblocking device.
Mounting from zfs:zroot failed with error 22.

Loader variables:
  vfs.root.mountfrom=zfs:zroot

Manual root filesystem specification:
  <fstype>:<device> [options]
      Mount <device> using filesystem <fstype>
      and with the specified (optional) option list.

    eg. ufs:/dev/da0s1a
        zfs:zroot/ROOT/default
        cd9660:/dev/cd0 ro
          (which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)

  ?               List valid disk boot devices
  .               Yield 1 second (for background tasks)
  <empty line>    Abort manual input
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2024-11-18 23:33:11 UTC
(In reply to Pat Maddox from comment #9)
Just a guess, but is the root pool missing a bootfs property?
Comment 11 Pat Maddox 2024-11-19 00:46:35 UTC
> is the root pool missing a bootfs property?

I believe per your article it's not necessary: https://freebsdfoundation.org/zfs-images-from-scratch-or-makefs-t-zfs/

And based on my observation, it's not necessary.

In any case, I have added bootfs to the example: https://github.com/patmaddox/lab/commit/5e351e22a45cfe43af0d7709b954033319abb457

Same behavior. If you boot without the second disk, it boots. If you reguid either pool, it works. It's when both pools have the same guid that it fails to boot.

If my example fails for you, then there's something different between your test and mine.

If my example passes for you, then there's something different between your machine and mine.
Comment 12 Mark Johnston freebsd_committer freebsd_triage 2024-11-19 01:24:35 UTC
(In reply to Pat Maddox from comment #11)
If I use your makefile, but modify the bhyve/bhyveload invocation to boot the official 14.2-BETA2 zfs image[1], it boots fine.  I verified that all three pools have the same guid, per zdb -u it's 4116862866898151352.  So I suspect that there's something else going on, and that importing one of the pools has some side effect which fixes the problem.

In particular, if I generate zdata.zfs using your script, then boot the 14.2 image into single user mode (so reguid hasn't run), I can see that both pools have the same GUID and yet the kernel was able to mount root successfully.  So something else is going on.

root@:/ # zdb -l /dev/ada0p4
------------------------------------
LABEL 0 
------------------------------------
    txg: 4
    version: 5000
    state: 1
    name: 'zroot'
    pool_guid: 4016146626377348012
    top_guid: 100716240520803340
    guid: 100716240520803340
    vdev_children: 1
    features_for_read:
    vdev_tree:
        type: 'disk'
        ashift: 12
        asize: 5363990528
        guid: 100716240520803340
        id: 0
        path: '/dev/null'
        whole_disk: 1
        create_txg: 4
        metaslab_array: 2
        metaslab_shift: 29
    labels = 0 1 2 3 
root@:/ # zdb -l /dev/ada1
------------------------------------
LABEL 0 
------------------------------------
    txg: 4
    version: 5000
    state: 1
    name: 'zdata'
    pool_guid: 4016146626377348012
    top_guid: 100716240520803340
    guid: 100716240520803340
    vdev_children: 1
    features_for_read:
    vdev_tree:
        type: 'disk'
        ashift: 12
        asize: 100139008
        guid: 100716240520803340
        id: 0
        path: '/dev/null'
        whole_disk: 1
        create_txg: 4
        metaslab_array: 2
        metaslab_shift: 24
    labels = 0 1 2 3 

I tried booting with all three pools, and that works too.

[1] https://download.freebsd.org/releases/VM-IMAGES/14.2-BETA3/amd64/Latest/FreeBSD-14.2-BETA3-amd64-zfs.raw.xz
Comment 13 Pat Maddox 2024-11-19 01:28:26 UTC
Just so I'm understanding correctly: if you run `make` on my example (perhaps taking out /adjusting the tap device first), it boots all the way to the login prompt?
Comment 14 Mark Johnston freebsd_committer freebsd_triage 2024-11-19 01:54:40 UTC
(In reply to Pat Maddox from comment #13)
No, if I run your example unmodified, I can reproduce the problem.  But if I substitute a FreeBSD zfs image for your zroot.zfs, the VM boots to a login prompt, despite both the FreeBSD image and zdata.zfs having the same pool GUID.
Comment 15 commit-hook freebsd_committer freebsd_triage 2024-11-19 21:19:09 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a20249443be111e8a3cb3b7bbe4a0d0e460a6058

commit a20249443be111e8a3cb3b7bbe4a0d0e460a6058
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-11-19 21:07:56 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-11-19 21:18:38 +0000

    makefs.8: Clarify that makefs-generated zpools always have the same GUID

    PR:             282832
    MFC after:      1 week

 usr.sbin/makefs/makefs.8 | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)
Comment 16 Pat Maddox 2024-11-19 21:21:08 UTC
Doc update works for me! Thanks

(I don't know if I get to close this PR, or someone else should)
Comment 17 Mark Johnston freebsd_committer freebsd_triage 2024-11-19 21:22:46 UTC
(In reply to Pat Maddox from comment #16)
I'll take care of closing this after the change is merged to stable/14.
Comment 18 commit-hook freebsd_committer freebsd_triage 2024-11-28 14:39:42 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c9f9f1a282ab81276feb81d0509e44535ebda504

commit c9f9f1a282ab81276feb81d0509e44535ebda504
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-11-19 21:07:56 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-11-28 14:38:17 +0000

    makefs.8: Clarify that makefs-generated zpools always have the same GUID

    PR:             282832
    MFC after:      1 week

    (cherry picked from commit a20249443be111e8a3cb3b7bbe4a0d0e460a6058)

 usr.sbin/makefs/makefs.8 | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)