Attempting to create a `md0` drive image to upload to my hosting service, I'm using `bsdinstall` to try and do the install. This worked in 10.3, but changes in 11 trigger issues. To repeat, issue the following on an existing 11.0 system installed on a ZFS-on-GELI configuration:
## show the current system is on GELI
# geli list | grep Name
## show that ada0p4.eli is the only backing member of zroot
# zpool status zroot
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
ada0p4.eli ONLINE 0 0 0
## create a 10GB disk image file
dd if=/dev/zero of=freebsd.img bs=10m count=1k
##turn it into a disk
# mdconfig -f freebsd.img -u 0
## run bsdinstall
Specify guided ZFS-on-root, a single stripe backed by `md0` and encrypt both the pool and the swap. Proceeding with install the will show messages on the console that *ada0p4.eli* has been detached and the host machine will hang for the obvious reason that its underlying GEOM_ELI has been forcibly detached and there's no longer any root file-system.
Digging further shows that /usr/src/usr.sbin/bsdinstall/scripts/zfsboot is forcibly detaching *all* GELI providers (around line 793, using GELI_DETACH_F), not just those that it created during the install process.
The point if this code is to detach anything left over from previous runs of the installer, where no state is kept.
I didn't consider the case of people running bsdinstall on a live system. I am not sure how to deal with both of these cases at the same time.
When issuing "bsdinstall -D debug.log", the tail of my log (right before the hang/freeze) is
DEBUG: zfs_create_diskpart: disk=[md0] index=
DEBUG: zfs_create_diskpart: Exporting ZFS pools...
DEBUG: zfs_create_diskpart: zpool export -f "zroot"
So the attempt to "zpool export -f" my root also seems to be taking down the GELI according to my console. But I need that zroot to stick around since that's where bsdinstall and its supporting files are located.
(In reply to Allan Jude from comment #1)
Is it possible to detect whether any pieces of a previous install are there and at least ask before attempting to drop them all on the floor? Possibly within those `while` loops:
zpool "bootpool" already exists. Drop it? (y/N) N
zpool "targetpool" already exists. Drop it? (y/N) Y
zpool "zpool" already exists. Drop it? (y/N) N
geli "md0p1" already exists. Drop it? (y/N) Y
geli "ada0p3" already exists. Drop it? (y/N) N
(there might be similar issues for the `graid` cleanup that don't come into play in my test setup since I'm not doing any flavor of raid).
Maddeningly, I commented out lines 782-802 of zfs_create_diskpart() where those pools get exported and the GELIs get detached, and made sure that there was nothing on md0 (no gparts, no geli, no zpools; just a zeroed out device) but something is still detaching my local GELI (and possibly exporting my zroot pool as well) when I confirm "yes, nuke md0".
(In reply to Tim Chase from comment #2)
Are you trying to create a new zpool on md0 with the same name as your currently imported root pool? This will not work.
(In reply to Allan Jude from comment #5)
I already learned that the hard way (and what got me poking in the zfsboot script in the first place), so I'm issuing
# export ZFSBOOT_POOL_NAME=myzpool
# export ZFSBOOT_BOOT_POOL_NAME=mybootpool
to ensure the pool-names don't conflict with my local pools (the installer lets me change ZFSBOOT_POOL_NAME at run-time but not ZFSBOOT_BOOT_POOL_NAME).
When I examine the md0 device after the hang-up, it does indeed have my specified names as the pool names, so this shouldn't be the problem.
As an absolute worst-case remedy, it should attempt to detect that it's running on a live/installed system vs. an installer image and warn that it's only designed to be run from an installer. This should save folks from having their systems hang unexpectedly when their root FS is pulled as an unintended consequence of attempting something that should be benign.