Bug 214911 - bsdinstall/scripts/zfsboot detaches *all* GELI devices
Summary: bsdinstall/scripts/zfsboot detaches *all* GELI devices
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 11.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-sysinstall (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-28 17:39 UTC by Tim Chase
Modified: 2016-11-29 17:01 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Chase 2016-11-28 17:39:09 UTC
Attempting to create a `md0` drive image to upload to my hosting service, I'm using `bsdinstall` to try and do the install. This worked in 10.3, but changes in 11 trigger issues.  To repeat, issue the following on an existing 11.0 system installed on a ZFS-on-GELI configuration:

    ## show the current system is on GELI
    # geli list | grep Name
    Name: ada0p4.eli
    Name: ada0p4
    Name: ada0p3.eli
    Name: ada0p3

    ## show that ada0p4.eli is the only backing member of zroot
    # zpool status zroot
    ...
        NAME          STATE     READ WRITE CKSUM
        zroot         ONLINE       0     0     0
          ada0p4.eli  ONLINE       0     0     0

    ## create a 10GB disk image file
    dd if=/dev/zero of=freebsd.img bs=10m count=1k

    ##turn it into a disk
    # mdconfig -f freebsd.img -u 0

    ## run bsdinstall
    # bsdinstall

Specify guided ZFS-on-root, a single stripe backed by `md0` and encrypt both the pool and the swap.  Proceeding with install the will show messages on the console that *ada0p4.eli* has been detached and the host machine will hang for the obvious reason that its underlying GEOM_ELI has been forcibly detached and there's no longer any root file-system.

Digging further shows that /usr/src/usr.sbin/bsdinstall/scripts/zfsboot is forcibly detaching *all* GELI providers (around line 793, using GELI_DETACH_F), not just those that it created during the install process.
Comment 1 Allan Jude freebsd_committer 2016-11-28 19:39:17 UTC
The point if this code is to detach anything left over from previous runs of the installer, where no state is kept.

I didn't consider the case of people running bsdinstall on a live system. I am not sure how to deal with both of these cases at the same time.
Comment 2 Tim Chase 2016-11-28 19:56:14 UTC
When issuing "bsdinstall -D debug.log", the tail of my log (right before the hang/freeze) is

DEBUG: zfs_create_diskpart: disk=[md0] index=[0]
DEBUG: zfs_create_diskpart: Exporting ZFS pools...   
DEBUG: zfs_create_diskpart: zpool export -f "zroot"

So the attempt to "zpool export -f" my root also seems to be taking down the GELI according to my console.  But I need that zroot to stick around since that's where bsdinstall and its supporting files are located.
Comment 3 Tim Chase 2016-11-28 20:04:41 UTC
(In reply to Allan Jude from comment #1)

Is it possible to detect whether any pieces of a previous install are there and at least ask before attempting to drop them all on the floor?  Possibly within those `while` loops:

  zpool "bootpool" already exists. Drop it? (y/N) N
  zpool "targetpool" already exists. Drop it? (y/N) Y
  zpool "zpool" already exists. Drop it? (y/N) N
  geli "md0p1" already exists. Drop it? (y/N) Y
  geli "ada0p3" already exists. Drop it? (y/N) N

(there might be similar issues for the `graid` cleanup that don't come into play in my test setup since I'm not doing any flavor of raid).
Comment 4 Tim Chase 2016-11-28 20:43:59 UTC
Maddeningly, I commented out lines 782-802 of zfs_create_diskpart() where those pools get exported and the GELIs get detached, and made sure that there was nothing on md0 (no gparts, no geli, no zpools; just a zeroed out device) but something is still detaching my local GELI (and possibly exporting my zroot pool as well) when I confirm "yes, nuke md0".
Comment 5 Allan Jude freebsd_committer 2016-11-28 20:48:13 UTC
(In reply to Tim Chase from comment #2)
Are you trying to create a new zpool on md0 with the same name as your currently imported root pool? This will not work.
Comment 6 Tim Chase 2016-11-28 21:23:42 UTC
(In reply to Allan Jude from comment #5)

TL;DR: no.

I already learned that the hard way (and what got me poking in the zfsboot script in the first place), so I'm issuing 

  # export ZFSBOOT_POOL_NAME=myzpool
  # export ZFSBOOT_BOOT_POOL_NAME=mybootpool
  # bsdinstall

to ensure the pool-names don't conflict with my local pools (the installer lets me change ZFSBOOT_POOL_NAME at run-time but not ZFSBOOT_BOOT_POOL_NAME).

When I examine the md0 device after the hang-up, it does indeed have my specified names as the pool names, so this shouldn't be the problem.
Comment 7 Tim Chase 2016-11-29 17:01:24 UTC
As an absolute worst-case remedy, it should attempt to detect that it's running on a live/installed system vs. an installer image and warn that it's only designed to be run from an installer.  This should save folks from having their systems hang unexpectedly when their root FS is pulled as an unintended consequence of attempting something that should be benign.