Bug 166566 - [zfs] zfs split renders 2 disk (MBR based) mirror unbootable
Summary: [zfs] zfs split renders 2 disk (MBR based) mirror unbootable
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Andriy Gapon
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-02 04:10 UTC by hartzell
Modified: 2012-06-30 09:03 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description hartzell 2012-04-02 04:10:11 UTC
I have a mac pro running 9-STABLE.  Two disks are part of a bootable zfs mirror.  They're MBR based.

------------------------------------------------------------------------------------
(delicious)[8:00pm]~>>gpart show ada1
=>        63  1953525105  ada1  MBR  (931G)
          63  1953525105     1  freebsd  [active]  (931G)

(delicious)[8:01pm]~>>gpart show ada1s1
=>         0  1953525105  ada1s1  BSD  (931G)
           0  1941962752       1  freebsd-zfs  (926G)
  1941962752    11562353       2  freebsd-swap  (5.5G)
------------------------------------------------------------------------------------

The mirror is currently resilvering, unrelated to this bug report.

------------------------------------------------------------------------------------
(delicious)[8:01pm]~>>zpool status zroot
  pool: zroot
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Apr  1 19:55:59 2012
        12.9G scanned out of 523G at 34.7M/s, 4h11m to go
        12.9G resilvered, 2.47% done
config:

	NAME         STATE     READ WRITE CKSUM
	zroot        ONLINE       0     0     0
	  mirror-0   ONLINE       0     0     0
	    ada3s1a  ONLINE       0     0     0
	    ada1s1a  ONLINE       0     0     0  (resilvering)

errors: No known data errors
------------------------------------------------------------------------------------

/boot/loader.conf contains:
vfs.root.mountfrom="zfs:zroot"

and zroot has it's bootfs set to zroot.

This system boots from either disk and runs happily.

I tried a zpool split on it

  zpool split zroot zsplitroot

and it booted up until the kernel tried to mount the root filesystem and it failed.

Fix: 

I was able to repair the situation by booting from a 9.0 DVD, loading the zfs kernel module, doing

  zpool import

which showed both pools

  zpool import -f -o cachefile=/tmp/zpool.cache -o altroot=/mnt zroot
  mount -t zfs zroot /mnt
  cp /tmp/zpool.cache /mnt/boot/zfs/zpool.cache

then destroying the splitroot pool and attaching it back to the mirror.
How-To-Repeat: I believe that setting up a bootable zfs mirror and running zpool split on it should repeat the problem.  It does for me.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2012-04-02 07:12:29 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 Andriy Gapon freebsd_committer freebsd_triage 2012-04-02 08:15:01 UTC
A few things missing from your port:

1. "Doesn't boot" is quite a poor description in comparison with other details
that you provided.  You should give more detailed information of the boot failure.

2. gpart information for ada3

3. You don't say which disk ended up as zroot and as zsplitroot after the split.

4. You don't say which disk is configured as a boot disk in BIOS.

-- 
Andriy Gapon
Comment 3 hartzell 2012-04-02 16:47:53 UTC
Thanks for following up on this.

Andriy Gapon writes:
 > 
 > A few things missing from your port:
 > 
 > 1. "Doesn't boot" is quite a poor description in comparison with
 > other details that you provided.  You should give more detailed
 > information of the boot failure. 

As the kernel is loading it fails to mount the root partition and
presents one with the minimal mountroot dialog.  Attempting to boot
from zfs:zroot or zfs:zsplitroot fails.  I remember that a question
mark lists various other devices but don't remember the particulars.

 > 2. gpart information for ada3

Identical to ada1.  Both disks have an MBR with one slice, which has a
BSD label with two partitions, a (926GB, type freebsd-zfs) and b
(5.5GB, type freebsd-swap).

   (delicious)[8:45am]~>>gpart show ada1
   =>        63  1953525105  ada1  MBR  (931G)
             63  1953525105     1  freebsd  [active]  (931G)
   
   (delicious)[8:45am]~>>gpart show ada1s1
   =>         0  1953525105  ada1s1  BSD  (931G)
              0  1941962752       1  freebsd-zfs  (926G)
     1941962752    11562353       2  freebsd-swap  (5.5G)
   
   (delicious)[8:46am]~>>gpart show ada3
   =>        63  1953525105  ada3  MBR  (931G)
             63  1953525105     1  freebsd  [active]  (931G)
   
   (delicious)[8:46am]~>>gpart show ada3s1
   =>         0  1953525105  ada3s1  BSD  (931G)
              0  1941962752       1  freebsd-zfs  (926G)
     1941962752    11562353       2  freebsd-swap  (5.5G)

Both have boot bits set up like this:

  gpart bootcode -b /boot/boot0 adaX
  dd if=/boot/zfsboot of=/dev/adaXs1 count=1
  dd if=/boot/zfsboot of=/dev/adaXs1a skip=1 seek=1024

 > 3. You don't say which disk ended up as zroot and as zsplitroot
 > after the split.

zpool status showed only

  zroot ada3s1a

and zpool import showed

  zsplitroot ada1s1a

 > 4. You don't say which disk is configured as a boot disk in BIOS.

This is a mac pro (tower), so BIOS is kind of a slippery concept.  I
leave the 'startup disk' set to the (other) OS X disks.  On power up I
hold down the option key and am presented with a dialog from which I
can select any of the bootable devices in the box.

When things are working correctly I can boot from either of the disks
in the ZFS mirror and things go well.  Now that I've upgraded I can
even pull one of the disks before powering up and boot from the other
(older zfs bootstrapping stuff used to have a problem with broken
mirrors).

After the zfs split I am unable to boot from either disk.

g.
Comment 4 Andriy Gapon freebsd_committer freebsd_triage 2012-04-05 09:10:32 UTC
on 02/04/2012 18:47 George Hartzell said the following:
> 
> Thanks for following up on this.
> 
> Andriy Gapon writes:
>  > 
>  > A few things missing from your port:
>  > 
>  > 1. "Doesn't boot" is quite a poor description in comparison with
>  > other details that you provided.  You should give more detailed
>  > information of the boot failure. 
> 
> As the kernel is loading it fails to mount the root partition and
> presents one with the minimal mountroot dialog.  Attempting to boot
> from zfs:zroot or zfs:zsplitroot fails.  I remember that a question
> mark lists various other devices but don't remember the particulars.

Thank you for additional detailed information.
Could you please set vfs.zfs.debug=1 in your loader.conf and reproduce the
problem and then report messages that appear just before and during mount attempt?
Pictures of your screen would do just fine if you are unable to capture the
messages as text.  If you are unsure what to report please report more rather
than less.


-- 
Andriy Gapon
Comment 5 Andriy Gapon freebsd_committer freebsd_triage 2012-06-26 11:47:50 UTC
[restoring bug-followup]

on 25/06/2012 19:48 George Hartzell said the following:
> Here are two images (more than one screen full) of the lsdev -v output
> from the loader that *actually loads the system* (when it's
> working...).

I have a theory of what's going on.

I believe that after zpool split the following items get updated with new
information:
- vdev label on the disk that remains in the main pool (ada3 + zroot)
- vdev label on the disk that goes to the new pool (ada1 + zsplitroot)
- zpool.cache file in the main/active/remaining pool (zroot)

The following item still has outdated information:
- zpool.cache file in the new pool (zsplitroot)

This happens because the new pool gets the contents of the original pool at
split start time (before any new ids are generated).  The file can not be
updated automatically because the new pool remains "un-imported" (exported)
after the split.  If it is desired that the zsplitroot's zpool.cache is updated
it has to be done manually - by importing the pool, etc.

I believe that what you see is a result of you always booting in such a way that
the zfs boot code and zfsloader find zsplitroot pool before zroot pool.  This is
confirmed by the screenshot which shows that zsplitroot is listed before zroot.
Because of that the stale zpool.cache file is used and as a result the ZFS code
in kernel can not find disks/pools based on the stale IDs.

I think that you have to change the boot order using BIOS, so that you boot from
ada3 disk.  You should verify at the loader prompt that that is indeed the case
and zroot is found first and is used as a boot pool.

If your BIOS either doesn't allow to change boot order, or lies about it or
doesn't change bios disk numbering such that a boot disk is the first drive
(disk0 / "BIOS drive C"), then I recommend thatyou set 'currdev' loader variable
to point to zroot pool.  Depending on your zfsloader version it should be done
in one of the following ways:
set currdev=zfs:zroot:
set currdev=zfs1
You can examine default value of the variable (with 'show' command) to see which
scheme should be used.

Please test this.

-- 
Andriy Gapon
Comment 6 hartzell 2012-06-26 22:38:17 UTC
Andriy Gapon writes:
 > 
 > [restoring bug-followup]
 > 
 > on 25/06/2012 19:48 George Hartzell said the following:
 > > Here are two images (more than one screen full) of the lsdev -v output
 > > from the loader that *actually loads the system* (when it's
 > > working...).
 > 
 > I have a theory of what's going on.
 > 
 > I believe that after zpool split the following items get updated with new
 > information:
 > - vdev label on the disk that remains in the main pool (ada3 + zroot)
 > - vdev label on the disk that goes to the new pool (ada1 + zsplitroot)
 > - zpool.cache file in the main/active/remaining pool (zroot)
 > 
 > The following item still has outdated information:
 > - zpool.cache file in the new pool (zsplitroot)
 > 
 > This happens because the new pool gets the contents of the original pool at
 > split start time (before any new ids are generated).  The file can not be
 > updated automatically because the new pool remains "un-imported" (exported)
 > after the split.  If it is desired that the zsplitroot's zpool.cache is updated
 > it has to be done manually - by importing the pool, etc.
 > 
 > I believe that what you see is a result of you always booting in such a way that
 > the zfs boot code and zfsloader find zsplitroot pool before zroot pool.  This is
 > confirmed by the screenshot which shows that zsplitroot is listed before zroot.
 > Because of that the stale zpool.cache file is used and as a result the ZFS code
 > in kernel can not find disks/pools based on the stale IDs.
 > 
 > I think that you have to change the boot order using BIOS, so that you boot from
 > ada3 disk.  You should verify at the loader prompt that that is indeed the case
 > and zroot is found first and is used as a boot pool.
 > 
 > If your BIOS either doesn't allow to change boot order, or lies about it or
 > doesn't change bios disk numbering such that a boot disk is the first drive
 > (disk0 / "BIOS drive C"), then I recommend thatyou set 'currdev' loader variable
 > to point to zroot pool.  Depending on your zfsloader version it should be done
 > in one of the following ways:
 > set currdev=zfs:zroot:
 > set currdev=zfs1
 > You can examine default value of the variable (with 'show' command) to see which
 > scheme should be used.
 > 
 > Please test this.

We're very close.

First thing, I discovered that I was wrong about its being able to
boot from either disk.  I don't think I ever misspoke to you (don't
see it in the bug report, can't find it in our personal emails) but I
certainly had it in my head that I had tried booting from both disks
and neither worked.  It turns out that one will boot but the other
will not.

Some background (for the bug trail).  This is a mac pro with four
internal SATA disks.  When you power on with the option key held down
you're presented with a graphic view of things that you can boot from.
In my configuration I have two disks labeled "dinky 1" and "dinky 2"
(Mac disks in a Mac OS X software raid) and two labeled "Windows" (the
boot stuff considers anything with an MBR to be Windows, sigh...).
Macs play fast and loose with device numbering/naming and I have no
way to tell the Windows devices apart.  For the rest of this
discussion I'll just refer to the four drives as A, B, C, and D (dinky
1, dinky 2, Windows, Windows) in left to right order.  

Out of habit, I tend to boot into BSD on drive C (two hits of the
right arrow key).

While I was playing with setting currdev in the loader I realized that
I could boot from disk D but not from disk C (no matter how I set
currdev).  

It turns out that when I boot from drive C and do an lsdev -v at the
loader prompt I get

   ...
   zsplitroot  ada1s1a
   zroot       ada3s1a

but/and when I boot from drive D and do an lsdev -v at the loader
prompt I get

   ...
   zroot       ada3s1a
   zsplitroot  ada1s1a

Notice that the order is different (confirming your observation).

When I boot off of C using vfs.zfs.debug=1 I get messages about
mismatched GUID's and failure to open the device it's looking for.
When I boot off of D things are fine.  This is consistent with your
idea that there is incorrect information in the zpool.cache file on
the filesystem in the zsplitroot pool.

currdev does not seem to have any effect, it looks like something else
is being used to find the initial zfs pool(s).

I'm not sure what there is to do to make the situation better.  It's
(probably) not the usual use case that zpool splilt is being used to
split the pool that contains the filesystem that contains the
zpool.cache file, so it would be an awfully special case to do
something special to handle it.

It seems like "the right thing" to do would be for the user (me) to do
the zpool split with the -R option then copy the correct zpool.cache
file into the split-off pool's root filesystem.  I'll repair my
currently broken mirror and give that a try.

Thanks for all the help!

g.
Comment 7 Andriy Gapon freebsd_committer freebsd_triage 2012-06-27 06:49:44 UTC
on 27/06/2012 00:38 George Hartzell said the following:
> currdev does not seem to have any effect, it looks like something else
> is being used to find the initial zfs pool(s).

Just a note that currdev would not affect the order of the pools in lsdev
output.  It should affect from which pool the zpool.cache is loaded.

Ah!  You probably need to issue unload command as well.  I keep forgetting that
in default configuration loader loads up stuff before presenting its menu.  I've
changed my loader.rc, so that nothing is loaded before the menu.

But, yes, the best course of action seems to be to fix up zsplitroot right after
splitting it off.

Thank you for your persistence in testing and debugging!

-- 
Andriy Gapon
Comment 8 hartzell 2012-06-27 22:53:21 UTC
Andriy Gapon writes:
 > on 27/06/2012 00:38 George Hartzell said the following:
 > > currdev does not seem to have any effect, it looks like something else
 > > is being used to find the initial zfs pool(s).
 > 
 > Just a note that currdev would not affect the order of the pools in lsdev
 > output.  It should affect from which pool the zpool.cache is loaded.
 > 
 > Ah!  You probably need to issue unload command as well.  I keep forgetting that
 > in default configuration loader loads up stuff before presenting its menu.  I've
 > changed my loader.rc, so that nothing is loaded before the menu.
 > 
 > But, yes, the best course of action seems to be to fix up zsplitroot right after
 > splitting it off.
 > 
 > Thank you for your persistence in testing and debugging!

I thought the following would work, but it does not.

  zpool split -R /zsplitroot zroot zsplitroot
  zpool status  # shows both pools.
  mount -t zfs zsplitroot /zsplitroot  # my zfs stuff doesn't auto mount
  cp /boot/zfs/zpool.cache /zsplitroot/boot/zfs
  perl -pi.bak -e 's|zfs:zroot|zfs:zsplitroot|' /zsplitroot/boot/loader.conf
  umount /zsplitroot

It fails to mount zsplitroot.  Worse, setting vfs.zfs.debug=1 results
in no additional output, just that the error is number 2.

Any idea what I'm missing?

g.
Comment 9 Andriy Gapon freebsd_committer freebsd_triage 2012-06-28 07:48:26 UTC
on 28/06/2012 00:53 George Hartzell said the following:
> Andriy Gapon writes:
>  > on 27/06/2012 00:38 George Hartzell said the following:
>  > > currdev does not seem to have any effect, it looks like something else
>  > > is being used to find the initial zfs pool(s).
>  > 
>  > Just a note that currdev would not affect the order of the pools in lsdev
>  > output.  It should affect from which pool the zpool.cache is loaded.
>  > 
>  > Ah!  You probably need to issue unload command as well.  I keep forgetting that
>  > in default configuration loader loads up stuff before presenting its menu.  I've
>  > changed my loader.rc, so that nothing is loaded before the menu.
>  > 
>  > But, yes, the best course of action seems to be to fix up zsplitroot right after
>  > splitting it off.
>  > 
>  > Thank you for your persistence in testing and debugging!
> 
> I thought the following would work, but it does not.
> 
>   zpool split -R /zsplitroot zroot zsplitroot
>   zpool status  # shows both pools.
>   mount -t zfs zsplitroot /zsplitroot  # my zfs stuff doesn't auto mount
>   cp /boot/zfs/zpool.cache /zsplitroot/boot/zfs
>   perl -pi.bak -e 's|zfs:zroot|zfs:zsplitroot|' /zsplitroot/boot/loader.conf
>   umount /zsplitroot
> 
> It fails to mount zsplitroot.  Worse, setting vfs.zfs.debug=1 results
> in no additional output, just that the error is number 2.
> 
> Any idea what I'm missing?


/boot/zfs/zpool.cache after split contains only information about zroot.  Thus
it's kind of useless on zsplitroot.
I think that you need to do zpool import -R ... -c ... zsplitroot and copy the
proper cache file.

-- 
Andriy Gapon
Comment 10 hartzell 2012-06-30 00:33:37 UTC
Andriy Gapon writes:
 > on 28/06/2012 00:53 George Hartzell said the following:
 > [...]
 > > I thought the following would work, but it does not.
 > > 
 > >   zpool split -R /zsplitroot zroot zsplitroot
 > >   zpool status  # shows both pools.
 > >   mount -t zfs zsplitroot /zsplitroot  # my zfs stuff doesn't auto mount
 > >   cp /boot/zfs/zpool.cache /zsplitroot/boot/zfs
 > >   perl -pi.bak -e 's|zfs:zroot|zfs:zsplitroot|' /zsplitroot/boot/loader.conf
 > >   umount /zsplitroot
 > > 
 > > It fails to mount zsplitroot.  Worse, setting vfs.zfs.debug=1 results
 > > in no additional output, just that the error is number 2.
 > > 
 > > Any idea what I'm missing?
 > 
 > 
 > /boot/zfs/zpool.cache after split contains only information about zroot.  Thus
 > it's kind of useless on zsplitroot.
 > I think that you need to do zpool import -R ... -c ... zsplitroot and copy the
 > proper cache file.

I thought that adding the "-R /zsplitroot" arg to the zpool split so
that also did the import would result in a zpool.cache file that
contained by.  zpool status after the split shows both pool, which
I didn't think was the case if you don't use -R.

g.
Comment 11 Andriy Gapon freebsd_committer freebsd_triage 2012-06-30 08:59:33 UTC
on 30/06/2012 02:33 George Hartzell said the following:
> I thought that adding the "-R /zsplitroot" arg to the zpool split so
> that also did the import would result in a zpool.cache file that
> contained by.  zpool status after the split shows both pool, which
> I didn't think was the case if you don't use -R.

With -R zplitroot is added to the main zpool.cache in /boot/zfs (on zroot).
Nothing is done with zpool.cache in zplitroot as far as I understand.

-- 
Andriy Gapon
Comment 12 Andriy Gapon freebsd_committer freebsd_triage 2012-06-30 09:01:47 UTC
State Changed
From-To: open->closed

Analysis has not revealed any FreeBSD ZFS bug. 


Comment 13 Andriy Gapon freebsd_committer freebsd_triage 2012-06-30 09:01:47 UTC
Responsible Changed
From-To: freebsd-fs->avg

Record interest in further developments on this report.