Bug 140888 - [zfs] boot fail from zfs root while the pool resilvering
Summary: [zfs] boot fail from zfs root while the pool resilvering
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 8.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-26 07:20 UTC by Alexei Volkov
Modified: 2017-12-31 22:34 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alexei Volkov 2009-11-26 07:20:01 UTC
On the system boots directly from ZFS mirror or raidz pool (http://wiki.freebsd.org/RootOnZFS) replacing one of the components with subsequent accident power failure or reboot while the resilvering is running, stops boot process with the message:

ZFS: can only boot from disk, mirror or raidz vdevs
ZFS: inconsistent nvlist contents
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS
ZFS: unexpected object set type lld
ZFS: unexpected object set type lld

FreeBSD/i386 boot
Default: tank0:/boot/kernel/kernel
boot:
ZFS: unexpected object set type lld

FreeBSD/i386 boot
Default: tank0:/boot/kernel/kernel
boot:

Fix: 

As a workaround reboot from CD/DVD in fixit mode, then import the pool and wait resilvering complete.
How-To-Repeat: Install the system as described on http://wiki.freebsd.org/RootOnZFS  for non single device installation i.e. mirror, raidz.

Eventually has something like 

[root@fresh-inst:~]# zpool status
  pool: tank0
 state: ONLINE
 scrub: none requested
config:

        NAME              STATE     READ WRITE CKSUM
        tank0             ONLINE       0     0     0
          raidz1          ONLINE       0     0     0
            gpt/QM00002   ONLINE       0     0     0
            gpt/SN091234  ONLINE       0     0     0

errors: No known data errors

Lets assume that one of the component has a prefail condition and have to be replaced with new one. Power off the system and replace one of the HDD with onother one. Boot back to OS. Booting just fine for now.

Get zpool status to see missing component.

[root@fresh-inst:~]# zpool status
  pool: tank0
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 scrub: none requested
config:

        NAME              STATE     READ WRITE CKSUM
        tank0             DEGRADED     0     0     0
          raidz1          DEGRADED     0     0     0
            gpt/QM00002   UNAVAIL      0   327     0  cannot open
            gpt/SN091234  ONLINE       0     0     0

errors: No known data errors

Partition the new disk as required and get new gpt component ready for zpool replacement.

[root@fresh-inst:~]# gpart show -l
=>     34  8388541  ad0  GPT  (4.0G)
       34      128    1  (null)  (64K)
      162  8388413    2  SN091234  (4.0G)

=>     34  8388541  ad1  GPT  (4.0G)
       34      128    1  (null)  (64K)
      162  8388413    2  SN023432  (4.0G)

Run replacement command.

[root@fresh-inst:~]# zpool replace tank0 gpt/QM00002 gpt/SN023432

[root@fresh-inst:~]# zpool status
  pool: tank0
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 7.88% done, 0h4m to go
config:

        NAME                STATE     READ WRITE CKSUM
        tank0               DEGRADED     0     0     0
          raidz1            DEGRADED     0     0     0
            replacing       DEGRADED     0     0     0
              gpt/QM00002   UNAVAIL      0 2.17K     0  cannot open
              gpt/SN023432  ONLINE       0     0     0  39.5M resilvered
            gpt/SN091234    ONLINE       0     0     0  372K resilvered

errors: No known data errors

Initiate regular reboot (but could simulate be an instant power failure).

[root@fresh-inst:~]# reboot

The systems fails to boot with the following message:

Booting from Hard Disk...
ZFS: can only boot from disk, mirror or raidz vdevs
ZFS: inconsistent nvlist contents
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS
ZFS: unexpected object set type lld
ZFS: unexpected object set type lld

FreeBSD/i386 boot
Default: tank0:/boot/kernel/kernel
boot:
ZFS: unexpected object set type lld

FreeBSD/i386 boot
Default: tank0:/boot/kernel/kernel
boot:
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2009-11-26 08:08:58 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 kot 2009-11-26 11:02:12 UTC
I found, that it keep fail booting if has at least one device not ONLINE
and pool state DEGRADED.

For instance

[root@livecd8:/]# zpool status
  pool: tank0
 state: DEGRADED
 scrub: none requested
config:

        NAME                        STATE     READ WRITE CKSUM
        tank0                       DEGRADED     0     0     0
          raidz1                    DEGRADED     0     0     0
            replacing               DEGRADED     0     0     0
              12996219703647995136  UNAVAIL      0   298     0  was
/dev/gpt/QM00002
              gpt/SN023432          ONLINE       0     0     0
            gpt/SN091234            ONLINE       0     0     0

errors: No known data errors

considered as degraded even it has replace gpt/QM00002 with new gpt/SN023432.

Detaching UNAVAIL component turns pool to ONLINE state back.

 [root@livecd8:/]# zpool detach tank0 12996219703647995136
 [root@livecd8:/]# zpool status
   pool: tank0
  state: ONLINE
  scrub: none requested
 config:

         NAME              STATE     READ WRITE CKSUM
         tank0             ONLINE       0     0     0
           raidz1          ONLINE       0     0     0
             gpt/SN023432  ONLINE       0     0     0
             gpt/SN091234  ONLINE       0     0     0

 errors: No known data errors

This case lets to boot from tank0.

It also keeps booting fine in case of component is manually turns to
OFFLINE state in any combination, for instance like

[root@fresh-inst:~]# zpool status
  pool: tank0
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME              STATE     READ WRITE CKSUM
        tank0             DEGRADED     0     0     0
          raidz1          DEGRADED     0     0     0
            gpt/SN023432  ONLINE       0     0     0
            gpt/SN091234  OFFLINE      0   921     0

errors: No known data errors
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:59:14 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped