Bug 167272 - [zfs] ZFS Disks reordering causes ZFS to pick the wrong drive
Summary: [zfs] ZFS Disks reordering causes ZFS to pick the wrong drive
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 8.2-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-24 19:50 UTC by David Alves
Modified: 2017-12-31 22:32 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Alves 2012-04-24 19:50:10 UTC
Hello,

ZFS shows the disks labels when invoking zpool status, if a disk was removed (physically) and the server rebooted, a reordering of the disks is done, and it appears that the old label is used by a valid disk ( the slot of the removed disks don't contains any new disks)

ZFS reports it as follows :

	  raidz2       DEGRADED     0     0     0
	    da16       ONLINE       0     0     0
	    da17       ONLINE       0     0     0
	    da18       ONLINE       0     0     0
	    da19       ONLINE       0     0     0
	    da20       ONLINE       0     0     0
	    da21       OFFLINE      0     0     0
	    da21       ONLINE       0     0     0
	    da22       ONLINE       0     0     0
	  raidz2       DEGRADED     0     0     0
	    da23       ONLINE       0     0     0
	    da24       ONLINE       0     0     0
	    da25       ONLINE       0     0     0
	    da26       ONLINE       0     0     0
	    da27       ONLINE       0     0     0
	    da27       OFFLINE      0     0     0
	    da29       ONLINE       0     0     0
	    da30       ONLINE       0     0     0




Notice the da21 and da27 drives.
the old disks da21/da27 are shown offline (because they were offlined and removed) but the reordering as assigned those labels to others running drives.

The problem is when performing a "zpool replace", "zpool replace" will pick the first label when attempting to replace a disk

example when replacing da21:

It picked up the da21 offline drive to replace because it was the first on the list.

	  raidz2       DEGRADED     0     0     0
	    da16       ONLINE       0     0     0
	    da17       ONLINE       0     0     0
	    da18       ONLINE       0     0     0
	    da19       ONLINE       0     0     0
	    da20       ONLINE       0     0     0
	    replacing  DEGRADED     0     0     0
	      da21     OFFLINE      0     0     0
	      da31     ONLINE       0     0     0  37.1G resilvered
	    da21       ONLINE       0     0     0
	    da22       ONLINE       0     0     1  512 resilvered
	  raidz2       DEGRADED     0     0     0
	    da23       ONLINE       0     0     0
	    da24       ONLINE       0     0     0
	    da25       ONLINE       0     0     0
	    da26       ONLINE       0     0     0
	    da27       ONLINE       0     0     0
	    da27       OFFLINE      0     0     0
	    da29       ONLINE       0     0     0
	    da30       ONLINE       0     0     0

example when replacing da27:

It picked up the da27 online drive to replace because it was the first on the list.

	  raidz2       ONLINE       0     0     0
	    da16       ONLINE       0     0     0
	    da17       ONLINE       0     0     0
	    da18       ONLINE       0     0     0
	    da19       ONLINE       0     0     0
	    da20       ONLINE       0     0     0
	    da31       ONLINE       0     0     0
	    da21       ONLINE       0     0     0
	    da22       ONLINE       0     0     1
	  raidz2       DEGRADED     0     0     0
	    da23       ONLINE       0     0     0
	    da24       ONLINE       0     0     0
	    da25       ONLINE       0     0     0
	    da26       ONLINE       0     0     0
	    replacing  ONLINE       0     0     0
	      da27     ONLINE       0     0     0
	      da28     ONLINE       0     0     0  80.5G resilvered
	    da27       OFFLINE      0     0     0
	    da29       ONLINE       0     0     0
	    da30       ONLINE       0     0     0


That would be nice if we can choose exactly what drive from the pool we are going to replace.

Thanks you.

How-To-Repeat: To repeat the problem:

offline a drive
remove the drive
reboot
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2012-04-25 06:00:27 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-fs

Over to maintainer(s).
Comment 2 ichhasseesnamenauszuwaehlen 2012-04-26 16:40:19 UTC
You can use the "zdb" command to see the guid of the disk, and use that
to replace it.

eg.

        children[2]:
            type: 'disk'
            id: 2
            guid: *713716085525288922*
            path: '/dev/*da21*'
            phys_path: '/dev/*da21*'
            whole_disk: 1
            *not_present: 1*
            metaslab_array: 13154506
            metaslab_shift: 25
            ashift: 9
            asize: 4289200128
            is_log: 1
            DTL: 13647709
            create_txg: 2302081

zpool replace 713716085525288922 da31

Also, I had a barely related problem with 8.2-RELEASE (or may have been
the 8-STABLE from April which is in an iso on the ftp site), where
rebooting would make ZFS use the wrong disk in the pool. (eg. my hot
spare might end up being part of the raidz2, and so constantly create
checksum errors until I scrub it or move it back). And then I tried
warning others about it, and they would always say that I make no sense,
and zfs uses the guid so can't get confused, etc. and only shows the
label/device on the command output. So I tried reproducing it, and it
was impossible for me to reproduce with 8-STABLE (October 2011). Your
problem isn't quite the same, but may be related.

Also in 8-STABLE, any time I do weird stuff like you did, the disk shows
the guid on the next reboot instead of the old device name, and on the
right it will add a comment saying "was da21". (but mostly I use gpt
slices, so I might just not have enough experience with that particular
problem).

So I highly recommend you try 8.3-RELEASE (or 8-STABLE).

And one more warning... if you ugrade to newer than 8.2-RELEAE, and have
an old pool from 8.2-RELEASE and upgrade it to v28, you still can't
remove logs. It is bugged. You need to destroy and recreate.
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:01:22 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped