Bug 256582

Summary: ZFS unable to attach/replace disk to the mirror/raidz
Product: Base System Reporter: Sergei Masharov <serzh>
Component: kernAssignee: freebsd-fs (Nobody) <fs>
Status: Closed Works As Intended    
Severity: Affects Only Me CC: asomers, grahamperrin
Priority: ---    
Version: 13.0-RELEASE   
Hardware: amd64   
OS: Any   

Description Sergei Masharov 2021-06-13 18:21:48 UTC
I had a mirror of root partition on four disk. It was created a long time ago, last changes were made in 12.2-RELEASE

Yesterday I have faced a problem attaching new mirror to the pool, every time I got an error like this:
cannot attach ada0s1 to diskid/DISK-K648T9125YHPs4: can only attach to mirrors and top-level disks

replace also didn't worked for me:
zpool replace tz diskid/DISK-K648T9125YHPs4 ada0s1
cannot replace diskid/DISK-K648T9125YHPs4 with ada0s1: already in replacing/spare config; wait for completion or use 'zpool detach'

I was able to detach all the copies of the mirror, but even after left only one drive I sill was not able to add any mirrors to that pool

I even tried to do it on another machine with the same error.

Ok. maybe some problems with the pool, I wanted for a long time change mirror with raidz.

I have created new raidz pool on one disk, hoping to replace slices after successful boot on the target machine. zfs send/receive have finished an the system successfully booted from the new disk:

# zpool status -v t1
  pool: t1
 state: ONLINE
  scan: scrub repaired 0B in 00:10:16 with 0 errors on Sun Jun 13 19:43:27 2021
config:

        NAME        STATE     READ WRITE CKSUM
        t1          ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada0s1  ONLINE       0     0     0
            ada0s2  ONLINE       0     0     0
            ada0s3  ONLINE       0     0     0
            ada0s4  ONLINE       0     0     0

But I still unable to replace any disk in the pool:

# zpool replace t1 ada0s4 ada2s4
cannot replace ada0s4 with ada2s4: already in replacing/spare config; wait for completion or use 'zpool detach'

# zdb -C t1

MOS Configuration:
        version: 5000
        name: 't1'
        state: 0
        txg: 1975
        pool_guid: 6410165186213320141
        errata: 0
        hostname: 'proxy.expir.org'
        com.delphix:has_per_vdev_zaps
        vdev_children: 1
        vdev_tree:
            type: 'root'
            id: 0
            guid: 6410165186213320141
            create_txg: 4
            children[0]:
                type: 'raidz'
                id: 0
                guid: 7617222505941286006
                nparity: 1
                metaslab_array: 256
                metaslab_shift: 30
                ashift: 9
                asize: 171779817472
                is_log: 0
                create_txg: 4
                com.delphix:vdev_zap_top: 129
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 8241112573820439777
                    path: '/dev/ada0s1'
                    whole_disk: 1
                    DTL: 1111
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 130
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 9546081964875052177
                    path: '/dev/ada0s2'
                    whole_disk: 1
                    DTL: 1110
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 131
                children[2]:
                    type: 'disk'
                    id: 2
                    guid: 14962733358913721879
                    path: '/dev/ada0s3'
                    whole_disk: 1
                    DTL: 1109
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 132
                children[3]:
                    type: 'disk'
                    id: 3
                    guid: 6900695808444206279
                    path: '/dev/ada0s4'
                    whole_disk: 1
                    DTL: 1108
                    create_txg: 4
                    com.delphix:vdev_zap_leaf: 133
        features_for_read:
            com.delphix:hole_birth
            com.delphix:embedded_data

I have tried to add new disks as spares, but still was not able to replace any of the current members of raidz

What is going on? I think it is a bug, the error is certainly not related to the current configuration.
Comment 1 Alan Somers freebsd_committer freebsd_triage 2021-06-13 20:44:07 UTC
The first error message suggests that you tried doing "zpool attach" when you meant "zpool replace".  But the second error message might indicate a bug.  Could you please show the output of "zdb -l /dev/ada0s1"?
Comment 2 Sergei Masharov 2021-06-14 08:13:48 UTC
a have tried attach and replace on old pool.
And after migration to the new one only replace is possible
It seems that some error have migrated to the new pool after send/receive

# zdb -l /dev/ada0s1
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 't1'
    state: 0
    txg: 1975
    pool_guid: 6410165186213320141
    errata: 0
    hostname: 'proxy.expir.org'
    top_guid: 7617222505941286006
    guid: 8241112573820439777
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 7617222505941286006
        nparity: 1
        metaslab_array: 256
        metaslab_shift: 30
        ashift: 9
        asize: 171779817472
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 8241112573820439777
            path: '/dev/ada0s1'
            whole_disk: 1
            DTL: 1111
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 9546081964875052177
            path: '/dev/ada0s2'
            whole_disk: 1
            DTL: 1110
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 14962733358913721879
            path: '/dev/ada0s3'
            whole_disk: 1
            DTL: 1109
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 6900695808444206279
            path: '/dev/ada0s4'
            whole_disk: 1
            DTL: 1108
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3
Comment 3 Alan Somers freebsd_committer freebsd_triage 2021-06-14 12:37:58 UTC
Wait.  In your original post you said that "zpool replace" didn't work, and now you say that it did.  Which is it?  That "zdb -l" output you pasted looks like it's from a disk that is a member of the pool.
Comment 4 Sergei Masharov 2021-06-14 15:03:18 UTC
neither attach nor replace did not work on the original pool. I have migrated data to the new one via zfs send/receive. The new one is raidz and I am unable to do a replace on it again.
Comment 5 Sergei Masharov 2021-06-14 15:06:06 UTC
zdb at the end gave me this:

ZFS_DBGMSG(zdb) START:
spa.c:5070:spa_open_common(): spa_open_common: opening t1
spa_misc.c:411:spa_load_note(): spa_load(t1, config trusted): LOADING
vdev.c:131:vdev_dbgmsg(): disk vdev '/dev/ada0s1': best uberblock found for spa t1. txg 4309
spa_misc.c:411:spa_load_note(): spa_load(t1, config untrusted): using uberblock with txg=4309
spa_misc.c:411:spa_load_note(): spa_load(t1, config trusted): spa_load_verify found 0 metadata errors and 1 data errors
spa.c:8246:spa_async_request(): spa=t1 async request task=2048
spa_misc.c:411:spa_load_note(): spa_load(t1, config trusted): LOADED
ZFS_DBGMSG(zdb) END
Comment 6 Alan Somers freebsd_committer freebsd_triage 2021-06-14 15:16:32 UTC
So "zpool replace" wasn't working at first, then it was working, and now it isn't?  Please show the _exact_ commands that you are typing, and the "zpool status" output at the time.  If "zpool replace" fails, then show the output of "zdb -l" on the disk you were trying to insert into the pool.
Comment 7 Sergei Masharov 2021-06-14 19:13:13 UTC
no, zpool replace never working.
I have hoped that after migrating data to the new pool the problem will be solved, but it was migrated to the new pool with the data.

root@proxy:/ # zpool status -v t1
  pool: t1
 state: ONLINE
  scan: scrub repaired 0B in 00:10:16 with 0 errors on Sun Jun 13 19:43:27 2021
config:

        NAME        STATE     READ WRITE CKSUM
        t1          ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada0s1  ONLINE       0     0     0
            ada0s2  ONLINE       0     0     0
            ada0s3  ONLINE       0     0     0
            ada0s4  ONLINE       0     0     0

errors: No known data errors
root@proxy:/ # zpool replace t1 ada0s4 ada2s4
cannot replace ada0s4 with ada2s4: already in replacing/spare config; wait for completion or use 'zpool detach'
root@proxy:/ # zdb -l /dev/ada2s4
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 't1'
    state: 0
    txg: 0
    pool_guid: 6410165186213320141
    errata: 0
    hostname: 'proxy.expir.org'
    top_guid: 14589067438874636017
    guid: 14589067438874636017
    vdev_children: 1
    vdev_tree:
        type: 'disk'
        id: 0
        guid: 14589067438874636017
        path: '/dev/ada2s4'
        whole_disk: 1
        metaslab_array: 0
        metaslab_shift: 0
        ashift: 12
        asize: 42944954368
        is_log: 0
        create_txg: 21014
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    create_txg: 21014
    labels = 0 1 2 3
Comment 8 Alan Somers freebsd_committer freebsd_triage 2021-06-14 19:28:29 UTC
If "zpool replace" never worked, then why did you say "And after migration to the new one only replace is possible"?

In any case, your problem is that ada2s4 thinks that it is already a member of the t1 pool.  That can happen for example like this:

1) You create a pool that includes ada2s4
2) You physically remove ada2
3) You "zpool replace" the missing ada2s4 with a different disk
4) You reinstall ada2.

The solution is to wipe the zpool label on that disk.  You can use "zpool labelclear ada2s4", "zpool create dummy ada2s4 && zpool destroy dummy", or simply use dd.  Then ZFS will let you add it to t1 again.
Comment 9 Sergei Masharov 2021-06-14 21:21:05 UTC
by "only replace is possible" I have meant that detach function is not available on zraid. Sorry for misled you by this phrase.

# dd if=/dev/zero of=/dev/ada2s4 bs=1M status=progress
dd: /dev/ada2s4: end of device GiB) transferred 535.001s, 80 MB/s

40961+0 records in
40960+0 records out
42949672960 bytes transferred in 535.525791 secs (80200942 bytes/sec)
root@proxy:/ # zdb -l /dev/ada2s4
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3
root@proxy:/ # zpool replace t1 ada0s4 ada2s4
cannot replace ada0s4 with ada2s4: already in replacing/spare config; wait for completion or use 'zpool detach'
Comment 10 Alan Somers freebsd_committer freebsd_triage 2021-06-18 14:16:51 UTC
Ok, try this now.

sudo dtrace -i 'fbt:zfs::return /arg1 == 45/ {trace(".");}' -c "zpool replace t1 ada0s4 ada2s4"
Comment 11 Sergei Masharov 2021-06-18 14:22:41 UTC
# dtrace -i 'fbt:zfs::return /arg1 == 45/ {trace(".");}' -c "zpool replace t1 ada0s2 ada1s1"
dtrace: description 'fbt:zfs::return ' matched 4739 probes
cannot replace ada0s2 with ada1s1: already in replacing/spare config; wait for completion or use 'zpool detach'
dtrace: pid 69066 exited with status 255
CPU     ID                    FUNCTION:NAME
  1  72740             spa_vdev_exit:return   .
  1  72728           spa_vdev_attach:return   .
  1  67877       zfs_ioc_vdev_attach:return   .
  1  68955           spl_nvlist_free:return   .
  1  69499       zfsdev_ioctl_common:return   .
  1  64084              zfsdev_ioctl:return   .
Comment 12 Alan Somers freebsd_committer freebsd_triage 2021-06-18 14:56:58 UTC
I see the problem.  You originally built the pool with ashift=9.  But you're trying to replace a disk with ashift=12.  If the new disk is actually 512n, then you can still use it by setting vfs.zfs.min_auto_ashift=9.  But if the new disk is 512e, then don't do that or your performance will suck.  Were the old disks 512n or 512e?  You can tell by doing "geom disk list ada0" and look at the Stripesize field.
Comment 13 Sergei Masharov 2021-06-18 19:04:47 UTC
thanks a lot.

# sysctl vfs.zfs.min_auto_ashift=9
vfs.zfs.min_auto_ashift: 12 -> 9
root@proxy:/dev/diskid # zpool replace t1 ada0s2 ada1s1

# zpool status -v t1
  pool: t1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Jun 18 22:02:35 2021
        125M scanned at 2.83M/s, 1.00M issued at 23.3K/s, 44.5G total
        0B resilvered, 0.00% done, no estimated completion time
config:

        NAME             STATE     READ WRITE CKSUM
        t1               ONLINE       0     0     0
          raidz1-0       ONLINE       0     0     0
            ada0s1       ONLINE       0     0     0
            replacing-1  ONLINE       0     0     0
              ada0s2     ONLINE       0     0     0
              ada1s1     ONLINE       0     0     0
            ada0s3       ONLINE       0     0     0
            ada0s4       ONLINE       0     0     0


But I still think that the reported error is incorrect :-)
Comment 14 Sergei Masharov 2021-06-18 19:09:03 UTC
The disks are 512n now, but I think they are the latest of their kind, so I have changed vfs.zfs.min_auto_ashift to 12, to create all new pool 4K aligned.
Comment 15 Alan Somers freebsd_committer freebsd_triage 2021-06-18 22:16:57 UTC
Glad you solved your problem.  And yes I agree that the error message is unhelpful.  You might consider opening an issue about the error message at https://github.com/openzfs/zfs .