Summary: | ZFS unable to attach/replace disk to the mirror/raidz | ||
---|---|---|---|
Product: | Base System | Reporter: | Sergei Masharov <serzh> |
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | Closed Works As Intended | ||
Severity: | Affects Only Me | CC: | asomers, grahamperrin |
Priority: | --- | ||
Version: | 13.0-RELEASE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
Sergei Masharov
2021-06-13 18:21:48 UTC
The first error message suggests that you tried doing "zpool attach" when you meant "zpool replace". But the second error message might indicate a bug. Could you please show the output of "zdb -l /dev/ada0s1"? a have tried attach and replace on old pool. And after migration to the new one only replace is possible It seems that some error have migrated to the new pool after send/receive # zdb -l /dev/ada0s1 ------------------------------------ LABEL 0 ------------------------------------ version: 5000 name: 't1' state: 0 txg: 1975 pool_guid: 6410165186213320141 errata: 0 hostname: 'proxy.expir.org' top_guid: 7617222505941286006 guid: 8241112573820439777 vdev_children: 1 vdev_tree: type: 'raidz' id: 0 guid: 7617222505941286006 nparity: 1 metaslab_array: 256 metaslab_shift: 30 ashift: 9 asize: 171779817472 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 8241112573820439777 path: '/dev/ada0s1' whole_disk: 1 DTL: 1111 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 9546081964875052177 path: '/dev/ada0s2' whole_disk: 1 DTL: 1110 create_txg: 4 children[2]: type: 'disk' id: 2 guid: 14962733358913721879 path: '/dev/ada0s3' whole_disk: 1 DTL: 1109 create_txg: 4 children[3]: type: 'disk' id: 3 guid: 6900695808444206279 path: '/dev/ada0s4' whole_disk: 1 DTL: 1108 create_txg: 4 features_for_read: com.delphix:hole_birth com.delphix:embedded_data labels = 0 1 2 3 Wait. In your original post you said that "zpool replace" didn't work, and now you say that it did. Which is it? That "zdb -l" output you pasted looks like it's from a disk that is a member of the pool. neither attach nor replace did not work on the original pool. I have migrated data to the new one via zfs send/receive. The new one is raidz and I am unable to do a replace on it again. zdb at the end gave me this: ZFS_DBGMSG(zdb) START: spa.c:5070:spa_open_common(): spa_open_common: opening t1 spa_misc.c:411:spa_load_note(): spa_load(t1, config trusted): LOADING vdev.c:131:vdev_dbgmsg(): disk vdev '/dev/ada0s1': best uberblock found for spa t1. txg 4309 spa_misc.c:411:spa_load_note(): spa_load(t1, config untrusted): using uberblock with txg=4309 spa_misc.c:411:spa_load_note(): spa_load(t1, config trusted): spa_load_verify found 0 metadata errors and 1 data errors spa.c:8246:spa_async_request(): spa=t1 async request task=2048 spa_misc.c:411:spa_load_note(): spa_load(t1, config trusted): LOADED ZFS_DBGMSG(zdb) END So "zpool replace" wasn't working at first, then it was working, and now it isn't? Please show the _exact_ commands that you are typing, and the "zpool status" output at the time. If "zpool replace" fails, then show the output of "zdb -l" on the disk you were trying to insert into the pool. no, zpool replace never working. I have hoped that after migrating data to the new pool the problem will be solved, but it was migrated to the new pool with the data. root@proxy:/ # zpool status -v t1 pool: t1 state: ONLINE scan: scrub repaired 0B in 00:10:16 with 0 errors on Sun Jun 13 19:43:27 2021 config: NAME STATE READ WRITE CKSUM t1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0s1 ONLINE 0 0 0 ada0s2 ONLINE 0 0 0 ada0s3 ONLINE 0 0 0 ada0s4 ONLINE 0 0 0 errors: No known data errors root@proxy:/ # zpool replace t1 ada0s4 ada2s4 cannot replace ada0s4 with ada2s4: already in replacing/spare config; wait for completion or use 'zpool detach' root@proxy:/ # zdb -l /dev/ada2s4 ------------------------------------ LABEL 0 ------------------------------------ version: 5000 name: 't1' state: 0 txg: 0 pool_guid: 6410165186213320141 errata: 0 hostname: 'proxy.expir.org' top_guid: 14589067438874636017 guid: 14589067438874636017 vdev_children: 1 vdev_tree: type: 'disk' id: 0 guid: 14589067438874636017 path: '/dev/ada2s4' whole_disk: 1 metaslab_array: 0 metaslab_shift: 0 ashift: 12 asize: 42944954368 is_log: 0 create_txg: 21014 features_for_read: com.delphix:hole_birth com.delphix:embedded_data create_txg: 21014 labels = 0 1 2 3 If "zpool replace" never worked, then why did you say "And after migration to the new one only replace is possible"? In any case, your problem is that ada2s4 thinks that it is already a member of the t1 pool. That can happen for example like this: 1) You create a pool that includes ada2s4 2) You physically remove ada2 3) You "zpool replace" the missing ada2s4 with a different disk 4) You reinstall ada2. The solution is to wipe the zpool label on that disk. You can use "zpool labelclear ada2s4", "zpool create dummy ada2s4 && zpool destroy dummy", or simply use dd. Then ZFS will let you add it to t1 again. by "only replace is possible" I have meant that detach function is not available on zraid. Sorry for misled you by this phrase. # dd if=/dev/zero of=/dev/ada2s4 bs=1M status=progress dd: /dev/ada2s4: end of device GiB) transferred 535.001s, 80 MB/s 40961+0 records in 40960+0 records out 42949672960 bytes transferred in 535.525791 secs (80200942 bytes/sec) root@proxy:/ # zdb -l /dev/ada2s4 failed to unpack label 0 failed to unpack label 1 failed to unpack label 2 failed to unpack label 3 root@proxy:/ # zpool replace t1 ada0s4 ada2s4 cannot replace ada0s4 with ada2s4: already in replacing/spare config; wait for completion or use 'zpool detach' Ok, try this now. sudo dtrace -i 'fbt:zfs::return /arg1 == 45/ {trace(".");}' -c "zpool replace t1 ada0s4 ada2s4" # dtrace -i 'fbt:zfs::return /arg1 == 45/ {trace(".");}' -c "zpool replace t1 ada0s2 ada1s1" dtrace: description 'fbt:zfs::return ' matched 4739 probes cannot replace ada0s2 with ada1s1: already in replacing/spare config; wait for completion or use 'zpool detach' dtrace: pid 69066 exited with status 255 CPU ID FUNCTION:NAME 1 72740 spa_vdev_exit:return . 1 72728 spa_vdev_attach:return . 1 67877 zfs_ioc_vdev_attach:return . 1 68955 spl_nvlist_free:return . 1 69499 zfsdev_ioctl_common:return . 1 64084 zfsdev_ioctl:return . I see the problem. You originally built the pool with ashift=9. But you're trying to replace a disk with ashift=12. If the new disk is actually 512n, then you can still use it by setting vfs.zfs.min_auto_ashift=9. But if the new disk is 512e, then don't do that or your performance will suck. Were the old disks 512n or 512e? You can tell by doing "geom disk list ada0" and look at the Stripesize field. thanks a lot. # sysctl vfs.zfs.min_auto_ashift=9 vfs.zfs.min_auto_ashift: 12 -> 9 root@proxy:/dev/diskid # zpool replace t1 ada0s2 ada1s1 # zpool status -v t1 pool: t1 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Jun 18 22:02:35 2021 125M scanned at 2.83M/s, 1.00M issued at 23.3K/s, 44.5G total 0B resilvered, 0.00% done, no estimated completion time config: NAME STATE READ WRITE CKSUM t1 ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada0s1 ONLINE 0 0 0 replacing-1 ONLINE 0 0 0 ada0s2 ONLINE 0 0 0 ada1s1 ONLINE 0 0 0 ada0s3 ONLINE 0 0 0 ada0s4 ONLINE 0 0 0 But I still think that the reported error is incorrect :-) The disks are 512n now, but I think they are the latest of their kind, so I have changed vfs.zfs.min_auto_ashift to 12, to create all new pool 4K aligned. Glad you solved your problem. And yes I agree that the error message is unhelpful. You might consider opening an issue about the error message at https://github.com/openzfs/zfs . |