Created attachment 190872 [details] init errno = 0; in zpool_read_labelr() On FreeBSD stable 11.1, any rev. since r324255, zpool add (or attach) may fail with "internal error: out of memory". I can reproduce this on a pool with many real physical disks, after adding 22-23 3x mirror vdevs, next one fails. Can not reproduce this on virtual configuration with simulated file based vdevs. On a running pool with many mirror vdevs a "zpool detach" followed by "zpool attach" would also fail with the same error. I traced it back to svn commit r324255, on systems before this commit zpool add attach works without failure, same hardware, I used beadm to boot different revisions. A quick fix that seems to work is to do "zpool create temporarypool daXX && zpool destroy temporarypool" then try again, i.e. "zpool attach bigtank daYY daXX". Attached here small patch that initializes "errno = 0" in zpool_read_label(), tested on latest r329700.
Notify committer of r324255.
This differential request seems to be related https://reviews.freebsd.org/D13088 I participated in it, but dropped a ball. I'll try to get back to this issue.
I didn't had the time to propose upstream the patch from the review as asked by Allan, but yes it seems to be the same bug.
I can reproduce this by creating 75 64MB files, creating a vnode-backed md(4) device on each of them, creating a zpool made of striped 3-mirrors on the first 72, then doing "sudo zpool add foo mirror md72 md73 md74"
A commit references this bug: Author: asomers Date: Fri Mar 2 21:26:49 UTC 2018 New revision: 330295 URL: https://svnweb.freebsd.org/changeset/base/330295 Log: ZFS: fix adding vdevs to very large pools r323791 changed the return value of zpool_read_label. Error paths that previously returned 0 began to return -1 instead. However, not all error paths initialized errno. When adding vdevs to a very large pool, errno could be prepopulated with ENOMEM, causing the operation to fail. Fix the bug by setting errno=ENOENT in the case that no ZFS label is found. PR: 226096 Submitted by: Nikita Kozlov Reviewed by: avg MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D13088 Changes: head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c
A commit references this bug: Author: avg Date: Fri Apr 6 11:42:09 UTC 2018 New revision: 332093 URL: https://svnweb.freebsd.org/changeset/base/332093 Log: MFC r330295: ZFS: fix adding vdevs to very large pools PR: 226096 Changes: _U stable/11/ stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c
A commit references this bug: Author: avg Date: Fri Apr 6 11:48:12 UTC 2018 New revision: 332094 URL: https://svnweb.freebsd.org/changeset/base/332094 Log: MFC r330295: ZFS: fix adding vdevs to very large pools PR: 226096 Changes: _U stable/10/ stable/10/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c
@Alan Could the underlying cause resolved here for `zpool add attach` also apply for other commands? I'm seeing this error message during a `zpool clear -Fn` with a corrupted metadata pool after a power outage: CURRENT-amd64# uname -a FreeBSD CURRENT-amd64 13.0-CURRENT FreeBSD 13.0-CURRENT r340668 GENERIC-NODEBUG amd64 CURRENT-amd64# zpool status pool: storage state: FAULTED status: The pool metadata is corrupted and the pool cannot be opened. action: Recovery is possible, but will result in some data loss. Returning the pool to its state as of Mon May 6 15:01:32 2019 should correct the problem. Approximately 15 seconds of data must be discarded, irreversibly. Recovery can be attempted by executing 'zpool clear -F storage'. A scrub of the pool is strongly recommended after recovery. see: http://illumos.org/msg/ZFS-8000-72 scan: none requested config: NAME STATE READ WRITE CKSUM storage FAULTED 0 0 1 da1 ONLINE 0 0 6 CURRENT-amd64# zpool clear -Fn storage internal error: out of memory If it helps, I found another report of this error message too, which *may* (or may not) indicate its scoped only to use of the '-n' flag: https://forums.freebsd.org/threads/rollback-after-zfs-upgrade-possible.69370/