Bug 226096 - [zfs] zpool add attach internal error: out of memory, pool with many vdevs, since r324255
Summary: [zfs] zpool add attach internal error: out of memory, pool with many vdevs, s...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: Alan Somers
URL:
Keywords: patch, regression
Depends on:
Blocks:
 
Reported: 2018-02-21 20:48 UTC by Evaldas Auryla
Modified: 2019-05-06 06:02 UTC (History)
3 users (show)

See Also:


Attachments
init errno = 0; in zpool_read_labelr() (529 bytes, patch)
2018-02-21 20:48 UTC, Evaldas Auryla
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Evaldas Auryla 2018-02-21 20:48:43 UTC
Created attachment 190872 [details]
init errno = 0; in zpool_read_labelr()

On FreeBSD stable 11.1, any rev. since r324255, zpool add (or attach) may fail with "internal error: out of memory". I can reproduce this on a pool with many real physical disks, after adding 22-23 3x mirror vdevs, next one fails. Can not reproduce this on virtual configuration with simulated file based vdevs. On a running pool with many mirror vdevs a "zpool detach" followed by "zpool attach" would also fail with the same error.

I traced it back to svn commit r324255, on systems before this commit zpool add attach works without failure, same hardware, I used beadm to boot different revisions.

A quick fix that seems to work is to do "zpool create temporarypool daXX && zpool destroy temporarypool" then try again, i.e. "zpool attach bigtank daYY daXX".

Attached here small patch that initializes "errno = 0" in zpool_read_label(), tested on latest r329700.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2018-02-27 15:38:03 UTC
Notify committer of r324255.
Comment 2 Andriy Gapon freebsd_committer 2018-02-27 16:02:29 UTC
This differential request seems to be related https://reviews.freebsd.org/D13088
I participated in it, but dropped a ball.
I'll try to get back to this issue.
Comment 3 Nikita Kozlov 2018-02-28 23:05:59 UTC
I didn't had the time to propose upstream the patch from the review as asked by Allan, but yes it seems to be the same bug.
Comment 4 Alan Somers freebsd_committer 2018-03-02 21:17:56 UTC
I can reproduce this by creating 75 64MB files, creating a vnode-backed md(4) device on each of them, creating a zpool made of striped 3-mirrors on the first 72, then doing "sudo zpool add foo mirror md72 md73 md74"
Comment 5 commit-hook freebsd_committer 2018-03-02 21:27:09 UTC
A commit references this bug:

Author: asomers
Date: Fri Mar  2 21:26:49 UTC 2018
New revision: 330295
URL: https://svnweb.freebsd.org/changeset/base/330295

Log:
  ZFS: fix adding vdevs to very large pools

  r323791 changed the return value of zpool_read_label.  Error paths that
  previously returned 0 began to return -1 instead.  However, not all error
  paths initialized errno.  When adding vdevs to a very large pool, errno could
  be prepopulated with ENOMEM, causing the operation to fail.  Fix the bug by
  setting errno=ENOENT in the case that no ZFS label is found.

  PR:		226096
  Submitted by:	Nikita Kozlov
  Reviewed by:	avg
  MFC after:	3 weeks
  Differential Revision:	https://reviews.freebsd.org/D13088

Changes:
  head/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c
Comment 6 commit-hook freebsd_committer 2018-04-06 11:42:45 UTC
A commit references this bug:

Author: avg
Date: Fri Apr  6 11:42:09 UTC 2018
New revision: 332093
URL: https://svnweb.freebsd.org/changeset/base/332093

Log:
  MFC r330295: ZFS: fix adding vdevs to very large pools

  PR:		226096

Changes:
_U  stable/11/
  stable/11/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c
Comment 7 commit-hook freebsd_committer 2018-04-06 11:48:54 UTC
A commit references this bug:

Author: avg
Date: Fri Apr  6 11:48:12 UTC 2018
New revision: 332094
URL: https://svnweb.freebsd.org/changeset/base/332094

Log:
  MFC r330295: ZFS: fix adding vdevs to very large pools

  PR:		226096

Changes:
_U  stable/10/
  stable/10/cddl/contrib/opensolaris/lib/libzfs/common/libzfs_import.c
Comment 8 Kubilay Kocak freebsd_committer freebsd_triage 2019-05-06 06:02:42 UTC
@Alan Could the underlying cause resolved here for `zpool add attach` also apply for other commands?

I'm seeing this error message during a `zpool clear -Fn` with a corrupted metadata pool after a power outage:

CURRENT-amd64# uname -a
FreeBSD CURRENT-amd64 13.0-CURRENT FreeBSD 13.0-CURRENT r340668 GENERIC-NODEBUG  amd64

CURRENT-amd64# zpool status
  pool: storage
 state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Recovery is possible, but will result in some data loss.
        Returning the pool to its state as of Mon May  6 15:01:32 2019
        should correct the problem.  Approximately 15 seconds of data
        must be discarded, irreversibly.  Recovery can be attempted
        by executing 'zpool clear -F storage'.  A scrub of the pool
        is strongly recommended after recovery.
   see: http://illumos.org/msg/ZFS-8000-72
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        storage     FAULTED      0     0     1
          da1       ONLINE       0     0     6

CURRENT-amd64# zpool clear -Fn storage
internal error: out of memory

If it helps, I found another report of this error message too, which *may* (or may not) indicate its scoped only to use of the '-n' flag:

https://forums.freebsd.org/threads/rollback-after-zfs-upgrade-possible.69370/