Bug 273663 - zfsd crashes in the presence of pools with removed TLVs
Summary: zfsd crashes in the presence of pools with removed TLVs
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 14.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Alan Somers
URL: https://reviews.freebsd.org/D41818
Keywords: crash, regression
Depends on:
Blocks:
 
Reported: 2023-09-09 17:46 UTC by Marek Zarychta
Modified: 2023-09-21 22:28 UTC (History)
3 users (show)

See Also:
asomers: mfc-stable14+
asomers: mfc-stable13+
asomers: mfc-stable12+


Attachments
zfsd backtrace (4.50 KB, text/plain)
2023-09-09 17:46 UTC, Marek Zarychta
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marek Zarychta 2023-09-09 17:46:11 UTC
Created attachment 244734 [details]
zfsd backtrace

After successfully upgrading a few sandboxed and desktop systems from stable/13 to stable/14, I tried with the system running in production. Among others, I found the issue with zfsd(8). It cannot start. The zpool wasn't upgraded yet though.
Comment 1 Alan Somers freebsd_committer freebsd_triage 2023-09-10 00:20:50 UTC
Yikes.  To narrow it down a little, can you tell me which stable/14 revision you're using?  And do you know of a version of stable/14 that worked?  My most recent system is main from 10-Aug, and it's working there.
Comment 2 Marek Zarychta 2023-09-10 05:50:41 UTC
(In reply to Alan Somers from comment #1)
It's FreeBSD 14.0-STABLE amd64 1400500 built yesterday from the sources updated to 4b22565f8598f2890d41e179fd36a0845f781262. The problem is probably not general, maybe zfsd(8) is hitting the corner case here. The zpool was initially created as a stripe, then changed to mirror[1]. The zpool was created by old ZFS code on FreeBSD 11, then upgraded to FreeBSD 13. The further upgrade to full support of FreeBSD 14 ZFS was withheld and missing features include[2]

[1] remove: Removal of vdev 1 copied 15.1G in 0h0m, completed on Fri May 15 20:04:34 2020

[2]
POOL  FEATURE
---------------
zroot
      edonr
      zilsaxattr
      head_errlog
      blake3
      block_cloning
      vdev_zaps_v2
Comment 3 Marek Zarychta 2023-09-10 05:59:44 UTC
(In reply to Alan Somers from comment #1)
I haven't answered when zfsd(8) worked fine there. It never worked on that machine (not counting FreeBSD 13.2-STABLE and earlier). The first upgrade to stable/14 was performed a few days before, just after releng/14.0 was branched.
Comment 4 Marek Zarychta 2023-09-10 12:48:24 UTC
I have exchanged /usr/sbin/zfsd between other machines running stable/14. There was no problem running this binary on other systems. Also the binary built on other systems crashed in a similar way on the affected system, so details covered in comment #2 describe the way to reproduce.
Comment 5 Alan Somers freebsd_committer freebsd_triage 2023-09-10 13:46:08 UTC
(In reply to Marek Zarychta from comment #2)
Would it be possible for you to recreate the problem, starting from a fresh pool, and record the commands you use?  And also show me the current "zpool status" ?
Comment 6 Marek Zarychta 2023-09-10 14:11:39 UTC
(In reply to Alan Somers from comment #5)
It's not easy to fully reproduce since the pool was created originally on FreeBSD 11, the datasets were received with zfs receive, then the pool upgraded after switching to FreeBSD 12 on that machine, then the stripe was converted to mirror and finally ZFS was upgraded after installation of FreeBSD 13. Anyway, zfsd(8) never complained about that.
zpool history shows among others: 

2020-05-07.12:30:29 zpool create -o altroot=/mnt -O compress=lz4 -O atime=off -m none -f zroot da0p2 da1p2
(...)
2020-05-15.17:46:28 zpool upgrade zroot (ZFS -> OpenZFS, FBSD 11 -> FBSD 12)
(...)
2020-05-15.20:04:01 zpool remove zroot /dev/da1p2
2020-05-15.20:09:03 zpool attach zroot /dev/da0p2 /dev/da1p2
(...)
2021-02-05.17:45:16 zpool upgrade zroot (FreeBSD 12 -> 13)
(...)



zpool status output: 

  pool: zroot
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:08:46 with 0 errors on Tue Aug 29 03:10:07 2023
remove: Removal of vdev 1 copied 15.1G in 0h0m, completed on Fri May 15 20:04:34 2020
        119K memory used for removed device mappings
config:

        NAME          STATE     READ WRITE CKSUM
        zroot         ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            da0p2     ONLINE       0     0     0
            da1p2     ONLINE       0     0     0

errors: No known data errors
Comment 7 Alan Somers freebsd_committer freebsd_triage 2023-09-11 16:48:24 UTC
I can reproduce the bug on a build of 15.0-CURRENT from today using these commands:

service zfsd start
zpool create -f testpool vtbd1 vtbd3
zpool remove testpool vtbd3
zpool attach testpool vtbd1 vtbd3
Comment 8 Marek Zarychta 2023-09-12 04:26:46 UTC
The patch proposed in D41818 fixes the issue in our case. Thank you for taking care of this bug.
Comment 9 commit-hook freebsd_committer freebsd_triage 2023-09-12 14:47:53 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0b294a386d34f6584848ed52407687df7ae59861

commit 0b294a386d34f6584848ed52407687df7ae59861
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2023-09-12 01:20:39 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2023-09-12 14:46:12 +0000

    Fix zfsd with the device_removal pool feature.

    Previously zfsd would crash in the presence of a pool with a
    top-level-vdev that had previously been removed.  The crash happened
    because the configuration nvlist of such a TLV contains an empty
    ZPOOL_CONFIG_CHILDREN array, which led to a pop_front from an empty
    list, which has undefined behavior.

    The crash only happened in stable/14 and later, probably do to
    differences in libcxx, but the change should be MFCed anyway.

    PR:             273663
    Reported by:    Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
    MFC after:      1 week
    Sponsored by:   Axcient
    Reviewed by:    mav
    Differential Revision: https://reviews.freebsd.org/D41818

 cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc | 37 +++++++++++++++++++++++++++++++
 cddl/usr.sbin/zfsd/vdev_iterator.cc       |  5 +----
 2 files changed, 38 insertions(+), 4 deletions(-)
Comment 10 Marek Zarychta 2023-09-14 08:16:30 UTC
I found another issue after the original one reported here was fixed. It was reported as bug 273772.
Comment 11 commit-hook freebsd_committer freebsd_triage 2023-09-21 20:53:51 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a39aac5bb8e46b0d9cd77e85be8a65808f8a89ce

commit a39aac5bb8e46b0d9cd77e85be8a65808f8a89ce
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2023-09-12 01:20:39 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2023-09-21 20:39:52 +0000

    Fix zfsd with the device_removal pool feature.

    Previously zfsd would crash in the presence of a pool with a
    top-level-vdev that had previously been removed.  The crash happened
    because the configuration nvlist of such a TLV contains an empty
    ZPOOL_CONFIG_CHILDREN array, which led to a pop_front from an empty
    list, which has undefined behavior.

    The crash only happened in stable/14 and later, probably do to
    differences in libcxx, but the change should be MFCed anyway.

    PR:             273663
    Reported by:    Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
    Sponsored by:   Axcient
    Reviewed by:    mav
    Differential Revision: https://reviews.freebsd.org/D41818

    (cherry picked from commit 0b294a386d34f6584848ed52407687df7ae59861)

 cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc | 37 +++++++++++++++++++++++++++++++
 cddl/usr.sbin/zfsd/vdev_iterator.cc       |  5 +----
 2 files changed, 38 insertions(+), 4 deletions(-)
Comment 12 commit-hook freebsd_committer freebsd_triage 2023-09-21 22:22:07 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3676929d3caea131170a48f8e054299d68c81ec5

commit 3676929d3caea131170a48f8e054299d68c81ec5
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2023-09-12 01:20:39 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2023-09-21 22:20:43 +0000

    Fix zfsd with the device_removal pool feature.

    Previously zfsd would crash in the presence of a pool with a
    top-level-vdev that had previously been removed.  The crash happened
    because the configuration nvlist of such a TLV contains an empty
    ZPOOL_CONFIG_CHILDREN array, which led to a pop_front from an empty
    list, which has undefined behavior.

    The crash only happened in stable/14 and later, probably do to
    differences in libcxx, but the change should be MFCed anyway.

    PR:             273663
    Reported by:    Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
    Sponsored by:   Axcient
    Reviewed by:    mav
    Differential Revision: https://reviews.freebsd.org/D41818

    (cherry picked from commit 0b294a386d34f6584848ed52407687df7ae59861)

 cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc | 37 +++++++++++++++++++++++++++++++
 cddl/usr.sbin/zfsd/vdev_iterator.cc       |  5 +----
 2 files changed, 38 insertions(+), 4 deletions(-)
Comment 13 commit-hook freebsd_committer freebsd_triage 2023-09-21 22:25:09 UTC
A commit in branch releng/14.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a015b9e690d87c45c73dd08840712f576894dbf9

commit a015b9e690d87c45c73dd08840712f576894dbf9
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2023-09-12 01:20:39 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2023-09-21 22:23:47 +0000

    Fix zfsd with the device_removal pool feature.

    Previously zfsd would crash in the presence of a pool with a
    top-level-vdev that had previously been removed.  The crash happened
    because the configuration nvlist of such a TLV contains an empty
    ZPOOL_CONFIG_CHILDREN array, which led to a pop_front from an empty
    list, which has undefined behavior.

    The crash only happened in stable/14 and later, probably do to
    differences in libcxx, but the change should be MFCed anyway.

    PR:             273663
    Reported by:    Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
    Sponsored by:   Axcient
    Reviewed by:    mav
    Differential Revision: https://reviews.freebsd.org/D41818
    Approved by:    gjb (re)

    (cherry picked from commit 0b294a386d34f6584848ed52407687df7ae59861)
    (cherry picked from commit a39aac5bb8e46b0d9cd77e85be8a65808f8a89ce)

 cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc | 37 +++++++++++++++++++++++++++++++
 cddl/usr.sbin/zfsd/vdev_iterator.cc       |  5 +----
 2 files changed, 38 insertions(+), 4 deletions(-)
Comment 14 Alan Somers freebsd_committer freebsd_triage 2023-09-21 22:27:52 UTC
MFCs complete.  The fix will be available in 14.0-RELEASE.
Comment 15 commit-hook freebsd_committer freebsd_triage 2023-09-21 22:28:10 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b2dff90c0be7c92a228fdbee5f69335dcc8044fb

commit b2dff90c0be7c92a228fdbee5f69335dcc8044fb
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2023-09-12 01:20:39 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2023-09-21 22:26:29 +0000

    Fix zfsd with the device_removal pool feature.

    Previously zfsd would crash in the presence of a pool with a
    top-level-vdev that had previously been removed.  The crash happened
    because the configuration nvlist of such a TLV contains an empty
    ZPOOL_CONFIG_CHILDREN array, which led to a pop_front from an empty
    list, which has undefined behavior.

    The crash only happened in stable/14 and later, probably do to
    differences in libcxx, but the change should be MFCed anyway.

    PR:             273663
    Reported by:    Marek Zarychta <zarychtam@plan-b.pwste.edu.pl>
    Sponsored by:   Axcient
    Reviewed by:    mav
    Differential Revision: https://reviews.freebsd.org/D41818

    (cherry picked from commit 0b294a386d34f6584848ed52407687df7ae59861)

 cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc | 37 +++++++++++++++++++++++++++++++
 cddl/usr.sbin/zfsd/vdev_iterator.cc       |  5 +----
 2 files changed, 38 insertions(+), 4 deletions(-)