Created attachment 244734 [details] zfsd backtrace After successfully upgrading a few sandboxed and desktop systems from stable/13 to stable/14, I tried with the system running in production. Among others, I found the issue with zfsd(8). It cannot start. The zpool wasn't upgraded yet though.
Yikes. To narrow it down a little, can you tell me which stable/14 revision you're using? And do you know of a version of stable/14 that worked? My most recent system is main from 10-Aug, and it's working there.
(In reply to Alan Somers from comment #1) It's FreeBSD 14.0-STABLE amd64 1400500 built yesterday from the sources updated to 4b22565f8598f2890d41e179fd36a0845f781262. The problem is probably not general, maybe zfsd(8) is hitting the corner case here. The zpool was initially created as a stripe, then changed to mirror[1]. The zpool was created by old ZFS code on FreeBSD 11, then upgraded to FreeBSD 13. The further upgrade to full support of FreeBSD 14 ZFS was withheld and missing features include[2] [1] remove: Removal of vdev 1 copied 15.1G in 0h0m, completed on Fri May 15 20:04:34 2020 [2] POOL FEATURE --------------- zroot edonr zilsaxattr head_errlog blake3 block_cloning vdev_zaps_v2
(In reply to Alan Somers from comment #1) I haven't answered when zfsd(8) worked fine there. It never worked on that machine (not counting FreeBSD 13.2-STABLE and earlier). The first upgrade to stable/14 was performed a few days before, just after releng/14.0 was branched.
I have exchanged /usr/sbin/zfsd between other machines running stable/14. There was no problem running this binary on other systems. Also the binary built on other systems crashed in a similar way on the affected system, so details covered in comment #2 describe the way to reproduce.
(In reply to Marek Zarychta from comment #2) Would it be possible for you to recreate the problem, starting from a fresh pool, and record the commands you use? And also show me the current "zpool status" ?
(In reply to Alan Somers from comment #5) It's not easy to fully reproduce since the pool was created originally on FreeBSD 11, the datasets were received with zfs receive, then the pool upgraded after switching to FreeBSD 12 on that machine, then the stripe was converted to mirror and finally ZFS was upgraded after installation of FreeBSD 13. Anyway, zfsd(8) never complained about that. zpool history shows among others: 2020-05-07.12:30:29 zpool create -o altroot=/mnt -O compress=lz4 -O atime=off -m none -f zroot da0p2 da1p2 (...) 2020-05-15.17:46:28 zpool upgrade zroot (ZFS -> OpenZFS, FBSD 11 -> FBSD 12) (...) 2020-05-15.20:04:01 zpool remove zroot /dev/da1p2 2020-05-15.20:09:03 zpool attach zroot /dev/da0p2 /dev/da1p2 (...) 2021-02-05.17:45:16 zpool upgrade zroot (FreeBSD 12 -> 13) (...) zpool status output: pool: zroot state: ONLINE status: Some supported and requested features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: scrub repaired 0B in 00:08:46 with 0 errors on Tue Aug 29 03:10:07 2023 remove: Removal of vdev 1 copied 15.1G in 0h0m, completed on Fri May 15 20:04:34 2020 119K memory used for removed device mappings config: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 da0p2 ONLINE 0 0 0 da1p2 ONLINE 0 0 0 errors: No known data errors
I can reproduce the bug on a build of 15.0-CURRENT from today using these commands: service zfsd start zpool create -f testpool vtbd1 vtbd3 zpool remove testpool vtbd3 zpool attach testpool vtbd1 vtbd3
The patch proposed in D41818 fixes the issue in our case. Thank you for taking care of this bug.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0b294a386d34f6584848ed52407687df7ae59861 commit 0b294a386d34f6584848ed52407687df7ae59861 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2023-09-12 01:20:39 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2023-09-12 14:46:12 +0000 Fix zfsd with the device_removal pool feature. Previously zfsd would crash in the presence of a pool with a top-level-vdev that had previously been removed. The crash happened because the configuration nvlist of such a TLV contains an empty ZPOOL_CONFIG_CHILDREN array, which led to a pop_front from an empty list, which has undefined behavior. The crash only happened in stable/14 and later, probably do to differences in libcxx, but the change should be MFCed anyway. PR: 273663 Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl> MFC after: 1 week Sponsored by: Axcient Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D41818 cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc | 37 +++++++++++++++++++++++++++++++ cddl/usr.sbin/zfsd/vdev_iterator.cc | 5 +---- 2 files changed, 38 insertions(+), 4 deletions(-)
I found another issue after the original one reported here was fixed. It was reported as bug 273772.
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a39aac5bb8e46b0d9cd77e85be8a65808f8a89ce commit a39aac5bb8e46b0d9cd77e85be8a65808f8a89ce Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2023-09-12 01:20:39 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2023-09-21 20:39:52 +0000 Fix zfsd with the device_removal pool feature. Previously zfsd would crash in the presence of a pool with a top-level-vdev that had previously been removed. The crash happened because the configuration nvlist of such a TLV contains an empty ZPOOL_CONFIG_CHILDREN array, which led to a pop_front from an empty list, which has undefined behavior. The crash only happened in stable/14 and later, probably do to differences in libcxx, but the change should be MFCed anyway. PR: 273663 Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl> Sponsored by: Axcient Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D41818 (cherry picked from commit 0b294a386d34f6584848ed52407687df7ae59861) cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc | 37 +++++++++++++++++++++++++++++++ cddl/usr.sbin/zfsd/vdev_iterator.cc | 5 +---- 2 files changed, 38 insertions(+), 4 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3676929d3caea131170a48f8e054299d68c81ec5 commit 3676929d3caea131170a48f8e054299d68c81ec5 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2023-09-12 01:20:39 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2023-09-21 22:20:43 +0000 Fix zfsd with the device_removal pool feature. Previously zfsd would crash in the presence of a pool with a top-level-vdev that had previously been removed. The crash happened because the configuration nvlist of such a TLV contains an empty ZPOOL_CONFIG_CHILDREN array, which led to a pop_front from an empty list, which has undefined behavior. The crash only happened in stable/14 and later, probably do to differences in libcxx, but the change should be MFCed anyway. PR: 273663 Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl> Sponsored by: Axcient Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D41818 (cherry picked from commit 0b294a386d34f6584848ed52407687df7ae59861) cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc | 37 +++++++++++++++++++++++++++++++ cddl/usr.sbin/zfsd/vdev_iterator.cc | 5 +---- 2 files changed, 38 insertions(+), 4 deletions(-)
A commit in branch releng/14.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a015b9e690d87c45c73dd08840712f576894dbf9 commit a015b9e690d87c45c73dd08840712f576894dbf9 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2023-09-12 01:20:39 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2023-09-21 22:23:47 +0000 Fix zfsd with the device_removal pool feature. Previously zfsd would crash in the presence of a pool with a top-level-vdev that had previously been removed. The crash happened because the configuration nvlist of such a TLV contains an empty ZPOOL_CONFIG_CHILDREN array, which led to a pop_front from an empty list, which has undefined behavior. The crash only happened in stable/14 and later, probably do to differences in libcxx, but the change should be MFCed anyway. PR: 273663 Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl> Sponsored by: Axcient Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D41818 Approved by: gjb (re) (cherry picked from commit 0b294a386d34f6584848ed52407687df7ae59861) (cherry picked from commit a39aac5bb8e46b0d9cd77e85be8a65808f8a89ce) cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc | 37 +++++++++++++++++++++++++++++++ cddl/usr.sbin/zfsd/vdev_iterator.cc | 5 +---- 2 files changed, 38 insertions(+), 4 deletions(-)
MFCs complete. The fix will be available in 14.0-RELEASE.
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b2dff90c0be7c92a228fdbee5f69335dcc8044fb commit b2dff90c0be7c92a228fdbee5f69335dcc8044fb Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2023-09-12 01:20:39 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2023-09-21 22:26:29 +0000 Fix zfsd with the device_removal pool feature. Previously zfsd would crash in the presence of a pool with a top-level-vdev that had previously been removed. The crash happened because the configuration nvlist of such a TLV contains an empty ZPOOL_CONFIG_CHILDREN array, which led to a pop_front from an empty list, which has undefined behavior. The crash only happened in stable/14 and later, probably do to differences in libcxx, but the change should be MFCed anyway. PR: 273663 Reported by: Marek Zarychta <zarychtam@plan-b.pwste.edu.pl> Sponsored by: Axcient Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D41818 (cherry picked from commit 0b294a386d34f6584848ed52407687df7ae59861) cddl/usr.sbin/zfsd/tests/zfsd_unittest.cc | 37 +++++++++++++++++++++++++++++++ cddl/usr.sbin/zfsd/vdev_iterator.cc | 5 +---- 2 files changed, 38 insertions(+), 4 deletions(-)