Summary: | Reproducible zpool(8) panic with 14.0-RELEASE amd64-zfs.raw VM-IMAGES | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Michael Dexter <editor> | ||||||||||
Component: | kern | Assignee: | Alexander Motin <mav> | ||||||||||
Status: | Closed FIXED | ||||||||||||
Severity: | Affects Only Me | CC: | allanjude, dch, emaste, markj, mav, rob2g2-freebsd, ronald | ||||||||||
Priority: | --- | Keywords: | crash | ||||||||||
Version: | 14.0-RELEASE | ||||||||||||
Hardware: | amd64 | ||||||||||||
OS: | Any | ||||||||||||
URL: | https://github.com/openzfs/zfs/pull/16162 | ||||||||||||
Attachments: |
|
Description
Michael Dexter
2024-04-17 17:24:57 UTC
Created attachment 250032 [details]
Text dump of the panic
Created attachment 250033 [details]
Script to reproduce the panic (re-upload)
I can add syntax to import the zpool with a different name if 'zroot' is a problem for you. Update: I can reproduce the panic on an AMD Ryzen 7 5800H system. Observation: The stock VM-IMAGE only has one label: ------------------------------------ LABEL 0 ------------------------------------ txg: 4 version: 5000 state: 1 name: 'zroot' pool_guid: 4016146626377348012 top_guid: 100716240520803340 guid: 100716240520803340 vdev_children: 1 features_for_read: vdev_tree: type: 'disk' ashift: 12 asize: 5363990528 guid: 100716240520803340 id: 0 path: '/dev/null' whole_disk: 1 create_txg: 4 metaslab_array: 2 metaslab_shift: 29 labels = 0 1 2 3 Update: I extracted the rootfs partition from the VM-IMAGE to a separate file as zroot.raw to simplify things, and found that: 1. While 'zdb -l zroot.raw' shows the label output (with a single label), I cannot import pool via the file. 2. Attaching it with mdconfig works fine, but could produce the panic on resolver in two runs. 3. dmesg does not report anything. 4. zpool status -v did not show any checksum errors. The issue appears to be associated with attach and resilver. I have no idea why happens after two to eight or so repetitions using the exact same source image. Next: Testing on 14-stable and 15-current. (In reply to Michael Dexter from comment #5) > Observation: The stock VM-IMAGE only has one label See how it says "labels = 0 1 2 3" that means it has all 4 labels but they are identical. If you get output that prints multiple labels, they are in some way different, and that should be investigated more closely (In reply to Allan Jude from comment #7) Thank you Allan. All: For context, this feature is marked as experimental and has revealed issues already. Let's get it stable! Created attachment 250042 [details]
Core dump from last weeks' 15-CURRENT CLOUDINIT VM-IMAGE
Same host, 15-CURRENT VM-IMAGE.
Where did the root-on-ZFS non-CLOUDINIT raw images go?
At the risk of having the wrong crash dump... do note the attached 15-CURRENT VM-IMAGE text dump: KDB: stack backtrace: #0 0xffffffff80b9009d at kdb_backtrace+0x5d #1 0xffffffff80b431a2 at vpanic+0x132 #2 0xffffffff80b43063 at panic+0x43 #3 0xffffffff8100c85c at trap_fatal+0x40c #4 0xffffffff8100c8af at trap_pfault+0x4f #5 0xffffffff80fe3ad8 at calltrap+0x8 #6 0xffffffff81f3e9c3 at avl_remove+0x1a3 #7 0xffffffff820285c8 at dsl_scan_visit+0x2c8 #8 0xffffffff820275ad at dsl_scan_sync+0xc6d #9 0xffffffff820541e6 at spa_sync+0xb36 #10 0xffffffff8206b3ab at txg_sync_thread+0x26b #11 0xffffffff80afdb7f at fork_exit+0x7f #12 0xffffffff80fe4b3e at fork_trampoline+0xe Most follow this pattern. Interesting: A clean 14.0 system is not exhibiting the issue. Only 14.0p2 and 14.0p5. I cannot yet say if the patch level plays a part in this, but I see 14.0p2 had some VFS changes: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275200 In you script you try to set different labels, but some of them are still the same. See bootfs1 and swapfs2: gpart modify -i 1 -l bootfs1 md10 gpart modify -i 1 -l bootfs1 md10 gpart modify -i 2 -l efiesp1 md10 gpart modify -i 2 -l efiesp2 md11 gpart modify -i 3 -l swapfs2 md11 gpart modify -i 3 -l swapfs2 md11 gpart modify -i 4 -l rootfs1 md10 gpart modify -i 4 -l rootfs2 md11 Is this intentional? (In reply to Ronald Klop from comment #11) Good eye! Unfortunately, the issue still exists even when I work only with the freebsd-zfs partition. I have fixed the partition numbers and have it running on three systems running 14.0p6, two Intel, one AMD, and the issue persists with: panic: VERIFY(sds != NULL) failed KDB: stack backtrace: #0 0xffffffff80b9009d at kdb_backtrace+0x5d #1 0xffffffff80b431a2 at vpanic+0x132 #2 0xffffffff81f7d07a at spl_panic+0x3a #3 0xffffffff8202f6d1 at dsl_scan_visit+0x3d1 #4 0xffffffff8202e5ad at dsl_scan_sync+0xc6d #5 0xffffffff8205b1e6 at spa_sync+0xb36 #6 0xffffffff820723ab at txg_sync_thread+0x26b #7 0xffffffff80afdb7f at fork_exit+0x7f #8 0xffffffff80fe4b2e at fork_trampoline+0xe Perhaps you would like to experiment with a makefs -t zfs image. This is the syntax used by /usr/src/release/tools/vmimage.subr with a 128m image: mkdir -p /tmp/rootfs/ROOT/default mkdir -p /tmp/rootfs/usr/ports mkdir -p /tmp/rootfs/var/audit makefs -t zfs -s 128m -B little -o 'poolname=zroot' -o 'bootfs=zroot/ROOT/default' -o 'rootpath=/' -o 'fs=zroot;mountpoint=none' -o 'fs=zroot/ROOT;mountpoint=none' -o 'fs=zroot/ROOT/default;mountpoint=/' -o 'fs=zroot/home;mountpoint=/home' -o 'fs=zroot/tmp;mountpoint=/tmp;exec=on;setuid=off' -o 'fs=zroot/usr;mountpoint=/usr;canmount=off' -o 'fs=zroot/usr/ports;setuid=off' -o 'fs=zroot/usr/src' -o 'fs=zroot/usr/obj' -o 'fs=zroot/var;mountpoint=/var;canmount=off' -o 'fs=zroot/var/audit;setuid=off;exec=off' -o 'fs=zroot/var/log;setuid=off;exec=off' -o 'fs=zroot/var/mail;atime=on' -o 'fs=zroot/var/tmp;setuid=off' /tmp/raw.zfs.img /tmp/rootfs Note: zdb -l /tmp/raw.zfs.img zpool import -d /tmp/raw.zfs.img truncate -s 128m /tmp/img.raw zpool create foo /tmp/img.raw zpool export foo zpool import -d /tmp/img.raw The img.raw created with truncate and zpool create can be imported while the makefs one reports: pool: zroot id: 17927745092259738836 state: UNAVAIL status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E config: zroot UNAVAIL insufficient replicas /tmp/img.raw UNAVAIL invalid label The makefs-generated image WILL import if md attached, but not as a file with -d. It seems makefs-generated ZFS images by skipping some optional dataset structures activate code not used since before ancient zpool version 11 and missing some locks. I think https://github.com/openzfs/zfs/pull/16162 should fix the problem. Though I wonder whether/when ZFS will regenerate those structures without explicit upgrade from the pre-11 pool version or will forever use the old code. (In reply to Alexander Motin from comment #15) Is the missing structure the "ds_next_clones_obj"? It looks like ZFS should add this one automatically. (In reply to Mark Johnston from comment #16) I haven't looked deep what this code does, but as I see it is activated by absence of dp_origin_snap added since SPA_VERSION_DSL_SCRUB and ds_next_clones_obj since SPA_VERSION_NEXT_CLONES. (In reply to Alexander Motin from comment #17) I don't think ZFS will automatically regenerate these structures, makefs needs to handle it. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=49086aa35d987b78dbc3c9ec94814fe338e07164 commit 49086aa35d987b78dbc3c9ec94814fe338e07164 Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2024-05-23 16:20:37 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2024-05-23 16:20:37 +0000 Fix scn_queue races on very old pools Code for pools before version 11 uses dmu_objset_find_dp() to scan for children datasets/clones. It calls enqueue_clones_cb() and enqueue_cb() callbacks in parallel from multiple taskq threads. It ends up bad for scan_ds_queue_insert(), corrupting scn_queue AVL-tree. Fix it by introducing a mutex to protect those two scan_ds_queue_insert() calls. All other calls are done from the sync thread and so serialized. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16162 PR: 278414 sys/contrib/openzfs/include/sys/dsl_scan.h | 1 + sys/contrib/openzfs/module/zfs/dsl_scan.c | 6 ++++++ 2 files changed, 7 insertions(+) A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=455ce1729353f2ffce9713ccc3574e73186a22f0 commit 455ce1729353f2ffce9713ccc3574e73186a22f0 Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2024-05-23 16:20:37 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2024-05-23 16:24:55 +0000 Fix scn_queue races on very old pools Code for pools before version 11 uses dmu_objset_find_dp() to scan for children datasets/clones. It calls enqueue_clones_cb() and enqueue_cb() callbacks in parallel from multiple taskq threads. It ends up bad for scan_ds_queue_insert(), corrupting scn_queue AVL-tree. Fix it by introducing a mutex to protect those two scan_ds_queue_insert() calls. All other calls are done from the sync thread and so serialized. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16162 PR: 278414 (cherry picked from commit 49086aa35d987b78dbc3c9ec94814fe338e07164) sys/contrib/openzfs/include/sys/dsl_scan.h | 1 + sys/contrib/openzfs/module/zfs/dsl_scan.c | 6 ++++++ 2 files changed, 7 insertions(+) A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9898f936aa69d1b67bcd83d189acb6013f76bd43 commit 9898f936aa69d1b67bcd83d189acb6013f76bd43 Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2024-05-23 16:20:37 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2024-05-23 17:43:02 +0000 Fix scn_queue races on very old pools Code for pools before version 11 uses dmu_objset_find_dp() to scan for children datasets/clones. It calls enqueue_clones_cb() and enqueue_cb() callbacks in parallel from multiple taskq threads. It ends up bad for scan_ds_queue_insert(), corrupting scn_queue AVL-tree. Fix it by introducing a mutex to protect those two scan_ds_queue_insert() calls. All other calls are done from the sync thread and so serialized. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16162 PR: 278414 (cherry picked from commit 49086aa35d987b78dbc3c9ec94814fe338e07164) sys/contrib/openzfs/include/sys/dsl_scan.h | 1 + sys/contrib/openzfs/module/zfs/dsl_scan.c | 6 ++++++ 2 files changed, 7 insertions(+) A commit in branch releng/14.1 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=856d35337225d77948b43ee5d479baa2588963ec commit 856d35337225d77948b43ee5d479baa2588963ec Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2024-05-23 16:20:37 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2024-05-23 18:11:36 +0000 Fix scn_queue races on very old pools Code for pools before version 11 uses dmu_objset_find_dp() to scan for children datasets/clones. It calls enqueue_clones_cb() and enqueue_cb() callbacks in parallel from multiple taskq threads. It ends up bad for scan_ds_queue_insert(), corrupting scn_queue AVL-tree. Fix it by introducing a mutex to protect those two scan_ds_queue_insert() calls. All other calls are done from the sync thread and so serialized. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16162 PR: 278414 Approved by: re (cperciva) (cherry picked from commit 49086aa35d987b78dbc3c9ec94814fe338e07164) (cherry picked from commit 455ce1729353f2ffce9713ccc3574e73186a22f0) sys/contrib/openzfs/include/sys/dsl_scan.h | 1 + sys/contrib/openzfs/module/zfs/dsl_scan.c | 6 ++++++ 2 files changed, 7 insertions(+) The fix for the ZFS panic is merged into releng/14.1 and stable branches. I hope newfs will also be updated some day. I have built this on 15-CURRENT shortly after commit and SO FAR SO GOOD! Thank you everyone! On 15.0-CURRENT #0 main-n270474-d2f1f71ec8c6, one can image the weekly VM-IMAGE to a hardware device, boot it on hardware, back up its partitions to a second device, dd over the first two partitions, 'zpool attach' the second data partition, wait for resilver, pull the original drive during reboot, boot, and online it for full restoration of the pool. This is resolved until further notice. Thank you everyone who make this happen! |