Created attachment 250031 [details] Script to reproduce the issues I have been exercising the 14.0-RELEASE amd64-zfs.raw VM-IMAGES produced by Release Engineering (thank you for these!) and have two reproducible issues when mirroring two images (thank you for fixing mkimg/makefs to allow this!): Some runs of the attached reproduction script run flawlessly, which others report between 4 and 50K checksum errors on the attached device: NAME STATE READ WRITE CKSUM zroot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/rootfs1 ONLINE 0 0 0 gpt/rootfs2 ONLINE 0 0 51.4K Some runs cause a panic: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x10 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff81f54447 stack pointer = 0x28:0xfffffe016703cce8 frame pointer = 0x28:0xfffffe016703cd20 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 6 (dmu_objset_find_2) rdi: fffff8045d553ac8 rsi: fffff80449059b50 rdx: fffff804490598d0 rcx: fffff80449059b50 r8: 0000000000000001 r9: 0000000000000002 rax: 0000000000000001 rbx: 00000000ffffffff rbp: fffffe016703cd20 r10: 0000000000000000 r11: 0000000000000001 r12: 0000000000000003 r13: 0000000000000001 r14: 0000000000000000 r15: 0000000000000046 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0x12 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff84161a20 stack pointer = 0x28:0xfffffe016703c440 frame pointer = 0x28:0xfffffe016703c480 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 6 (dmu_objset_find_2) rdi: 0000000000000012 rsi: fffff80102350828 rdx: fffff80102350810 rcx: fffffe01652553f8 r8: ffffffffffffffda r9: 0000000000000000 rax: fffffe016703c818 rbx: fffffe01652582b0 rbp: fffffe016703c480 r10: fffffe0165e16a72 r11: fffff80024eac800 r12: fffff8017c2f1900 r13: fffffe0165255000 r14: fffff80024eac800 r15: 0000000000000000 KDB: stack backtrace: #0 0xffffffff80b9009d at kdb_backtrace+0x5d #1 0xffffffff80b431a2 at vpanic+0x132 #2 0xffffffff80b43063 at panic+0x43 #3 0xffffffff8100c85c at trap_fatal+0x40c #4 0xffffffff8100c8af at trap_pfault+0x4f #5 0xffffffff80fe3ad8 at calltrap+0x8 #6 0xffffffff8411829a at skl_compute_wm+0xa6a #7 0xffffffff840df49f at intel_atomic_check+0xf0f #8 0xffffffff83d15783 at drm_atomic_check_only+0x4a3 #9 0xffffffff83d15bc3 at drm_atomic_commit+0x13 #10 0xffffffff83d252c8 at drm_client_modeset_commit_atomic+0x158 #11 0xffffffff83d253b4 at drm_client_modeset_commit_locked+0x74 #12 0xffffffff83d25541 at drm_client_modeset_commit+0x21 #13 0xffffffff83d68303 at drm_fb_helper_restore_fbdev_mode_unlocked+0x83 #14 0xffffffff83d55661 at vt_kms_postswitch+0x181 #15 0xffffffff8098a01f at vt_window_switch+0x11f #16 0xffffffff8098b45f at vtterm_cngrab+0x4f #17 0xffffffff80ad7556 at cngrab+0x26 I am attaching the text dump and can provide a core dump, but hopefully the reproduction script will help you create your very own ones. The script does not assist with downloading the AMD64 VM-IMAGE. Simply expand it with unxz. Caveat: The VM-IMAGES include the zpool name 'zroot' and will conflict with a host using the same name. I can add rename-on-import syntax if you like. Let me know what other information might be helpful. Thanks!
Created attachment 250032 [details] Text dump of the panic
Created attachment 250033 [details] Script to reproduce the panic (re-upload)
I can add syntax to import the zpool with a different name if 'zroot' is a problem for you.
Update: I can reproduce the panic on an AMD Ryzen 7 5800H system.
Observation: The stock VM-IMAGE only has one label: ------------------------------------ LABEL 0 ------------------------------------ txg: 4 version: 5000 state: 1 name: 'zroot' pool_guid: 4016146626377348012 top_guid: 100716240520803340 guid: 100716240520803340 vdev_children: 1 features_for_read: vdev_tree: type: 'disk' ashift: 12 asize: 5363990528 guid: 100716240520803340 id: 0 path: '/dev/null' whole_disk: 1 create_txg: 4 metaslab_array: 2 metaslab_shift: 29 labels = 0 1 2 3
Update: I extracted the rootfs partition from the VM-IMAGE to a separate file as zroot.raw to simplify things, and found that: 1. While 'zdb -l zroot.raw' shows the label output (with a single label), I cannot import pool via the file. 2. Attaching it with mdconfig works fine, but could produce the panic on resolver in two runs. 3. dmesg does not report anything. 4. zpool status -v did not show any checksum errors. The issue appears to be associated with attach and resilver. I have no idea why happens after two to eight or so repetitions using the exact same source image. Next: Testing on 14-stable and 15-current.
(In reply to Michael Dexter from comment #5) > Observation: The stock VM-IMAGE only has one label See how it says "labels = 0 1 2 3" that means it has all 4 labels but they are identical. If you get output that prints multiple labels, they are in some way different, and that should be investigated more closely
(In reply to Allan Jude from comment #7) Thank you Allan. All: For context, this feature is marked as experimental and has revealed issues already. Let's get it stable!
Created attachment 250042 [details] Core dump from last weeks' 15-CURRENT CLOUDINIT VM-IMAGE Same host, 15-CURRENT VM-IMAGE. Where did the root-on-ZFS non-CLOUDINIT raw images go?
At the risk of having the wrong crash dump... do note the attached 15-CURRENT VM-IMAGE text dump: KDB: stack backtrace: #0 0xffffffff80b9009d at kdb_backtrace+0x5d #1 0xffffffff80b431a2 at vpanic+0x132 #2 0xffffffff80b43063 at panic+0x43 #3 0xffffffff8100c85c at trap_fatal+0x40c #4 0xffffffff8100c8af at trap_pfault+0x4f #5 0xffffffff80fe3ad8 at calltrap+0x8 #6 0xffffffff81f3e9c3 at avl_remove+0x1a3 #7 0xffffffff820285c8 at dsl_scan_visit+0x2c8 #8 0xffffffff820275ad at dsl_scan_sync+0xc6d #9 0xffffffff820541e6 at spa_sync+0xb36 #10 0xffffffff8206b3ab at txg_sync_thread+0x26b #11 0xffffffff80afdb7f at fork_exit+0x7f #12 0xffffffff80fe4b3e at fork_trampoline+0xe Most follow this pattern. Interesting: A clean 14.0 system is not exhibiting the issue. Only 14.0p2 and 14.0p5. I cannot yet say if the patch level plays a part in this, but I see 14.0p2 had some VFS changes: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275200
In you script you try to set different labels, but some of them are still the same. See bootfs1 and swapfs2: gpart modify -i 1 -l bootfs1 md10 gpart modify -i 1 -l bootfs1 md10 gpart modify -i 2 -l efiesp1 md10 gpart modify -i 2 -l efiesp2 md11 gpart modify -i 3 -l swapfs2 md11 gpart modify -i 3 -l swapfs2 md11 gpart modify -i 4 -l rootfs1 md10 gpart modify -i 4 -l rootfs2 md11 Is this intentional?
(In reply to Ronald Klop from comment #11) Good eye! Unfortunately, the issue still exists even when I work only with the freebsd-zfs partition. I have fixed the partition numbers and have it running on three systems running 14.0p6, two Intel, one AMD, and the issue persists with: panic: VERIFY(sds != NULL) failed KDB: stack backtrace: #0 0xffffffff80b9009d at kdb_backtrace+0x5d #1 0xffffffff80b431a2 at vpanic+0x132 #2 0xffffffff81f7d07a at spl_panic+0x3a #3 0xffffffff8202f6d1 at dsl_scan_visit+0x3d1 #4 0xffffffff8202e5ad at dsl_scan_sync+0xc6d #5 0xffffffff8205b1e6 at spa_sync+0xb36 #6 0xffffffff820723ab at txg_sync_thread+0x26b #7 0xffffffff80afdb7f at fork_exit+0x7f #8 0xffffffff80fe4b2e at fork_trampoline+0xe
Perhaps you would like to experiment with a makefs -t zfs image. This is the syntax used by /usr/src/release/tools/vmimage.subr with a 128m image: mkdir -p /tmp/rootfs/ROOT/default mkdir -p /tmp/rootfs/usr/ports mkdir -p /tmp/rootfs/var/audit makefs -t zfs -s 128m -B little -o 'poolname=zroot' -o 'bootfs=zroot/ROOT/default' -o 'rootpath=/' -o 'fs=zroot;mountpoint=none' -o 'fs=zroot/ROOT;mountpoint=none' -o 'fs=zroot/ROOT/default;mountpoint=/' -o 'fs=zroot/home;mountpoint=/home' -o 'fs=zroot/tmp;mountpoint=/tmp;exec=on;setuid=off' -o 'fs=zroot/usr;mountpoint=/usr;canmount=off' -o 'fs=zroot/usr/ports;setuid=off' -o 'fs=zroot/usr/src' -o 'fs=zroot/usr/obj' -o 'fs=zroot/var;mountpoint=/var;canmount=off' -o 'fs=zroot/var/audit;setuid=off;exec=off' -o 'fs=zroot/var/log;setuid=off;exec=off' -o 'fs=zroot/var/mail;atime=on' -o 'fs=zroot/var/tmp;setuid=off' /tmp/raw.zfs.img /tmp/rootfs Note: zdb -l /tmp/raw.zfs.img zpool import -d /tmp/raw.zfs.img truncate -s 128m /tmp/img.raw zpool create foo /tmp/img.raw zpool export foo zpool import -d /tmp/img.raw The img.raw created with truncate and zpool create can be imported while the makefs one reports: pool: zroot id: 17927745092259738836 state: UNAVAIL status: One or more devices contains corrupted data. action: The pool cannot be imported due to damaged devices or data. see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E config: zroot UNAVAIL insufficient replicas /tmp/img.raw UNAVAIL invalid label
The makefs-generated image WILL import if md attached, but not as a file with -d.
It seems makefs-generated ZFS images by skipping some optional dataset structures activate code not used since before ancient zpool version 11 and missing some locks. I think https://github.com/openzfs/zfs/pull/16162 should fix the problem. Though I wonder whether/when ZFS will regenerate those structures without explicit upgrade from the pre-11 pool version or will forever use the old code.
(In reply to Alexander Motin from comment #15) Is the missing structure the "ds_next_clones_obj"? It looks like ZFS should add this one automatically.
(In reply to Mark Johnston from comment #16) I haven't looked deep what this code does, but as I see it is activated by absence of dp_origin_snap added since SPA_VERSION_DSL_SCRUB and ds_next_clones_obj since SPA_VERSION_NEXT_CLONES.
(In reply to Alexander Motin from comment #17) I don't think ZFS will automatically regenerate these structures, makefs needs to handle it.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=49086aa35d987b78dbc3c9ec94814fe338e07164 commit 49086aa35d987b78dbc3c9ec94814fe338e07164 Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2024-05-23 16:20:37 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2024-05-23 16:20:37 +0000 Fix scn_queue races on very old pools Code for pools before version 11 uses dmu_objset_find_dp() to scan for children datasets/clones. It calls enqueue_clones_cb() and enqueue_cb() callbacks in parallel from multiple taskq threads. It ends up bad for scan_ds_queue_insert(), corrupting scn_queue AVL-tree. Fix it by introducing a mutex to protect those two scan_ds_queue_insert() calls. All other calls are done from the sync thread and so serialized. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16162 PR: 278414 sys/contrib/openzfs/include/sys/dsl_scan.h | 1 + sys/contrib/openzfs/module/zfs/dsl_scan.c | 6 ++++++ 2 files changed, 7 insertions(+)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=455ce1729353f2ffce9713ccc3574e73186a22f0 commit 455ce1729353f2ffce9713ccc3574e73186a22f0 Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2024-05-23 16:20:37 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2024-05-23 16:24:55 +0000 Fix scn_queue races on very old pools Code for pools before version 11 uses dmu_objset_find_dp() to scan for children datasets/clones. It calls enqueue_clones_cb() and enqueue_cb() callbacks in parallel from multiple taskq threads. It ends up bad for scan_ds_queue_insert(), corrupting scn_queue AVL-tree. Fix it by introducing a mutex to protect those two scan_ds_queue_insert() calls. All other calls are done from the sync thread and so serialized. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16162 PR: 278414 (cherry picked from commit 49086aa35d987b78dbc3c9ec94814fe338e07164) sys/contrib/openzfs/include/sys/dsl_scan.h | 1 + sys/contrib/openzfs/module/zfs/dsl_scan.c | 6 ++++++ 2 files changed, 7 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9898f936aa69d1b67bcd83d189acb6013f76bd43 commit 9898f936aa69d1b67bcd83d189acb6013f76bd43 Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2024-05-23 16:20:37 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2024-05-23 17:43:02 +0000 Fix scn_queue races on very old pools Code for pools before version 11 uses dmu_objset_find_dp() to scan for children datasets/clones. It calls enqueue_clones_cb() and enqueue_cb() callbacks in parallel from multiple taskq threads. It ends up bad for scan_ds_queue_insert(), corrupting scn_queue AVL-tree. Fix it by introducing a mutex to protect those two scan_ds_queue_insert() calls. All other calls are done from the sync thread and so serialized. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16162 PR: 278414 (cherry picked from commit 49086aa35d987b78dbc3c9ec94814fe338e07164) sys/contrib/openzfs/include/sys/dsl_scan.h | 1 + sys/contrib/openzfs/module/zfs/dsl_scan.c | 6 ++++++ 2 files changed, 7 insertions(+)
A commit in branch releng/14.1 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=856d35337225d77948b43ee5d479baa2588963ec commit 856d35337225d77948b43ee5d479baa2588963ec Author: Alexander Motin <mav@FreeBSD.org> AuthorDate: 2024-05-23 16:20:37 +0000 Commit: Alexander Motin <mav@FreeBSD.org> CommitDate: 2024-05-23 18:11:36 +0000 Fix scn_queue races on very old pools Code for pools before version 11 uses dmu_objset_find_dp() to scan for children datasets/clones. It calls enqueue_clones_cb() and enqueue_cb() callbacks in parallel from multiple taskq threads. It ends up bad for scan_ds_queue_insert(), corrupting scn_queue AVL-tree. Fix it by introducing a mutex to protect those two scan_ds_queue_insert() calls. All other calls are done from the sync thread and so serialized. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16162 PR: 278414 Approved by: re (cperciva) (cherry picked from commit 49086aa35d987b78dbc3c9ec94814fe338e07164) (cherry picked from commit 455ce1729353f2ffce9713ccc3574e73186a22f0) sys/contrib/openzfs/include/sys/dsl_scan.h | 1 + sys/contrib/openzfs/module/zfs/dsl_scan.c | 6 ++++++ 2 files changed, 7 insertions(+)
The fix for the ZFS panic is merged into releng/14.1 and stable branches. I hope newfs will also be updated some day.
I have built this on 15-CURRENT shortly after commit and SO FAR SO GOOD! Thank you everyone!
On 15.0-CURRENT #0 main-n270474-d2f1f71ec8c6, one can image the weekly VM-IMAGE to a hardware device, boot it on hardware, back up its partitions to a second device, dd over the first two partitions, 'zpool attach' the second data partition, wait for resilver, pull the original drive during reboot, boot, and online it for full restoration of the pool. This is resolved until further notice. Thank you everyone who make this happen!