Summary: | kernel panic while dd'ing a USB disk to a ZFS directory | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | David Gilbert <dave> | ||||||
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> | ||||||
Status: | Open --- | ||||||||
Severity: | Affects Only Me | CC: | marklmi26-fbsd | ||||||
Priority: | --- | Keywords: | crash, needs-qa | ||||||
Version: | CURRENT | ||||||||
Hardware: | riscv | ||||||||
OS: | Any | ||||||||
Attachments: |
|
Description
David Gilbert
2023-05-07 04:51:29 UTC
Created attachment 242031 [details]
info file for coredump
I'll note that FreeBSD 14.0-CURRENT #4 main-n262103-c3c5e6c3e6c4-dirty reportin the cor core.txt file is from during the "import openzfs update" disaster, with various fixes/workarounds occuring after that. It may be too much of a mess to figure out if this is a new problem vs. not. Testing versions from after the known problems were adjusted for may be a requirement. (In reply to Mark Millard from comment #2) "reportin the cor core.txt file" should have been: "reported in the core.txt file" [I seem to be good at mangling things this morning.] Also: There might be questions about the status of the pool for, say, block cloning possibly having been active for a time before before its software adjustments were made. This might be true even if a recent version of main [so: 14] is used with the pool now. (In reply to Mark Millard from comment #3) Does zpool-get(8) not tell whether feature@block_cloning is enabled or active in this case? feature@block_cloning active. So I need to update current to now and reboot? (In reply to Graham Perrin from comment #4) zpool get should report disabled vs. enabled vs. active. But I'll note that I referenced it because of it being a pool level issue, not just zfs, and one that also leads to incompatibility with older openzfs vintages. Even if it is just enabled, there is no way back to disabled, unless one has a zpool checkpoint that predates the zpool upgrade and can revert to the checkpoint. (snapshots need not be sufficient for pool feature problems.) However, this openzfs import also had other corruption generation problems not involving the block cloning feature misbehavior. If one has snapshots predating the corruptions that one can revert to being based on, these may allow avoiding those specific corruptions. But that will not deal with block cloning problems if some occured. (In reply to dgilbert from comment #0) > … https://v4.nextcloud.towernet.ca/s/NkD6E4Bd7TAQdGH In one of the files: FreeBSD ump.daveg.ca 14.0-CURRENT FreeBSD 14.0-CURRENT #4 main-n262103-c3c5e6c3e6c4-dirty: Thu Apr 13 13:45:26 EDT 2023 root@ump.daveg.ca:/usr/obj/usr/src/riscv.riscv64/sys/GENERIC riscv Your c3c5e6c3e6c4 predates <https://github.com/freebsd/freebsd-src/commit/068913e4ba3dd9b3067056e832cefc5ed264b5cc> by a few days. c3c5e6c3e6c4 lacks the vfs.zfs.bclone_enabled 0 (zero) by default approach. ---- Mark, I misread your comment 3 as coming from the opening poster. Sorry. (In reply to dgilbert from comment #5) You have other corruption issues not related to block cloning. Or you may have both types corruptions. Part of the issue of some corruptions was that they predated the checksumming that is recorded, so scrubbing and the like does not find or fix such corruptions. There is a crash bug that can be temporarily avoided by: QUOTE When in single user mode set compression property to "off" on any zfs active dataset that has compression other than "off" and the sync property to something other than "disabled". END QUOTE and then working on that basis until you are using an adjusted system version. Do you have a pool checkpoint that predates the zpool upgrade? If yes, would it be reasonable to revert to that? Would the result predate the import? If yes, this could allow then progressing by jumping over the problem period completely but means having just older data. Similarly for creating a new pool and restoring from backups. Definitely get to a system based on outside the time-range that runs from the bad import until fairly recently. Then deal with whatever corruptions-mess may be present, if you can. (In reply to Mark Millard from comment #8) Missing word "MAY": You MAY have other corruption issues not related to block cloning. Sorry about that. Urm... the disk scrubs fine. Does this corruption bypass scrub? (In reply to dgilbert from comment #10) See the 1st paragraph of Comment 8 (other than the missing "MAY"): YES. FYI for what has been commited to openzfs, starting with the bad import: Commit message (Expand) Author Age Files Lines * zfs: merge openzfs/zfs@d96e29576 Martin Matuska 4 days 125 -625/+1965 * stand: Add isspace to FreeBSD ctypes.h Warner Losh 6 days 1 -0/+1 * stand: back out the most of the horrible aarch64 kludge Warner Losh 6 days 1 -5/+8 * openzfs: re-enable FPU usage on aarch64 Kyle Evans 11 days 1 -6/+1 * Fix BLAKE3 aarch64 assembly for FreeBSD and macOS Tino Reichardt 11 days 2 -4511/+4057 * zfs: Fix positive ABD size assertion in abd_verify(). Mateusz Guzik 11 days 1 -1/+2 * openzfs: arm64: implement kfpu_begin/kfpu_end Kyle Evans 11 days 1 -1/+27 * zfs: make zfs_vfs_held() definition consistent with declaration Dimitry Andric 12 days 1 -1/+1 * zfs: Revert "Fix data race between zil_commit() and zil_suspend()" Mateusz Guzik 12 days 2 -28/+0 * Add support for zpool user properties Allan Jude 12 days 12 -48/+627 * zfs: fix up bogus checksums with blake3 in face of cpu migration Mateusz Guzik 12 days 1 -2/+5 * zfs/powerpc64: Fix big-endian powerpc64 asm Justin Hibbits 2023-04-22 5 -1/+62 * zfs: fix up EINVAL from getdirentries on .zfs Mateusz Guzik 2023-04-20 1 -0/+11 * zfs: add missing vn state transition for .zfs Mateusz Guzik 2023-04-20 1 -0/+4 * zfs: Add vfs.zfs.bclone_enabled sysctl. Pawel Jakub Dawidek 2023-04-17 3 -2/+11 * zfs: Merge https://github.com/openzfs/zfs/pull/14739 Pawel Jakub Dawidek 2023-04-17 1 -3/+1 * zfs: cherry-pick openzfs/zfs@c71fe7164 Pawel Jakub Dawidek 2023-04-17 1 -2/+4 * zfs: Revert "ZFS_IOC_COUNT_FILLED does unnecessary txg_wait_synced()" Mateusz Guzik 2023-04-15 1 -16/+5 * zfs: don't use zfs_freebsd_copy_file_range Mateusz Guzik 2023-04-15 1 -1/+2 * zfs: Appease set by unused warnings for spl_fstrans_*mark stubs. John Baldwin 2023-04-10 1 -1/+1 * openzfs: adopt to the new vn_lock_pair() interface Konstantin Belousov 2023-04-07 1 -1/+2 * zfs: disable kernel fpu usage on arm and aarc64 Mateusz Guzik 2023-04-07 2 -2/+2 * zfs: try to fallback early if can't do optimized copy Mateusz Guzik 2023-04-07 1 -0/+8 * zfs: fix up EXDEV handling for clone_range Mateusz Guzik 2023-04-07 1 -29/+27 * zfs: add missing vop_fplookup_vexec assignments Mateusz Guzik 2023-04-06 1 -0/+9 * zfs: fix null ap->a_fsizetd NULL pointer derefernce Martin Matuska 2023-04-05 1 -1/+1 * Revert "zfs: fall back if block_cloning feature is disabled" Martin Matuska 2023-04-04 1 -10/+7 * zfs: fall back if block_cloning feature is disabled Martin Matuska 2023-04-04 1 -7/+10 * zfs: merge openzfs/zfs@431083f75 Martin Matuska 2023-04-03 309 -15331/+45241 (In reply to Mark Millard from comment #11) Hmm. That list has both ends being imports from openzfs. The time order is most-recent to oldest, so the "starting with" that I referenced is actually at the bottom of the list. Ok. Previous kernel is from Feb 13th, apparently. Is it useful to go back, or is there some other way of validating the machine ... or am I reinstalling? (In reply to dgilbert from comment #13) Of the following features, which are active? ( These are ones added after what is listed in /usr/share/zfs/compatibility.d/openzfs-2.1-freebsd ) edonr zilsaxattr head_errlog blake3 block_cloning I do not know just when main started supporting each of these. (The list has grown over time.) Presuming one knew for the Feb-13 kernel that you reference . . . (I expect that, for FreeBSD, blake3 and block_cloning may be the new ones vs. Feb-13. But I do not know for sure.) Any active "not even read-only compatible" feature is sufficient to block access. Absent such, any active "just read-only compatible" feature is sufficient to block write access. block_cloning is read-only compatible with an older zfs that does not support it. The pool will go back to just enabled status when the last cloned block is freed. (Not that one can directly cause such freeing, as far as I know.) blake3 being active is not even read-only compatible with an older zfs that does not support it. blake3 status is per dataset/filesystem. The pool will go back to just enabled status once all filesystems that have ever had their checksum set to blake3 are destroyed. head_errlog being active is not even read-only compatible with an older zfs that does not support it. Once active can not be put back to enabled for the pool. zilsxattr is read-only compatible with an older zfs that does not support it. zilsaxattr status is per dataset/filesystem. The pool will go back to just enabled status when all datasets that use the feature have been destroyed. edonr being active is not even read-only compatible with an older zfs that does not support it. edonr status is per dataset/filesystem. The pool will go back to just enabled status once all filesystems that have ever had their checksum set to edonr are destroyed. REMINDER (quoting): active This feature's on-disk format changes are in effect on the pool. Support for this feature is required to import the pool in read-write mode. If this feature is not read-only compatible, support is also required to import the pool in read-only mode (see Read-only compatibility). enabled An administrator has marked this feature as enabled on the pool, but the feature's on-disk format changes have not been made yet. The pool can still be imported by software that does not support this feature, but changes may be made to the on-disk format at any time which will move the feature to the active state. Some features may support returning to the enabled state after becoming active. See feature-specific documentation for details. disabled This feature's on-disk format changes have not been made and will not be made unless an administrator moves the feature to the enabled state. Features cannot be disabled once they have been enabled. edonr "enabled" zilsaxattr "active" head_errlog "active" blake3 "enabled" block_cloning "active" [2:26:325]root@ump:/var/crash> zpool get all NAME PROPERTY VALUE SOURCE zump size 1.75T - zump capacity 8% - zump altroot - default zump health ONLINE - zump guid 10904999209893387658 - zump version - default zump bootfs zump/ROOT/default local zump delegation on default zump autoreplace off default zump cachefile - default zump failmode wait default zump listsnapshots off default zump autoexpand off default zump dedupratio 1.00x - zump free 1.60T - zump allocated 154G - zump readonly off - zump ashift 0 default zump comment - default zump expandsize - - zump freeing 0 - zump fragmentation 31% - zump leaked 0 - zump multihost off default zump checkpoint - - zump load_guid 9590921689347713566 - zump autotrim off default zump compatibility off default zump bcloneused 828K - zump bclonesaved 828K - zump bcloneratio 2.00x - zump feature@async_destroy enabled local zump feature@empty_bpobj active local zump feature@lz4_compress active local zump feature@multi_vdev_crash_dump enabled local zump feature@spacemap_histogram active local zump feature@enabled_txg active local zump feature@hole_birth active local zump feature@extensible_dataset active local zump feature@embedded_data active local zump feature@bookmarks enabled local zump feature@filesystem_limits enabled local zump feature@large_blocks enabled local zump feature@large_dnode enabled local zump feature@sha512 enabled local zump feature@skein enabled local zump feature@edonr enabled local zump feature@userobj_accounting active local zump feature@encryption enabled local zump feature@project_quota active local zump feature@device_removal enabled local zump feature@obsolete_counts enabled local zump feature@zpool_checkpoint enabled local zump feature@spacemap_v2 active local zump feature@allocation_classes enabled local zump feature@resilver_defer enabled local zump feature@bookmark_v2 enabled local zump feature@redaction_bookmarks enabled local zump feature@redacted_datasets enabled local zump feature@bookmark_written enabled local zump feature@log_spacemap active local zump feature@livelist enabled local zump feature@device_rebuild enabled local zump feature@zstd_compress active local zump feature@draid enabled local zump feature@zilsaxattr active local zump feature@head_errlog active local zump feature@blake3 enabled local zump feature@block_cloning active local (In reply to dgilbert from comment #15) Looking at the zfs source it appears that ( in openzfs/module/zcommon/zfeature_common.c ) the following means that, sort of special code changed like the temporary sysctl for block cloning, FreeBSD gets all features as soon as they are imported: static boolean_t zfs_mod_supported_feature(const char *name, const struct zfs_mod_supported_features *sfeatures) { /* * The zfs module spa_feature_table[], whether in-kernel or in * libzpool, always supports all the features. libzfs needs to * query the running module, via sysfs, to determine which * features are supported. * * The equivalent _can_ be done on FreeBSD by way of the sysctl * tree, but this has not been done yet. Therefore, we return * that all features are supported. */ #if defined(_KERNEL) || defined(LIB_ZPOOL_BUILD) || defined(__FreeBSD__) (void) name, (void) sfeatures; return (B_TRUE); #else return (zfs_mod_supported(ZFS_SYSFS_POOL_FEATURES, name, sfeatures)); #endif } This means I'm confused about how/why edonr's status has long been not listed in the likes of the openzfs-2.*-freebsd files, but is listed for the openzfs-2.*-linux files: # diff /usr/share/zfs/compatibility.d/openzfs-2.1-* 1c1 < # Features supported by OpenZFS 2.1 on FreeBSD --- > # Features supported by OpenZFS 2.1 on Linux 9a10 > edonr amd64_ZFS amd64 1400088 1400088 # diff /usr/share/zfs/compatibility.d/openzfs-2.0-* 1c1 < # Features supported by OpenZFS 2.0 on FreeBSD --- > # Features supported by OpenZFS 2.0 on Linux 8a9 > edonr Despite edonr in those files, it looks to me like: zilsaxattr "active" head_errlog "active" have been handled in main's kernel well before your Feb-13 time frame. But that still leaves: block_cloning "active" So, if I understand right, you would end up with just Read-Only status for the pool for a kernel from around Feb-13. I've not investigated the loader for back then but I'd expect that using a more modern loader would deal with any issue there (if it even is a boot pool). Not really answering the question. I suppose you're saying that Feb13th won't boot this drive anymore. Buildworld and buildkernel are still running (takes a day and a bit). Do I need to reinstall or just get a new kernel installed? (In reply to Mark Millard from comment #17) FYI: Looking at the logs listed by: https://cgit.freebsd.org/src/log/sys/contrib/openzfs/include/zfeature_common.h is a way to find when features were likely first imported: SPA_FEATURE_AVZ_V2 : 2023-May-03 SPA_FEATURE_BLOCK_CLONING : 2023-Apr-03 SPA_FEATURE_BLAKE3 : 2022-Jun-23 SPA_FEATURE_HEAD_ERRLOG : 2022-May-12 SPA_FEATURE_ZILSAXATTR : 2022-Mar-08 And on 2021-Feb-18: -#if !defined(__FreeBSD__) SPA_FEATURE_EDONR, -#endif (Past that gets into the SVN time frame.) (In reply to dgilbert from comment #18) It has been a research effort to even provide the properties that I've reported. I'm not likely to be able to give you simple or complete answers to the complicated context --and I most definitely do not know your overall context or constraints. If FreeBSD can boot read-only media to some degree, which it can as I understand, the read-only boot pool might fit in that category for all I know. I've no clue how useful such would be to you. If you have no way to identify and fix potential corruptions, I'd expect you would initialize a new boot-pool, avoiding sources of potentially corrupt data. But I've no clue if you have snapshots, checkpoints, or other such for comparisons or if it would be reasonable for to you to back up in time if you do have some known-at-the-time good data, even if now old. Good luck --or at least better luck than landing in the disasterous-openzfs-import time frame in the first place. |