panic: VERIFY3(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed (0 == 5) I have suddenly started consistently having a panic and crash on my overnight run. This can be triggered by a find / (with various parameter) plus the overnight security run etc. I can now trigger a crash a will (after 10 minutes or so). I had trouble initially get a crash dump, but now have one but am having problems with kgdb: ------------- triple0# kgdb /boot/kernel/kernel /var/crash/vmcore.6 GNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD] Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd13.1". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... /usr/ports/devel/gdb/work-py39/gdb-12.1/gdb/thread.c:1328: internal-error: switch_to_thread: Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. ----- Backtrace ----- 0x12c6931 ??? 0x16d2678 ??? 0x16d24f8 ??? 0x1b21d63 ??? 0x16971b8 ??? 0x13e32ee ??? 0x167841f ??? 0x12fb704 ??? 0x16a3392 ??? 0x1498ac5 ??? 0x149748b ??? 0x1495e8e ??? 0x11f317b ??? --------------------- /usr/ports/devel/gdb/work-py39/gdb-12.1/gdb/thread.c:1328: internal-error: switch_to_thread: Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) ---------------------------- standard kernel (i.e. freebsd-update ...) triple0# uname -a FreeBSD triple0.internal 13.1-RELEASE-p1 FreeBSD 13.1-RELEASE-p1 GENERIC amd64 kgdb installed from pkg, then from ports, still same problem. ================================================ Kernel panic details: Since I haven't been able to get the debugger to work, I have instead managed a photo of the stack backtrace on screen: Basically: #0 kdb_backtrace+0x65 #1 vpanic+0x17f #2 spl_panic+0x3a #4 zfs_znode_alloc+0x522 #4 zfs_zget+0x3b5 #5 zfs_dirent_lookup+0x16b #6 zfs_dirlook+0x7a #7 zfs_lookup+0x3d0 #8 zfs_freebsd_cachedlookup+0x3d0 #9 vfs_cache_lookup+0xad #10 VOP_LOOKUP+030 #11 cache_fplookup_noentry+0x1a3 #12 cache_fplookup+0x366 #13 namei+0x12a #14 kern_statat+0xf3 #15 sys_fstatat+0x2f #16 amd64_syscall+0x10c #17 fast_syscall_common+0xf8 I have attached the photo. Hope someone can point me in the right direction (kgdb etc) Thanks Duncan ------ ------ cat /var/crash/info.6 Dump header from device: /dev/da15p1 Architecture: amd64 Architecture Version: 2 Dump Length: 4367545370 Blocksize: 512 Compression: zstd Dumptime: 2022-08-24 14:33:16 +1000 Hostname: triple0.internal Magic: FreeBSD Kernel Dump Version String: FreeBSD 13.1-RELEASE-p1 GENERIC Panic String: VERIFY3(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed (0 == 5) Dump Parity: 1622049469 Bounds: 6 Dump Status: good --------- --------- cat /var/crash/core.txt.6 r/crash/vmcore.6.zst : Read error (39) : premature end triple0.internal dumped core - see /var/crash/vmcore.6 Wed Aug 24 14:52:35 AEST 2022 FreeBSD triple0.internal 13.1-RELEASE-p1 FreeBSD 13.1-RELEASE-p1 GENERIC amd64 panic: VERIFY3(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed (0 == 5) GNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD] Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd13.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... /wrkdirs/usr/ports/devel/gdb/work-py39/gdb-12.1/gdb/thread.c:1328: internal-error: switch_to_thread: Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) [answered Y; input not from terminal] This is a bug, please report it. For instructions, see: <https://www.gnu.org/software/gdb/bugs/>. /wrkdirs/usr/ports/devel/gdb/work-py39/gdb-12.1/gdb/thread.c:1328: internal-error: switch_to_thread: Assertion `thr != NULL' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Create a core file of GDB? (y or n) [answered Y; input not from terminal] Abort trap (core dumped) ------------------------------------------------------------------------ ps -axlww ps: can't read nprocs ------------------------------------------------------------------------ vmstat -s vmstat: vm_cnt: ------------------------------------------------------------------------ vmstat -m vmstat: memstat_kvm_malloc: _kvm_vnet_selectpid: dumptid Type InUse MemUse Requests Size(s) ------------------------------------------------------------------------ vmstat -z vmstat: memstat_kvm_uma: KVM short read ITEM SIZE LIMIT USED FREE REQ FAILSLEEP XDOMAIN ------------------------------------------------------------------------ vmstat -i vmstat: sintrnames: ------------------------------------------------------------------------ pstat -T pstat: kvm_read(): ------------------------------------------------------------------------ pstat -s Device 1K-blocks Used Avail Capacity ------------------------------------------------------------------------ iostat iostat: devstat_checkversion: userland devstat version 6 is not the same as the kernel devstat_checkversion: devstat version 0 devstat_checkversion: libdevstat newer than kernel ------------------------------------------------------------------------ ipcs -a ipcs: msginfo: ------------------------------------------------------------------------ ipcs -T ipcs: msginfo: ------------------------------------------------------------------------ netstat -s ------------------------------------------------------------------------ netstat -m netstat: memstat_kvm_all: _kvm_vnet_selectpid: dumptid ------------------------------------------------------------------------ netstat -anA netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid ------------------------------------------------------------------------ netstat -aL netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid netstat: _kvm_vnet_selectpid: dumptid ------------------------------------------------------------------------ fstat fstat: procstat_getprocs() ------------------------------------------------------------------------ dmesg dmesg: kvm_read: ------------------------------------------------------------------------ kernel config options CONFIG_AUTOGENERATED ident GENERIC machine amd64 cpu HAMMER makeoptions WITH_CTF=1 makeoptions DEBUG=-g options IICHID_SAMPLING options HID_DEBUG options EVDEV_SUPPORT options XENHVM options USB_DEBUG options ATH_ENABLE_11N options AH_AR5416_INTERRUPT_MITIGATION options IEEE80211_SUPPORT_MESH options IEEE80211_DEBUG options SC_PIXEL_MODE options VESA options PPS_SYNC options COMPAT_LINUXKPI options PCI_IOV options PCI_HP options IOMMU options EARLY_AP_STARTUP options SMP options NETGDB options NETDUMP options DEBUGNET options ZSTDIO options GZIO options EKCD options KDB_TRACE options KDB options RCTL options RACCT_DEFAULT_TO_DISABLED options RACCT options INCLUDE_CONFIG_FILE options DDB_CTF options KDTRACE_HOOKS options KDTRACE_FRAME options MAC options CAPABILITIES options CAPABILITY_MODE options AUDIT options HWPMC_HOOKS options KBD_INSTALL_CDEV options PRINTF_BUFR_SIZE=128 options _KPOSIX_PRIORITY_SCHEDULING options SYSVSEM options SYSVMSG options SYSVSHM options STACK options KTRACE options SCSI_DELAY=5000 options COMPAT_FREEBSD12 options COMPAT_FREEBSD11 options COMPAT_FREEBSD10 options COMPAT_FREEBSD9 options COMPAT_FREEBSD7 options COMPAT_FREEBSD6 options COMPAT_FREEBSD5 options COMPAT_FREEBSD4 options COMPAT_FREEBSD32 options EFIRT options GEOM_LABEL options GEOM_RAID options TMPFS options PSEUDOFS options PROCFS options CD9660 options MSDOSFS options NFS_ROOT options NFSLOCKD options NFSD options NFSCL options MD_ROOT options QUOTA options UFS_GJOURNAL options UFS_DIRHASH options UFS_ACL options SOFTUPDATES options FFS options KERN_TLS options SCTP_SUPPORT options TCP_RFC7413 options TCP_HHOOK options TCP_BLACKBOX options TCP_OFFLOAD options ROUTE_MPATH options IPSEC_SUPPORT options INET6 options INET options VIMAGE options PREEMPTION options NUMA options SCHED_ULE options NEW_PCIB options GEOM_PART_GPT options GEOM_PART_MBR options GEOM_PART_EBR options GEOM_PART_BSD device isa device mem device io device uart_ns8250 device cpufreq device acpi device smbios device pci device fdc device ahci device ata device mvs device siis device ahc device ahd device esp device hptiop device isp device mpt device mps device mpr device sym device isci device ocs_fc device pvscsi device scbus device ch device da device sa device cd device pass device ses device amr device arcmsr device ciss device iir device ips device mly device twa device smartpqi device tws device aac device aacp device aacraid device ida device mfi device mlx device mrsas device pmspcv device twe device nvme device nvd device vmd device atkbdc device atkbd device psm device kbdmux device vga device splash device sc device vt device vt_vga device vt_efifb device vt_vbefb device agp device cbb device pccard device cardbus device uart device ppc device ppbus device lpt device ppi device puc device iflib device em device igc device ix device ixv device ixl device iavf device ice device vmx device axp device bxe device le device ti device mlx5 device mlxfw device mlx5en device miibus device ae device age device alc device ale device bce device bfe device bge device cas device dc device et device fxp device gem device jme device lge device msk device nfe device nge device re device rl device sge device sis device sk device ste device stge device vge device vr device xl device wlan device wlan_wep device wlan_ccmp device wlan_tkip device wlan_amrr device an device ath device ath_pci device ath_hal device ath_rate_sample device ipw device iwi device iwn device malo device mwl device ral device wpi device crypto device aesni device loop device padlock_rng device rdrand_rng device ether device vlan device tuntap device md device gif device firmware device xz device bpf device uhci device ohci device ehci device xhci device usb device ukbd device umass device sound device snd_cmi device snd_csa device snd_emu10kx device snd_es137x device snd_hda device snd_ich device snd_via8233 device mmc device mmcsd device sdhci device rtsx device virtio device virtio_pci device vtnet device virtio_blk device virtio_scsi device virtio_balloon device kvm_clock device hyperv device xenpci device netmap device evdev device uinput device hid ------------------------------------------------------------------------ ddb capture buffer ddb: ddb_capture: kvm_nlist ----- ----- 128Gb Memory, AMD Ryzen Threadripper 1920X 12-core 6 running virtual machines (bhyve) 8 jails quite a few vlans/bridges etc triple0# zpool list -v NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT data_pool 54.6T 32.7T 21.8T - - 13% 59% 1.00x ONLINE - raidz2-0 54.6T 32.7T 21.8T - - 13% 60.0% - ONLINE gpt/SG601 - - - - - - - - ONLINE gpt/SG602 - - - - - - - - ONLINE gpt/SG603 - - - - - - - - ONLINE gpt/SG609 - - - - - - - - ONLINE gpt/SG605 - - - - - - - - ONLINE gpt/SG606 - - - - - - - - ONLINE gpt/SG607 - - - - - - - - ONLINE gpt/SG608 - - - - - - - - ONLINE gpt/T6_7 - - - - - - - - ONLINE gpt/T6_10 - - - - - - - - ONLINE logs - - - - - - - - - gpt/zil01 31.5G 76.1M 31.4G - - 0% 0.23% - ONLINE cache - - - - - - - - - gpt/data_cache 900G 328G 572G - - 0% 36.4% - ONLINE gpt/data_cache2 466G 464G 1.69G - - 0% 99.6% - ONLINE pool_18a 81.8T 67.6T 14.2T - - 1% 82% 1.00x ONLINE - raidz1-0 81.8T 67.6T 14.2T - - 1% 82.6% - ONLINE gpt/T18_1 - - - - - - - - ONLINE gpt/T18_3 - - - - - - - - ONLINE gpt/T18_2 - - - - - - - - ONLINE gpt/T18_4 - - - - - - - - ONLINE gpt/T18_0 - - - - - - - - ONLINE cache - - - - - - - - - gpt/ssd120_R3C2 112G 3.39G 108G - - 0% 3.03% - ONLINE system 432G 194G 238G - - 47% 44% 1.00x ONLINE - mirror-0 432G 194G 238G - - 47% 44.9% - ONLINE gpt/system_A - - - - - - - - ONLINE gpt/system_pool_R1C1 - - - - - - - - ONLINE
Created attachment 236087 [details] Photo of stack trace image was tool large to be accepted in initial creation of bug report
The number 5 comes from sys/contrib/openzfs/module/zfs/sa.c function sa_build_index, which returns EIO (error 5) if an sa_hdr_phys object has a bad magic number.
Created attachment 236355 [details] Full core.txt.0 from dump
I have compiled my system from source and caused a panic (run periodic daily (daily_clean_disks_enabled will cause it) ). I turned off all the VMs and jails before causing the crash (remove extra complexity and reduce the size of the core.txt file). I also added a larger dump disk and now have a complete viable dump. I have attached the complete core.txt.0 file and the following is the starting portion: triple0.internal dumped core - see /var/crash/vmcore.0 Sun Sep 4 20:09:55 AEST 2022 FreeBSD triple0.internal 13.1-RELEASE-p2 FreeBSD 13.1-RELEASE-p2 releng/13.1-n250158-752f813d6cc GENERIC amd64 panic: VERIFY3(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed (0 == 5) GNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD] Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd13.1". Type "show configuration" for configuration details. For bug reporting instructions, please see: <https://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: panic: VERIFY3(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed (0 == 5) cpuid = 8 time = 1662284816 KDB: stack backtrace: #0 0xffffffff80c694a5 at kdb_backtrace+0x65 #1 0xffffffff80c1bb5f at vpanic+0x17f #2 0xffffffff8500bf4a at spl_panic+0x3a #3 0xffffffff85032c12 at zfs_znode_alloc+0x522 #4 0xffffffff85033c35 at zfs_zget+0x3b5 #5 0xffffffff8501e84b at zfs_dirent_lookup+0x16b #6 0xffffffff8501e91a at zfs_dirlook+0x7a #7 0xffffffff85030a10 at zfs_lookup+0x3d0 #8 0xffffffff8502c11b at zfs_freebsd_cachedlookup+0x6b #9 0xffffffff80cdc4ed at vfs_cache_lookup+0xad #10 0xffffffff80ce0cf0 at VOP_LOOKUP+0x30 #11 0xffffffff80ce0723 at cache_fplookup_noentry+0x1a3 #12 0xffffffff80cddfa6 at cache_fplookup+0x366 #13 0xffffffff80ce8cfa at namei+0x12a #14 0xffffffff80d06993 at kern_statat+0xf3 #15 0xffffffff80d0708f at sys_fstatat+0x2f #16 0xffffffff810b06ec at amd64_syscall+0x10c #17 0xffffffff81087e8b at fast_syscall_common+0xf8 Uptime: 56m13s (ada0:ahcich1:0:0:0): spin-down (ada1:ahcich2:0:0:0): spin-down (ada2:ahcich3:0:0:0): spin-down (ada3:ahcich4:0:0:0): spin-down Dumping 31966 out of 130858 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80c1b75c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0xffffffff80c1bbce in vpanic ( fmt=0xffffffff8527c012 "VERIFY3(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp, SA_HDL_SHARED, &zp->z_sa_hdl)) failed (0 == %lld)\n", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0xffffffff8500bf4a in spl_panic (file=<optimized out>, func=<optimized out>, line=<unavailable>, fmt=<unavailable>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_misc.c:107 #5 0xffffffff85032c12 in zfs_znode_sa_init (zfsvfs=0xfffff8027a9c5000, zp=0xfffff807415f9ce8, db=0xfffff80f7c49acb8, obj_type=DMU_OT_SA, sa_hdl=<optimized out>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_znode.c:364 #6 zfs_znode_alloc (zfsvfs=zfsvfs@entry=0xfffff8027a9c5000, db=0xfffff80f7c49acb8, blksz=3584, obj_type=DMU_OT_SA, hdl=hdl@entry=0x0) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_znode.c:466 #7 0xffffffff85033c35 in zfs_zget (zfsvfs=<optimized out>, zfsvfs@entry=0xfffff8027a9c5000, obj_num=<optimized out>, zpp=zpp@entry=0xfffffe03109916f0) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_znode.c:1042 #8 0xffffffff8501e84b in zfs_dirent_lookup ( dzp=dzp@entry=0xfffff80741474588, name=0xfffffe0310991860 "tb_pkmeth.c", zpp=zpp@entry=0xfffffe0310991740, flag=flag@entry=2) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_dir.c:191 #9 0xffffffff8501e91a in zfs_dirlook (dzp=dzp@entry=0xfffff80741474588, name=<unavailable>, name@entry=0xfffffe0310991860 "tb_pkmeth.c", zpp=zpp@entry=0xfffffe03109917e0) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_dir.c:247 #10 0xffffffff85030a10 in zfs_lookup (dvp=<optimized out>, nm=nm@entry=0xfffffe0310991860 "tb_pkmeth.c", vpp=<optimized out>, cnp=cnp@entry=0xfffffe0310991c58, nameiop=0, cr=<optimized out>, flags=0, cached=1) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:929 #11 0xffffffff8502c11b in zfs_freebsd_lookup (ap=0xfffffe0310991990, ap@entry=<error reading variable: value is not available>, cached=1) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:4593 #12 zfs_freebsd_cachedlookup (ap=0xfffffe0310991990, ap@entry=<error reading variable: value is not available>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vnops_os.c:4601 #13 0xffffffff80cdc4ed in VOP_CACHEDLOOKUP (dvp=0xfffff80730da45b8, vpp=0xfffffe0310991a10, cnp=0xfffffe0310991c58) at ./vnode_if.h:99 #14 vfs_cache_lookup (ap=<unavailable>, ap@entry=<error reading variable: value is not available>) at /usr/src/sys/kern/vfs_cache.c:3069 #15 0xffffffff80ce0cf0 in VOP_LOOKUP (dvp=dvp@entry=0xfffff80730da45b8, vpp=<unavailable>, vpp@entry=0xfffffe0310991a10, cnp=<unavailable>, cnp@entry=0xfffffe0310991c58) at ./vnode_if.h:65 #16 0xffffffff80ce0723 in cache_fplookup_noentry ( fpl=fpl@entry=0xfffffe0310991a88) at /usr/src/sys/kern/vfs_cache.c:4928 #17 0xffffffff80cddfa6 in cache_fplookup_next (fpl=0xfffffe0310991a88) at /usr/src/sys/kern/vfs_cache.c:5284 #18 cache_fplookup_impl (dvp=<optimized out>, fpl=0xfffffe0310991a88) at /usr/src/sys/kern/vfs_cache.c:5932 #19 cache_fplookup (ndp=ndp@entry=0xfffffe0310991bd8, status=status@entry=0xfffffe0310991b84, pwdp=pwdp@entry=0xfffffe0310991b88) at /usr/src/sys/kern/vfs_cache.c:6104 #20 0xffffffff80ce8cfa in namei (ndp=ndp@entry=0xfffffe0310991bd8) at /usr/src/sys/kern/vfs_lookup.c:570 #21 0xffffffff80d06993 in kern_statat (td=0xfffffe030c6b5000, flag=<optimized out>, fd=-100, path=<unavailable>, pathseg=<unavailable>, pathseg@entry=UIO_USERSPACE, sbp=sbp@entry=0xfffffe0310991d18, hook=0x0) at /usr/src/sys/kern/vfs_syscalls.c:2441 #22 0xffffffff80d0708f in sys_fstatat (td=<unavailable>, uap=0xfffffe030c6b53e8) at /usr/src/sys/kern/vfs_syscalls.c:2418 #23 0xffffffff810b06ec in syscallenter (td=0xfffffe030c6b5000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:189 #24 amd64_syscall (td=0xfffffe030c6b5000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1185 #25 <signal handler called> #26 0x00000008011ad39a in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffd588 (kgdb) ------------------------------------------------------------------------
I got back to trying to move forward with this issue (re-enabling full EOD runs) and found out where the problem was. In my nextcloud jail, part of the /usr/src file system would cause a panic if accessed (i.e. running a find over it). I haven't gotten around to locating the exact directory/file. Now the interesting thing is that this dataset is encrypted and would mount when decrypted (using a key from higher up the filesytem hierarchy (typed in password as part of startup)). The panic would only occur on access to parts of the filesytem dataset. I tried replicating the dataset (to keep for later diagnosis), but upon mounting, machine would panic, requiring a boot into single user mode and deleting the copied dataset (probably should just modify "canmount"), before booting would complete without a panic. My backups(?) consisted of dataset replication onto other pools (in the same machine and to another (soon to be offsite machine (running truenas)). When I entered the key and mounting occurred, both other systems would panic. My only solution (I could think of), was to create a new dataset and copy over (using rsync in this case) all the folders except /usr/src. I copied /usr/src from another jail. I have renamed and kept the original dataset for potential debugging in the future. Moral of the story: Proof that ZFS replication is actually NOT the same as a backup. The corruption was propagated in a more virulent form (mount == panic) to the replicated dataset. At some time I would appreciate being able to help someone figure out what has happened to the dataset, and how to stop similar in the future. It has shaken my faith a little (in ZFS).
(In reply to Duncan from comment #5) > … I tried replicating the dataset (to keep for later diagnosis), but upon > mounting, machine would panic, … Was the try entirely successful? Is it the same type of panic? From <https://old.reddit.com/r/zfs/comments/kx6865/-/gjjmt8n/?context=4> (I can not verify this): >> … Replicating broken snapshots/datasets isn't possible. Even if there's an >> unrepairable CKSUM error in a data block inside a file, and all the actual >> metadata is intact, zfs will refuse to send that snapshot. …
(In reply to Graham Perrin from comment #6) The replication I beleive works fine, as long as one doesn't then try to mount the dataset. I will check this properly and perphaps try setting up another machine to run the panics on. It is a bit of a pain to keep knocking over my main server. I should get back to this within the week. I did get a different type of crash dump, I believe from the mount and it is different, i.e. Unread portion of the kernel message buffer: panic: VERIFY3(sa.sa_magic == SA_MAGIC) failed (1122422741 == 3100762) cpuid = 5 time = 1666406111 KDB: stack backtrace: #0 0xffffffff80c694a5 at kdb_backtrace+0x65 #1 0xffffffff80c1bb5f at vpanic+0x17f #2 0xffffffff84ff4f4a at spl_panic+0x3a #3 0xffffffff851948f8 at zpl_get_file_info+0x1d8 #4 0xffffffff85060388 at dmu_objset_userquota_get_ids+0x298 #5 0xffffffff85073f24 at dnode_setdirty+0x34 #6 0xffffffff8504bd49 at dbuf_dirty+0x9d9 #7 0xffffffff85061fc0 at dmu_objset_space_upgrade+0x40 #8 0xffffffff85060a5f at dmu_objset_id_quota_upgrade_cb+0x14f #9 0xffffffff85061eaf at dmu_objset_upgrade_task_cb+0x7f #10 0xffffffff84ff6a0f at taskq_run+0x1f #11 0xffffffff80c7da81 at taskqueue_run_locked+0x181 #12 0xffffffff80c7ed92 at taskqueue_thread_loop+0xc2 #13 0xffffffff80bd8a9e at fork_exit+0x7e #14 0xffffffff810885ee at fork_trampoline+0xe Uptime: 13m13s (ada0:ahcich1:0:0:0): spin-down (ada1:ahcich2:0:0:0): spin-down (ada2:ahcich3:0:0:0): spin-down (ada3:ahcich4:0:0:0): spin-down Dumping 13911 out of 130858 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 55 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu, (kgdb) #0 __curthread () at /usr/src/sys/amd64/include/pcpu_aux.h:55 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:399 #2 0xffffffff80c1b75c in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:487 #3 0xffffffff80c1bbce in vpanic ( fmt=0xffffffff85250fe8 "VERIFY3(sa.sa_magic == SA_MAGIC) failed (%llu == %llu)\n", ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:920 #4 0xffffffff84ff4f4a in spl_panic (file=<optimized out>, func=<optimized out>, line=<unavailable>, fmt=<unavailable>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_misc.c:107 #5 0xffffffff851948f8 in zpl_get_file_info (bonustype=<optimized out>, data=0xfffffe035db250c0, zoi=0xfffffe027e72bc50) at /usr/src/sys/contrib/openzfs/module/zfs/zfs_quota.c:89 #6 0xffffffff85060388 in dmu_objset_userquota_get_ids ( dn=0xfffff8160ebcf660, before=before@entry=1, tx=<optimized out>, tx@entry=0xfffff80ec760a100) at /usr/src/sys/contrib/openzfs/module/zfs/dmu_objset.c:2215 #7 0xffffffff85073f24 in dnode_setdirty (dn=0xfffff8160ebcf660, tx=0xfffff80ec760a100) at /usr/src/sys/contrib/openzfs/module/zfs/dnode.c:1691 #8 0xffffffff8504bd49 in dbuf_dirty (db=0xfffff8160ebd3b90, db@entry=0x0, tx=tx@entry=0xfffff8160ebd3b90) at /usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:2367 #9 0xffffffff8504c074 in dmu_buf_will_dirty_impl (db_fake=<optimized out>, flags=<optimized out>, flags@entry=9, tx=0xfffff8160ebd3b90, tx@entry=0xfffff80ec760a100) at /usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:2517 #10 0xffffffff8504aea2 in dmu_buf_will_dirty (db_fake=<unavailable>, tx=<unavailable>, tx@entry=0xfffff80ec760a100) at /usr/src/sys/contrib/openzfs/module/zfs/dbuf.c:2523 #11 0xffffffff85061fc0 in dmu_objset_space_upgrade ( os=os@entry=0xfffff80408629800) at /usr/src/sys/contrib/openzfs/module/zfs/dmu_objset.c:2328 #12 0xffffffff85060a5f in dmu_objset_id_quota_upgrade_cb ( os=0xfffff80408629800) at /usr/src/sys/contrib/openzfs/module/zfs/dmu_objset.c:2385 #13 0xffffffff85061eaf in dmu_objset_upgrade_task_cb (data=0xfffff80408629800) at /usr/src/sys/contrib/openzfs/module/zfs/dmu_objset.c:1447 #14 0xffffffff84ff6a0f in taskq_run (arg=0xfffff801e5ab5300, pending=<unavailable>) at /usr/src/sys/contrib/openzfs/module/os/freebsd/spl/spl_taskq.c:315 #15 0xffffffff80c7da81 in taskqueue_run_locked ( queue=queue@entry=0xfffff80116004300) at /usr/src/sys/kern/subr_taskqueue.c:477 #16 0xffffffff80c7ed92 in taskqueue_thread_loop (arg=<optimized out>, arg@entry=0xfffff801dfb570d0) at /usr/src/sys/kern/subr_taskqueue.c:794 #17 0xffffffff80bd8a9e in fork_exit ( callout=0xffffffff80c7ecd0 <taskqueue_thread_loop>, arg=0xfffff801dfb570d0, frame=0xfffffe027e72bf40) at /usr/src/sys/kern/kern_fork.c:1093 #18 <signal handler called> #19 mi_startup () at /usr/src/sys/kern/init_main.c:322 #20 0xffffffff80f791d9 in swapper () at /usr/src/sys/vm/vm_swapout.c:755 #21 0xffffffff80385022 in btext () at /usr/src/sys/amd64/amd64/locore.S:80 ---------------------- I would say this is a similar but different problem. I had months of replicated copies on two different pools. Because I copied (send/recieve) them encrypted and unmounted on the destination, nothing showed up. As soon as I tried to mount them, panic. Currently I have renamed the original dataset (currently unmounted), but I deleted the backups (they wouldn't mount, but I'm sure I can re-create them). I will do more experimintaion when I have a couple of hours spare (within the week)
(In reply to Duncan from comment #7) Thanks. Given the differences, I suggest a separate bug report with 266014 as 'see also'. I have bookmarked a pull request and some (GitHub) issues as food for thought; not sharing them here, because I don't imagine this particular food relating to comment #0.