While rsync'ing some files, the system eventually panics with: > panic: solaris assert: sa.sa_magic == 0x2F505A (0x51786a88 == 0x2f505a), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c, line: 610 The backtrace is: > #0 0xffffffff804f23c0 at kdb_backtrace+0x60 > #1 0xffffffff804b3824 at vpanic+0x124 > #2 0xffffffff804b36f3 at panic+0x43 > #3 0xffffffff8109122f at assfail3+0x2f > #4 0xffffffff80e4d417 at zfs_space_delta_cb+0x107 > #5 0xffffffff80dc156f at dmu_objset_userquota_get_ids+0x36f > #6 0xffffffff80dc952f at dnode_sync+0x9f > #7 0xffffffff80dc0aeb at dmu_objset_sync_dnodes+0x2b > #8 0xffffffff80dc0920 at dmu_objset_sync+0x220 > #9 0xffffffff80dd241a at dsl_dataset_sync+0x1ca > #10 0xffffffff80ddabca at dsl_pool_sync+0x7a > #11 0xffffffff80dfa77d at spa_sync+0x57d > #12 0xffffffff80e03f56 at txg_sync_thread+0x3a6 > #13 0xffffffff8048037a at fork_exit+0x9a > #14 0xffffffff806ccfde at fork_trampoline+0xe This happens on: > FreeBSD fred 10.3-STABLE FreeBSD 10.3-STABLE #1 r308740: Sat Nov 19 21:15:27 GMT 2016 root@fred:/usr/obj/usr/src/sys/FRED amd64 and as well on a newer version: > FreeBSD fred 10.3-STABLE FreeBSD 10.3-STABLE #0 r312924: Sun Jan 29 01:45:50 GMT 2017 root@fred:/usr/obj/usr/src/sys/FRED amd64 The machine has ECC RAM, and a zpool scrub did not report any errors. The pool is a 2-way-mirror. The actual value of sa.sa_magic changes (not everytime though): e.g. 0x51786a88 or 0x5178691e. The code in question at zfs_vfsops.c, line 610 is: > static int > zfs_space_delta_cb(dmu_object_type_t bonustype, void *data, > uint64_t *userp, uint64_t *groupp) > { > /* > * Is it a valid type of object to track? > */ > if (bonustype != DMU_OT_ZNODE && bonustype != DMU_OT_SA) > return (SET_ERROR(ENOENT)); > > /* > * If we have a NULL data pointer > * then assume the id's aren't changing and > * return EEXIST to the dmu to let it know to > * use the same ids > */ > if (data == NULL) > return (SET_ERROR(EEXIST)); > > if (bonustype == DMU_OT_ZNODE) { > znode_phys_t *znp = data; > *userp = znp->zp_uid; > *groupp = znp->zp_gid; > } else { > int hdrsize; > sa_hdr_phys_t *sap = data; > sa_hdr_phys_t sa = *sap; > boolean_t swap = B_FALSE; > > ASSERT(bonustype == DMU_OT_SA); > > if (sa.sa_magic == 0) { > /* > * This should only happen for newly created > * files that haven't had the znode data filled > * in yet. > */ > *userp = 0; > *groupp = 0; > return (0); > } > if (sa.sa_magic == BSWAP_32(SA_MAGIC)) { > sa.sa_magic = SA_MAGIC; > sa.sa_layout_info = BSWAP_16(sa.sa_layout_info); > swap = B_TRUE; > } else { > /*610*/ VERIFY3U(sa.sa_magic, ==, SA_MAGIC); // FAILS > } > > hdrsize = sa_hdrsize(&sa); > VERIFY3U(hdrsize, >=, sizeof (sa_hdr_phys_t)); > *userp = *((uint64_t *)((uintptr_t)data + hdrsize + > SA_UID_OFFSET)); > *groupp = *((uint64_t *)((uintptr_t)data + hdrsize + > SA_GID_OFFSET)); > if (swap) { > *userp = BSWAP_64(*userp); > *groupp = BSWAP_64(*groupp); > } > } > return (0); > } The stacktrace has dmu_objset_userquota_get_ids listed. And the target file system (into which rsync is writing) has quota enabled: > NAME PROPERTY VALUE SOURCE > freddata/backup/alexs-imac type filesystem - > freddata/backup/alexs-imac creation Wed Mar 7 18:01 2012 - > freddata/backup/alexs-imac used 521G - > freddata/backup/alexs-imac available 29.2G - > freddata/backup/alexs-imac referenced 213G - > freddata/backup/alexs-imac compressratio 1.11x - > freddata/backup/alexs-imac mounted yes - > freddata/backup/alexs-imac quota 550G local > freddata/backup/alexs-imac reservation 300G local > freddata/backup/alexs-imac recordsize 128K default > freddata/backup/alexs-imac mountpoint /backup/alexs-imac inherited from freddata/backup > freddata/backup/alexs-imac sharenfs off default > freddata/backup/alexs-imac checksum fletcher4 inherited from freddata > freddata/backup/alexs-imac compression lz4 local > freddata/backup/alexs-imac atime off inherited from freddata/backup > freddata/backup/alexs-imac devices on default > freddata/backup/alexs-imac exec on default > freddata/backup/alexs-imac setuid on default > freddata/backup/alexs-imac readonly off default > freddata/backup/alexs-imac jailed off default > freddata/backup/alexs-imac snapdir hidden default > freddata/backup/alexs-imac aclmode discard default > freddata/backup/alexs-imac aclinherit restricted default > freddata/backup/alexs-imac canmount on default > freddata/backup/alexs-imac xattr off temporary > freddata/backup/alexs-imac copies 1 default > freddata/backup/alexs-imac version 5 - > freddata/backup/alexs-imac utf8only off - > freddata/backup/alexs-imac normalization none - > freddata/backup/alexs-imac casesensitivity sensitive - > freddata/backup/alexs-imac vscan off default > freddata/backup/alexs-imac nbmand off default > freddata/backup/alexs-imac sharesmb off default > freddata/backup/alexs-imac refquota none default > freddata/backup/alexs-imac refreservation none default > freddata/backup/alexs-imac primarycache all default > freddata/backup/alexs-imac secondarycache all default > freddata/backup/alexs-imac usedbysnapshots 308G - > freddata/backup/alexs-imac usedbydataset 213G - > freddata/backup/alexs-imac usedbychildren 0 - > freddata/backup/alexs-imac usedbyrefreservation 0 - > freddata/backup/alexs-imac logbias latency default > freddata/backup/alexs-imac dedup off default > freddata/backup/alexs-imac mlslabel - > freddata/backup/alexs-imac sync standard default > freddata/backup/alexs-imac refcompressratio 1.06x - > freddata/backup/alexs-imac written 145G - > freddata/backup/alexs-imac logicalused 569G - > freddata/backup/alexs-imac logicalreferenced 221G - > freddata/backup/alexs-imac volmode default default > freddata/backup/alexs-imac filesystem_limit none default > freddata/backup/alexs-imac snapshot_limit none default > freddata/backup/alexs-imac filesystem_count none default > freddata/backup/alexs-imac snapshot_count none default > freddata/backup/alexs-imac redundant_metadata all default
Did you do 'zfs upgrade' any time recently?
Haven't done any zfs/zpool upgrades in a long time. zfs version is 5 (for this and all other fs in the pool). zpool version is: > NAME PROPERTY VALUE SOURCE > [...snip...] > freddata version - default > freddata feature@async_destroy enabled local > freddata feature@empty_bpobj active local > freddata feature@lz4_compress active local > freddata feature@multi_vdev_crash_dump enabled local > freddata feature@spacemap_histogram active local > freddata feature@enabled_txg active local > freddata feature@hole_birth active local > freddata feature@extensible_dataset enabled local > freddata feature@embedded_data disabled local > freddata feature@bookmarks enabled local > freddata feature@filesystem_limits disabled local > freddata feature@large_blocks disabled local
Could it be that you have some old files that were created before your last zfs upgrade (even if that was a long time ago)? Could it be that rsync updates those files? You might find this issue (close, but no associated commit) to be interesting: https://github.com/zfsonlinux/zfs/issues/2025 I see that in your case the wrong magic values also look like Unix timestamps, both from 25 April 2013. Could you please check if you still have more files from that era? P.S. Some other similar reports in ZoL repository: https://github.com/zfsonlinux/zfs/issues/1303 https://github.com/zfsonlinux/zfs/issues/3968
Thanks Andriy. There are indeed lots of (very) old files there. I had an earlier snapshot from a few years ago (that refs all these files). Deleted all files in the fs, and rsync'd everything anew. No panic. I've also tried (first rolling back to the old snapshot with all old files present) rsync with either --xattrs present or not. Panics for both cases. But the time until panic varies a lot. So looks like a race condition. Question is now, is there bogus data (old format that wasn't properly upgraded on-the-fly) with the correct checksum on the disk, and the current code trips over it. Or is the race condition still present when it tries to on-the-fly-upgrade old files?
(In reply to johannes from comment #4) I believe that it is the race condition. https://github.com/zfsonlinux/zfs/issues/2025#issuecomment-40459546 I think that what I wrote there still holds and there hasn't been any fix in this area.
Haven't encountered this problem in a long time. Closing it.
(In reply to johannes from comment #6) I believe that the problem still exists, but it's very specific (post zfs upgrade) and rare.