Bug 216586 - zfs panic: sa.sa_magic == 0x2F505A in zfs_space_delta_cb()
Summary: zfs panic: sa.sa_magic == 0x2F505A in zfs_space_delta_cb()
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-fs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-29 22:57 UTC by johannes
Modified: 2020-01-11 03:05 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description johannes 2017-01-29 22:57:42 UTC
While rsync'ing some files, the system eventually panics with:

> panic: solaris assert: sa.sa_magic == 0x2F505A (0x51786a88 == 0x2f505a), file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c, line: 610

The backtrace is:

> #0 0xffffffff804f23c0 at kdb_backtrace+0x60
> #1 0xffffffff804b3824 at vpanic+0x124
> #2 0xffffffff804b36f3 at panic+0x43
> #3 0xffffffff8109122f at assfail3+0x2f
> #4 0xffffffff80e4d417 at zfs_space_delta_cb+0x107
> #5 0xffffffff80dc156f at dmu_objset_userquota_get_ids+0x36f
> #6 0xffffffff80dc952f at dnode_sync+0x9f
> #7 0xffffffff80dc0aeb at dmu_objset_sync_dnodes+0x2b
> #8 0xffffffff80dc0920 at dmu_objset_sync+0x220
> #9 0xffffffff80dd241a at dsl_dataset_sync+0x1ca
> #10 0xffffffff80ddabca at dsl_pool_sync+0x7a
> #11 0xffffffff80dfa77d at spa_sync+0x57d
> #12 0xffffffff80e03f56 at txg_sync_thread+0x3a6
> #13 0xffffffff8048037a at fork_exit+0x9a
> #14 0xffffffff806ccfde at fork_trampoline+0xe

This happens on:
> FreeBSD fred 10.3-STABLE FreeBSD 10.3-STABLE #1 r308740: Sat Nov 19 21:15:27 GMT 2016 root@fred:/usr/obj/usr/src/sys/FRED amd64

and as well on a newer version:
> FreeBSD fred 10.3-STABLE FreeBSD 10.3-STABLE #0 r312924: Sun Jan 29 01:45:50 GMT 2017     root@fred:/usr/obj/usr/src/sys/FRED  amd64

The machine has ECC RAM, and a zpool scrub did not report any errors. The pool is a 2-way-mirror.

The actual value of sa.sa_magic changes (not everytime though): e.g. 0x51786a88 or 0x5178691e.

The code in question at zfs_vfsops.c, line 610 is:

> static int
> zfs_space_delta_cb(dmu_object_type_t bonustype, void *data,
>     uint64_t *userp, uint64_t *groupp)
> {
> 	/*
> 	 * Is it a valid type of object to track?
> 	 */
> 	if (bonustype != DMU_OT_ZNODE && bonustype != DMU_OT_SA)
> 		return (SET_ERROR(ENOENT));
> 
> 	/*
> 	 * If we have a NULL data pointer
> 	 * then assume the id's aren't changing and
> 	 * return EEXIST to the dmu to let it know to
> 	 * use the same ids
> 	 */
> 	if (data == NULL)
> 		return (SET_ERROR(EEXIST));
> 
> 	if (bonustype == DMU_OT_ZNODE) {
> 		znode_phys_t *znp = data;
> 		*userp = znp->zp_uid;
> 		*groupp = znp->zp_gid;
> 	} else {
> 		int hdrsize;
> 		sa_hdr_phys_t *sap = data;
> 		sa_hdr_phys_t sa = *sap;
> 		boolean_t swap = B_FALSE;
> 
> 		ASSERT(bonustype == DMU_OT_SA);
> 
> 		if (sa.sa_magic == 0) {
> 			/*
> 			 * This should only happen for newly created
> 			 * files that haven't had the znode data filled
> 			 * in yet.
> 			 */
> 			*userp = 0;
> 			*groupp = 0;
> 			return (0);
> 		}
> 		if (sa.sa_magic == BSWAP_32(SA_MAGIC)) {
> 			sa.sa_magic = SA_MAGIC;
> 			sa.sa_layout_info = BSWAP_16(sa.sa_layout_info);
> 			swap = B_TRUE;
> 		} else {
> /*610*/		VERIFY3U(sa.sa_magic, ==, SA_MAGIC);   // FAILS
> 		}
> 
> 		hdrsize = sa_hdrsize(&sa);
> 		VERIFY3U(hdrsize, >=, sizeof (sa_hdr_phys_t));
> 		*userp = *((uint64_t *)((uintptr_t)data + hdrsize +
> 		    SA_UID_OFFSET));
> 		*groupp = *((uint64_t *)((uintptr_t)data + hdrsize +
> 		    SA_GID_OFFSET));
> 		if (swap) {
> 			*userp = BSWAP_64(*userp);
> 			*groupp = BSWAP_64(*groupp);
> 		}
> 	}
> 	return (0);
> }

The stacktrace has dmu_objset_userquota_get_ids listed. And the target file system (into which rsync is writing) has quota enabled:

> NAME                        PROPERTY              VALUE                  SOURCE
> freddata/backup/alexs-imac  type                  filesystem             -
> freddata/backup/alexs-imac  creation              Wed Mar  7 18:01 2012  -
> freddata/backup/alexs-imac  used                  521G                   -
> freddata/backup/alexs-imac  available             29.2G                  -
> freddata/backup/alexs-imac  referenced            213G                   -
> freddata/backup/alexs-imac  compressratio         1.11x                  -
> freddata/backup/alexs-imac  mounted               yes                    -
> freddata/backup/alexs-imac  quota                 550G                   local
> freddata/backup/alexs-imac  reservation           300G                   local
> freddata/backup/alexs-imac  recordsize            128K                   default
> freddata/backup/alexs-imac  mountpoint            /backup/alexs-imac     inherited from freddata/backup
> freddata/backup/alexs-imac  sharenfs              off                    default
> freddata/backup/alexs-imac  checksum              fletcher4              inherited from freddata
> freddata/backup/alexs-imac  compression           lz4                    local
> freddata/backup/alexs-imac  atime                 off                    inherited from freddata/backup
> freddata/backup/alexs-imac  devices               on                     default
> freddata/backup/alexs-imac  exec                  on                     default
> freddata/backup/alexs-imac  setuid                on                     default
> freddata/backup/alexs-imac  readonly              off                    default
> freddata/backup/alexs-imac  jailed                off                    default
> freddata/backup/alexs-imac  snapdir               hidden                 default
> freddata/backup/alexs-imac  aclmode               discard                default
> freddata/backup/alexs-imac  aclinherit            restricted             default
> freddata/backup/alexs-imac  canmount              on                     default
> freddata/backup/alexs-imac  xattr                 off                    temporary
> freddata/backup/alexs-imac  copies                1                      default
> freddata/backup/alexs-imac  version               5                      -
> freddata/backup/alexs-imac  utf8only              off                    -
> freddata/backup/alexs-imac  normalization         none                   -
> freddata/backup/alexs-imac  casesensitivity       sensitive              -
> freddata/backup/alexs-imac  vscan                 off                    default
> freddata/backup/alexs-imac  nbmand                off                    default
> freddata/backup/alexs-imac  sharesmb              off                    default
> freddata/backup/alexs-imac  refquota              none                   default
> freddata/backup/alexs-imac  refreservation        none                   default
> freddata/backup/alexs-imac  primarycache          all                    default
> freddata/backup/alexs-imac  secondarycache        all                    default
> freddata/backup/alexs-imac  usedbysnapshots       308G                   -
> freddata/backup/alexs-imac  usedbydataset         213G                   -
> freddata/backup/alexs-imac  usedbychildren        0                      -
> freddata/backup/alexs-imac  usedbyrefreservation  0                      -
> freddata/backup/alexs-imac  logbias               latency                default
> freddata/backup/alexs-imac  dedup                 off                    default
> freddata/backup/alexs-imac  mlslabel                                     -
> freddata/backup/alexs-imac  sync                  standard               default
> freddata/backup/alexs-imac  refcompressratio      1.06x                  -
> freddata/backup/alexs-imac  written               145G                   -
> freddata/backup/alexs-imac  logicalused           569G                   -
> freddata/backup/alexs-imac  logicalreferenced     221G                   -
> freddata/backup/alexs-imac  volmode               default                default
> freddata/backup/alexs-imac  filesystem_limit      none                   default
> freddata/backup/alexs-imac  snapshot_limit        none                   default
> freddata/backup/alexs-imac  filesystem_count      none                   default
> freddata/backup/alexs-imac  snapshot_count        none                   default
> freddata/backup/alexs-imac  redundant_metadata    all                    default
Comment 1 Andriy Gapon freebsd_committer 2017-01-30 09:36:28 UTC
Did you do 'zfs upgrade' any time recently?
Comment 2 johannes 2017-01-31 00:57:05 UTC
Haven't done any zfs/zpool upgrades in a long time.

zfs version is 5 (for this and all other fs in the pool).

zpool version is:
> NAME      PROPERTY                       VALUE                          SOURCE
> [...snip...]
> freddata  version                        -                              default
> freddata  feature@async_destroy          enabled                        local
> freddata  feature@empty_bpobj            active                         local
> freddata  feature@lz4_compress           active                         local
> freddata  feature@multi_vdev_crash_dump  enabled                        local
> freddata  feature@spacemap_histogram     active                         local
> freddata  feature@enabled_txg            active                         local
> freddata  feature@hole_birth             active                         local
> freddata  feature@extensible_dataset     enabled                        local
> freddata  feature@embedded_data          disabled                       local
> freddata  feature@bookmarks              enabled                        local
> freddata  feature@filesystem_limits      disabled                       local
> freddata  feature@large_blocks           disabled                       local
Comment 3 Andriy Gapon freebsd_committer 2017-01-31 13:04:48 UTC
Could it be that you have some old files that were created before your last zfs upgrade (even if that was a long time ago)?
Could it be that rsync updates those files?

You might find this issue (close, but no associated commit) to be interesting:
https://github.com/zfsonlinux/zfs/issues/2025

I see that in your case the wrong magic values also look like Unix timestamps, both from 25 April 2013.
Could you please check if you still have more files from that era?

P.S.
Some other similar reports in ZoL repository:
https://github.com/zfsonlinux/zfs/issues/1303
https://github.com/zfsonlinux/zfs/issues/3968
Comment 4 johannes 2017-02-03 23:02:17 UTC
Thanks Andriy.

There are indeed lots of (very) old files there. I had an earlier snapshot from a few years ago (that refs all these files). Deleted all files in the fs, and rsync'd everything anew. No panic.

I've also tried (first rolling back to the old snapshot with all old files present) rsync with either --xattrs present or not. Panics for both cases.
But the time until panic varies a lot. So looks like a race condition.

Question is now, is there bogus data (old format that wasn't properly upgraded on-the-fly) with the correct checksum on the disk, and the current code trips over it.
Or is the race condition still present when it tries to on-the-fly-upgrade old files?
Comment 5 Andriy Gapon freebsd_committer 2017-02-04 12:36:05 UTC
(In reply to johannes from comment #4)
I believe that it is the race condition.
https://github.com/zfsonlinux/zfs/issues/2025#issuecomment-40459546
I think that what I wrote there still holds and there hasn't been any fix in this area.
Comment 6 johannes 2020-01-11 03:05:21 UTC
Haven't encountered this problem in a long time. Closing it.