Summary: | zfs panic: sa.sa_magic == 0x2F505A in zfs_space_delta_cb() | ||
---|---|---|---|
Product: | Base System | Reporter: | jo |
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | Open --- | ||
Severity: | Affects Only Me | CC: | ae, avg, dpetrov67, grahamperrin |
Priority: | --- | Keywords: | crash |
Version: | 10.3-STABLE | ||
Hardware: | amd64 | ||
OS: | Any | ||
See Also: |
https://github.com/openzfs/zfs/issues/2025 https://github.com/openzfs/zfs/issues/6332 https://github.com/openzfsonosx/openzfs/issues/5 |
Description
jo
2017-01-29 22:57:42 UTC
Did you do 'zfs upgrade' any time recently? Haven't done any zfs/zpool upgrades in a long time.
zfs version is 5 (for this and all other fs in the pool).
zpool version is:
> NAME PROPERTY VALUE SOURCE
> [...snip...]
> freddata version - default
> freddata feature@async_destroy enabled local
> freddata feature@empty_bpobj active local
> freddata feature@lz4_compress active local
> freddata feature@multi_vdev_crash_dump enabled local
> freddata feature@spacemap_histogram active local
> freddata feature@enabled_txg active local
> freddata feature@hole_birth active local
> freddata feature@extensible_dataset enabled local
> freddata feature@embedded_data disabled local
> freddata feature@bookmarks enabled local
> freddata feature@filesystem_limits disabled local
> freddata feature@large_blocks disabled local
Could it be that you have some old files that were created before your last zfs upgrade (even if that was a long time ago)? Could it be that rsync updates those files? You might find this issue (close, but no associated commit) to be interesting: https://github.com/zfsonlinux/zfs/issues/2025 I see that in your case the wrong magic values also look like Unix timestamps, both from 25 April 2013. Could you please check if you still have more files from that era? P.S. Some other similar reports in ZoL repository: https://github.com/zfsonlinux/zfs/issues/1303 https://github.com/zfsonlinux/zfs/issues/3968 Thanks Andriy. There are indeed lots of (very) old files there. I had an earlier snapshot from a few years ago (that refs all these files). Deleted all files in the fs, and rsync'd everything anew. No panic. I've also tried (first rolling back to the old snapshot with all old files present) rsync with either --xattrs present or not. Panics for both cases. But the time until panic varies a lot. So looks like a race condition. Question is now, is there bogus data (old format that wasn't properly upgraded on-the-fly) with the correct checksum on the disk, and the current code trips over it. Or is the race condition still present when it tries to on-the-fly-upgrade old files? (In reply to johannes from comment #4) I believe that it is the race condition. https://github.com/zfsonlinux/zfs/issues/2025#issuecomment-40459546 I think that what I wrote there still holds and there hasn't been any fix in this area. Haven't encountered this problem in a long time. Closing it. (In reply to johannes from comment #6) I believe that the problem still exists, but it's very specific (post zfs upgrade) and rare. (In reply to Andriy Gapon from comment #7) We hit this problem 2 times last week. One machine is based on 12.0, second one is based on 13.0 (before ZoL migration). I looked at the ZoL code, it seems to me, that the code didn't changed here. So, I think it is possible to get such panic on the up to date OpenZFS too. I have core dump from the one panic. Let me know if you are interested to see something. panic: solaris assert: sa.sa_magic == 0x2F505A (0xb656f0e0 == 0x2f505a), file: /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c, line: 609 cpuid = 4 time = 1623060642 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe3fd9321780 vpanic() at vpanic+0x194/frame 0xfffffe3fd93217e0 panic() at panic+0x43/frame 0xfffffe3fd9321840 assfail3() at assfail3+0x2c/frame 0xfffffe3fd9321860 zfs_space_delta_cb() at zfs_space_delta_cb+0x100/frame 0xfffffe3fd93218a0 dmu_objset_userquota_get_ids() at dmu_objset_userquota_get_ids+0x1b7/frame 0xfffffe3fd93218f0 dnode_sync() at dnode_sync+0xa6/frame 0xfffffe3fd9321980 sync_dnodes_task() at sync_dnodes_task+0x92/frame 0xfffffe3fd93219c0 taskq_run() at taskq_run+0x10/frame 0xfffffe3fd93219e0 taskqueue_run_locked() at taskqueue_run_locked+0x147/frame 0xfffffe3fd9321a40 taskqueue_thread_loop() at taskqueue_thread_loop+0xb8/frame 0xfffffe3fd9321a70 fork_exit() at fork_exit+0x86/frame 0xfffffe3fd9321ab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe3fd9321ab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- Uptime: 14d8h32m53s Dumping 14912 out of 261996 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% __curthread () at ./machine/pcpu.h:232 232 ./machine/pcpu.h: No such file or directory. (kgdb) bt #0 __curthread () at ./machine/pcpu.h:232 #1 doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:318 #2 0xffffffff80aa31c3 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:386 #3 0xffffffff80aa36ae in vpanic (fmt=<optimized out>, ap=0xfffffe3fd9321820) at /usr/src/sys/kern/kern_shutdown.c:779 #4 0xffffffff80aa34d3 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:710 #5 0xffffffff822df23c in assfail3 (a=<unavailable>, lv=<unavailable>, op=<unavailable>, rv=<unavailable>, f=<unavailable>, l=<optimized out>) at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:91 #6 0xffffffff8209e9c0 in zfs_space_delta_cb (bonustype=<optimized out>, data=0xfffff803e64b2f40, userp=0xfffff8052a3b2278, groupp=0xfffff8052a3b2280) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:609 #7 0xffffffff8200d917 in dmu_objset_userquota_get_ids (dn=0xfffff8052a3b2000, before=0, tx=<optimized out>) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:1592 #8 0xffffffff82015396 in dnode_sync (dn=0xfffff8052a3b2000, tx=0xfffff803ae49e800) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode_sync.c:570 #9 0xffffffff8200d192 in dmu_objset_sync_dnodes (list=0xfffff801763bb420, tx=0xfffff803ae49e800) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:1093 #10 sync_dnodes_task (arg=0xfffff82419cdd260) at /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_objset.c:1160 #11 0xffffffff81fd3750 in taskq_run (arg=0xfffff801c66d08d0, pending=<unavailable>) at /usr/src/sys/cddl/compat/opensolaris/kern/opensolaris_taskq.c:109 #12 0xffffffff80afc0e7 in taskqueue_run_locked (queue=0xfffff8016b01cd00) at /usr/src/sys/kern/subr_taskqueue.c:463 #13 0xffffffff80afd2c8 in taskqueue_thread_loop (arg=<optimized out>) at /usr/src/sys/kern/subr_taskqueue.c:755 #14 0xffffffff80a65796 in fork_exit (callout=0xffffffff80afd210 <taskqueue_thread_loop>, arg=0xfffff801281c3f20, frame=0xfffffe3fd9321ac0) at /usr/src/sys/kern/kern_fork.c:1038 #15 <signal handler called> (In reply to Andrey V. Elsukov from comment #8) I think that at this point there is enough information about the problem. So far I could not come up with a good way to fix it, but to be honest I did not try very hard. The problem has not been fixed, so no reason to close this report. Also see: - https://github.com/zfsonlinux/zfs/issues/2025 (closed) - https://github.com/openzfs/zfs/issues/6332 (open) - https://github.com/openzfsonosx/openzfs/issues/5 (open) |