Under serious delete load, the server crashes with the above report. Bug submitted on request of avg@FreeBSD.org --WjW Architecture: amd64 Architecture Version: 2 Dump Length: 7031291904B (6705 MB) Blocksize: 512 Dumptime: Tue Sep 29 12:55:01 2015 Hostname: zfs.digiware.nl Magic: FreeBSD Kernel Dump Version String: FreeBSD 10.2-STABLE #2 r287746M: Sun Sep 13 12:33:23 CEST 2015 root@zfs.digiware.nl:/home/obj/amd64/usr/srcs/work10/src/sys/ZFS Panic String: solaris assert: start < end, file: /usr/srcs/work10/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c, line: 219 (kgdb) bt #0 doadump (textdump=1) at pcpu.h:219 #1 0xffffffff804f9d97 in kern_reboot (howto=260) at /usr/srcs/work10/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff804fa195 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/srcs/work10/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff804fa023 in panic (fmt=0x0) at /usr/srcs/work10/src/sys/kern/kern_shutdown.c:687 #4 0xffffffff8106a1fd in assfail (a=<value optimized out>, f=<value optimized out>, l=<value optimized out>) at /usr/srcs/work10/src/sys/modules/opensolaris/../../cddl/compat/opensolaris/kern/opensolaris_cmn_err.c:81 #5 0xffffffff80e65161 in trim_map_segment_add (tm=0xfffff8001432ba00, start=7704576, end=7704576, txg=0) at /usr/srcs/work10/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c:219 #6 0xffffffff80e6464f in trim_map_free_locked (tm=<value optimized out>, start=<value optimized out>, end=<value optimized out>, txg=<value optimized out>) at /usr/srcs/work10/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c:319 #7 0xffffffff80e64588 in trim_map_free (vd=0xfffff8000ae3b800, offset=7704576, size=<value optimized out>, txg=0) at /usr/srcs/work10/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c:337 #8 0xffffffff80db490c in arc_hdr_destroy (hdr=0xfffff800198ac5e8) at /usr/srcs/work10/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:1825 #9 0xffffffff80db980a in l2arc_evict (dev=0xfffff8000aa2d480, distance=<value optimized out>, all=0) at /usr/srcs/work10/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:5004 #10 0xffffffff80dba0da in l2arc_feed_thread (dummy=<value optimized out>) at /usr/srcs/work10/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c:5581 #11 0xffffffff804c9855 in fork_exit (callout=0xffffffff80db9b80 <l2arc_feed_thread>, arg=0x0, frame=0xfffffe023bbc4bc0) at /usr/srcs/work10/src/sys/kern/kern_fork.c:1018 #12 0xffffffff807773ee in fork_trampoline () at /usr/srcs/work10/src/sys/amd64/amd64/exception.S:611 #13 0x0000000000000000 in ?? () And the values there are: #5 0xffffffff80e65161 in trim_map_segment_add (tm=0xfffff8001432ba00, start=7704576, end=7704576, txg=0) at /usr/srcs/work10/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/trim_map.c:219 219 VERIFY(start < end); (kgdb) p start $1 = 7704576 (kgdb) p end $2 = 7704576
Created attachment 161810 [details] proposed patch for testing Could you please try this patch? It would especially useful if a kernel (and zfs module) is built with INVARIANTS. As I understand the issue is not readily reproducible?
(In reply to Andriy Gapon from comment #1) You are right, the problem is hard to trigger. This problem was only triggered once. And since the last upgrade to 10-STABLE the system is surprisingly stable... I used to crash every few days on the "deadmansbutton" for missing a write status return. The trim-assert only happened once, while removing large ZVOLs. I'll upgrade, and start some work on ZVOLs, to see if it happens again. --WjW
(In reply to Willem Jan Withagen from comment #2) Is the patch against 10.2 or against HEAD? I'd expect the latter since I'm getting an anry patch... Patching file sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c using Plan A... Hunk #1 succeeded at 890 (offset -244 lines). Hunk #2 failed at 2223. Hunk #3 failed at 4602. Hunk #4 failed at 5764. 3 out of 4 hunks failed--saving rejects to sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c.rej Hmm... Ignoring the trailing garbage. done Does it also work against 10.2 if I manually try to patch up? --WjW
The patch was made against head, yes, but I think that it should also apply to the latest stable/10 (after the recent MFC wave from mav@). You can try to make the changes by hand if the differences are not that significant.
Created attachment 161919 [details] Patch agains 10.2-STABLE
(In reply to Willem Jan Withagen from comment #5) Patched and rebooted the server running: FreeBSD 10.2-STABLE #3 r289060M Sofar no bad effects. Server is up for 2 days.
The patch looks good. Thank you for testing.
A commit references this bug: Author: avg Date: Fri Oct 30 12:00:35 UTC 2015 New revision: 290191 URL: https://svnweb.freebsd.org/changeset/base/290191 Log: l2arc: do not call trim_map_free() for blocks with zero b_asize b_asize can be zero if the block is compressed into an empty block (ZIO_COMPRESS_EMPTY) and the trim code asserts that meaningless zero-sized trimming is not attempted. The logic for calling trim_map_free() is extracted into a new function l2arc_trim() to minimize code duplication. PR: 203473 Reported by: Willem Jan Withagen <wjw@digiware.nl> Tested by: Willem Jan Withagen <wjw@digiware.nl> MFC after: 11 days Changes: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
Patch is still holding itself. wjw@zfs.digiware.nl> uptime 1:39PM up 21 days, 2:33, 1 user, load averages: 0.21, 0.13, 0.13 Thanx for the commit
(In reply to Willem Jan Withagen from comment #9) Something that I should have asked earlier: what is a value of kstat.zfs.misc.arcstats.l2_compress_zeros sysctl?
(In reply to Andriy Gapon from comment #10) wjw@zfs.digiware.nl> sysctl kstat.zfs.misc.arcstats.l2_compress_zeros kstat.zfs.misc.arcstats.l2_compress_zeros: 0 There is no description with this sysctl. So is it good or bad that it is 0. Or in other words: What does the sysctl register?
(In reply to Willem Jan Withagen from comment #11) It means that there hasn't been any buffers so far that could have provoked the bug.
(In reply to Andriy Gapon from comment #12) Any suggestions as to how to get buffers into the state that they might provoke this situation? Last time I think I had a 5Tb fully written ZVOl which I deleted. Would that run into the buffer problem?
Running 10.2-STABLE r289928 and have been experiencing the panic detailed above. As of this post, I show sysctl kstat.zfs.misc.arcstats.l2_compress_zeros: 228 with an uptime of 6hr 49min. I will apply the patch and let you know if this rectifies the problem. Should I expect the sysctl to report 0 after the patch is applied?
(In reply to Paul Boehmer from comment #14) No.
A commit references this bug: Author: mav Date: Fri Nov 13 09:52:36 UTC 2015 New revision: 290766 URL: https://svnweb.freebsd.org/changeset/base/290766 Log: MFC r290191 (by avg): l2arc: do not call trim_map_free() for blocks with zero b_asize b_asize can be zero if the block is compressed into an empty block (ZIO_COMPRESS_EMPTY) and the trim code asserts that meaningless zero-sized trimming is not attempted. The logic for calling trim_map_free() is extracted into a new function l2arc_trim() to minimize code duplication. PR: 203473 Reported by: Willem Jan Withagen <wjw@digiware.nl> Tested by: Willem Jan Withagen <wjw@digiware.nl> Changes: _U stable/10/ stable/10/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
It looks like it was easy to provoke on my system in the following scenarios: rsync -avu from my server to a zfs fs used for backup running a couple of linux bhyve systems backed by zvols doing heavy io (yum / apt-get upgrade) given the patch has just been synced to RELENG_10 I'll update and try to provoke the system again to see if it is the solution for my case
This has solved the issue for me too, thanks!
Thank you, everyone, for testing and reporting.
Uptime of 12 days on the patch (5 days max prior to patching), no issues on my end. Excellent work, thanks.
A commit references this bug: Author: avg Date: Tue Nov 17 15:10:25 UTC 2015 New revision: 290983 URL: https://svnweb.freebsd.org/changeset/base/290983 Log: MFC r290191: l2arc: do not call trim_map_free() for blocks with zero b_asize The code in this branch has become quite outdated, so the change had to be applied by hand. PR: 203473 Changes: _U stable/9/sys/ _U stable/9/sys/cddl/contrib/opensolaris/ stable/9/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c