Bug 205163 - Periodic jobs triggering panics on ZFS host with jails
Summary: Periodic jobs triggering panics on ZFS host with jails
Status: Closed Unable to Reproduce
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.2-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2015-12-09 17:00 UTC by dustinwenz
Modified: 2017-03-21 12:12 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dustinwenz 2015-12-09 17:00:27 UTC
I have multiple machines that have had occasional panics occur while the daily and weekly periodic scripts run. The panic is always "Fatal trap 18: integer divide fault while in kernel mode".

A notable common factor is that all affected systems have zfs-based startup disks, and 20-40 jails. Each jail has it's own filesystem that was created by cloning the boot filesystem. I suspect this is a zfs bug that is triggered by the access patterns in the periodic scripts. There is significant load on the system when the scheduled processes start, because all jails execute the same scripts at the same time.

I've been able to alleviate this problem by disabling the security scans within the jails, but leave it enabled on the root host.


This is the message logged related to the panic:

Dec  5 04:16:47 <kern.crit> svr-033-08 kernel: 
Dec  5 04:16:47 <kern.crit> svr-033-08 kernel: 
Dec  5 04:16:47 <kern.crit> svr-033-08 kernel: Fatal trap 18: integer divide fault while in kernel mode
Dec  5 04:16:47 <kern.crit> svr-033-08 kernel: cpuid = 19; apic id = 27
Dec  5 04:16:47 <kern.crit> svr-033-08 kernel: instruction pointer	= 0x20:0xffffffff819f54d4
Dec  5 04:16:47 <kern.crit> svr-033-08 kernel: stack pointer	        = 0x28:0xfffffe085fec76f0
Dec  5 04:16:47 <kern.crit> svr-033-08 kernel: frame pointer	        = 0x28:0xfffffe085fec7740
Dec  5 04:23:18 <kern.info> svr-033-08 syslogd: kernel boot file is /boot/kernel/kernel
Dec  5 04:23:18 <kern.crit> svr-033-08 kernel: code segment		= base rx0, limit 0xfffff, type 0x1b
Dec  5 04:23:18 <kern.crit> svr-033-08 kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Dec  5 04:23:18 <kern.crit> svr-033-08 kernel: processor eflags	= interrupt enabled, resume, IOPL = 0
Dec  5 04:23:18 <kern.crit> svr-033-08 kernel: current process		= 20355 (find)
Dec  5 04:23:18 <kern.crit> svr-033-08 kernel: trap number		= 18
Dec  5 04:23:18 <kern.crit> svr-033-08 kernel: panic: integer divide fault
Dec  5 04:23:18 <kern.crit> svr-033-08 kernel: cpuid = 19
Comment 1 dustinwenz 2015-12-09 17:03:03 UTC
This is a kgdb trace on an affected machine, running STABLE r286534M:

kgdb kernel.debug /var/crash/vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 20355 (find)
trap number		= 18
panic: integer divide fault
cpuid = 19
KDB: stack backtrace:
#0 0xffffffff80986710 at kdb_backtrace+0x60
#1 0xffffffff80949e76 at vpanic+0x126
#2 0xffffffff80949d43 at panic+0x43
#3 0xffffffff80d5d3db at trap_fatal+0x36b
#4 0xffffffff80d5d05c at trap+0x75c
#5 0xffffffff80d42f12 at calltrap+0x8
#6 0xffffffff819f4fc8 at dmu_tx_assign+0xf8
#7 0xffffffff81a7a887 at zfs_inactive+0x157
#8 0xffffffff81a8369d at zfs_freebsd_inactive+0xd
#9 0xffffffff80e85ed7 at VOP_INACTIVE_APV+0xa7
#10 0xffffffff809ed182 at vinactive+0x102
#11 0xffffffff809ed572 at vputx+0x272
#12 0xffffffff809f40ea at sys_fchdir+0x2aa
#13 0xffffffff80d5dcf7 at amd64_syscall+0x357
#14 0xffffffff80d431fb at Xfast_syscall+0xfb
Uptime: 16d16h5m25s
Dumping 9283 out of 32705 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/dtraceall.ko.symbols...done.
Loaded symbols for /boot/kernel/dtraceall.ko.symbols
Reading symbols from /boot/kernel/profile.ko.symbols...done.
Loaded symbols for /boot/kernel/profile.ko.symbols
Reading symbols from /boot/kernel/dtrace.ko.symbols...done.
Loaded symbols for /boot/kernel/dtrace.ko.symbols
Reading symbols from /boot/kernel/systrace_freebsd32.ko.symbols...done.
Loaded symbols for /boot/kernel/systrace_freebsd32.ko.symbols
Reading symbols from /boot/kernel/systrace.ko.symbols...done.
Loaded symbols for /boot/kernel/systrace.ko.symbols
Reading symbols from /boot/kernel/sdt.ko.symbols...done.
Loaded symbols for /boot/kernel/sdt.ko.symbols
Reading symbols from /boot/kernel/lockstat.ko.symbols...done.
Loaded symbols for /boot/kernel/lockstat.ko.symbols
Reading symbols from /boot/kernel/fasttrap.ko.symbols...done.
Loaded symbols for /boot/kernel/fasttrap.ko.symbols
Reading symbols from /boot/kernel/fbt.ko.symbols...done.
Loaded symbols for /boot/kernel/fbt.ko.symbols
Reading symbols from /boot/kernel/dtnfscl.ko.symbols...done.
Loaded symbols for /boot/kernel/dtnfscl.ko.symbols
Reading symbols from /boot/kernel/dtmalloc.ko.symbols...done.
Loaded symbols for /boot/kernel/dtmalloc.ko.symbols
Reading symbols from /boot/kernel/ipmi.ko.symbols...done.
Loaded symbols for /boot/kernel/ipmi.ko.symbols
Reading symbols from /boot/kernel/smbus.ko.symbols...done.
Loaded symbols for /boot/kernel/smbus.ko.symbols
Reading symbols from /boot/kernel/if_lagg.ko.symbols...done.
Loaded symbols for /boot/kernel/if_lagg.ko.symbols
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219		__asm("movq %%gs:%1,%0" : "=r" (td)
(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80949ad2 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451
#2  0xffffffff80949eb5 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758
#3  0xffffffff80949d43 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:687
#4  0xffffffff80d5d3db in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851
#5  0xffffffff80d5d05c in trap (frame=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:203
#6  0xffffffff80d42f12 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#7  0xffffffff819f54d4 in dmu_tx_wait (tx=0xfffff805786ae400) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:1078
#8  0xffffffff819f4fc8 in dmu_tx_assign (tx=0xfffff805786ae400, txg_how=TXG_WAIT) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:1339
#9  0xffffffff81a7a887 in zfs_inactive (vp=<value optimized out>, cr=<value optimized out>, ct=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4716
#10 0xffffffff81a8369d in zfs_freebsd_inactive (ap=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6540
#11 0xffffffff80e85ed7 in VOP_INACTIVE_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1953
#12 0xffffffff809ed182 in vinactive (vp=0xfffff80563f32ce8, td=0xfffff8084181b940) at vnode_if.h:807
#13 0xffffffff809ed572 in vputx (vp=0xfffff80563f32ce8, func=1) at /usr/src/sys/kern/vfs_subr.c:2307
#14 0xffffffff809f40ea in sys_fchdir (td=<value optimized out>, uap=<value optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:763
#15 0xffffffff80d5dcf7 in amd64_syscall (td=0xfffff8084181b940, traced=0) at subr_syscall.c:134
#16 0xffffffff80d431fb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#17 0x000000080089aa2a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
Comment 2 dustinwenz 2015-12-09 17:39:33 UTC
Note I see this with a kernel config based on GENERIC, with these options added:

device pf
device pflog
device pfsync
options ALTQ
options ALTQ_CBQ
options ALTQ_RED
options ALTQ_RIO
options ALTQ_HFSC
options ALTQ_PRIQ
options ALTQ_NOPCC

There are also two kernel patches applied from these PRs:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=189870
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594

The last patch might be interesting here because the stack trace below hits lines (1077-1078 in dmu_tx.c) modified by the arc patch:

        min_tx_time = zfs_delay_scale *
            (dirty - delay_min_bytes) / (zfs_dirty_data_max_internal - dirty);
Comment 3 dustinwenz 2015-12-09 21:06:50 UTC
I was checking out some of the variables as they are set at frame 7 (below). If it's to be believed, zfs_dirty_data_max_internal is much smaller than it should be at that point in the code (maybe due to memory corruption, or a race condition regarding zfs_dirty_data_max_internal). That would cause the subtraction at line 1079 to wrap to some value a bit less than uint64 max, but cast as a long long. I'm pretty sure that isn't how the code is intended to work. If that is a problem, disabling the vfs.zfs_dynamic_write_buffer tunable may help, because it should force zfs_dirty_data_max_internal to be the same as zfs_dirty_data_max.

(kgdb) frame 7
#7  0xffffffff819f54d4 in dmu_tx_wait (tx=0xfffff805786ae400) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:1078
1078		min_tx_time = zfs_delay_scale *
(kgdb) list
1073		 * cause a divide-by-zero if it's == the max.
1074		 */
1075		ASSERT3U(dirty, <, zfs_dirty_data_max_internal);
1076	
1077		now = gethrtime();
1078		min_tx_time = zfs_delay_scale *
1079		    (dirty - delay_min_bytes) / (zfs_dirty_data_max_internal - dirty);
1080		if (now > tx->tx_start + min_tx_time)
1081			return;
1082	
(kgdb) print zfs_dirty_data_max 
$1 = 3429452595
(kgdb) print zfs_dirty_data_max_internal
$2 = 331001856
(kgdb) print dirty
$3 = 336166912
Comment 4 karl 2015-12-10 03:28:48 UTC
I have sent you a patch that MIGHT address this.

I don't think the trace you have is valid as the divisor isn't zero according to the prints, but the return of a value that unexpectedly underflows might cause problems elsewhere (and once the stack gets crashed all bets are off.)

Let me know if you can provoke another panic with that in.
Comment 5 Andriy Gapon freebsd_committer freebsd_triage 2015-12-10 08:36:10 UTC
(In reply to dustinwenz from comment #3)
You can also do 'disassemble' in frame 7 and see instructions leading to 0xffffffff819f54d4.  You can examine register values as well ('i reg').
Comment 6 Andriy Gapon freebsd_committer freebsd_triage 2015-12-10 09:02:09 UTC
(In reply to dustinwenz from comment #2)
I suspect that the patch from bug 187594 is buggy. It changes zfs_dirty_data_max_internal without any synchronization.
Comment 7 karl 2015-12-10 15:28:06 UTC
(In reply to Andriy Gapon from comment #6)

The predicate that was turned into a dynamic variable was runtime tunable without any sort of mutex locking; a reasonable perusal of the code disclosed no reason to not do it.

However, the potential for a sign inversion (which incidentally MIGHT occur in the base code as well, or to a corner case that could result in a divide-by-zero) is easily removed (and I have done so and sent the patch to do so to the reporter, asking if it stops the panics.)

Given the stack trace involved here I don't believe that's implicated as the cause, however (since the panic is a divide-by-zero trap yet there is no division by zero evidenced by examination) however, the potential for the stack to get smashed by an integer division that sets the sign bit and leads to misbehavior somewhere else in the code is plausible and defending against both that and the potential for an actual divide-by-zero obviously should be done.
Comment 8 Andriy Gapon freebsd_committer freebsd_triage 2015-12-10 17:00:44 UTC
(In reply to karl from comment #7)
The original tunable was expected to be set only once at the boot time.
Your new "internal" variable can be changed on every call to "dmu_tx_wait" and those calls can be concurrent.
Comment 9 karl 2015-12-10 17:21:31 UTC
(In reply to Andriy Gapon from comment #8)

That sysctl was not declared as settable only in loader.conf; it was (and is) runtime tunable.  That which can be changed at runtime the base code had better handle if modified asynchronously.

What's in the dump appears to be inconsistent; the panic claims to be on an attempt to perform integer division by zero, but the divisor (on examination) is not zero.

Since the output of that calculation is an expected delay time, however, correcting it so that no division takes place if that condition arises should be effective, provided that the panic is actually what was represented. I'm skeptical due to the above, but it's simple enough to find out -- thus what I sent to the author (and posted over on the ARC bug as well this morning.)
Comment 10 dustinwenz 2015-12-10 20:05:03 UTC
(kgdb) frame 7
#7  0xffffffff819f54d4 in dmu_tx_wait (tx=0xfffff805786ae400) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:1078
1078		min_tx_time = zfs_delay_scale *
(kgdb) disassemble
Dump of assembler code for function dmu_tx_wait:
0xffffffff819f53e0 <dmu_tx_wait+0>:	push   %rbp
0xffffffff819f53e1 <dmu_tx_wait+1>:	mov    %rsp,%rbp
0xffffffff819f53e4 <dmu_tx_wait+4>:	push   %r15
0xffffffff819f53e6 <dmu_tx_wait+6>:	push   %r14
0xffffffff819f53e8 <dmu_tx_wait+8>:	push   %r13
0xffffffff819f53ea <dmu_tx_wait+10>:	push   %r12
0xffffffff819f53ec <dmu_tx_wait+12>:	push   %rbx
0xffffffff819f53ed <dmu_tx_wait+13>:	sub    $0x18,%rsp
0xffffffff819f53f1 <dmu_tx_wait+17>:	mov    %rdi,%r14
0xffffffff819f53f4 <dmu_tx_wait+20>:	mov    0x30(%r14),%r15
0xffffffff819f53f8 <dmu_tx_wait+24>:	cmpl   $0x0,0xa0(%r14)
0xffffffff819f5400 <dmu_tx_wait+32>:	je     0xffffffff819f55ae <dmu_tx_wait+462>
0xffffffff819f5406 <dmu_tx_wait+38>:	lea    0x138(%r15),%r12
0xffffffff819f540d <dmu_tx_wait+45>:	xor    %esi,%esi
0xffffffff819f540f <dmu_tx_wait+47>:	mov    $0xffffffff81adcb5d,%rdx
0xffffffff819f5416 <dmu_tx_wait+54>:	mov    $0x552,%ecx
0xffffffff819f541b <dmu_tx_wait+59>:	mov    %r12,%rdi
0xffffffff819f541e <dmu_tx_wait+62>:	callq  0xffffffff80951c30 <_sx_xlock>
0xffffffff819f5423 <dmu_tx_wait+67>:	mov    0x188(%r15),%rbx
0xffffffff819f542a <dmu_tx_wait+74>:	cmp    0xffffffff81b11ad0,%rbx
0xffffffff819f5432 <dmu_tx_wait+82>:	jb     0xffffffff819f545c <dmu_tx_wait+124>
0xffffffff819f5434 <dmu_tx_wait+84>:	lea    0x158(%r15),%r13
0xffffffff819f543b <dmu_tx_wait+91>:	nopl   0x0(%rax,%rax,1)
0xffffffff819f5440 <dmu_tx_wait+96>:	mov    %r13,%rdi
0xffffffff819f5443 <dmu_tx_wait+99>:	mov    %r12,%rsi
0xffffffff819f5446 <dmu_tx_wait+102>:	callq  0xffffffff808f4890 <_cv_wait>
0xffffffff819f544b <dmu_tx_wait+107>:	mov    0x188(%r15),%rbx
0xffffffff819f5452 <dmu_tx_wait+114>:	cmp    0xffffffff81b11ad0,%rbx
0xffffffff819f545a <dmu_tx_wait+122>:	jae    0xffffffff819f5440 <dmu_tx_wait+96>
0xffffffff819f545c <dmu_tx_wait+124>:	mov    $0xffffffff81adcb5d,%rsi
0xffffffff819f5463 <dmu_tx_wait+131>:	mov    $0x556,%edx
0xffffffff819f5468 <dmu_tx_wait+136>:	mov    %r12,%rdi
0xffffffff819f546b <dmu_tx_wait+139>:	callq  0xffffffff80951dc0 <_sx_xunlock>
0xffffffff819f5470 <dmu_tx_wait+144>:	movslq 0xffffffff81af6de8,%rax
0xffffffff819f5478 <dmu_tx_wait+152>:	imul   0xffffffff81b11ad0,%rax
0xffffffff819f5481 <dmu_tx_wait+161>:	shr    $0x2,%rax
0xffffffff819f5485 <dmu_tx_wait+165>:	mov    $0x28f5c28f5c28f5c3,%rcx
0xffffffff819f548f <dmu_tx_wait+175>:	mul    %rcx
0xffffffff819f5492 <dmu_tx_wait+178>:	shr    $0x2,%rdx
0xffffffff819f5496 <dmu_tx_wait+182>:	mov    %rbx,%r15
0xffffffff819f5499 <dmu_tx_wait+185>:	sub    %rdx,%r15
0xffffffff819f549c <dmu_tx_wait+188>:	jbe    0xffffffff819f5593 <dmu_tx_wait+435>
0xffffffff819f54a2 <dmu_tx_wait+194>:	mov    0x30(%r14),%r12
0xffffffff819f54a6 <dmu_tx_wait+198>:	lea    -0x38(%rbp),%rdi
0xffffffff819f54aa <dmu_tx_wait+202>:	callq  0xffffffff80956dc0 <getnanouptime>
0xffffffff819f54af <dmu_tx_wait+207>:	imul   $0x3b9aca00,-0x38(%rbp),%rcx
0xffffffff819f54b7 <dmu_tx_wait+215>:	add    -0x30(%rbp),%rcx
0xffffffff819f54bb <dmu_tx_wait+219>:	imul   0xffffffff81af6df0,%r15
0xffffffff819f54c4 <dmu_tx_wait+228>:	mov    0xffffffff81b11ad0,%rsi
0xffffffff819f54cc <dmu_tx_wait+236>:	sub    %rbx,%rsi
0xffffffff819f54cf <dmu_tx_wait+239>:	xor    %edx,%edx
0xffffffff819f54d1 <dmu_tx_wait+241>:	mov    %r15,%rax
0xffffffff819f54d4 <dmu_tx_wait+244>:	div    %rsi
0xffffffff819f54d7 <dmu_tx_wait+247>:	mov    0x98(%r14),%rdx
0xffffffff819f54de <dmu_tx_wait+254>:	add    %rax,%rdx
0xffffffff819f54e1 <dmu_tx_wait+257>:	cmp    %rdx,%rcx
0xffffffff819f54e4 <dmu_tx_wait+260>:	jg     0xffffffff819f5593 <dmu_tx_wait+435>
0xffffffff819f54ea <dmu_tx_wait+266>:	mov    0xffffffff81af6bd8,%r15
0xffffffff819f54f2 <dmu_tx_wait+274>:	cmp    %r15,%rax
0xffffffff819f54f5 <dmu_tx_wait+277>:	cmovle %rax,%r15
0xffffffff819f54f9 <dmu_tx_wait+281>:	mov    0xffffffff81af6cc8,%rdi
0xffffffff819f5501 <dmu_tx_wait+289>:	test   %rdi,%rdi
0xffffffff819f5504 <dmu_tx_wait+292>:	je     0xffffffff819f551c <dmu_tx_wait+316>
0xffffffff819f5506 <dmu_tx_wait+294>:	xor    %r8d,%r8d
0xffffffff819f5509 <dmu_tx_wait+297>:	xor    %r9d,%r9d
0xffffffff819f550c <dmu_tx_wait+300>:	mov    %r14,%rsi
0xffffffff819f550f <dmu_tx_wait+303>:	mov    %rbx,%rdx
0xffffffff819f5512 <dmu_tx_wait+306>:	mov    %r15,%rcx
0xffffffff819f5515 <dmu_tx_wait+309>:	callq  *0xffffffff814a3d48
0xffffffff819f551c <dmu_tx_wait+316>:	lea    0x138(%r12),%rbx
0xffffffff819f5524 <dmu_tx_wait+324>:	xor    %esi,%esi
0xffffffff819f5526 <dmu_tx_wait+326>:	mov    $0xffffffff81adcb5d,%rdx
0xffffffff819f552d <dmu_tx_wait+333>:	mov    $0x440,%ecx
0xffffffff819f5532 <dmu_tx_wait+338>:	mov    %rbx,%rdi
0xffffffff819f5535 <dmu_tx_wait+341>:	callq  0xffffffff80951c30 <_sx_xlock>
0xffffffff819f553a <dmu_tx_wait+346>:	mov    0x98(%r14),%rax
0xffffffff819f5541 <dmu_tx_wait+353>:	add    %r15,%rax
0xffffffff819f5544 <dmu_tx_wait+356>:	add    0x1a8(%r12),%r15
0xffffffff819f554c <dmu_tx_wait+364>:	cmp    %r15,%rax
0xffffffff819f554f <dmu_tx_wait+367>:	cmovge %rax,%r15
0xffffffff819f5553 <dmu_tx_wait+371>:	mov    %r15,0x1a8(%r12)
0xffffffff819f555b <dmu_tx_wait+379>:	mov    $0xffffffff81adcb5d,%rsi
0xffffffff819f5562 <dmu_tx_wait+386>:	mov    $0x444,%edx
0xffffffff819f5567 <dmu_tx_wait+391>:	mov    %rbx,%rdi
0xffffffff819f556a <dmu_tx_wait+394>:	callq  0xffffffff80951dc0 <_sx_xunlock>
0xffffffff819f556f <dmu_tx_wait+399>:	shl    $0x2,%r15
0xffffffff819f5573 <dmu_tx_wait+403>:	movslq 0xffffffff81af6be0,%rdx
0xffffffff819f557b <dmu_tx_wait+411>:	shl    $0x2,%rdx
0xffffffff819f557f <dmu_tx_wait+415>:	mov    $0xffffffff81adcbdb,%rdi
0xffffffff819f5586 <dmu_tx_wait+422>:	mov    $0x200,%ecx
0xffffffff819f558b <dmu_tx_wait+427>:	mov    %r15,%rsi
0xffffffff819f558e <dmu_tx_wait+430>:	callq  0xffffffff809532a0 <pause_sbt>
0xffffffff819f5593 <dmu_tx_wait+435>:	movl   $0x0,0xa0(%r14)
0xffffffff819f559e <dmu_tx_wait+446>:	movl   $0x1,0x94(%r14)
0xffffffff819f55a9 <dmu_tx_wait+457>:	jmpq   0xffffffff819f5671 <dmu_tx_wait+657>
0xffffffff819f55ae <dmu_tx_wait+462>:	mov    (%r15),%rbx
0xffffffff819f55b1 <dmu_tx_wait+465>:	mov    %rbx,%rdi
0xffffffff819f55b4 <dmu_tx_wait+468>:	callq  0xffffffff81a2cb50 <spa_suspended>
0xffffffff819f55b9 <dmu_tx_wait+473>:	test   %eax,%eax
0xffffffff819f55bb <dmu_tx_wait+475>:	jne    0xffffffff819f565d <dmu_tx_wait+637>
0xffffffff819f55c1 <dmu_tx_wait+481>:	mov    0x48(%r14),%rsi
0xffffffff819f55c5 <dmu_tx_wait+485>:	test   %rsi,%rsi
0xffffffff819f55c8 <dmu_tx_wait+488>:	je     0xffffffff819f565d <dmu_tx_wait+637>
0xffffffff819f55ce <dmu_tx_wait+494>:	mov    0x68(%r14),%rax
0xffffffff819f55d2 <dmu_tx_wait+498>:	test   %rax,%rax
0xffffffff819f55d5 <dmu_tx_wait+501>:	je     0xffffffff819f5680 <dmu_tx_wait+672>
0xffffffff819f55db <dmu_tx_wait+507>:	mov    0x18(%rax),%r12
0xffffffff819f55df <dmu_tx_wait+511>:	lea    0xf8(%r12),%r15
0xffffffff819f55e7 <dmu_tx_wait+519>:	xor    %esi,%esi
0xffffffff819f55e9 <dmu_tx_wait+521>:	mov    $0xffffffff81adcb5d,%rdx
0xffffffff819f55f0 <dmu_tx_wait+528>:	mov    $0x573,%ecx
0xffffffff819f55f5 <dmu_tx_wait+533>:	mov    %r15,%rdi
0xffffffff819f55f8 <dmu_tx_wait+536>:	callq  0xffffffff80951c30 <_sx_xlock>
0xffffffff819f55fd <dmu_tx_wait+541>:	mov    0x1c8(%r12),%rax
0xffffffff819f5605 <dmu_tx_wait+549>:	mov    0x48(%r14),%rcx
0xffffffff819f5609 <dmu_tx_wait+553>:	dec    %rcx
0xffffffff819f560c <dmu_tx_wait+556>:	cmp    %rcx,%rax
0xffffffff819f560f <dmu_tx_wait+559>:	jne    0xffffffff819f563f <dmu_tx_wait+607>
0xffffffff819f5611 <dmu_tx_wait+561>:	lea    0x1d0(%r12),%rbx
0xffffffff819f5619 <dmu_tx_wait+569>:	nopl   0x0(%rax)
0xffffffff819f5620 <dmu_tx_wait+576>:	mov    %rbx,%rdi
0xffffffff819f5623 <dmu_tx_wait+579>:	mov    %r15,%rsi
0xffffffff819f5626 <dmu_tx_wait+582>:	callq  0xffffffff808f4890 <_cv_wait>
0xffffffff819f562b <dmu_tx_wait+587>:	mov    0x1c8(%r12),%rax
0xffffffff819f5633 <dmu_tx_wait+595>:	mov    0x48(%r14),%rcx
0xffffffff819f5637 <dmu_tx_wait+599>:	dec    %rcx
0xffffffff819f563a <dmu_tx_wait+602>:	cmp    %rcx,%rax
0xffffffff819f563d <dmu_tx_wait+605>:	je     0xffffffff819f5620 <dmu_tx_wait+576>
0xffffffff819f563f <dmu_tx_wait+607>:	mov    $0xffffffff81adcb5d,%rsi
0xffffffff819f5646 <dmu_tx_wait+614>:	mov    $0x576,%edx
0xffffffff819f564b <dmu_tx_wait+619>:	mov    %r15,%rdi
0xffffffff819f564e <dmu_tx_wait+622>:	callq  0xffffffff80951dc0 <_sx_xunlock>
0xffffffff819f5653 <dmu_tx_wait+627>:	movq   $0x0,0x68(%r14)
0xffffffff819f565b <dmu_tx_wait+635>:	jmp    0xffffffff819f5671 <dmu_tx_wait+657>
0xffffffff819f565d <dmu_tx_wait+637>:	mov    %rbx,%rdi
0xffffffff819f5660 <dmu_tx_wait+640>:	callq  0xffffffff81a2dec0 <spa_last_synced_txg>
0xffffffff819f5665 <dmu_tx_wait+645>:	lea    0x1(%rax),%rsi
0xffffffff819f5669 <dmu_tx_wait+649>:	mov    %r15,%rdi
0xffffffff819f566c <dmu_tx_wait+652>:	callq  0xffffffff81a31470 <txg_wait_synced>
0xffffffff819f5671 <dmu_tx_wait+657>:	add    $0x18,%rsp
0xffffffff819f5675 <dmu_tx_wait+661>:	pop    %rbx
0xffffffff819f5676 <dmu_tx_wait+662>:	pop    %r12
0xffffffff819f5678 <dmu_tx_wait+664>:	pop    %r13
0xffffffff819f567a <dmu_tx_wait+666>:	pop    %r14
0xffffffff819f567c <dmu_tx_wait+668>:	pop    %r15
0xffffffff819f567e <dmu_tx_wait+670>:	pop    %rbp
0xffffffff819f567f <dmu_tx_wait+671>:	retq   
0xffffffff819f5680 <dmu_tx_wait+672>:	mov    0x30(%r14),%rdi
0xffffffff819f5684 <dmu_tx_wait+676>:	inc    %rsi
0xffffffff819f5687 <dmu_tx_wait+679>:	callq  0xffffffff81a31880 <txg_wait_open>
0xffffffff819f568c <dmu_tx_wait+684>:	jmp    0xffffffff819f5671 <dmu_tx_wait+657>
End of assembler dump.

(kgdb) x 0xffffffff81b11ad0
0xffffffff81b11ad0 <zfs_dirty_data_max_internal>:	0x13bab000

(kgdb) info registers 
rax            0x3d25fde186a0	67233382500000
rbx            0x14098000	336166912
rcx            0x51df7df23d8aa	1440325321349290
rdx            0x0	0
rsi            0x0	0
rdi            0xfffffe085fec7708	-2163054184696
rbp            0xfffffe085fec7740	0xfffffe085fec7740
rsp            0xfffffe085fec7700	0xfffffe085fec7700
r8             0xfffffe085fec7b80	-2163054183552
r9             0x55d972ee	1440314094
r10            0xfffff80053160938	-8794699069128
r11            0xffffffff806c2460	-2140396448
r12            0xfffff80053160800	-8794699069440
r13            0xfffff80053160958	-8794699069096
r14            0xfffff805786ae400	-8772597914624
r15            0x3d25fde186a0	67233382500000
rip            0xffffffff819f54d4	0xffffffff819f54d4 <dmu_tx_wait+244>
eflags         0x10246	66118
cs             0x20	32
ss             0x28	40
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0
Comment 11 dustinwenz 2015-12-10 20:16:24 UTC
The instruction pointer refers to the division at 0xffffffff819f54d4, where it performs the unsigned divide rdx:rax/rsi.

Somehow, rsi is reported as 0 in frame 7. I'd expect it to be -5165056 (or the really big unsigned equivalent) because 331001856 is loaded into rsi from memory and rbx (336166912) is subtracted from it.
Comment 12 karl 2015-12-10 21:59:26 UTC
(In reply to dustinwenz from comment #11)

Right -- it should be the unsigned (underflowed) equivalent, which of course is not division by zero.

I don't think you should be able to get there with dirty > dirty_max_internal, but obviously (assuming what's on the stack is correct) you did.  I stuck a DTRACE in there on the copy I'm running (which isn't in the copy I posted and sent to you) and am torture-testing to try to reproduce this, so far without success (or with the DTRACE firing.)
Comment 13 Andriy Gapon freebsd_committer freebsd_triage 2015-12-10 23:18:58 UTC
(In reply to dustinwenz from comment #11)
I think that gdb does not lie here.
Do you see any code that prevents a value stored at address 0xffffffff81b11ad0 from changing after a load to %rsi at 0xffffffff819f54c4 is executed? I do not.
Comment 14 dustinwenz 2015-12-16 22:52:53 UTC
I've been unable to reproduce this panic in a test environment, with 60 cloned jails running periodic jobs every hour. Since the problem occurred often on production hosts, there might be some other factor that contributes to it. The only difference that comes to mind is that affected hosts also had other moderately-sized ZFS filesystem clones (each having several thousand files totaling between 10 and 100TB per FS) mounted inside of the jails that were scanned by periodic. Those clones were destroyed and recreated multiple times daily.

So, I guess my next step is to add some bigger data storage to my test host in order to make progress on this.
Comment 15 karl 2015-12-16 22:57:59 UTC
(In reply to dustinwenz from comment #14)
Hmmmm. This sounds like the panic I've seen on backup copies.... And that one looked to be in-core corruption but I haven't been able to reproduce it under controlled conditions.

Interestingly enough I've yet to get one with my inversion protection change in... but I can't call this dispositive as it often took a week or more to show up.
Comment 16 Andriy Gapon freebsd_committer freebsd_triage 2017-03-21 12:12:38 UTC
AFAIK, the problem has never been reported for a clean FreeBSD build (without any patches).