I have multiple machines that have had occasional panics occur while the daily and weekly periodic scripts run. The panic is always "Fatal trap 18: integer divide fault while in kernel mode". A notable common factor is that all affected systems have zfs-based startup disks, and 20-40 jails. Each jail has it's own filesystem that was created by cloning the boot filesystem. I suspect this is a zfs bug that is triggered by the access patterns in the periodic scripts. There is significant load on the system when the scheduled processes start, because all jails execute the same scripts at the same time. I've been able to alleviate this problem by disabling the security scans within the jails, but leave it enabled on the root host. This is the message logged related to the panic: Dec 5 04:16:47 <kern.crit> svr-033-08 kernel: Dec 5 04:16:47 <kern.crit> svr-033-08 kernel: Dec 5 04:16:47 <kern.crit> svr-033-08 kernel: Fatal trap 18: integer divide fault while in kernel mode Dec 5 04:16:47 <kern.crit> svr-033-08 kernel: cpuid = 19; apic id = 27 Dec 5 04:16:47 <kern.crit> svr-033-08 kernel: instruction pointer = 0x20:0xffffffff819f54d4 Dec 5 04:16:47 <kern.crit> svr-033-08 kernel: stack pointer = 0x28:0xfffffe085fec76f0 Dec 5 04:16:47 <kern.crit> svr-033-08 kernel: frame pointer = 0x28:0xfffffe085fec7740 Dec 5 04:23:18 <kern.info> svr-033-08 syslogd: kernel boot file is /boot/kernel/kernel Dec 5 04:23:18 <kern.crit> svr-033-08 kernel: code segment = base rx0, limit 0xfffff, type 0x1b Dec 5 04:23:18 <kern.crit> svr-033-08 kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Dec 5 04:23:18 <kern.crit> svr-033-08 kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Dec 5 04:23:18 <kern.crit> svr-033-08 kernel: current process = 20355 (find) Dec 5 04:23:18 <kern.crit> svr-033-08 kernel: trap number = 18 Dec 5 04:23:18 <kern.crit> svr-033-08 kernel: panic: integer divide fault Dec 5 04:23:18 <kern.crit> svr-033-08 kernel: cpuid = 19
This is a kgdb trace on an affected machine, running STABLE r286534M: kgdb kernel.debug /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 20355 (find) trap number = 18 panic: integer divide fault cpuid = 19 KDB: stack backtrace: #0 0xffffffff80986710 at kdb_backtrace+0x60 #1 0xffffffff80949e76 at vpanic+0x126 #2 0xffffffff80949d43 at panic+0x43 #3 0xffffffff80d5d3db at trap_fatal+0x36b #4 0xffffffff80d5d05c at trap+0x75c #5 0xffffffff80d42f12 at calltrap+0x8 #6 0xffffffff819f4fc8 at dmu_tx_assign+0xf8 #7 0xffffffff81a7a887 at zfs_inactive+0x157 #8 0xffffffff81a8369d at zfs_freebsd_inactive+0xd #9 0xffffffff80e85ed7 at VOP_INACTIVE_APV+0xa7 #10 0xffffffff809ed182 at vinactive+0x102 #11 0xffffffff809ed572 at vputx+0x272 #12 0xffffffff809f40ea at sys_fchdir+0x2aa #13 0xffffffff80d5dcf7 at amd64_syscall+0x357 #14 0xffffffff80d431fb at Xfast_syscall+0xfb Uptime: 16d16h5m25s Dumping 9283 out of 32705 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /boot/kernel/zfs.ko.symbols...done. Loaded symbols for /boot/kernel/zfs.ko.symbols Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /boot/kernel/opensolaris.ko.symbols Reading symbols from /boot/kernel/dtraceall.ko.symbols...done. Loaded symbols for /boot/kernel/dtraceall.ko.symbols Reading symbols from /boot/kernel/profile.ko.symbols...done. Loaded symbols for /boot/kernel/profile.ko.symbols Reading symbols from /boot/kernel/dtrace.ko.symbols...done. Loaded symbols for /boot/kernel/dtrace.ko.symbols Reading symbols from /boot/kernel/systrace_freebsd32.ko.symbols...done. Loaded symbols for /boot/kernel/systrace_freebsd32.ko.symbols Reading symbols from /boot/kernel/systrace.ko.symbols...done. Loaded symbols for /boot/kernel/systrace.ko.symbols Reading symbols from /boot/kernel/sdt.ko.symbols...done. Loaded symbols for /boot/kernel/sdt.ko.symbols Reading symbols from /boot/kernel/lockstat.ko.symbols...done. Loaded symbols for /boot/kernel/lockstat.ko.symbols Reading symbols from /boot/kernel/fasttrap.ko.symbols...done. Loaded symbols for /boot/kernel/fasttrap.ko.symbols Reading symbols from /boot/kernel/fbt.ko.symbols...done. Loaded symbols for /boot/kernel/fbt.ko.symbols Reading symbols from /boot/kernel/dtnfscl.ko.symbols...done. Loaded symbols for /boot/kernel/dtnfscl.ko.symbols Reading symbols from /boot/kernel/dtmalloc.ko.symbols...done. Loaded symbols for /boot/kernel/dtmalloc.ko.symbols Reading symbols from /boot/kernel/ipmi.ko.symbols...done. Loaded symbols for /boot/kernel/ipmi.ko.symbols Reading symbols from /boot/kernel/smbus.ko.symbols...done. Loaded symbols for /boot/kernel/smbus.ko.symbols Reading symbols from /boot/kernel/if_lagg.ko.symbols...done. Loaded symbols for /boot/kernel/if_lagg.ko.symbols Reading symbols from /boot/kernel/ums.ko.symbols...done. Loaded symbols for /boot/kernel/ums.ko.symbols #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 __asm("movq %%gs:%1,%0" : "=r" (td) (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff80949ad2 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #2 0xffffffff80949eb5 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:758 #3 0xffffffff80949d43 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:687 #4 0xffffffff80d5d3db in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:851 #5 0xffffffff80d5d05c in trap (frame=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:203 #6 0xffffffff80d42f12 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #7 0xffffffff819f54d4 in dmu_tx_wait (tx=0xfffff805786ae400) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:1078 #8 0xffffffff819f4fc8 in dmu_tx_assign (tx=0xfffff805786ae400, txg_how=TXG_WAIT) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:1339 #9 0xffffffff81a7a887 in zfs_inactive (vp=<value optimized out>, cr=<value optimized out>, ct=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4716 #10 0xffffffff81a8369d in zfs_freebsd_inactive (ap=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6540 #11 0xffffffff80e85ed7 in VOP_INACTIVE_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1953 #12 0xffffffff809ed182 in vinactive (vp=0xfffff80563f32ce8, td=0xfffff8084181b940) at vnode_if.h:807 #13 0xffffffff809ed572 in vputx (vp=0xfffff80563f32ce8, func=1) at /usr/src/sys/kern/vfs_subr.c:2307 #14 0xffffffff809f40ea in sys_fchdir (td=<value optimized out>, uap=<value optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:763 #15 0xffffffff80d5dcf7 in amd64_syscall (td=0xfffff8084181b940, traced=0) at subr_syscall.c:134 #16 0xffffffff80d431fb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #17 0x000000080089aa2a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal
Note I see this with a kernel config based on GENERIC, with these options added: device pf device pflog device pfsync options ALTQ options ALTQ_CBQ options ALTQ_RED options ALTQ_RIO options ALTQ_HFSC options ALTQ_PRIQ options ALTQ_NOPCC There are also two kernel patches applied from these PRs: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=189870 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594 The last patch might be interesting here because the stack trace below hits lines (1077-1078 in dmu_tx.c) modified by the arc patch: min_tx_time = zfs_delay_scale * (dirty - delay_min_bytes) / (zfs_dirty_data_max_internal - dirty);
I was checking out some of the variables as they are set at frame 7 (below). If it's to be believed, zfs_dirty_data_max_internal is much smaller than it should be at that point in the code (maybe due to memory corruption, or a race condition regarding zfs_dirty_data_max_internal). That would cause the subtraction at line 1079 to wrap to some value a bit less than uint64 max, but cast as a long long. I'm pretty sure that isn't how the code is intended to work. If that is a problem, disabling the vfs.zfs_dynamic_write_buffer tunable may help, because it should force zfs_dirty_data_max_internal to be the same as zfs_dirty_data_max. (kgdb) frame 7 #7 0xffffffff819f54d4 in dmu_tx_wait (tx=0xfffff805786ae400) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:1078 1078 min_tx_time = zfs_delay_scale * (kgdb) list 1073 * cause a divide-by-zero if it's == the max. 1074 */ 1075 ASSERT3U(dirty, <, zfs_dirty_data_max_internal); 1076 1077 now = gethrtime(); 1078 min_tx_time = zfs_delay_scale * 1079 (dirty - delay_min_bytes) / (zfs_dirty_data_max_internal - dirty); 1080 if (now > tx->tx_start + min_tx_time) 1081 return; 1082 (kgdb) print zfs_dirty_data_max $1 = 3429452595 (kgdb) print zfs_dirty_data_max_internal $2 = 331001856 (kgdb) print dirty $3 = 336166912
I have sent you a patch that MIGHT address this. I don't think the trace you have is valid as the divisor isn't zero according to the prints, but the return of a value that unexpectedly underflows might cause problems elsewhere (and once the stack gets crashed all bets are off.) Let me know if you can provoke another panic with that in.
(In reply to dustinwenz from comment #3) You can also do 'disassemble' in frame 7 and see instructions leading to 0xffffffff819f54d4. You can examine register values as well ('i reg').
(In reply to dustinwenz from comment #2) I suspect that the patch from bug 187594 is buggy. It changes zfs_dirty_data_max_internal without any synchronization.
(In reply to Andriy Gapon from comment #6) The predicate that was turned into a dynamic variable was runtime tunable without any sort of mutex locking; a reasonable perusal of the code disclosed no reason to not do it. However, the potential for a sign inversion (which incidentally MIGHT occur in the base code as well, or to a corner case that could result in a divide-by-zero) is easily removed (and I have done so and sent the patch to do so to the reporter, asking if it stops the panics.) Given the stack trace involved here I don't believe that's implicated as the cause, however (since the panic is a divide-by-zero trap yet there is no division by zero evidenced by examination) however, the potential for the stack to get smashed by an integer division that sets the sign bit and leads to misbehavior somewhere else in the code is plausible and defending against both that and the potential for an actual divide-by-zero obviously should be done.
(In reply to karl from comment #7) The original tunable was expected to be set only once at the boot time. Your new "internal" variable can be changed on every call to "dmu_tx_wait" and those calls can be concurrent.
(In reply to Andriy Gapon from comment #8) That sysctl was not declared as settable only in loader.conf; it was (and is) runtime tunable. That which can be changed at runtime the base code had better handle if modified asynchronously. What's in the dump appears to be inconsistent; the panic claims to be on an attempt to perform integer division by zero, but the divisor (on examination) is not zero. Since the output of that calculation is an expected delay time, however, correcting it so that no division takes place if that condition arises should be effective, provided that the panic is actually what was represented. I'm skeptical due to the above, but it's simple enough to find out -- thus what I sent to the author (and posted over on the ARC bug as well this morning.)
(kgdb) frame 7 #7 0xffffffff819f54d4 in dmu_tx_wait (tx=0xfffff805786ae400) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_tx.c:1078 1078 min_tx_time = zfs_delay_scale * (kgdb) disassemble Dump of assembler code for function dmu_tx_wait: 0xffffffff819f53e0 <dmu_tx_wait+0>: push %rbp 0xffffffff819f53e1 <dmu_tx_wait+1>: mov %rsp,%rbp 0xffffffff819f53e4 <dmu_tx_wait+4>: push %r15 0xffffffff819f53e6 <dmu_tx_wait+6>: push %r14 0xffffffff819f53e8 <dmu_tx_wait+8>: push %r13 0xffffffff819f53ea <dmu_tx_wait+10>: push %r12 0xffffffff819f53ec <dmu_tx_wait+12>: push %rbx 0xffffffff819f53ed <dmu_tx_wait+13>: sub $0x18,%rsp 0xffffffff819f53f1 <dmu_tx_wait+17>: mov %rdi,%r14 0xffffffff819f53f4 <dmu_tx_wait+20>: mov 0x30(%r14),%r15 0xffffffff819f53f8 <dmu_tx_wait+24>: cmpl $0x0,0xa0(%r14) 0xffffffff819f5400 <dmu_tx_wait+32>: je 0xffffffff819f55ae <dmu_tx_wait+462> 0xffffffff819f5406 <dmu_tx_wait+38>: lea 0x138(%r15),%r12 0xffffffff819f540d <dmu_tx_wait+45>: xor %esi,%esi 0xffffffff819f540f <dmu_tx_wait+47>: mov $0xffffffff81adcb5d,%rdx 0xffffffff819f5416 <dmu_tx_wait+54>: mov $0x552,%ecx 0xffffffff819f541b <dmu_tx_wait+59>: mov %r12,%rdi 0xffffffff819f541e <dmu_tx_wait+62>: callq 0xffffffff80951c30 <_sx_xlock> 0xffffffff819f5423 <dmu_tx_wait+67>: mov 0x188(%r15),%rbx 0xffffffff819f542a <dmu_tx_wait+74>: cmp 0xffffffff81b11ad0,%rbx 0xffffffff819f5432 <dmu_tx_wait+82>: jb 0xffffffff819f545c <dmu_tx_wait+124> 0xffffffff819f5434 <dmu_tx_wait+84>: lea 0x158(%r15),%r13 0xffffffff819f543b <dmu_tx_wait+91>: nopl 0x0(%rax,%rax,1) 0xffffffff819f5440 <dmu_tx_wait+96>: mov %r13,%rdi 0xffffffff819f5443 <dmu_tx_wait+99>: mov %r12,%rsi 0xffffffff819f5446 <dmu_tx_wait+102>: callq 0xffffffff808f4890 <_cv_wait> 0xffffffff819f544b <dmu_tx_wait+107>: mov 0x188(%r15),%rbx 0xffffffff819f5452 <dmu_tx_wait+114>: cmp 0xffffffff81b11ad0,%rbx 0xffffffff819f545a <dmu_tx_wait+122>: jae 0xffffffff819f5440 <dmu_tx_wait+96> 0xffffffff819f545c <dmu_tx_wait+124>: mov $0xffffffff81adcb5d,%rsi 0xffffffff819f5463 <dmu_tx_wait+131>: mov $0x556,%edx 0xffffffff819f5468 <dmu_tx_wait+136>: mov %r12,%rdi 0xffffffff819f546b <dmu_tx_wait+139>: callq 0xffffffff80951dc0 <_sx_xunlock> 0xffffffff819f5470 <dmu_tx_wait+144>: movslq 0xffffffff81af6de8,%rax 0xffffffff819f5478 <dmu_tx_wait+152>: imul 0xffffffff81b11ad0,%rax 0xffffffff819f5481 <dmu_tx_wait+161>: shr $0x2,%rax 0xffffffff819f5485 <dmu_tx_wait+165>: mov $0x28f5c28f5c28f5c3,%rcx 0xffffffff819f548f <dmu_tx_wait+175>: mul %rcx 0xffffffff819f5492 <dmu_tx_wait+178>: shr $0x2,%rdx 0xffffffff819f5496 <dmu_tx_wait+182>: mov %rbx,%r15 0xffffffff819f5499 <dmu_tx_wait+185>: sub %rdx,%r15 0xffffffff819f549c <dmu_tx_wait+188>: jbe 0xffffffff819f5593 <dmu_tx_wait+435> 0xffffffff819f54a2 <dmu_tx_wait+194>: mov 0x30(%r14),%r12 0xffffffff819f54a6 <dmu_tx_wait+198>: lea -0x38(%rbp),%rdi 0xffffffff819f54aa <dmu_tx_wait+202>: callq 0xffffffff80956dc0 <getnanouptime> 0xffffffff819f54af <dmu_tx_wait+207>: imul $0x3b9aca00,-0x38(%rbp),%rcx 0xffffffff819f54b7 <dmu_tx_wait+215>: add -0x30(%rbp),%rcx 0xffffffff819f54bb <dmu_tx_wait+219>: imul 0xffffffff81af6df0,%r15 0xffffffff819f54c4 <dmu_tx_wait+228>: mov 0xffffffff81b11ad0,%rsi 0xffffffff819f54cc <dmu_tx_wait+236>: sub %rbx,%rsi 0xffffffff819f54cf <dmu_tx_wait+239>: xor %edx,%edx 0xffffffff819f54d1 <dmu_tx_wait+241>: mov %r15,%rax 0xffffffff819f54d4 <dmu_tx_wait+244>: div %rsi 0xffffffff819f54d7 <dmu_tx_wait+247>: mov 0x98(%r14),%rdx 0xffffffff819f54de <dmu_tx_wait+254>: add %rax,%rdx 0xffffffff819f54e1 <dmu_tx_wait+257>: cmp %rdx,%rcx 0xffffffff819f54e4 <dmu_tx_wait+260>: jg 0xffffffff819f5593 <dmu_tx_wait+435> 0xffffffff819f54ea <dmu_tx_wait+266>: mov 0xffffffff81af6bd8,%r15 0xffffffff819f54f2 <dmu_tx_wait+274>: cmp %r15,%rax 0xffffffff819f54f5 <dmu_tx_wait+277>: cmovle %rax,%r15 0xffffffff819f54f9 <dmu_tx_wait+281>: mov 0xffffffff81af6cc8,%rdi 0xffffffff819f5501 <dmu_tx_wait+289>: test %rdi,%rdi 0xffffffff819f5504 <dmu_tx_wait+292>: je 0xffffffff819f551c <dmu_tx_wait+316> 0xffffffff819f5506 <dmu_tx_wait+294>: xor %r8d,%r8d 0xffffffff819f5509 <dmu_tx_wait+297>: xor %r9d,%r9d 0xffffffff819f550c <dmu_tx_wait+300>: mov %r14,%rsi 0xffffffff819f550f <dmu_tx_wait+303>: mov %rbx,%rdx 0xffffffff819f5512 <dmu_tx_wait+306>: mov %r15,%rcx 0xffffffff819f5515 <dmu_tx_wait+309>: callq *0xffffffff814a3d48 0xffffffff819f551c <dmu_tx_wait+316>: lea 0x138(%r12),%rbx 0xffffffff819f5524 <dmu_tx_wait+324>: xor %esi,%esi 0xffffffff819f5526 <dmu_tx_wait+326>: mov $0xffffffff81adcb5d,%rdx 0xffffffff819f552d <dmu_tx_wait+333>: mov $0x440,%ecx 0xffffffff819f5532 <dmu_tx_wait+338>: mov %rbx,%rdi 0xffffffff819f5535 <dmu_tx_wait+341>: callq 0xffffffff80951c30 <_sx_xlock> 0xffffffff819f553a <dmu_tx_wait+346>: mov 0x98(%r14),%rax 0xffffffff819f5541 <dmu_tx_wait+353>: add %r15,%rax 0xffffffff819f5544 <dmu_tx_wait+356>: add 0x1a8(%r12),%r15 0xffffffff819f554c <dmu_tx_wait+364>: cmp %r15,%rax 0xffffffff819f554f <dmu_tx_wait+367>: cmovge %rax,%r15 0xffffffff819f5553 <dmu_tx_wait+371>: mov %r15,0x1a8(%r12) 0xffffffff819f555b <dmu_tx_wait+379>: mov $0xffffffff81adcb5d,%rsi 0xffffffff819f5562 <dmu_tx_wait+386>: mov $0x444,%edx 0xffffffff819f5567 <dmu_tx_wait+391>: mov %rbx,%rdi 0xffffffff819f556a <dmu_tx_wait+394>: callq 0xffffffff80951dc0 <_sx_xunlock> 0xffffffff819f556f <dmu_tx_wait+399>: shl $0x2,%r15 0xffffffff819f5573 <dmu_tx_wait+403>: movslq 0xffffffff81af6be0,%rdx 0xffffffff819f557b <dmu_tx_wait+411>: shl $0x2,%rdx 0xffffffff819f557f <dmu_tx_wait+415>: mov $0xffffffff81adcbdb,%rdi 0xffffffff819f5586 <dmu_tx_wait+422>: mov $0x200,%ecx 0xffffffff819f558b <dmu_tx_wait+427>: mov %r15,%rsi 0xffffffff819f558e <dmu_tx_wait+430>: callq 0xffffffff809532a0 <pause_sbt> 0xffffffff819f5593 <dmu_tx_wait+435>: movl $0x0,0xa0(%r14) 0xffffffff819f559e <dmu_tx_wait+446>: movl $0x1,0x94(%r14) 0xffffffff819f55a9 <dmu_tx_wait+457>: jmpq 0xffffffff819f5671 <dmu_tx_wait+657> 0xffffffff819f55ae <dmu_tx_wait+462>: mov (%r15),%rbx 0xffffffff819f55b1 <dmu_tx_wait+465>: mov %rbx,%rdi 0xffffffff819f55b4 <dmu_tx_wait+468>: callq 0xffffffff81a2cb50 <spa_suspended> 0xffffffff819f55b9 <dmu_tx_wait+473>: test %eax,%eax 0xffffffff819f55bb <dmu_tx_wait+475>: jne 0xffffffff819f565d <dmu_tx_wait+637> 0xffffffff819f55c1 <dmu_tx_wait+481>: mov 0x48(%r14),%rsi 0xffffffff819f55c5 <dmu_tx_wait+485>: test %rsi,%rsi 0xffffffff819f55c8 <dmu_tx_wait+488>: je 0xffffffff819f565d <dmu_tx_wait+637> 0xffffffff819f55ce <dmu_tx_wait+494>: mov 0x68(%r14),%rax 0xffffffff819f55d2 <dmu_tx_wait+498>: test %rax,%rax 0xffffffff819f55d5 <dmu_tx_wait+501>: je 0xffffffff819f5680 <dmu_tx_wait+672> 0xffffffff819f55db <dmu_tx_wait+507>: mov 0x18(%rax),%r12 0xffffffff819f55df <dmu_tx_wait+511>: lea 0xf8(%r12),%r15 0xffffffff819f55e7 <dmu_tx_wait+519>: xor %esi,%esi 0xffffffff819f55e9 <dmu_tx_wait+521>: mov $0xffffffff81adcb5d,%rdx 0xffffffff819f55f0 <dmu_tx_wait+528>: mov $0x573,%ecx 0xffffffff819f55f5 <dmu_tx_wait+533>: mov %r15,%rdi 0xffffffff819f55f8 <dmu_tx_wait+536>: callq 0xffffffff80951c30 <_sx_xlock> 0xffffffff819f55fd <dmu_tx_wait+541>: mov 0x1c8(%r12),%rax 0xffffffff819f5605 <dmu_tx_wait+549>: mov 0x48(%r14),%rcx 0xffffffff819f5609 <dmu_tx_wait+553>: dec %rcx 0xffffffff819f560c <dmu_tx_wait+556>: cmp %rcx,%rax 0xffffffff819f560f <dmu_tx_wait+559>: jne 0xffffffff819f563f <dmu_tx_wait+607> 0xffffffff819f5611 <dmu_tx_wait+561>: lea 0x1d0(%r12),%rbx 0xffffffff819f5619 <dmu_tx_wait+569>: nopl 0x0(%rax) 0xffffffff819f5620 <dmu_tx_wait+576>: mov %rbx,%rdi 0xffffffff819f5623 <dmu_tx_wait+579>: mov %r15,%rsi 0xffffffff819f5626 <dmu_tx_wait+582>: callq 0xffffffff808f4890 <_cv_wait> 0xffffffff819f562b <dmu_tx_wait+587>: mov 0x1c8(%r12),%rax 0xffffffff819f5633 <dmu_tx_wait+595>: mov 0x48(%r14),%rcx 0xffffffff819f5637 <dmu_tx_wait+599>: dec %rcx 0xffffffff819f563a <dmu_tx_wait+602>: cmp %rcx,%rax 0xffffffff819f563d <dmu_tx_wait+605>: je 0xffffffff819f5620 <dmu_tx_wait+576> 0xffffffff819f563f <dmu_tx_wait+607>: mov $0xffffffff81adcb5d,%rsi 0xffffffff819f5646 <dmu_tx_wait+614>: mov $0x576,%edx 0xffffffff819f564b <dmu_tx_wait+619>: mov %r15,%rdi 0xffffffff819f564e <dmu_tx_wait+622>: callq 0xffffffff80951dc0 <_sx_xunlock> 0xffffffff819f5653 <dmu_tx_wait+627>: movq $0x0,0x68(%r14) 0xffffffff819f565b <dmu_tx_wait+635>: jmp 0xffffffff819f5671 <dmu_tx_wait+657> 0xffffffff819f565d <dmu_tx_wait+637>: mov %rbx,%rdi 0xffffffff819f5660 <dmu_tx_wait+640>: callq 0xffffffff81a2dec0 <spa_last_synced_txg> 0xffffffff819f5665 <dmu_tx_wait+645>: lea 0x1(%rax),%rsi 0xffffffff819f5669 <dmu_tx_wait+649>: mov %r15,%rdi 0xffffffff819f566c <dmu_tx_wait+652>: callq 0xffffffff81a31470 <txg_wait_synced> 0xffffffff819f5671 <dmu_tx_wait+657>: add $0x18,%rsp 0xffffffff819f5675 <dmu_tx_wait+661>: pop %rbx 0xffffffff819f5676 <dmu_tx_wait+662>: pop %r12 0xffffffff819f5678 <dmu_tx_wait+664>: pop %r13 0xffffffff819f567a <dmu_tx_wait+666>: pop %r14 0xffffffff819f567c <dmu_tx_wait+668>: pop %r15 0xffffffff819f567e <dmu_tx_wait+670>: pop %rbp 0xffffffff819f567f <dmu_tx_wait+671>: retq 0xffffffff819f5680 <dmu_tx_wait+672>: mov 0x30(%r14),%rdi 0xffffffff819f5684 <dmu_tx_wait+676>: inc %rsi 0xffffffff819f5687 <dmu_tx_wait+679>: callq 0xffffffff81a31880 <txg_wait_open> 0xffffffff819f568c <dmu_tx_wait+684>: jmp 0xffffffff819f5671 <dmu_tx_wait+657> End of assembler dump. (kgdb) x 0xffffffff81b11ad0 0xffffffff81b11ad0 <zfs_dirty_data_max_internal>: 0x13bab000 (kgdb) info registers rax 0x3d25fde186a0 67233382500000 rbx 0x14098000 336166912 rcx 0x51df7df23d8aa 1440325321349290 rdx 0x0 0 rsi 0x0 0 rdi 0xfffffe085fec7708 -2163054184696 rbp 0xfffffe085fec7740 0xfffffe085fec7740 rsp 0xfffffe085fec7700 0xfffffe085fec7700 r8 0xfffffe085fec7b80 -2163054183552 r9 0x55d972ee 1440314094 r10 0xfffff80053160938 -8794699069128 r11 0xffffffff806c2460 -2140396448 r12 0xfffff80053160800 -8794699069440 r13 0xfffff80053160958 -8794699069096 r14 0xfffff805786ae400 -8772597914624 r15 0x3d25fde186a0 67233382500000 rip 0xffffffff819f54d4 0xffffffff819f54d4 <dmu_tx_wait+244> eflags 0x10246 66118 cs 0x20 32 ss 0x28 40 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0
The instruction pointer refers to the division at 0xffffffff819f54d4, where it performs the unsigned divide rdx:rax/rsi. Somehow, rsi is reported as 0 in frame 7. I'd expect it to be -5165056 (or the really big unsigned equivalent) because 331001856 is loaded into rsi from memory and rbx (336166912) is subtracted from it.
(In reply to dustinwenz from comment #11) Right -- it should be the unsigned (underflowed) equivalent, which of course is not division by zero. I don't think you should be able to get there with dirty > dirty_max_internal, but obviously (assuming what's on the stack is correct) you did. I stuck a DTRACE in there on the copy I'm running (which isn't in the copy I posted and sent to you) and am torture-testing to try to reproduce this, so far without success (or with the DTRACE firing.)
(In reply to dustinwenz from comment #11) I think that gdb does not lie here. Do you see any code that prevents a value stored at address 0xffffffff81b11ad0 from changing after a load to %rsi at 0xffffffff819f54c4 is executed? I do not.
I've been unable to reproduce this panic in a test environment, with 60 cloned jails running periodic jobs every hour. Since the problem occurred often on production hosts, there might be some other factor that contributes to it. The only difference that comes to mind is that affected hosts also had other moderately-sized ZFS filesystem clones (each having several thousand files totaling between 10 and 100TB per FS) mounted inside of the jails that were scanned by periodic. Those clones were destroyed and recreated multiple times daily. So, I guess my next step is to add some bigger data storage to my test host in order to make progress on this.
(In reply to dustinwenz from comment #14) Hmmmm. This sounds like the panic I've seen on backup copies.... And that one looked to be in-core corruption but I haven't been able to reproduce it under controlled conditions. Interestingly enough I've yet to get one with my inversion protection change in... but I can't call this dispositive as it often took a week or more to show up.
AFAIK, the problem has never been reported for a clean FreeBSD build (without any patches).