FreeBSD Tango.HML1.ScaleEngine.net 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r269281: Wed Jul 30 01:07:09 UTC 2014 root@Tango.HML1.ScaleEngine.net:/usr/obj/usr/src/sys/SCALEENGINE amd64 Trying to run 'poudriere bulk' on yesterdays -CURRENT and it crashes Jul 31 03:11:11 Tango syslogd: kernel boot file is /boot/kernel/kernel Jul 31 03:11:11 Tango kernel: panic: avl_find() succeeded inside avl_add() Jul 31 03:11:11 Tango kernel: cpuid = 6 Jul 31 03:11:11 Tango kernel: KDB: stack backtrace: Jul 31 03:11:11 Tango kernel: #0 0xffffffff809ab070 at kdb_backtrace+0x60 Jul 31 03:11:11 Tango kernel: #1 0xffffffff8096c3a5 at panic+0x155 Jul 31 03:11:11 Tango kernel: #2 0xffffffff819dbc04 at avl_add+0x84 Jul 31 03:11:11 Tango kernel: #3 0xffffffff81a04560 at dbuf_create+0x390 Jul 31 03:11:11 Tango kernel: #4 0xffffffff81a03f5a at dbuf_hold_impl+0x11a Jul 31 03:11:11 Tango kernel: #5 0xffffffff81a0467b at dbuf_hold+0x1b Jul 31 03:11:11 Tango kernel: #6 0xffffffff81a09593 at dmu_buf_hold_array_by_dnode+0x153 Jul 31 03:11:11 Tango kernel: #7 0xffffffff81a0a526 at dmu_read_uio+0x66 Jul 31 03:11:11 Tango kernel: #8 0xffffffff81a9b923 at zfs_freebsd_read+0x3a3 Jul 31 03:11:11 Tango kernel: #9 0xffffffff80eddf11 at VOP_READ_APV+0xa1 Jul 31 03:11:11 Tango kernel: #10 0xffffffff80a32427 at vn_read+0x167 Jul 31 03:11:11 Tango kernel: #11 0xffffffff80a2e5db at vn_io_fault+0x10b Jul 31 03:11:11 Tango kernel: #12 0xffffffff809c4a95 at dofileread+0x95 Jul 31 03:11:11 Tango kernel: #13 0xffffffff809c48f2 at kern_preadv+0x92 Jul 31 03:11:11 Tango kernel: #14 0xffffffff809c4859 at sys_pread+0x69 Jul 31 03:11:11 Tango kernel: #15 0xffffffff80db84fb at amd64_syscall+0x3fb Jul 31 03:11:11 Tango kernel: #16 0xffffffff80d9a47b at Xfast_syscall+0xfb Jul 31 03:11:11 Tango kernel: Uptime: 1d1h38m16s Jul 31 03:25:23 Tango syslogd: kernel boot file is /boot/kernel/kernel Jul 31 03:25:23 Tango kernel: panic: avl_find() succeeded inside avl_add() Jul 31 03:25:23 Tango kernel: cpuid = 0 Jul 31 03:25:23 Tango kernel: KDB: stack backtrace: Jul 31 03:25:23 Tango kernel: #0 0xffffffff809ab070 at kdb_backtrace+0x60 Jul 31 03:25:23 Tango kernel: #1 0xffffffff8096c3a5 at panic+0x155 Jul 31 03:25:23 Tango kernel: #2 0xffffffff819dbc04 at avl_add+0x84 Jul 31 03:25:23 Tango kernel: #3 0xffffffff81a04560 at dbuf_create+0x390 Jul 31 03:25:23 Tango kernel: #4 0xffffffff81a03f5a at dbuf_hold_impl+0x11a Jul 31 03:25:23 Tango kernel: #5 0xffffffff81a0467b at dbuf_hold+0x1b Jul 31 03:25:23 Tango kernel: #6 0xffffffff81a09593 at dmu_buf_hold_array_by_dnode+0x153 Jul 31 03:25:23 Tango kernel: #7 0xffffffff81a0a526 at dmu_read_uio+0x66 Jul 31 03:25:23 Tango kernel: #8 0xffffffff81a9bb0a at zfs_freebsd_read+0x58a Jul 31 03:25:23 Tango kernel: #9 0xffffffff80eddf11 at VOP_READ_APV+0xa1 Jul 31 03:25:23 Tango kernel: #10 0xffffffff80a32427 at vn_read+0x167 Jul 31 03:25:23 Tango kernel: #11 0xffffffff80a2e5db at vn_io_fault+0x10b Jul 31 03:25:23 Tango kernel: #12 0xffffffff809c4a95 at dofileread+0x95 Jul 31 03:25:23 Tango kernel: #13 0xffffffff809c47b8 at kern_readv+0x68 Jul 31 03:25:23 Tango kernel: #14 0xffffffff809c4743 at sys_read+0x63 Jul 31 03:25:23 Tango kernel: #15 0xffffffff80db84fb at amd64_syscall+0x3fb Jul 31 03:25:23 Tango kernel: #16 0xffffffff80d9a47b at Xfast_syscall+0xfb Jul 31 03:25:23 Tango kernel: Uptime: 12m11s
Another crash Jul 31 04:04:36 Tango syslogd: kernel boot file is /boot/kernel/kernel Jul 31 04:04:36 Tango kernel: panic: avl_find() succeeded inside avl_add() Jul 31 04:04:36 Tango kernel: cpuid = 23 Jul 31 04:04:36 Tango kernel: KDB: stack backtrace: Jul 31 04:04:36 Tango kernel: #0 0xffffffff809ab070 at kdb_backtrace+0x60 Jul 31 04:04:36 Tango kernel: #1 0xffffffff8096c3a5 at panic+0x155 Jul 31 04:04:36 Tango kernel: #2 0xffffffff819dbc04 at avl_add+0x84 Jul 31 04:04:36 Tango kernel: #3 0xffffffff81a04560 at dbuf_create+0x390 Jul 31 04:04:36 Tango kernel: #4 0xffffffff81a03f5a at dbuf_hold_impl+0x11a Jul 31 04:04:36 Tango kernel: #5 0xffffffff81a0467b at dbuf_hold+0x1b Jul 31 04:04:36 Tango kernel: #6 0xffffffff81a09593 at dmu_buf_hold_array_by_dno de+0x153 Jul 31 04:04:36 Tango kernel: #7 0xffffffff81a0a526 at dmu_read_uio+0x66 Jul 31 04:04:36 Tango kernel: #8 0xffffffff81a9b923 at zfs_freebsd_read+0x3a3 Jul 31 04:04:36 Tango kernel: #9 0xffffffff80eddf11 at VOP_READ_APV+0xa1 Jul 31 04:04:36 Tango kernel: #10 0xffffffff80a32427 at vn_read+0x167 Jul 31 04:04:36 Tango kernel: #11 0xffffffff80a2e5db at vn_io_fault+0x10b Jul 31 04:04:36 Tango kernel: #12 0xffffffff809c4a95 at dofileread+0x95 Jul 31 04:04:36 Tango kernel: #13 0xffffffff809c47b8 at kern_readv+0x68 Jul 31 04:04:36 Tango kernel: #14 0xffffffff809c4743 at sys_read+0x63 Jul 31 04:04:36 Tango kernel: #15 0xffffffff80db84fb at amd64_syscall+0x3fb Jul 31 04:04:36 Tango kernel: #16 0xffffffff80d9a47b at Xfast_syscall+0xfb Jul 31 04:04:36 Tango kernel: Uptime: 36m46s
Can you try reverting r221991? It looks like our gethrtime() uses the getnanouptime() instead of nanouptime(), which is faster but have less accuracy. Andriy -- is there any reason why the portion of change from 221990 gets reverted? (I think gethrtime_waitfree() should go through dtrace_gethrtime() by the way).
Reverting r221990 does seem to have solved the issue. I'll keep an eye on it for a few more days before saying it is fixed conclusively. #uname -a FreeBSD Tango.HML1.ScaleEngine.net 11.0-CURRENT FreeBSD 11.0-CURRENT #2 r269358M: Thu Jul 31 23:33:29 UTC 2014 root@Tango.HML1.ScaleEngine.net:/usr/obj/usr/src/sys/SCALEENGINE amd64 #uptime 2:02AM up 2:15, 5 users, load averages: 24.19, 22.90, 20.78 (high load average is from poudriere bulk runs)
A commit references this bug: Author: delphij Date: Fri Aug 1 22:33:24 UTC 2014 New revision: 269404 URL: http://svnweb.freebsd.org/changeset/base/269404 Log: Split gethrtime() and gethrtime_waitfree() and make the former use nanouptime() instead of getnanouptime(). nanouptime(9) provides more precise result at expense of being slower. In r269223, gethrtime() is used as creation time of dbuf, which in turn acts as portion of lookup key to maintain AVL invariant where there can not be duplicate items. Before this change, gethrtime() have preferred better execution time by sacrificing precision, which may lead to panic on busy systems with: panic: avl_find() succeeded inside avl_add() Reported by: allanjude, mav PR: kern/192284 MFC after: 11 days X-MFC-with: r269223 Changes: head/sys/cddl/compat/opensolaris/sys/time.h
Fixed in -HEAD.