Bug 192284

Summary: panic: avl_find() succeeded inside avl_add() while running poudriere bulk on -CURRENT
Product: Base System Reporter: Allan Jude <allanjude>
Component: kernAssignee: Xin LI <delphij>
Status: Closed FIXED    
Severity: Affects Some People CC: delphij, swills
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Allan Jude freebsd_committer freebsd_triage 2014-07-31 03:31:04 UTC
FreeBSD Tango.HML1.ScaleEngine.net 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r269281: Wed Jul 30 01:07:09 UTC 2014     root@Tango.HML1.ScaleEngine.net:/usr/obj/usr/src/sys/SCALEENGINE  amd64


Trying to run 'poudriere bulk' on yesterdays -CURRENT and it crashes


Jul 31 03:11:11 Tango syslogd: kernel boot file is /boot/kernel/kernel
Jul 31 03:11:11 Tango kernel: panic: avl_find() succeeded inside avl_add()
Jul 31 03:11:11 Tango kernel: cpuid = 6
Jul 31 03:11:11 Tango kernel: KDB: stack backtrace:
Jul 31 03:11:11 Tango kernel: #0 0xffffffff809ab070 at kdb_backtrace+0x60
Jul 31 03:11:11 Tango kernel: #1 0xffffffff8096c3a5 at panic+0x155
Jul 31 03:11:11 Tango kernel: #2 0xffffffff819dbc04 at avl_add+0x84
Jul 31 03:11:11 Tango kernel: #3 0xffffffff81a04560 at dbuf_create+0x390
Jul 31 03:11:11 Tango kernel: #4 0xffffffff81a03f5a at dbuf_hold_impl+0x11a
Jul 31 03:11:11 Tango kernel: #5 0xffffffff81a0467b at dbuf_hold+0x1b
Jul 31 03:11:11 Tango kernel: #6 0xffffffff81a09593 at dmu_buf_hold_array_by_dnode+0x153
Jul 31 03:11:11 Tango kernel: #7 0xffffffff81a0a526 at dmu_read_uio+0x66
Jul 31 03:11:11 Tango kernel: #8 0xffffffff81a9b923 at zfs_freebsd_read+0x3a3
Jul 31 03:11:11 Tango kernel: #9 0xffffffff80eddf11 at VOP_READ_APV+0xa1
Jul 31 03:11:11 Tango kernel: #10 0xffffffff80a32427 at vn_read+0x167
Jul 31 03:11:11 Tango kernel: #11 0xffffffff80a2e5db at vn_io_fault+0x10b
Jul 31 03:11:11 Tango kernel: #12 0xffffffff809c4a95 at dofileread+0x95
Jul 31 03:11:11 Tango kernel: #13 0xffffffff809c48f2 at kern_preadv+0x92
Jul 31 03:11:11 Tango kernel: #14 0xffffffff809c4859 at sys_pread+0x69
Jul 31 03:11:11 Tango kernel: #15 0xffffffff80db84fb at amd64_syscall+0x3fb
Jul 31 03:11:11 Tango kernel: #16 0xffffffff80d9a47b at Xfast_syscall+0xfb
Jul 31 03:11:11 Tango kernel: Uptime: 1d1h38m16s


Jul 31 03:25:23 Tango syslogd: kernel boot file is /boot/kernel/kernel
Jul 31 03:25:23 Tango kernel: panic: avl_find() succeeded inside avl_add()
Jul 31 03:25:23 Tango kernel: cpuid = 0
Jul 31 03:25:23 Tango kernel: KDB: stack backtrace:
Jul 31 03:25:23 Tango kernel: #0 0xffffffff809ab070 at kdb_backtrace+0x60
Jul 31 03:25:23 Tango kernel: #1 0xffffffff8096c3a5 at panic+0x155
Jul 31 03:25:23 Tango kernel: #2 0xffffffff819dbc04 at avl_add+0x84
Jul 31 03:25:23 Tango kernel: #3 0xffffffff81a04560 at dbuf_create+0x390
Jul 31 03:25:23 Tango kernel: #4 0xffffffff81a03f5a at dbuf_hold_impl+0x11a
Jul 31 03:25:23 Tango kernel: #5 0xffffffff81a0467b at dbuf_hold+0x1b
Jul 31 03:25:23 Tango kernel: #6 0xffffffff81a09593 at dmu_buf_hold_array_by_dnode+0x153
Jul 31 03:25:23 Tango kernel: #7 0xffffffff81a0a526 at dmu_read_uio+0x66
Jul 31 03:25:23 Tango kernel: #8 0xffffffff81a9bb0a at zfs_freebsd_read+0x58a
Jul 31 03:25:23 Tango kernel: #9 0xffffffff80eddf11 at VOP_READ_APV+0xa1
Jul 31 03:25:23 Tango kernel: #10 0xffffffff80a32427 at vn_read+0x167
Jul 31 03:25:23 Tango kernel: #11 0xffffffff80a2e5db at vn_io_fault+0x10b
Jul 31 03:25:23 Tango kernel: #12 0xffffffff809c4a95 at dofileread+0x95
Jul 31 03:25:23 Tango kernel: #13 0xffffffff809c47b8 at kern_readv+0x68
Jul 31 03:25:23 Tango kernel: #14 0xffffffff809c4743 at sys_read+0x63
Jul 31 03:25:23 Tango kernel: #15 0xffffffff80db84fb at amd64_syscall+0x3fb
Jul 31 03:25:23 Tango kernel: #16 0xffffffff80d9a47b at Xfast_syscall+0xfb
Jul 31 03:25:23 Tango kernel: Uptime: 12m11s
Comment 1 Allan Jude freebsd_committer freebsd_triage 2014-07-31 04:08:34 UTC
Another crash

Jul 31 04:04:36 Tango syslogd: kernel boot file is /boot/kernel/kernel
Jul 31 04:04:36 Tango kernel: panic: avl_find() succeeded inside avl_add()
Jul 31 04:04:36 Tango kernel: cpuid = 23
Jul 31 04:04:36 Tango kernel: KDB: stack backtrace:
Jul 31 04:04:36 Tango kernel: #0 0xffffffff809ab070 at kdb_backtrace+0x60
Jul 31 04:04:36 Tango kernel: #1 0xffffffff8096c3a5 at panic+0x155
Jul 31 04:04:36 Tango kernel: #2 0xffffffff819dbc04 at avl_add+0x84
Jul 31 04:04:36 Tango kernel: #3 0xffffffff81a04560 at dbuf_create+0x390
Jul 31 04:04:36 Tango kernel: #4 0xffffffff81a03f5a at dbuf_hold_impl+0x11a
Jul 31 04:04:36 Tango kernel: #5 0xffffffff81a0467b at dbuf_hold+0x1b
Jul 31 04:04:36 Tango kernel: #6 0xffffffff81a09593 at dmu_buf_hold_array_by_dno
de+0x153
Jul 31 04:04:36 Tango kernel: #7 0xffffffff81a0a526 at dmu_read_uio+0x66
Jul 31 04:04:36 Tango kernel: #8 0xffffffff81a9b923 at zfs_freebsd_read+0x3a3
Jul 31 04:04:36 Tango kernel: #9 0xffffffff80eddf11 at VOP_READ_APV+0xa1
Jul 31 04:04:36 Tango kernel: #10 0xffffffff80a32427 at vn_read+0x167
Jul 31 04:04:36 Tango kernel: #11 0xffffffff80a2e5db at vn_io_fault+0x10b
Jul 31 04:04:36 Tango kernel: #12 0xffffffff809c4a95 at dofileread+0x95
Jul 31 04:04:36 Tango kernel: #13 0xffffffff809c47b8 at kern_readv+0x68
Jul 31 04:04:36 Tango kernel: #14 0xffffffff809c4743 at sys_read+0x63
Jul 31 04:04:36 Tango kernel: #15 0xffffffff80db84fb at amd64_syscall+0x3fb
Jul 31 04:04:36 Tango kernel: #16 0xffffffff80d9a47b at Xfast_syscall+0xfb
Jul 31 04:04:36 Tango kernel: Uptime: 36m46s
Comment 2 Xin LI freebsd_committer freebsd_triage 2014-07-31 05:57:54 UTC
Can you try reverting r221991?  It looks like our gethrtime() uses the getnanouptime() instead of nanouptime(), which is faster but have less accuracy.

Andriy -- is there any reason why the portion of change from 221990 gets reverted?  (I think gethrtime_waitfree() should go through dtrace_gethrtime() by the way).
Comment 3 Allan Jude freebsd_committer freebsd_triage 2014-08-01 02:03:36 UTC
Reverting r221990 does seem to have solved the issue.

I'll keep an eye on it for a few more days before saying it is fixed conclusively.

#uname -a
FreeBSD Tango.HML1.ScaleEngine.net 11.0-CURRENT FreeBSD 11.0-CURRENT #2 r269358M: Thu Jul 31 23:33:29 UTC 2014     root@Tango.HML1.ScaleEngine.net:/usr/obj/usr/src/sys/SCALEENGINE  amd64

#uptime
 2:02AM  up  2:15, 5 users, load averages: 24.19, 22.90, 20.78



(high load average is from poudriere bulk runs)
Comment 4 commit-hook freebsd_committer freebsd_triage 2014-08-01 22:34:08 UTC
A commit references this bug:

Author: delphij
Date: Fri Aug  1 22:33:24 UTC 2014
New revision: 269404
URL: http://svnweb.freebsd.org/changeset/base/269404

Log:
  Split gethrtime() and gethrtime_waitfree() and make the former use
  nanouptime() instead of getnanouptime().  nanouptime(9) provides more
  precise result at expense of being slower.

  In r269223, gethrtime() is used as creation time of dbuf, which in turn
  acts as portion of lookup key to maintain AVL invariant where there can
  not be duplicate items.  Before this change, gethrtime() have preferred
  better execution time by sacrificing precision, which may lead to panic
  on busy systems with:

  	panic: avl_find() succeeded inside avl_add()

  Reported by:	allanjude, mav
  PR:		kern/192284
  MFC after:	11 days
  X-MFC-with:	r269223

Changes:
  head/sys/cddl/compat/opensolaris/sys/time.h
Comment 5 Xin LI freebsd_committer freebsd_triage 2014-08-01 22:35:06 UTC
Fixed in -HEAD.