Bug 219070 - vm_radix_remove: invalid key found panic
Summary: vm_radix_remove: invalid key found panic
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-04 14:53 UTC by Jeroen Ruigrok van der Werven
Modified: 2017-06-27 13:02 UTC (History)
1 user (show)

See Also:


Attachments
dmesg.boot after panic (6.82 KB, text/plain)
2017-05-04 14:53 UTC, Jeroen Ruigrok van der Werven
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeroen Ruigrok van der Werven freebsd_committer freebsd_triage 2017-05-04 14:53:13 UTC
Created attachment 182301 [details]
dmesg.boot after panic

This is an ASROCK Q2900 motherboard (http://www.asrock.com/mb/intel/q2900-itx/).

Had similar panics with 10-STABLE and 11-STABLE and in the end replaced both memory banks with new SO-DIMMS (Kingston KVR16LS11K2 in a 2-kit that's on their supported memory list) and still received this. So I am excluding memory stick issues.

The problem only seems to occur when under heavy load, say compiling ports with poudriere or building world.

Panic details:

panic: vm_radix_remove: invalid key found
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0455872360
vpanic() at vpanic+0x186/frame 0xfffffe04558723e0
panic() at panic+0x43/frame 0xfffffe0455872440
vm_radix_remove() at vm_radix_remove+0x200/frame 0xfffffe0455872450
vm_page_free_toq() at vm_page_free_toq+0x56/frame 0xfffffe0455872470
vm_object_page_remove() at vm_object_page_remove+0x145/frame 0xfffffe04558724d0
vnode_pager_setsize() at vnode_pager_setsize+0xfd/frame 0xfffffe0455872520
trunc_pages() at trunc_pages+0x126/frame 0xfffffe0455872580
softdep_setup_freeblocks() at softdep_setup_freeblocks+0x9d6/frame 0xfffffe0455872600
ffs_truncate() at ffs_truncate+0x32e/frame 0xfffffe0455872760
ufs_inactive() at ufs_inactive+0x1c7/frame 0xfffffe04558727a0
VOP_INACTIVE_APV() at VOP_INACTIVE_APV+0x89/frame 0xfffffe04558727d0
vinactive() at vinactive+0xf2/frame 0xfffffe0455872830
vputx() at vputx+0x2c5/frame 0xfffffe0455872890
kern_unlinkat() at kern_unlinkat+0x22a/frame 0xfffffe0455872a60
amd64_syscall() at amd64_syscall+0x6c4/frame 0xfffffe0455872bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0455872bf0
--- syscall (10, FreeBSD ELF64, sys_unlink), rip = 0x8009756aa, rsp = 0x7fffffffe718, rbp = 0x7fffffffe7e0 ---
KDB: enter: panic

(kgdb) bt
#0  doadump (textdump=0) at pcpu.h:222
#1  0xffffffff8030b2ab in db_dump (dummy=<value optimized out>, dummy2=<value optimized out>, dummy3=<value optimized out>, dummy4=<value optimized out>) at /usr/src/sys/ddb/db_command.c:533
#2  0xffffffff8030b09f in db_command (cmd_table=<value optimized out>) at /usr/src/sys/ddb/db_command.c:440
#3  0xffffffff8030add4 in db_command_loop () at /usr/src/sys/ddb/db_command.c:493
#4  0xffffffff8030de8f in db_trap (type=<value optimized out>, code=<value optimized out>) at /usr/src/sys/ddb/db_main.c:248
#5  0xffffffff8054b833 in kdb_trap (type=3, code=-61456, tf=<value optimized out>) at /usr/src/sys/kern/subr_kdb.c:654
#6  0xffffffff807ed79f in trap (frame=0xfffffe0455872290) at /usr/src/sys/amd64/amd64/trap.c:537
#7  0xffffffff807d3111 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#8  0xffffffff8054af6b in kdb_enter (why=0xffffffff808f0e3a "panic", msg=<value optimized out>) at cpufunc.h:63
#9  0xffffffff80509303 in vpanic (fmt=<value optimized out>, ap=0xfffffe0455872420) at /usr/src/sys/kern/kern_shutdown.c:752
#10 0xffffffff80509153 in panic (fmt=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:690
#11 0xffffffff807c88c0 in vm_radix_remove (rtree=0x0, index=18446741893301083792) at uma.h:364
#12 0xffffffff807bcd56 in vm_page_free_toq (m=0xfffff804385afe30) at /usr/src/sys/vm/vm_page.c:1287
#13 0xffffffff807b9055 in vm_object_page_remove (object=<value optimized out>, start=<value optimized out>, end=119283, options=<value optimized out>) at /usr/src/sys/vm/vm_object.c:1948
#14 0xffffffff807cbbcd in vnode_pager_setsize (vp=<value optimized out>, nsize=0) at /usr/src/sys/vm/vnode_pager.c:425
#15 0xffffffff8076edd6 in trunc_pages (ip=<value optimized out>, length=0, extblocks=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/ufs/ffs/ffs_softdep.c:7015
#16 0xffffffff8076fb56 in softdep_setup_freeblocks (ip=0xfffff8017d859000, length=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/ufs/ffs/ffs_softdep.c:6981
#17 0xffffffff807599ce in ffs_truncate (vp=<value optimized out>, length=<value optimized out>, flags=<value optimized out>, cred=<value optimized out>) at /usr/src/sys/ufs/ffs/ffs_inode.c:393
#18 0xffffffff8078d507 in ufs_inactive (ap=<value optimized out>) at /usr/src/sys/ufs/ufs/ufs_inode.c:131
#19 0xffffffff808804b9 in VOP_INACTIVE_APV (vop=<value optimized out>, a=0xfffffe04558727e8) at vnode_if.c:1955
#20 0xffffffff805d1022 in vinactive (vp=0xfffff8019851c1d8, td=0xfffff802e2097560) at vnode_if.h:807
#21 0xffffffff805d1555 in vputx (vp=0xfffff8019851c1d8, func=2) at /usr/src/sys/kern/vfs_subr.c:2715
#22 0xffffffff805db81a in kern_unlinkat (td=0xfffff802e2097560, fd=-100, path=0x800e5d908 <Address 0x800e5d908 out of bounds>, pathseg=UIO_USERSPACE, oldinum=<value optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1791
#23 0xffffffff807eea74 in amd64_syscall (td=0xfffff802e2097560, traced=0) at subr_syscall.c:135
#24 0xffffffff807d33fb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#25 0x00000008009756aa in ?? ()


(kgdb) fr 11
#11 0xffffffff807c88c0 in vm_radix_remove (rtree=0x0, index=18446741893301083792) at uma.h:364
364             uma_zfree_arg(zone, item, NULL);
(kgdb) p/x index
$2 = 0xfffffe0455871e90

(kgdb) fr 12
#12 0xffffffff807bcd56 in vm_page_free_toq (m=0xfffff804385afe30) at /usr/src/sys/vm/vm_page.c:1287
1287            vm_radix_remove(&object->rtree, m->pindex);
(kgdb) p (struct vm_page)m
$7 = {plinks = {q = {tqe_next = 0xfffff804385afe30, tqe_prev = 0x0}, s = {ss = {sle_next = 0xfffff804385afe30}, pv = 0x0}, memguard = {p = 18446735295741886000, v = 0}}, listq = {tqe_next = 0xfffff802867fed68, 
    tqe_prev = 0xfffff804385afe98}, object = 0xfffffe0455872440, pindex = 18446744071567348051, phys_addr = 18446741893301085264, md = {pv_list = {tqh_first = 0xffffffff8092c3ed, tqh_last = 0xe}, pv_gen = 979041840, pat_mode = -2044}, 
  wire_count = 2178760728, busy_lock = 4294967295, hold_count = 3, flags = 0, aflags = 0 '\0', oflags = 0 '\0', queue = 0 '\0', psind = 0 '\0', segind = 16 '\020', order = 0 '\0', pool = 0 '\0', act_count = 0 '\0', valid = 48 '0', 
  dirty = 0 '\0'}
(kgdb) p/x 18446735295741886000
$8 = 0xfffff804385afe30
(kgdb) p/x 18446744071567348051
$9 = 0xffffffff80509153
(kgdb) p/x 18446741893301085264
$10 = 0xfffffe0455872450

(kgdb) fr 13
#13 0xffffffff807b9055 in vm_object_page_remove (object=<value optimized out>, start=<value optimized out>, end=119283, options=<value optimized out>) at /usr/src/sys/vm/vm_object.c:1948
1948                    vm_page_free(p);
(kgdb) print (struct vm_page)p
$5 = {plinks = {q = {tqe_next = 0xfffff804385afe98, tqe_prev = 0xfffffe0455872440}, s = {ss = {sle_next = 0xfffff804385afe98}, pv = 0xfffffe0455872440}, memguard = {p = 18446735295741886104, v = 18446741893301085248}}, listq = {
    tqe_next = 0xffffffff80509153, tqe_prev = 0xfffffe0455872450}, object = 0xffffffff8092c3ed, pindex = 14, phys_addr = 18446735295775440432, md = {pv_list = {tqh_first = 0xffffffff81dd4018, tqh_last = 0x3}, pv_gen = 16, 
    pat_mode = 48}, wire_count = 1434920016, busy_lock = 4294966788, hold_count = 9200, flags = 21895, aflags = 4 '\004', oflags = 254 '', queue = 255 '', psind = -1 '', segind = 0 '\0', order = 0 '\0', pool = 0 '\0', 
  act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'}
(kgdb) p/x 18446735295741886104
$14 = 0xfffff804385afe98
Comment 1 Konstantin Belousov freebsd_committer freebsd_triage 2017-05-05 15:04:35 UTC
The page in frame 12 looks like a garbage: the wire count practically cannot be that large, pat mode also looks wrong.  This might be a cause of the panic you see, or it might be due to gdb troubles.

Did you compiled your kernel with INVARIANTS and WITNESS ?  If not, enable these options and try to reproduce the problem.  I wonder if it triggers something else faster.

Also, do you use any third-party modules ?
Comment 2 Jeroen Ruigrok van der Werven freebsd_committer freebsd_triage 2017-05-05 20:08:09 UTC
(In reply to Konstantin Belousov from comment #1)
No third party modules:

% kldstat
Id Refs Address            Size     Name
 1   25 0xffffffff80200000 1e08780  kernel
 2    1 0xffffffff82221000 593e     fdescfs.ko
 3    1 0xffffffff82227000 6683     nullfs.ko
 4    1 0xffffffff8222e000 a865     linprocfs.ko
 5    3 0xffffffff82239000 7b44     linux_common.ko
 6    1 0xffffffff82241000 bb08     tmpfs.ko
 7    1 0xffffffff8224d000 41d65    linux.ko
 8    1 0xffffffff8228f000 3bea5    linux64.ko

I had INVARIANTS and WITNESS on during 10-STABLE, but turned it off during my entire upgrade to 11-STABLE. I'll turn them on and run another poudriere llvm build to see if I can trigger it again.

The wire_count is the wired page count statistic, I assume?
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2017-05-05 20:38:36 UTC
(In reply to Jeroen Ruigrok van der Werven from comment #2)
It is a counter of number of times the page is currently wired.  Each user must unwire the page, eventually.
Comment 4 Jeroen Ruigrok van der Werven freebsd_committer freebsd_triage 2017-06-05 10:16:46 UTC
Of course, with INVARIANTS and WITNESS currently in the kernel the system stays stable.

Maybe I should remove WITNESS at some point and just run with INVARIANTS, see how it behaves. And maybe even ditch INVARIANTS at some point. If it stays stable then it must have been a fluke.
Comment 5 Jeroen Ruigrok van der Werven freebsd_committer freebsd_triage 2017-06-27 13:02:44 UTC
Will test this week. Haven't had time yet.