I've to a repeatable crash on a server that is busy, but not overly busy. The server is a sunfire x4140 (dual 6 core opteron w/ 48 G RAM and 6 1G disks). The 6 1G disks are in a ZFS z2 array. The crash happens on line 198 of my source, which is inside the ZAP_LEAF_CHUNK() macro. 193 chunk = zap_leaf_phys(l)->l_hdr.lh_freelist; 194 ASSERT3U(chunk, <, ZAP_LEAF_NUMCHUNKS(l)); 195 ASSERT3U(ZAP_LEAF_CHUNK(l, chunk).l_free.lf_type, ==, ZAP_CHUNK_FREE); 196 197 zap_leaf_phys(l)->l_hdr.lh_freelist = 198 ZAP_LEAF_CHUNK(l, chunk).l_free.lf_next; 199 200 zap_leaf_phys(l)->l_hdr.lh_nfree--; 201 202 return (chunk); for some reason, I can't print l->l_phys (which is what the macro does first). gdb doesn't see l as having l->phys. The following is the GDB output of bt: panic: vm_fault: fault on nofault entry, addr: fffffe00c79bd000 cpuid = 8 KDB: stack backtrace: #0 0xffffffff8098e390 at kdb_backtrace+0x60 #1 0xffffffff80951066 at vpanic+0x126 #2 0xffffffff80950f33 at panic+0x43 #3 0xffffffff80bcfa4c at vm_fault_hold+0x1b2c #4 0xffffffff80bcded7 at vm_fault+0x77 #5 0xffffffff80d5612c at trap_pfault+0x19c #6 0xffffffff80d558fa at trap+0x47a #7 0xffffffff80d3b8d2 at calltrap+0x8 #8 0xffffffff81a49a5a at zap_entry_create+0x27a #9 0xffffffff81a45eee at fzap_add_cd+0xde #10 0xffffffff81a4c051 at zap_add+0x101 #11 0xffffffff81a6bfb5 at zfs_link_create+0x415 #12 0xffffffff81a87a34 at zfs_freebsd_rename+0xac4 #13 0xffffffff80e81e1b at VOP_RENAME_APV+0xab #14 0xffffffff80a03476 at kern_renameat+0x4a6 #15 0xffffffff80d5694f at amd64_syscall+0x40f #16 0xffffffff80d3bbbb at Xfast_syscall+0xfb Uptime: 21m11s GEOM_MIRROR: Device yak-boot: rebuilding provider da7 stopped. Dumping 1978 out of 49123 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% *** Deleted all the "loading symbols" lines *** #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff80950cc2 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486 #2 0xffffffff809510a5 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:889 #3 0xffffffff80950f33 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:818 #4 0xffffffff80bcfa4c in vm_fault_hold (map=<value optimized out>, vaddr=<value optimized out>, fault_type=<value optimized out>, fault_flags=<value optimized out>, m_hold=<value optimized out>) at /usr/src/sys/vm/vm_fault.c:329 #5 0xffffffff80bcded7 in vm_fault (map=0xfffff80002000000, vaddr=<value optimized out>, fault_type=1 '\001', fault_flags=0) at /usr/src/sys/vm/vm_fault.c:273 #6 0xffffffff80d5612c in trap_pfault (frame=0xfffffe0c56854320, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:757 #7 0xffffffff80d558fa in trap (frame=0xfffffe0c56854320) at /usr/src/sys/amd64/amd64/trap.c:447 #8 0xffffffff80d3b8d2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #9 0xffffffff81a494c5 in zap_leaf_array_create (l=0xfffff8008b3a3b00, buf=<value optimized out>, integer_size=1, num_integers=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:198 #10 0xffffffff81a49a5a in zap_entry_create (l=0xfffff8008b3a3b00, zn=0xfffff8008bf19e00, cd=Cannot access memory at address 0x0 ) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:644 #11 0xffffffff81a45eee in fzap_add_cd (zn=0xfffff8008bf19e00, integer_size=<value optimized out>, num_integers=1, val=0xfffffe0c568546d0, cd=4294967295, tx=0xfffff80787758c00) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:814 #12 0xffffffff81a4c051 in zap_add (os=<value optimized out>, zapobj=<value optimized out>, key=<value optimized out>, integer_size=<value optimized out>, num_integers=<value optimized out>, val=<value optimized out>, tx=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:1014 #13 0xffffffff81a6bfb5 in zfs_link_create (dl=0xfffff800966cce00, zp=<value optimized out>, tx=0xfffff80787758c00, flag=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:767 #14 0xffffffff81a87a34 in zfs_freebsd_rename (ap=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4067 #15 0xffffffff80e81e1b in VOP_RENAME_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1546 #16 0xffffffff80a03476 in kern_renameat (td=<value optimized out>, oldfd=<value optimized out>, old=<value optimized out>, newfd=<value optimized out>, new=<value optimized out>, pathseg=<value optimized out>) at vnode_if.h:636 #17 0xffffffff80d5694f in amd64_syscall (td=0xfffff8048f7fd000, traced=0) at subr_syscall.c:141 #18 0xffffffff80d3bbbb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #19 0x000000080381fbaa in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal
I'm struggling with kgdb and the source. But from what I can tell, this may be happening with a rename() kernel call operating on a directory of 350k files that icinga2's statistics addon had manage to generate. Wow. That's big as directories go.
A little more data. There are two directories: /var/spool/icinga2/tmp and /var/spool/icinga2/perfdata (technically, /jail/icinga exist below that ... both filesystems, but the rest of this is one filesystem) in tmp are two files: host-perfdata and service-perfdata each interval (looks like every minute), these files are renamed into the perfdata directory with a date-time string inserted into the name. -rw-r--r-- 1 183 icinga 4130 Aug 3 16:42 host-perfdata.1470256929 -rw-r--r-- 1 183 icinga 5298 Aug 3 16:42 service-perfdata.1470256959 (as an example of two files in perfdata.) so... with the 350k files in the directory, 2 renames firing each minute, the crash happens about once every 12-ish hours.
I got another crash with a slightly different signature. Still ending in the same stuff: (kgdb) bt #0 doadump (textdump=1) at ../../../kern/kern_shutdown.c:298 #1 0xffffffff80df6d9a in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:486 #2 0xffffffff80df770c in vpanic (fmt=0xffffffff81796b99 "vm_fault: fault on nofault entry, addr: %lx", ap=0xfffffe0c563a2560) at ../../../kern/kern_shutdown.c:889 #3 0xffffffff80df74d0 in panic (fmt=0xffffffff81796b99 "vm_fault: fault on nofault entry, addr: %lx") at ../../../kern/kern_shutdown.c:818 #4 0xffffffff8126b4d6 in vm_fault_hold (map=0xfffff80003000000, vaddr=18446741878753259520, fault_type=1 '\001', fault_flags=0, m_hold=0x0) at ../../../vm/vm_fault.c:329 #5 0xffffffff8126b259 in vm_fault (map=0xfffff80003000000, vaddr=18446741878753259520, fault_type=1 '\001', fault_flags=0) at ../../../vm/vm_fault.c:273 #6 0xffffffff8142001a in trap_pfault (frame=0xfffffe0c563a2e60, usermode=0) at ../../../amd64/amd64/trap.c:757 #7 0xffffffff8141ed93 in trap (frame=0xfffffe0c563a2e60) at ../../../amd64/amd64/trap.c:447 #8 0xffffffff813f78d2 in calltrap () at ../../../amd64/amd64/exception.S:236 #9 0xffffffff822daba7 in zap_leaf_chunk_alloc (l=0xfffff80566424b00) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:198 #10 0xffffffff822da1d8 in zap_leaf_array_create (l=0xfffff80566424b00, buf=0xfffff80848a09a30 "service-perfdata.1470234364", integer_size=1, num_integers=28) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:239 #11 0xffffffff822daa39 in zap_entry_create (l=0xfffff80566424b00, zn=0xfffff8001ca1c200, cd=0, integer_size=8 '\b', num_integers=1, buf=0xfffffe0c563a33a8, zeh=0xfffffe0c563a3100) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:644 #12 0xffffffff822d4d1d in fzap_add_cd (zn=0xfffff8001ca1c200, integer_size=8, num_integers=1, val=0xfffffe0c563a33a8, cd=4294967295, tx=0xfffff805f57b1300) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:814 #13 0xffffffff822d5427 in fzap_add (zn=0xfffff8001ca1c200, integer_size=8, num_integers=1, val=0xfffffe0c563a33a8, tx=0xfffff805f57b1300) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:841 #14 0xffffffff822de5d1 in zap_add (os=0xfffff8002d0bac00, zapobj=60627, key=0xfffff80848a09a30 "service-perfdata.1470234364", integer_size=8, num_integers=1, val=0xfffffe0c563a33a8, tx=0xfffff805f57b1300) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:1008 #15 0xffffffff82313c59 in zfs_link_create (dl=0xfffff80848a09a00, zp=0xfffff806f6a4cb80, tx=0xfffff805f57b1300, flag=16) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:767 #16 0xffffffff8233f5fb in zfs_rename (sdvp=0xfffff804d7613938, snm=0xfffff8001fa3e817 "service-perfdata", tdvp=0xfffff804d7613588, tnm=0xfffff8001fa8cc1c "service-perfdata.1470234364", cr=0xfffff8049f296e00, ct=0x0, flags=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4067 #17 0xffffffff82339aaf in zfs_freebsd_rename (ap=0xfffffe0c563a3628) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6513 #18 0xffffffff8161877b in VOP_RENAME_APV (vop=0xffffffff823c75f0, a=0xfffffe0c563a3628) at vnode_if.c:1546 #19 0xffffffff80f22519 in VOP_RENAME (fdvp=0xfffff804d7613938, fvp=0xfffff809a51ddb10, fcnp=0xfffffe0c563a38b0, tdvp=0xfffff804d7613588, tvp=0x0, tcnp=0xfffffe0c563a37e0) at vnode_if.h:636 #20 0xffffffff80f223a3 in kern_renameat (td=0xfffff804258d5960, oldfd=-100, old=0x806c201c0 <Address 0x806c201c0 out of bounds>, newfd=-100, new=0x806c51180 <Address 0x806c51180 out of bounds>, pathseg=UIO_USERSPACE) at ../../../kern/vfs_syscalls.c:3752 #21 0xffffffff80f21d63 in kern_rename (td=0xfffff804258d5960, from=0x806c201c0 <Address 0x806c201c0 out of bounds>, to=0x806c51180 <Address 0x806c51180 out of bounds>, pathseg=UIO_USERSPACE) at ../../../kern/vfs_syscalls.c:3644 #22 0xffffffff80f21d2a in sys_rename (td=0xfffff804258d5960, uap=0xfffffe0c563a3b98) at ../../../kern/vfs_syscalls.c:3621 #23 0xffffffff81421163 in syscallenter (td=0xfffff804258d5960, sa=0xfffffe0c563a3b88) at subr_syscall.c:141 #24 0xffffffff8142073f in amd64_syscall (td=0xfffff804258d5960, traced=0) at ../../../amd64/amd64/trap.c:959 #25 0xffffffff813f7bbb in Xfast_syscall () at ../../../amd64/amd64/exception.S:396 #26 0x000000080381fbaa in ?? ()