Bug 211535 - ZFS crash zap_leaf_array_create() in zap_leaf.c
Summary: ZFS crash zap_leaf_array_create() in zap_leaf.c
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-fs (Nobody)
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2016-08-02 21:32 UTC by David Gilbert
Modified: 2016-08-22 05:54 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Gilbert 2016-08-02 21:32:07 UTC
I've to a repeatable crash on a server that is busy, but not overly busy.  The server is a sunfire x4140 (dual 6 core opteron w/ 48 G RAM and 6 1G disks).  The 6 1G disks are in a ZFS z2 array.

The crash happens on line 198 of my source, which is inside the ZAP_LEAF_CHUNK() macro.

193             chunk = zap_leaf_phys(l)->l_hdr.lh_freelist;
194             ASSERT3U(chunk, <, ZAP_LEAF_NUMCHUNKS(l));
195             ASSERT3U(ZAP_LEAF_CHUNK(l, chunk).l_free.lf_type, ==, ZAP_CHUNK_FREE);
196
197             zap_leaf_phys(l)->l_hdr.lh_freelist =
198                 ZAP_LEAF_CHUNK(l, chunk).l_free.lf_next;
199
200             zap_leaf_phys(l)->l_hdr.lh_nfree--;
201
202             return (chunk);

for some reason, I can't print l->l_phys (which is what the macro does first).  gdb doesn't see l as having l->phys.

The following is the GDB output of bt:

panic: vm_fault: fault on nofault entry, addr: fffffe00c79bd000
cpuid = 8
KDB: stack backtrace:
#0 0xffffffff8098e390 at kdb_backtrace+0x60
#1 0xffffffff80951066 at vpanic+0x126
#2 0xffffffff80950f33 at panic+0x43
#3 0xffffffff80bcfa4c at vm_fault_hold+0x1b2c
#4 0xffffffff80bcded7 at vm_fault+0x77
#5 0xffffffff80d5612c at trap_pfault+0x19c
#6 0xffffffff80d558fa at trap+0x47a
#7 0xffffffff80d3b8d2 at calltrap+0x8
#8 0xffffffff81a49a5a at zap_entry_create+0x27a
#9 0xffffffff81a45eee at fzap_add_cd+0xde
#10 0xffffffff81a4c051 at zap_add+0x101
#11 0xffffffff81a6bfb5 at zfs_link_create+0x415
#12 0xffffffff81a87a34 at zfs_freebsd_rename+0xac4
#13 0xffffffff80e81e1b at VOP_RENAME_APV+0xab
#14 0xffffffff80a03476 at kern_renameat+0x4a6
#15 0xffffffff80d5694f at amd64_syscall+0x40f
#16 0xffffffff80d3bbbb at Xfast_syscall+0xfb
Uptime: 21m11s
GEOM_MIRROR: Device yak-boot: rebuilding provider da7 stopped.
Dumping 1978 out of 49123 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

*** Deleted all the "loading symbols" lines ***

#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80950cc2 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486
#2  0xffffffff809510a5 in vpanic (fmt=<value optimized out>, ap=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:889
#3  0xffffffff80950f33 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:818
#4  0xffffffff80bcfa4c in vm_fault_hold (map=<value optimized out>, vaddr=<value optimized out>,
    fault_type=<value optimized out>, fault_flags=<value optimized out>, m_hold=<value optimized out>)
    at /usr/src/sys/vm/vm_fault.c:329
#5  0xffffffff80bcded7 in vm_fault (map=0xfffff80002000000, vaddr=<value optimized out>, fault_type=1 '\001',
    fault_flags=0) at /usr/src/sys/vm/vm_fault.c:273
#6  0xffffffff80d5612c in trap_pfault (frame=0xfffffe0c56854320, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:757
#7  0xffffffff80d558fa in trap (frame=0xfffffe0c56854320) at /usr/src/sys/amd64/amd64/trap.c:447
#8  0xffffffff80d3b8d2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#9  0xffffffff81a494c5 in zap_leaf_array_create (l=0xfffff8008b3a3b00, buf=<value optimized out>, integer_size=1,
    num_integers=<value optimized out>)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:198
#10 0xffffffff81a49a5a in zap_entry_create (l=0xfffff8008b3a3b00, zn=0xfffff8008bf19e00, cd=Cannot access memory at address 0x0
)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:644
#11 0xffffffff81a45eee in fzap_add_cd (zn=0xfffff8008bf19e00, integer_size=<value optimized out>, num_integers=1,
    val=0xfffffe0c568546d0, cd=4294967295, tx=0xfffff80787758c00)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:814
#12 0xffffffff81a4c051 in zap_add (os=<value optimized out>, zapobj=<value optimized out>, key=<value optimized out>,
    integer_size=<value optimized out>, num_integers=<value optimized out>, val=<value optimized out>,
    tx=<value optimized out>)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:1014
#13 0xffffffff81a6bfb5 in zfs_link_create (dl=0xfffff800966cce00, zp=<value optimized out>, tx=0xfffff80787758c00,
    flag=<value optimized out>)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:767
#14 0xffffffff81a87a34 in zfs_freebsd_rename (ap=<value optimized out>)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4067
#15 0xffffffff80e81e1b in VOP_RENAME_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:1546
#16 0xffffffff80a03476 in kern_renameat (td=<value optimized out>, oldfd=<value optimized out>,
    old=<value optimized out>, newfd=<value optimized out>, new=<value optimized out>, pathseg=<value optimized out>)
    at vnode_if.h:636
#17 0xffffffff80d5694f in amd64_syscall (td=0xfffff8048f7fd000, traced=0) at subr_syscall.c:141
#18 0xffffffff80d3bbbb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#19 0x000000080381fbaa in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
Comment 1 David Gilbert 2016-08-03 21:17:31 UTC
I'm struggling with kgdb and the source.  But from what I can tell, this may be happening with a rename() kernel call operating on a directory of 350k files that icinga2's statistics addon had manage to generate.

Wow.  That's big as directories go.
Comment 2 David Gilbert 2016-08-04 05:03:48 UTC
A little more data.  There are two directories:

/var/spool/icinga2/tmp and /var/spool/icinga2/perfdata

(technically, /jail/icinga exist below that ... both filesystems, but the rest of this is one filesystem)

in tmp are two files: host-perfdata and service-perfdata

each interval (looks like every minute), these files are renamed into the perfdata directory with a date-time string inserted into the name.

-rw-r--r--  1 183  icinga   4130 Aug  3 16:42 host-perfdata.1470256929
-rw-r--r--  1 183  icinga   5298 Aug  3 16:42 service-perfdata.1470256959

(as an example of two files in perfdata.)

so... with the 350k files in the directory, 2 renames firing each minute, the crash happens about once every 12-ish hours.
Comment 3 David Gilbert 2016-08-05 15:45:28 UTC
I got another crash with a slightly different signature.  Still ending in the same stuff:

(kgdb) bt
#0  doadump (textdump=1) at ../../../kern/kern_shutdown.c:298
#1  0xffffffff80df6d9a in kern_reboot (howto=260) at ../../../kern/kern_shutdown.c:486
#2  0xffffffff80df770c in vpanic (fmt=0xffffffff81796b99 "vm_fault: fault on nofault entry, addr: %lx",
    ap=0xfffffe0c563a2560) at ../../../kern/kern_shutdown.c:889
#3  0xffffffff80df74d0 in panic (fmt=0xffffffff81796b99 "vm_fault: fault on nofault entry, addr: %lx")
    at ../../../kern/kern_shutdown.c:818
#4  0xffffffff8126b4d6 in vm_fault_hold (map=0xfffff80003000000, vaddr=18446741878753259520, fault_type=1 '\001',
    fault_flags=0, m_hold=0x0) at ../../../vm/vm_fault.c:329
#5  0xffffffff8126b259 in vm_fault (map=0xfffff80003000000, vaddr=18446741878753259520, fault_type=1 '\001',
    fault_flags=0) at ../../../vm/vm_fault.c:273
#6  0xffffffff8142001a in trap_pfault (frame=0xfffffe0c563a2e60, usermode=0) at ../../../amd64/amd64/trap.c:757
#7  0xffffffff8141ed93 in trap (frame=0xfffffe0c563a2e60) at ../../../amd64/amd64/trap.c:447
#8  0xffffffff813f78d2 in calltrap () at ../../../amd64/amd64/exception.S:236
#9  0xffffffff822daba7 in zap_leaf_chunk_alloc (l=0xfffff80566424b00)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:198
#10 0xffffffff822da1d8 in zap_leaf_array_create (l=0xfffff80566424b00,
    buf=0xfffff80848a09a30 "service-perfdata.1470234364", integer_size=1, num_integers=28)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:239
#11 0xffffffff822daa39 in zap_entry_create (l=0xfffff80566424b00, zn=0xfffff8001ca1c200, cd=0, integer_size=8 '\b',
    num_integers=1, buf=0xfffffe0c563a33a8, zeh=0xfffffe0c563a3100)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_leaf.c:644
#12 0xffffffff822d4d1d in fzap_add_cd (zn=0xfffff8001ca1c200, integer_size=8, num_integers=1, val=0xfffffe0c563a33a8,
    cd=4294967295, tx=0xfffff805f57b1300)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:814
#13 0xffffffff822d5427 in fzap_add (zn=0xfffff8001ca1c200, integer_size=8, num_integers=1, val=0xfffffe0c563a33a8,
    tx=0xfffff805f57b1300) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap.c:841
#14 0xffffffff822de5d1 in zap_add (os=0xfffff8002d0bac00, zapobj=60627,
    key=0xfffff80848a09a30 "service-perfdata.1470234364", integer_size=8, num_integers=1, val=0xfffffe0c563a33a8,
    tx=0xfffff805f57b1300)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zap_micro.c:1008
#15 0xffffffff82313c59 in zfs_link_create (dl=0xfffff80848a09a00, zp=0xfffff806f6a4cb80, tx=0xfffff805f57b1300,
    flag=16) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:767
#16 0xffffffff8233f5fb in zfs_rename (sdvp=0xfffff804d7613938, snm=0xfffff8001fa3e817 "service-perfdata",
    tdvp=0xfffff804d7613588, tnm=0xfffff8001fa8cc1c "service-perfdata.1470234364", cr=0xfffff8049f296e00, ct=0x0,
    flags=0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:4067
#17 0xffffffff82339aaf in zfs_freebsd_rename (ap=0xfffffe0c563a3628)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6513
#18 0xffffffff8161877b in VOP_RENAME_APV (vop=0xffffffff823c75f0, a=0xfffffe0c563a3628) at vnode_if.c:1546
#19 0xffffffff80f22519 in VOP_RENAME (fdvp=0xfffff804d7613938, fvp=0xfffff809a51ddb10, fcnp=0xfffffe0c563a38b0,
    tdvp=0xfffff804d7613588, tvp=0x0, tcnp=0xfffffe0c563a37e0) at vnode_if.h:636
#20 0xffffffff80f223a3 in kern_renameat (td=0xfffff804258d5960, oldfd=-100,
    old=0x806c201c0 <Address 0x806c201c0 out of bounds>, newfd=-100,
    new=0x806c51180 <Address 0x806c51180 out of bounds>, pathseg=UIO_USERSPACE) at ../../../kern/vfs_syscalls.c:3752
#21 0xffffffff80f21d63 in kern_rename (td=0xfffff804258d5960, from=0x806c201c0 <Address 0x806c201c0 out of bounds>,
    to=0x806c51180 <Address 0x806c51180 out of bounds>, pathseg=UIO_USERSPACE) at ../../../kern/vfs_syscalls.c:3644
#22 0xffffffff80f21d2a in sys_rename (td=0xfffff804258d5960, uap=0xfffffe0c563a3b98)
    at ../../../kern/vfs_syscalls.c:3621
#23 0xffffffff81421163 in syscallenter (td=0xfffff804258d5960, sa=0xfffffe0c563a3b88) at subr_syscall.c:141
#24 0xffffffff8142073f in amd64_syscall (td=0xfffff804258d5960, traced=0) at ../../../amd64/amd64/trap.c:959
#25 0xffffffff813f7bbb in Xfast_syscall () at ../../../amd64/amd64/exception.S:396
#26 0x000000080381fbaa in ?? ()