Summary: | Kernel panic in getnewvnode (possibly ZFS related) | ||
---|---|---|---|
Product: | Base System | Reporter: | Raimo Niskanen <raimo+freebsd> |
Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
Status: | Closed Overcome By Events | ||
Severity: | Affects Some People | CC: | avg |
Priority: | --- | ||
Version: | 10.3-RELEASE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
Raimo Niskanen
2017-06-12 09:44:50 UTC
If you still have the core file, could you please print *vp in frame 8? It's better to set print pretty before that. Thanks! Nope. I do not have that core file - it has wrapped away long ago... But I have a fresher one (I get a new about every 3 days): sasquatch.otp.ericsson.se dumped core - see /var/crash/vmcore.4 Sat Aug 12 04:18:54 CEST 2017 FreeBSD sasquatch.otp.ericsson.se 10.3-RELEASE-p20 FreeBSD 10.3-RELEASE-p20 #0: Wed Jul 12 03:13:07 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 panic: page fault GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: processor eflags = interrupt enabled, resume, IOPL = 0 current process = 92526 (find) trap number = 12 panic: page fault cpuid = 1 KDB: stack backtrace: #0 0xffffffff8098e810 at kdb_backtrace+0x60 #1 0xffffffff809514e6 at vpanic+0x126 #2 0xffffffff809513b3 at panic+0x43 #3 0xffffffff80d5649b at trap_fatal+0x36b #4 0xffffffff80d5679d at trap_pfault+0x2ed #5 0xffffffff80d55e1a at trap+0x47a #6 0xffffffff80d3bde2 at calltrap+0x8 #7 0xffffffff809f59b0 at getnewvnode+0x1c0 #8 0xffffffff81a51dda at zfs_znode_alloc+0x5a #9 0xffffffff81a52ee2 at zfs_zget+0x2a2 #10 0xffffffff81a6b195 at zfs_dirent_lock+0x4a5 #11 0xffffffff81a6b43d at zfs_dirlook+0x17d #12 0xffffffff81a817f0 at zfs_lookup+0x2e0 #13 0xffffffff81a81c9d at zfs_freebsd_lookup+0x6d #14 0xffffffff80e80f01 at VOP_CACHEDLOOKUP_APV+0xa1 #15 0xffffffff809e4906 at vfs_cache_lookup+0xd6 #16 0xffffffff80e80df1 at VOP_LOOKUP_APV+0xa1 #17 0xffffffff809ed021 at lookup+0x5a1 Uptime: 3d1h11m26s Dumping 1841 out of 12215 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% Reading symbols from /mnt/boot/kernel/zfs.ko.symbols...done. Loaded symbols for /mnt/boot/kernel/zfs.ko.symbols Reading symbols from /mnt/boot/kernel/opensolaris.ko.symbols...done. Loaded symbols for /mnt/boot/kernel/opensolaris.ko.symbols Reading symbols from /mnt/boot/kernel/geom_mirror.ko.symbols...done. Loaded symbols for /mnt/boot/kernel/geom_mirror.ko.symbols Reading symbols from /mnt/boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for /mnt/boot/kernel/fdescfs.ko.symbols Reading symbols from /mnt/boot/kernel/ums.ko.symbols...done. Loaded symbols for /mnt/boot/kernel/ums.ko.symbols Reading symbols from /mnt/boot/kernel/uhid.ko.symbols...done. Loaded symbols for /mnt/boot/kernel/uhid.ko.symbols Reading symbols from /mnt/boot/kernel/pf.ko.symbols...done. Loaded symbols for /mnt/boot/kernel/pf.ko.symbols #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff80951142 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486 #2 0xffffffff80951525 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:889 #3 0xffffffff809513b3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:818 #4 0xffffffff80d5649b in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:858 #5 0xffffffff80d5679d in trap_pfault (frame=0xfffffe034fc3fec0, usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:681 #6 0xffffffff80d55e1a in trap (frame=0xfffffe034fc3fec0) at /usr/src/sys/amd64/amd64/trap.c:447 #7 0xffffffff80d3bde2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236 #8 0xffffffff809f7755 in vholdl (vp=0xfffff801ed8b0ce8) at /usr/src/sys/kern/vfs_subr.c:2468 #9 0xffffffff809f59b0 in getnewvnode (tag=0xffffffff81af1c7b "zfs", mp=0xfffff8002cfae660, vops=0xffffffff81b048e0, vpp=0xfffffe034fc40170) at /usr/src/sys/kern/vfs_subr.c:2163 #10 0xffffffff81a51dda in zfs_znode_alloc (zfsvfs=0xfffff8002ce75000, db=0xfffff803145caaf8, blksz=1536, obj_type=DMU_OT_SA, hdl=0x0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:630 #11 0xffffffff81a52ee2 in zfs_zget (zfsvfs=0xfffff8002ce75000, obj_num=143239, zpp=0xfffffe034fc40328) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1221 #12 0xffffffff81a6b195 in zfs_dirent_lock (dlpp=0xfffffe034fc40330, dzp=<value optimized out>, name=<value optimized out>, zpp=0xfffffe034fc40328, flag=<value optimized out>, direntflags=<value optimized out>, realpnp=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:318 #13 0xffffffff81a6b43d in zfs_dirlook (dzp=0xfffff8003d7525c0, name=0xfffffe034fc40410 "msg-12736-1.txt", vpp=0xfffffe034fc40858, flags=<value optimized out>, deflg=0x0, rpnp=0x0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:426 #14 0xffffffff81a817f0 in zfs_lookup () at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1545 #15 0xffffffff81a81c9d in zfs_freebsd_lookup (ap=0xfffffe034fc40578) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6195 #16 0xffffffff80e80f01 in VOP_CACHEDLOOKUP_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:197 #17 0xffffffff809e4906 in vfs_cache_lookup (ap=<value optimized out>) at vnode_if.h:80 #18 0xffffffff80e80df1 in VOP_LOOKUP_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:129 #19 0xffffffff809ed021 in lookup (ndp=0xfffffe034fc407f8) at vnode_if.h:54 #20 0xffffffff809ec784 in namei (ndp=0xfffffe034fc407f8) at /usr/src/sys/kern/vfs_lookup.c:302 #21 0xffffffff80a00ece in kern_statat_vnhook (td=0xfffff801ed35c000, flag=<value optimized out>, fd=-100, path=0x801022308 <Address 0x801022308 out of bounds>, pathseg=UIO_USERSPACE, sbp=0xfffffe034fc40910, hook=0xfffff801ed8b0da8) at /usr/src/sys/kern/vfs_syscalls.c:2301 #22 0xffffffff80a00ddc in sys_fstatat (td=0xfffff801ed8b0ce8, uap=0xfffffe034fc40a40) at /usr/src/sys/kern/vfs_syscalls.c:2281 #23 0xffffffff80d56e6f in amd64_syscall (td=0xfffff801ed35c000, traced=0) at subr_syscall.c:141 #24 0xffffffff80d3c0cb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #25 0x00000008008b6cba in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) fr 8 #8 0xffffffff809f7755 in vholdl (vp=0xfffff801ed8b0ce8) at /usr/src/sys/kern/vfs_subr.c:2468 2468 TAILQ_INSERT_HEAD(&mp->mnt_activevnodelist, vp, v_actfreelist); (kgdb) set print pretty (kgdb) p *vp $1 = { v_tag = 0xffffffff81af1c7b "zfs", v_op = 0xffffffff81b048e0, v_data = 0x0, v_mount = 0x0, v_nmntvnodes = { tqe_next = 0xfffff801eda4b938, tqe_prev = 0xfffff80118c5c1f8 }, v_un = { vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0 }, v_hashlist = { le_next = 0x0, le_prev = 0xfffffe0000a39748 }, v_cache_src = { lh_first = 0x0 }, v_cache_dst = { tqh_first = 0xfffff800178c6a80, tqh_last = 0xfffff800178c6aa0 }, v_cache_dd = 0x0, v_lock = { lock_object = { lo_name = 0xffffffff81af1c7b "zfs", lo_flags = 117112840, lo_data = 0, lo_witness = 0x0 }, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96 }, v_interlock = { lock_object = { lo_name = 0xffffffff80fe4d4c "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0 }, mtx_lock = 4 }, v_vnlock = 0xfffff801ed8b0d50, v_actfreelist = { tqe_next = 0xfffff80227ca4b10, tqe_prev = 0xfffff80023a5a648 }, v_bufobj = { bo_lock = { lock_object = { lo_name = 0xffffffff80fe4d5c "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0 }, rw_lock = 1 }, bo_ops = 0xffffffff814b7380, bo_object = 0x0, bo_synclist = { le_next = 0x0, le_prev = 0x0 }, bo_private = 0xfffff801ed8b0ce8, __bo_vnode = 0xfffff801ed8b0ce8, bo_clean = { bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff801ed8b0e08 }, bv_root = { pt_root = 0 }, bv_cnt = 0 }, bo_dirty = { bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff801ed8b0e28 }, bv_root = { pt_root = 0 }, bv_cnt = 0 }, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072 }, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = { rl_waiters = { tqh_first = 0x0, tqh_last = 0xfffff801ed8b0e70 }, rl_currdep = 0x0 }, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 1, v_usecount = 0, v_iflag = 512, v_vflag = 0, v_writecount = 0, v_hash = 53603, v_type = VNON } (In reply to Raimo Niskanen from comment #2) Could you please also print *vp->v_bufobj.bo_ops and do x/10a vp->v_bufobj.bo_ops ? Given that the vnode has been through at least once recycle cycle and it is already partially initialized in getnewvnode there is not much of the original data to chew on. Thank you! There are some other things to try. For example, following pointers in v_nmntvnodes on a chance that they still point to valid vnodes on the original mount list. Also, v_cache_dst.tqh_first may happen to point to struct namecache that's not been trashed yet, so it might be possible to find some useful bits there like a pointer to a parent directory vnode. (kgdb) p *vp->v_bufobj.bo_ops $1 = { bop_name = 0xffffffff80fe28f1 "buf_ops_bio", bop_write = 0xffffffff809dc460 <bufwrite>, bop_strategy = 0xffffffff809dc690 <bufstrategy>, bop_sync = 0xffffffff809dc6d0 <bufsync>, bop_bdflush = 0xffffffff809dc710 <bufbdflush> } (kgdb) x/10a vp->v_bufobj.bo_ops 0xffffffff814b7380 <buf_ops_bio>: 0xffffffff80fe28f1 <sun_noname+849> 0xffffffff809dc460 <bufwrite> 0xffffffff814b7390 <buf_ops_bio+16>: 0xffffffff809dc690 <bufstrategy> 0xffffffff809dc6d0 <bufsync> 0xffffffff814b73a0 <buf_ops_bio+32>: 0xffffffff809dc710 <bufbdflush> 0x1 0xffffffff814b73b0 <sysctl___vfs_vmiodirenable>: 0xffffffff8169cd80 <sysctl__vfs_children> 0xffffffff814afa58 <sysctl___vfs_acl_nfs4_old_semantics> 0xffffffff814b73c0 <sysctl___vfs_vmiodirenable+16>: 0xc00400027ffffc90 0xffffffff814b73a8 <vmiodirenable> (kgdb) p v_nmntvnodes No symbol "v_nmntvnodes" in current context. (kgdb) p v_cache_dst.tqh_first No symbol "v_cache_dst" in current context. Something seems to be missing there... Tabbing suggests that these symbols are recognized, but apparently not known. What trick have I missed? (In reply to Raimo Niskanen from comment #5) No trick, they are fields in struct vnode. So, you have to write vp->xxxxx. Sorry. Now I got it. (kgdb) p *vp[0]->v_cache_dst.tqh_first $7 = { nc_hash = { le_next = 0x0, le_prev = 0xfffffe0000ebe650 }, nc_src = { le_next = 0x0, le_prev = 0xfffff802cb689980 }, nc_dst = { tqe_next = 0x0, tqe_prev = 0xfffff801ed8b0d38 }, nc_dvp = 0xfffff802cb689938, nc_vp = 0xfffff801ed8b0ce8, nc_flag = 0 '\0', nc_nlen = 15 '\017', nc_name = 0xfffff800178c6ac2 "msg-19872-1.txt" } What should I look for in the vnodes in this linked list? (kgdb) p vp[0]->v_nmntvnodes.tqe_next $8 = (struct vnode *) 0xfffff801eda4b938 Thank you! Although, I must say that "vp[0]->" looks strange. It would be interesting to print nc_dvp in the struct cache. E.g. p *$7.nc_dvp if you are still in the same gdb session. Also, p *vp->v_nmntvnodes.tqe_next. Yes, well. It is so seldom I use gdb so I am a noob every time. [0] was a misunderstanding with no harm in this case... (kgdb) p *$7.nc_dvp $17 = { v_tag = 0xffffffff81af1c7b "zfs", v_op = 0xffffffff81b048e0, v_data = 0xfffff80256a1a2e0, v_mount = 0xfffff8002cfae660, v_nmntvnodes = { tqe_next = 0xfffff80023a5a588, tqe_prev = 0xfffff8007ae3ad08 }, v_un = { vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0 }, v_hashlist = { le_next = 0x0, le_prev = 0x0 }, v_cache_src = { lh_first = 0xfffff800178c6a80 }, v_cache_dst = { tqh_first = 0xfffff801cce59cb0, tqh_last = 0xfffff801cce59cd0 }, v_cache_dd = 0xfffff801cce59cb0, v_lock = { lock_object = { lo_name = 0xffffffff81af1c7b "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0 }, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96 }, v_interlock = { lock_object = { lo_name = 0xffffffff80fe4d4c "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0 }, mtx_lock = 4 }, v_vnlock = 0xfffff802cb6899a0, v_actfreelist = { tqe_next = 0xfffff80256787760, tqe_prev = 0xfffff802f39e2820 }, v_bufobj = { bo_lock = { lock_object = { lo_name = 0xffffffff80fe4d5c "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0 }, rw_lock = 1 }, bo_ops = 0xffffffff814b7380, bo_object = 0xfffff801cc3e1500, bo_synclist = { le_next = 0x0, le_prev = 0x0 }, bo_private = 0xfffff802cb689938, __bo_vnode = 0xfffff802cb689938, bo_clean = { bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff802cb689a58 }, bv_root = { pt_root = 0 }, bv_cnt = 0 }, bo_dirty = { bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff802cb689a78 }, bv_root = { pt_root = 0 }, bv_cnt = 0 }, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072 }, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = { rl_waiters = { tqh_first = 0x0, tqh_last = 0xfffff802cb689ac0 }, rl_currdep = 0x0 }, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 1, v_usecount = 0, v_iflag = 512, v_vflag = 0, v_writecount = 0, v_hash = 111896, v_type = VDIR } (kgdb) p *vp->v_nmntvnodes.tqe_next $18 = { v_tag = 0xffffffff81af1c7b "zfs", v_op = 0xffffffff81b048e0, v_data = 0xfffff801181e4450, v_mount = 0xfffff8002cfae660, v_nmntvnodes = { tqe_next = 0xfffff8030f9a3938, tqe_prev = 0xfffff802f3fc93d0 }, v_un = { vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0 }, v_hashlist = { le_next = 0x0, le_prev = 0x0 }, v_cache_src = { lh_first = 0xfffff801185a1690 }, v_cache_dst = { tqh_first = 0xfffff8011865a4d0, tqh_last = 0xfffff8011865a4f0 }, v_cache_dd = 0xfffff8011865a4d0, v_lock = { lock_object = { lo_name = 0xffffffff81af1c7b "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0 }, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96 }, v_interlock = { lock_object = { lo_name = 0xffffffff80fe4d4c "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0 }, mtx_lock = 4 }, v_vnlock = 0xfffff801eda4b9a0, v_actfreelist = { tqe_next = 0xfffff8001f187000, tqe_prev = 0xfffff8007e4309f8 }, v_bufobj = { bo_lock = { lock_object = { lo_name = 0xffffffff80fe4d5c "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0 }, rw_lock = 1 }, bo_ops = 0xffffffff814b7380, bo_object = 0xfffff80059ffc000, bo_synclist = { le_next = 0x0, le_prev = 0x0 }, bo_private = 0xfffff801eda4b938, __bo_vnode = 0xfffff801eda4b938, bo_clean = { bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff801eda4ba58 }, bv_root = { pt_root = 0 }, bv_cnt = 0 }, bo_dirty = { bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff801eda4ba78 }, bv_root = { pt_root = 0 }, bv_cnt = 0 }, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072 }, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = { rl_waiters = { tqh_first = 0x0, tqh_last = 0xfffff801eda4bac0 }, rl_currdep = 0x0 }, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 1, v_usecount = 0, v_iflag = 512, v_vflag = 0, v_writecount = 0, v_hash = 143216, v_type = VDIR } Lo and behold! With the following patch: =================================================================== --- sys/kern/vfs_subr.c (revision 322570) +++ sys/kern/vfs_subr.c (working copy) @@ -2449,6 +2449,9 @@ vholdl(struct vnode *vp) ASSERT_VI_LOCKED(vp, "vholdl"); #endif vp->v_holdcnt++; + if (vp->v_holdcnt == 1 && vp->v_type == VBAD) + panic("vhold: reclaimed vnode"); + if ((vp->v_iflag & VI_FREE) == 0) return; VNASSERT(vp->v_holdcnt == 1, vp, ("vholdl: wrong hold count")); I got this panic: =================================================================== sasquatch.otp.ericsson.se dumped core - see /var/crash/vmcore.3 Wed Aug 30 07:18:08 CEST 2017 FreeBSD sasquatch.otp.ericsson.se 10.3-RELEASE-p21 FreeBSD 10.3-RELEASE-p21 #0: Mon Aug 28 11:05:53 CEST 2017 root@sasquatch.otp.ericsson.se:/usr/obj/usr/src/sys/DEBUG amd64 panic: vhold: reclaimed vnode GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: vhold: reclaimed vnode cpuid = 1 KDB: stack backtrace: #0 0xffffffff8098e810 at kdb_backtrace+0x60 #1 0xffffffff809514e6 at vpanic+0x126 #2 0xffffffff809513b3 at panic+0x43 #3 0xffffffff809f77e5 at vholdl+0x145 #4 0xffffffff809f0f40 at dounmount+0x60 #5 0xffffffff81a6dfe4 at zfs_unmount_snap+0x114 #6 0xffffffff81a10663 at dsl_dataset_user_release_impl+0x103 #7 0xffffffff81a10f2c at dsl_dataset_user_release_onexit+0x5c #8 0xffffffff81a79fb6 at zfs_onexit_destroy+0x56 #9 0xffffffff81a70072 at zfsdev_close+0x52 #10 0xffffffff80833e19 at devfs_fpdrop+0xa9 #11 0xffffffff80836585 at devfs_close_f+0x45 #12 0xffffffff80905fa9 at _fdrop+0x29 #13 0xffffffff8090884e at closef+0x21e #14 0xffffffff80906358 at closefp+0x98 #15 0xffffffff80d56e9f at amd64_syscall+0x40f #16 0xffffffff80d3c0fb at Xfast_syscall+0xfb Uptime: 1d19h20m50s (In reply to Raimo Niskanen from comment #10) Interesting! Are you able to associate this stack trace with any activity / commands? Also, could you please print *mp in the dounmount frame? I already see one potential problem, but want to confirm that it is indeed it. It seems this time it was executing a backup script that did a 'zfs send' most probably: zfs send -R -I weekly-2017-08-26_04.25.43--1m zroot@daily-2017-08-31_03.38.14--1w This does not seem to be one of my typical panics, though. The last time this zfs send failed was June 20:th. #0 doadump (textdump=<value optimized out>) at pcpu.h:219 219 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump (textdump=<value optimized out>) at pcpu.h:219 #1 0xffffffff80951142 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486 #2 0xffffffff80951525 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:889 #3 0xffffffff809513b3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:818 #4 0xffffffff809f77e5 in vholdl (vp=<value optimized out>) at /usr/src/sys/kern/vfs_subr.c:2453 #5 0xffffffff809f0f40 in dounmount (mp=0xfffff8003414a660, flags=524288, td=0xfffff8002ca3b000) at /usr/src/sys/kern/vfs_mount.c:1223 #6 0xffffffff81a6dfe4 in zfs_unmount_snap (snapname=<value optimized out>) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:3485 #7 0xffffffff81a10663 in dsl_dataset_user_release_impl (holds=0xfffff801d2054740, errlist=0x0, tmpdp=0xfffff8002c5cb000) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c:581 #8 0xffffffff81a10f2c in dsl_dataset_user_release_onexit (arg=0xfffff80034930600) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c:629 #9 0xffffffff81a79fb6 in zfs_onexit_destroy (zo=0xfffff8020ee82140) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_onexit.c:93 #10 0xffffffff81a70072 in zfsdev_close (data=0x2) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:5995 #11 0xffffffff80833e19 in devfs_fpdrop (fp=<value optimized out>) at /usr/src/sys/fs/devfs/devfs_vnops.c:186 #12 0xffffffff80836585 in devfs_close_f (fp=<value optimized out>, td=<value optimized out>) at /usr/src/sys/fs/devfs/devfs_vnops.c:646 #13 0xffffffff80905fa9 in _fdrop (fp=0xfffff8002cf50050, td=0x0) at file.h:344 #14 0xffffffff8090884e in closef (fp=<value optimized out>, td=<value optimized out>) at /usr/src/sys/kern/kern_descrip.c:2339 #15 0xffffffff80906358 in closefp (fdp=0xfffff801c76f1800, fd=<value optimized out>, fp=0xfffff8002cf50050, td=0xfffff8002ca3b000, holdleaders=<value optimized out>) at /usr/src/sys/kern/kern_descrip.c:1195 #16 0xffffffff80d56e9f in amd64_syscall (td=0xfffff8002ca3b000, traced=0) at subr_syscall.c:141 #17 0xffffffff80d3c0fb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #18 0x0000000801a05f3a in ?? () Previous frame inner to this frame (corrupt stack?) Current language: auto; currently minimal (kgdb) fr 5 #5 0xffffffff809f0f40 in dounmount (mp=0xfffff8003414a660, flags=524288, td=0xfffff8002ca3b000) at /usr/src/sys/kern/vfs_mount.c:1223 1223 vholdl(coveredvp); (kgdb) set print pretty (kgdb) p *mp $1 = { mnt_mtx = { lock_object = { lo_name = 0xffffffff80fe41c1 "struct mount mtx", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0 }, mtx_lock = 4 }, mnt_gen = 1, mnt_list = { tqe_next = 0xfffff8003414a330, tqe_prev = 0xfffff8003414a9b8 }, mnt_op = 0xffffffff81b047c8, mnt_vfc = 0xffffffff81b04780, mnt_vnodecovered = 0xfffff8017144e588, mnt_syncer = 0x0, mnt_ref = 1, mnt_nvnodelist = { tqh_first = 0x0, tqh_last = 0xfffff8003414a6c0 }, mnt_nvnodelistsize = 0, mnt_activevnodelist = { tqh_first = 0x0, tqh_last = 0xfffff8003414a6d8 }, mnt_activevnodelistsize = 0, mnt_writeopcount = 0, mnt_kern_flag = 1073742016, mnt_flag = 276828185, mnt_opt = 0xfffff800842bc9a0, mnt_optnew = 0x0, mnt_maxsymlinklen = 0, mnt_stat = { f_version = 537068824, f_type = 222, f_flags = 0, f_bsize = 512, f_iosize = 131072, f_blocks = 17666176, f_bfree = 7687944, f_bavail = 7687944, f_files = 7962179, f_ffree = 7687944, f_syncwrites = 0, f_asyncwrites = 0, f_syncreads = 0, f_asyncreads = 0, f_spare = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, f_namemax = 255, f_owner = 0, f_fsid = { val = {1516185005, -21157410} }, f_charspare = '\0' <repeats 79 times>, f_fstypename = "zfs", '\0' <repeats 12 times>, f_mntfromname = "zroot/export/otp_support@weekly-2017-08-26_04.25.43--1m", '\0' <repeats 32 times>, f_mntonname = "/export/otp_support/.zfs/snapshot/weekly-2017-08-26_04.25.43--1m", '\0' <repeats 23 times> }, mnt_cred = 0xfffff80181c3a500, mnt_data = 0xfffff800aa443000, mnt_time = 0, mnt_iosize_max = 65536, mnt_export = 0x0, mnt_label = 0x0, mnt_hashseed = 72392620, mnt_lockref = 0, mnt_secondary_writes = 0, mnt_secondary_accwrites = 0, mnt_susp_owner = 0x0, mnt_gjprovider = 0x0, mnt_explock = { lock_object = { lo_name = 0xffffffff80fc9c44 "explock", lo_flags = 108199936, lo_data = 0, lo_witness = 0x0 }, lk_lock = 1, lk_exslpfail = 0, lk_timo = 0, lk_pri = 96 }, mnt_upper_link = { tqe_next = 0x0, tqe_prev = 0x0 }, mnt_uppers = { tqh_first = 0x0, tqh_last = 0xfffff8003414a980 } } (In reply to Raimo Niskanen from comment #13) Maybe there will be another kind of panic from the debug code... Meanwhile, it seems that my original theory about this panic was wrong. Could you please print *mp->mnt_vnodecovered as well? (kgdb) p *mp->mnt_vnodecovered $2 = { v_tag = 0xffffffff80ea8ace "none", v_op = 0xffffffff8148eb18, v_data = 0x0, v_mount = 0x0, v_nmntvnodes = { tqe_next = 0xfffff801d028c588, tqe_prev = 0xfffff80250a29d08 }, v_un = { vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0 }, v_hashlist = { le_next = 0x0, le_prev = 0xfffffe0000b80be0 }, v_cache_src = { lh_first = 0x0 }, v_cache_dst = { tqh_first = 0x0, tqh_last = 0xfffff8017144e5d8 }, v_cache_dd = 0x0, v_lock = { lock_object = { lo_name = 0xffffffff81af1c7b "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0 }, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96 }, v_interlock = { lock_object = { lo_name = 0xffffffff80fe4da3 "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0 }, mtx_lock = 18446735278365454336 }, v_vnlock = 0xfffff8017144e5f0, v_actfreelist = { tqe_next = 0xfffff801d028c588, tqe_prev = 0xffffffff81623330 }, v_bufobj = { bo_lock = { lock_object = { lo_name = 0xffffffff80fe4db3 "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0 }, rw_lock = 1 }, bo_ops = 0xffffffff814b73e0, bo_object = 0x0, bo_synclist = { le_next = 0x0, le_prev = 0x0 }, bo_private = 0xfffff8017144e588, __bo_vnode = 0xfffff8017144e588, bo_clean = { bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff8017144e6a8 }, bv_root = { pt_root = 0 }, bv_cnt = 0 }, bo_dirty = { bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff8017144e6c8 }, bv_root = { pt_root = 0 }, bv_cnt = 0 }, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072 }, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = { rl_waiters = { tqh_first = 0x0, tqh_last = 0xfffff8017144e710 }, rl_currdep = 0x0 }, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 1, v_usecount = 0, v_iflag = 0, v_vflag = 0, v_writecount = 0, v_hash = 119749, v_type = VBAD } (In reply to Raimo Niskanen from comment #15) Okay, I believe that the snapshot problem is already fixed in the newer ZFS code (e.g. in stable/10, 11.1 and FreeBSD 10.4 betas). Let's hope that the other (original) problem can be caught as well. Does that mean that we now wait for a different panic? Or should I upgrade the server first? (In reply to Raimo Niskanen from comment #17) If you can afford to upgrade the system that would be the best. Please re-apply the debug patch in that case. Otherwise, we would have to wait for a different panic. Upgrade to what version? (the shorter the better right now) (In reply to Raimo Niskanen from comment #19) Whichever is more comfortable for you. 11.1 or 10.4 beta aka stable/10. I do not know how to upgrade to stable/10 or 10.4 beta with freebsd-upgrade... And have locally compiled applications that do not know if I can easily recompile for 11.1. I just got another one in zfs_unmount_snap. This machine can not complete my backup script now. It seems this hides the bug we are hunting, or the bug we are hunting is a secondary problem to this bug... Do you know when 10.4 is due? I wonder if it is an option to wait for 10.4 to be released? If not, suppose I build and install 10.4-beta from source - is it easy to upgrade to 10.4-release or 11.1-release from that using freebsd-update? (In reply to Raimo Niskanen from comment #22) I think that 10.4 beta build has already started (or is about to start), so I expect that they will be published soon. Another possibility is to temporarily disable the backups while the hunt for the other problem is going... I will disable backups, but fear that might stop provoking the bug. If so I will wait for 10.4. (In reply to Raimo Niskanen from comment #24) I am now starting to think that the snapshot unmount issue was _the_ the original problem as well. It's just that the problematic access was not caught until later. But let's see. Uptime now 5 days. New record! I guess the bug will not be provoked without my backup snapshots. I'll wait for 10.4, then, before being able to take backups again, unfortunately the release plan says announcement is due 3:d of October... Strangely enough I have a different server, also with 10.3 on which this has never happened. They are very alike, especially the way both send zfs recursive zroot mirror with snapshots. Can you say what may provoke the bug on this particular machine? Any news on this issue? Have you been able to upgrade to 10.4? Thanks! I take it that 10.4 is out, then :-) I polled a few times but have not noticed the release. Thank you for the notification! I'll upgrade shortly. Upgraded to 10.4. Uptime before that 54 days without a glitch. Monthly ZFS stream backup script for October now running (26 days late :-). Wating with antici..............pation! The monthly backup script completed without kernel panic. I have enabled the crontab backups. All back to normal operation. If I get a week without kernel panic then the problem is definitely solved! Good news so far. Thanks! Let's see if it holds. We can declare this bug as fixed now. No sign of trouble. Thank you! |