Bug 219935

Summary:	Kernel panic in getnewvnode (possibly ZFS related)
Product:	Base System	Reporter:	Raimo Niskanen <raimo+freebsd>
Component:	kern	Assignee:	freebsd-bugs (Nobody) <bugs>
Status:	Closed Overcome By Events
Severity:	Affects Some People	CC:	avg
Priority:	---
Version:	10.3-RELEASE
Hardware:	amd64
OS:	Any

Description Raimo Niskanen 2017-06-12 09:44:50 UTC

I have a server that crashes with a page fault every 3-4 days in ZFS when allocating a new vnode.

The crashes mostly happens during cron daily, often in find, which probably is from periodic's setuid_status_chksetuid_enable.

The server has also crashed once from nfscl_nget, and once from vfs_msync.

freebsd-version: 10.3-RELEASE-p19


Extracted from the latest crash - /var/crash/cron.txt.8:

sasquatch.otp.ericsson.se dumped core - see /var/crash/vmcore.8

Sun Jun 11 03:04:33 CEST 2017

FreeBSD sasquatch.otp.ericsson.se 10.3-RELEASE-p18 FreeBSD 10.3-RELEASE-p18 #0: Tue Apr 11 10:31:00 UTC 2017     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

panic: page fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
trap number             = 12
panic: page fault
cpuid = 3
KDB: stack backtrace:
#0 0xffffffff8098e7e0 at kdb_backtrace+0x60
#1 0xffffffff809514b6 at vpanic+0x126
#2 0xffffffff80951383 at panic+0x43
#3 0xffffffff80d5646b at trap_fatal+0x36b
#4 0xffffffff80d5676d at trap_pfault+0x2ed
#5 0xffffffff80d55dea at trap+0x47a
#6 0xffffffff80d3bdb2 at calltrap+0x8
#7 0xffffffff809f5980 at getnewvnode+0x1c0
#8 0xffffffff81a51dda at zfs_znode_alloc+0x5a
#9 0xffffffff81a52ee2 at zfs_zget+0x2a2
#10 0xffffffff81a6b195 at zfs_dirent_lock+0x4a5
#11 0xffffffff81a6b43d at zfs_dirlook+0x17d
#12 0xffffffff81a817f0 at zfs_lookup+0x2e0
#13 0xffffffff81a81c9d at zfs_freebsd_lookup+0x6d
#14 0xffffffff80e80bc1 at VOP_CACHEDLOOKUP_APV+0xa1
#15 0xffffffff809e48d6 at vfs_cache_lookup+0xd6
#16 0xffffffff80e80ab1 at VOP_LOOKUP_APV+0xa1
#17 0xffffffff809ecff1 at lookup+0x5a1
Uptime: 2d23h57m24s
Dumping 1889 out of 12215 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /boot/kernel/zfs.ko.symbols
Reading symbols from /boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /boot/kernel/opensolaris.ko.symbols
Reading symbols from /boot/kernel/geom_mirror.ko.symbols...done.
Loaded symbols for /boot/kernel/geom_mirror.ko.symbols
Reading symbols from /boot/kernel/fdescfs.ko.symbols...done.
Loaded symbols for /boot/kernel/fdescfs.ko.symbols
Reading symbols from /boot/kernel/ums.ko.symbols...done.
Loaded symbols for /boot/kernel/ums.ko.symbols
Reading symbols from /boot/kernel/pf.ko.symbols...done.
Loaded symbols for /boot/kernel/pf.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219     pcpu.h: No such file or directory.
        in pcpu.h
(kgdb) #0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80951112 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:486
#2  0xffffffff809514f5 in vpanic (fmt=<value optimized out>, 
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:889
#3  0xffffffff80951383 in panic (fmt=0x0)
    at /usr/src/sys/kern/kern_shutdown.c:818
#4  0xffffffff80d5646b in trap_fatal (frame=<value optimized out>, 
    eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:858
#5  0xffffffff80d5676d in trap_pfault (frame=0xfffffe034f87aec0, 
    usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:681
#6  0xffffffff80d55dea in trap (frame=0xfffffe034f87aec0)
    at /usr/src/sys/amd64/amd64/trap.c:447
#7  0xffffffff80d3bdb2 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:236
#8  0xffffffff809f7725 in vholdl (vp=0xfffff801f6b32938)
    at /usr/src/sys/kern/vfs_subr.c:2468
#9  0xffffffff809f5980 in getnewvnode (tag=0xffffffff81af1c7b "zfs", 
    mp=0xfffff8002cf28330, vops=0xffffffff81b048e0, vpp=0xfffffe034f87b170)
    at /usr/src/sys/kern/vfs_subr.c:2163
#10 0xffffffff81a51dda in zfs_znode_alloc (zfsvfs=0xfffff8002cf29000, 
    db=0xfffff801abbbed80, blksz=512, obj_type=DMU_OT_SA, hdl=0x0)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:630
#11 0xffffffff81a52ee2 in zfs_zget (zfsvfs=0xfffff8002cf29000, obj_num=99903, 
    zpp=0xfffffe034f87b328)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1221
#12 0xffffffff81a6b195 in zfs_dirent_lock (dlpp=0xfffffe034f87b330, 
    dzp=<value optimized out>, name=<value optimized out>, 
    zpp=0xfffffe034f87b328, flag=<value optimized out>, 
    direntflags=<value optimized out>, realpnp=<value optimized out>)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:318
#13 0xffffffff81a6b43d in zfs_dirlook (dzp=0xfffff802d55c0170, 
    name=0xfffffe034f87b410 "in00016072", vpp=0xfffffe034f87b858, 
    flags=<value optimized out>, deflg=0x0, rpnp=0x0)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:426
#14 0xffffffff81a817f0 in zfs_lookup ()
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1545
#15 0xffffffff81a81c9d in zfs_freebsd_lookup (ap=0xfffffe034f87b578)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6195
#16 0xffffffff80e80bc1 in VOP_CACHEDLOOKUP_APV (vop=<value optimized out>, 
    a=<value optimized out>) at vnode_if.c:197
#17 0xffffffff809e48d6 in vfs_cache_lookup (ap=<value optimized out>)
    at vnode_if.h:80
#18 0xffffffff80e80ab1 in VOP_LOOKUP_APV (vop=<value optimized out>, 
    a=<value optimized out>) at vnode_if.c:129
#19 0xffffffff809ecff1 in lookup (ndp=0xfffffe034f87b7f8) at vnode_if.h:54
#20 0xffffffff809ec754 in namei (ndp=0xfffffe034f87b7f8)
    at /usr/src/sys/kern/vfs_lookup.c:302
#21 0xffffffff80a00e9e in kern_statat_vnhook (td=0xfffff8002ca9b000, 
    flag=<value optimized out>, fd=-100, 
    path=0x801118088 <Address 0x801118088 out of bounds>, 
    pathseg=UIO_USERSPACE, sbp=0xfffffe034f87b910, hook=0xfffff801f6b329f8)
    at /usr/src/sys/kern/vfs_syscalls.c:2301
#22 0xffffffff80a00dac in sys_fstatat (td=0xfffff801f6b32938, 
    uap=0xfffffe034f87ba40) at /usr/src/sys/kern/vfs_syscalls.c:2281
#23 0xffffffff80d56e3f in amd64_syscall (td=0xfffff8002ca9b000, traced=0)
    at subr_syscall.c:141
#24 0xffffffff80d3c09b in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:396
#25 0x00000008008b6cba in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb) 

:
------------------------------------------------------------------------
fstat

fstat: can't read file 1 at 0x200007fffffffff
fstat: can't read file 2 at 0x4000000001fffff
fstat: can't read znode_phys at 0x1
fstat: can't read znode_phys at 0x1
fstat: can't read znode_phys at 0x1
fstat: can't read file 1 at 0x200007fffffffff
fstat: can't read file 2 at 0x4000000001fffff
fstat: can't read znode_phys at 0x1
fstat: can't read znode_phys at 0x1
fstat: can't read znode_phys at 0x1
fstat: can't read file 1 at 0x200007fffffffff
fstat: can't read file 2 at 0x4000000001fffff
fstat: can't read file 4 at 0x780000ffff
fstat: can't read file 5 at 0xc00000000
fstat: can't read znode_phys at 0x1
fstat: can't read znode_phys at 0x1
fstat: can't read znode_phys at 0x1
:
USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
root     cat        90061 root -         -       error    -
root     cat        90061   wd -         -       error    -
root     cat        90061 text -         -       error    -
root     cat        90061    0* pipe fffff801b28af2e8 <-> fffff801b28af448      0 rw
root     sh         90059 root -         -       error    -
root     sh         90059   wd -         -       error    -
root     sh         90059 text -         -       error    -
root     sh         90059    0* pipe fffff801b28af2e8 <-> fffff801b28af448      0 rw
root     find       90058 root -         -       error    -
root     find       90058   wd -         -       error    -
root     find       90058 text -         -       error    -
root     find       90058    0 /dev         30 crw-rw-rw-    null  r
root     sh         90054 root -         -       error    -
root     sh         90054   wd -         -       error    -
root     sh         90054 text -         -       error    -
root     sh         90054    0 /dev         30 crw-rw-rw-    null  r
root     sh         90054    6 -         -       error    -
root     mail       90053 root -         -       error    -
root     mail       90053   wd -         -       error    -
root     mail       90053 text -         -       error    -
root     mail       90053    0* pipe fffff801f60385d0 <-> fffff801f6038730      0 rw

Comment 1 Andriy Gapon freebsd_committer

2017-08-22 13:05:45 UTC

If you still have the core file, could you please print *vp in frame 8?
It's better to set print pretty before that.
Thanks!

Comment 2 Raimo Niskanen 2017-08-25 09:31:29 UTC

Nope. I do not have that core file - it has wrapped away long ago...

But I have a fresher one (I get a new about every 3 days):

sasquatch.otp.ericsson.se dumped core - see /var/crash/vmcore.4

Sat Aug 12 04:18:54 CEST 2017

FreeBSD sasquatch.otp.ericsson.se 10.3-RELEASE-p20 FreeBSD 10.3-RELEASE-p20 #0: Wed Jul 12 03:13:07 UTC 2017     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

panic: page fault

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 92526 (find)
trap number		= 12
panic: page fault
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff8098e810 at kdb_backtrace+0x60
#1 0xffffffff809514e6 at vpanic+0x126
#2 0xffffffff809513b3 at panic+0x43
#3 0xffffffff80d5649b at trap_fatal+0x36b
#4 0xffffffff80d5679d at trap_pfault+0x2ed
#5 0xffffffff80d55e1a at trap+0x47a
#6 0xffffffff80d3bde2 at calltrap+0x8
#7 0xffffffff809f59b0 at getnewvnode+0x1c0
#8 0xffffffff81a51dda at zfs_znode_alloc+0x5a
#9 0xffffffff81a52ee2 at zfs_zget+0x2a2
#10 0xffffffff81a6b195 at zfs_dirent_lock+0x4a5
#11 0xffffffff81a6b43d at zfs_dirlook+0x17d
#12 0xffffffff81a817f0 at zfs_lookup+0x2e0
#13 0xffffffff81a81c9d at zfs_freebsd_lookup+0x6d
#14 0xffffffff80e80f01 at VOP_CACHEDLOOKUP_APV+0xa1
#15 0xffffffff809e4906 at vfs_cache_lookup+0xd6
#16 0xffffffff80e80df1 at VOP_LOOKUP_APV+0xa1
#17 0xffffffff809ed021 at lookup+0x5a1
Uptime: 3d1h11m26s
Dumping 1841 out of 12215 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

Reading symbols from /mnt/boot/kernel/zfs.ko.symbols...done.
Loaded symbols for /mnt/boot/kernel/zfs.ko.symbols
Reading symbols from /mnt/boot/kernel/opensolaris.ko.symbols...done.
Loaded symbols for /mnt/boot/kernel/opensolaris.ko.symbols
Reading symbols from /mnt/boot/kernel/geom_mirror.ko.symbols...done.
Loaded symbols for /mnt/boot/kernel/geom_mirror.ko.symbols
Reading symbols from /mnt/boot/kernel/fdescfs.ko.symbols...done.
Loaded symbols for /mnt/boot/kernel/fdescfs.ko.symbols
Reading symbols from /mnt/boot/kernel/ums.ko.symbols...done.
Loaded symbols for /mnt/boot/kernel/ums.ko.symbols
Reading symbols from /mnt/boot/kernel/uhid.ko.symbols...done.
Loaded symbols for /mnt/boot/kernel/uhid.ko.symbols
Reading symbols from /mnt/boot/kernel/pf.ko.symbols...done.
Loaded symbols for /mnt/boot/kernel/pf.ko.symbols
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80951142 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486
#2  0xffffffff80951525 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:889
#3  0xffffffff809513b3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:818
#4  0xffffffff80d5649b in trap_fatal (frame=<value optimized out>, eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:858
#5  0xffffffff80d5679d in trap_pfault (frame=0xfffffe034fc3fec0, usermode=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:681
#6  0xffffffff80d55e1a in trap (frame=0xfffffe034fc3fec0) at /usr/src/sys/amd64/amd64/trap.c:447
#7  0xffffffff80d3bde2 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:236
#8  0xffffffff809f7755 in vholdl (vp=0xfffff801ed8b0ce8) at /usr/src/sys/kern/vfs_subr.c:2468
#9  0xffffffff809f59b0 in getnewvnode (tag=0xffffffff81af1c7b "zfs", mp=0xfffff8002cfae660, vops=0xffffffff81b048e0, vpp=0xfffffe034fc40170)
    at /usr/src/sys/kern/vfs_subr.c:2163
#10 0xffffffff81a51dda in zfs_znode_alloc (zfsvfs=0xfffff8002ce75000, db=0xfffff803145caaf8, blksz=1536, obj_type=DMU_OT_SA, hdl=0x0)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:630
#11 0xffffffff81a52ee2 in zfs_zget (zfsvfs=0xfffff8002ce75000, obj_num=143239, zpp=0xfffffe034fc40328)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c:1221
#12 0xffffffff81a6b195 in zfs_dirent_lock (dlpp=0xfffffe034fc40330, dzp=<value optimized out>, name=<value optimized out>, zpp=0xfffffe034fc40328, 
    flag=<value optimized out>, direntflags=<value optimized out>, realpnp=<value optimized out>)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:318
#13 0xffffffff81a6b43d in zfs_dirlook (dzp=0xfffff8003d7525c0, name=0xfffffe034fc40410 "msg-12736-1.txt", vpp=0xfffffe034fc40858, 
    flags=<value optimized out>, deflg=0x0, rpnp=0x0) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c:426
#14 0xffffffff81a817f0 in zfs_lookup () at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1545
#15 0xffffffff81a81c9d in zfs_freebsd_lookup (ap=0xfffffe034fc40578)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6195
#16 0xffffffff80e80f01 in VOP_CACHEDLOOKUP_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:197
#17 0xffffffff809e4906 in vfs_cache_lookup (ap=<value optimized out>) at vnode_if.h:80
#18 0xffffffff80e80df1 in VOP_LOOKUP_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:129
#19 0xffffffff809ed021 in lookup (ndp=0xfffffe034fc407f8) at vnode_if.h:54
#20 0xffffffff809ec784 in namei (ndp=0xfffffe034fc407f8) at /usr/src/sys/kern/vfs_lookup.c:302
#21 0xffffffff80a00ece in kern_statat_vnhook (td=0xfffff801ed35c000, flag=<value optimized out>, fd=-100, 
    path=0x801022308 <Address 0x801022308 out of bounds>, pathseg=UIO_USERSPACE, sbp=0xfffffe034fc40910, hook=0xfffff801ed8b0da8)
    at /usr/src/sys/kern/vfs_syscalls.c:2301
#22 0xffffffff80a00ddc in sys_fstatat (td=0xfffff801ed8b0ce8, uap=0xfffffe034fc40a40) at /usr/src/sys/kern/vfs_syscalls.c:2281
#23 0xffffffff80d56e6f in amd64_syscall (td=0xfffff801ed35c000, traced=0) at subr_syscall.c:141
#24 0xffffffff80d3c0cb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#25 0x00000008008b6cba in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal


(kgdb) fr 8
#8  0xffffffff809f7755 in vholdl (vp=0xfffff801ed8b0ce8) at /usr/src/sys/kern/vfs_subr.c:2468
2468		TAILQ_INSERT_HEAD(&mp->mnt_activevnodelist, vp, v_actfreelist);
(kgdb) set print pretty
(kgdb) p *vp
$1 = {
  v_tag = 0xffffffff81af1c7b "zfs", 
  v_op = 0xffffffff81b048e0, 
  v_data = 0x0, 
  v_mount = 0x0, 
  v_nmntvnodes = {
    tqe_next = 0xfffff801eda4b938, 
    tqe_prev = 0xfffff80118c5c1f8
  }, 
  v_un = {
    vu_mount = 0x0, 
    vu_socket = 0x0, 
    vu_cdev = 0x0, 
    vu_fifoinfo = 0x0
  }, 
  v_hashlist = {
    le_next = 0x0, 
    le_prev = 0xfffffe0000a39748
  }, 
  v_cache_src = {
    lh_first = 0x0
  }, 
  v_cache_dst = {
    tqh_first = 0xfffff800178c6a80, 
    tqh_last = 0xfffff800178c6aa0
  }, 
  v_cache_dd = 0x0, 
  v_lock = {
    lock_object = {
      lo_name = 0xffffffff81af1c7b "zfs", 
      lo_flags = 117112840, 
      lo_data = 0, 
      lo_witness = 0x0
    }, 
    lk_lock = 1, 
    lk_exslpfail = 0, 
    lk_timo = 51, 
    lk_pri = 96
  }, 
  v_interlock = {
    lock_object = {
      lo_name = 0xffffffff80fe4d4c "vnode interlock", 
      lo_flags = 16973824, 
      lo_data = 0, 
      lo_witness = 0x0
    }, 
    mtx_lock = 4
  }, 
  v_vnlock = 0xfffff801ed8b0d50, 
  v_actfreelist = {
    tqe_next = 0xfffff80227ca4b10, 
    tqe_prev = 0xfffff80023a5a648
  }, 
  v_bufobj = {
    bo_lock = {
      lock_object = {
        lo_name = 0xffffffff80fe4d5c "bufobj interlock", 
        lo_flags = 86179840, 
        lo_data = 0, 
        lo_witness = 0x0
      }, 
      rw_lock = 1
    }, 
    bo_ops = 0xffffffff814b7380, 
    bo_object = 0x0, 
    bo_synclist = {
      le_next = 0x0, 
      le_prev = 0x0
    }, 
    bo_private = 0xfffff801ed8b0ce8, 
    __bo_vnode = 0xfffff801ed8b0ce8, 
    bo_clean = {
      bv_hd = {
        tqh_first = 0x0, 
        tqh_last = 0xfffff801ed8b0e08
      }, 
      bv_root = {
        pt_root = 0
      }, 
      bv_cnt = 0
    }, 
    bo_dirty = {
      bv_hd = {
        tqh_first = 0x0, 
        tqh_last = 0xfffff801ed8b0e28
      }, 
      bv_root = {
        pt_root = 0
      }, 
      bv_cnt = 0
    }, 
    bo_numoutput = 0, 
    bo_flag = 0, 
    bo_bsize = 131072
  }, 
  v_pollinfo = 0x0, 
  v_label = 0x0, 
  v_lockf = 0x0, 
  v_rl = {
    rl_waiters = {
      tqh_first = 0x0, 
      tqh_last = 0xfffff801ed8b0e70
    }, 
    rl_currdep = 0x0
  }, 
  v_cstart = 0, 
  v_lasta = 0, 
  v_lastw = 0, 
  v_clen = 0, 
  v_holdcnt = 1, 
  v_usecount = 0, 
  v_iflag = 512, 
  v_vflag = 0, 
  v_writecount = 0, 
  v_hash = 53603, 
  v_type = VNON
}

Comment 3 Andriy Gapon freebsd_committer

2017-08-25 10:39:30 UTC

(In reply to Raimo Niskanen from comment #2)
Could you please also print *vp->v_bufobj.bo_ops and do x/10a vp->v_bufobj.bo_ops ?
Given that the vnode has been through at least once recycle cycle and it is already partially initialized in getnewvnode there is not much of the original data to chew on. 
Thank you!

Comment 4 Andriy Gapon freebsd_committer

2017-08-25 10:55:25 UTC

There are some other things to try.  For example, following pointers in v_nmntvnodes on a chance that they still point to valid vnodes on the original mount list. Also, v_cache_dst.tqh_first may happen to point to struct  namecache that's not been trashed yet, so it might be possible to find some useful bits there like a pointer to a parent directory vnode.

Comment 5 Raimo Niskanen 2017-08-25 12:33:43 UTC

(kgdb) p *vp->v_bufobj.bo_ops 
$1 = {
  bop_name = 0xffffffff80fe28f1 "buf_ops_bio", 
  bop_write = 0xffffffff809dc460 <bufwrite>, 
  bop_strategy = 0xffffffff809dc690 <bufstrategy>, 
  bop_sync = 0xffffffff809dc6d0 <bufsync>, 
  bop_bdflush = 0xffffffff809dc710 <bufbdflush>
}

(kgdb) x/10a vp->v_bufobj.bo_ops 
0xffffffff814b7380 <buf_ops_bio>:	0xffffffff80fe28f1 <sun_noname+849>	0xffffffff809dc460 <bufwrite>
0xffffffff814b7390 <buf_ops_bio+16>:	0xffffffff809dc690 <bufstrategy>	0xffffffff809dc6d0 <bufsync>
0xffffffff814b73a0 <buf_ops_bio+32>:	0xffffffff809dc710 <bufbdflush>	0x1
0xffffffff814b73b0 <sysctl___vfs_vmiodirenable>:	0xffffffff8169cd80 <sysctl__vfs_children>	0xffffffff814afa58 <sysctl___vfs_acl_nfs4_old_semantics>
0xffffffff814b73c0 <sysctl___vfs_vmiodirenable+16>:	0xc00400027ffffc90	0xffffffff814b73a8 <vmiodirenable>

(kgdb) p v_nmntvnodes
No symbol "v_nmntvnodes" in current context.
(kgdb) p v_cache_dst.tqh_first 
No symbol "v_cache_dst" in current context.

Something seems to be missing there...  Tabbing suggests that these symbols are recognized, but apparently not known.  What trick have I missed?

Comment 6 Andriy Gapon freebsd_committer

2017-08-25 12:36:25 UTC

(In reply to Raimo Niskanen from comment #5)
No trick, they are fields in struct vnode.  So, you have to write vp->xxxxx.

Comment 7 Raimo Niskanen 2017-08-25 12:43:41 UTC

Sorry.  Now I got it.

(kgdb) p *vp[0]->v_cache_dst.tqh_first
$7 = {
  nc_hash = {
    le_next = 0x0, 
    le_prev = 0xfffffe0000ebe650
  }, 
  nc_src = {
    le_next = 0x0, 
    le_prev = 0xfffff802cb689980
  }, 
  nc_dst = {
    tqe_next = 0x0, 
    tqe_prev = 0xfffff801ed8b0d38
  }, 
  nc_dvp = 0xfffff802cb689938, 
  nc_vp = 0xfffff801ed8b0ce8, 
  nc_flag = 0 '\0', 
  nc_nlen = 15 '\017', 
  nc_name = 0xfffff800178c6ac2 "msg-19872-1.txt"
}

What should I look for in the vnodes in this linked list?
(kgdb) p vp[0]->v_nmntvnodes.tqe_next
$8 = (struct vnode *) 0xfffff801eda4b938

Comment 8 Andriy Gapon freebsd_committer

2017-08-25 12:49:08 UTC

Thank you! Although, I must say that "vp[0]->" looks strange.

It would be interesting to print nc_dvp in the struct cache.
E.g. p *$7.nc_dvp if you are still in the same gdb session.  Also, p *vp->v_nmntvnodes.tqe_next.

Comment 9 Raimo Niskanen 2017-08-25 12:53:17 UTC

Yes, well.  It is so seldom I use gdb so I am a noob every time.  [0] was a misunderstanding with no harm in this case...

(kgdb) p *$7.nc_dvp 
$17 = {
  v_tag = 0xffffffff81af1c7b "zfs", 
  v_op = 0xffffffff81b048e0, 
  v_data = 0xfffff80256a1a2e0, 
  v_mount = 0xfffff8002cfae660, 
  v_nmntvnodes = {
    tqe_next = 0xfffff80023a5a588, 
    tqe_prev = 0xfffff8007ae3ad08
  }, 
  v_un = {
    vu_mount = 0x0, 
    vu_socket = 0x0, 
    vu_cdev = 0x0, 
    vu_fifoinfo = 0x0
  }, 
  v_hashlist = {
    le_next = 0x0, 
    le_prev = 0x0
  }, 
  v_cache_src = {
    lh_first = 0xfffff800178c6a80
  }, 
  v_cache_dst = {
    tqh_first = 0xfffff801cce59cb0, 
    tqh_last = 0xfffff801cce59cd0
  }, 
  v_cache_dd = 0xfffff801cce59cb0, 
  v_lock = {
    lock_object = {
      lo_name = 0xffffffff81af1c7b "zfs", 
      lo_flags = 117112832, 
      lo_data = 0, 
      lo_witness = 0x0
    }, 
    lk_lock = 1, 
    lk_exslpfail = 0, 
    lk_timo = 51, 
    lk_pri = 96
  }, 
  v_interlock = {
    lock_object = {
      lo_name = 0xffffffff80fe4d4c "vnode interlock", 
      lo_flags = 16973824, 
      lo_data = 0, 
      lo_witness = 0x0
    }, 
    mtx_lock = 4
  }, 
  v_vnlock = 0xfffff802cb6899a0, 
  v_actfreelist = {
    tqe_next = 0xfffff80256787760, 
    tqe_prev = 0xfffff802f39e2820
  }, 
  v_bufobj = {
    bo_lock = {
      lock_object = {
        lo_name = 0xffffffff80fe4d5c "bufobj interlock", 
        lo_flags = 86179840, 
        lo_data = 0, 
        lo_witness = 0x0
      }, 
      rw_lock = 1
    }, 
    bo_ops = 0xffffffff814b7380, 
    bo_object = 0xfffff801cc3e1500, 
    bo_synclist = {
      le_next = 0x0, 
      le_prev = 0x0
    }, 
    bo_private = 0xfffff802cb689938, 
    __bo_vnode = 0xfffff802cb689938, 
    bo_clean = {
      bv_hd = {
        tqh_first = 0x0, 
        tqh_last = 0xfffff802cb689a58
      }, 
      bv_root = {
        pt_root = 0
      }, 
      bv_cnt = 0
    }, 
    bo_dirty = {
      bv_hd = {
        tqh_first = 0x0, 
        tqh_last = 0xfffff802cb689a78
      }, 
      bv_root = {
        pt_root = 0
      }, 
      bv_cnt = 0
    }, 
    bo_numoutput = 0, 
    bo_flag = 0, 
    bo_bsize = 131072
  }, 
  v_pollinfo = 0x0, 
  v_label = 0x0, 
  v_lockf = 0x0, 
  v_rl = {
    rl_waiters = {
      tqh_first = 0x0, 
      tqh_last = 0xfffff802cb689ac0
    }, 
    rl_currdep = 0x0
  }, 
  v_cstart = 0, 
  v_lasta = 0, 
  v_lastw = 0, 
  v_clen = 0, 
  v_holdcnt = 1, 
  v_usecount = 0, 
  v_iflag = 512, 
  v_vflag = 0, 
  v_writecount = 0, 
  v_hash = 111896, 
  v_type = VDIR
}

(kgdb) p *vp->v_nmntvnodes.tqe_next
$18 = {
  v_tag = 0xffffffff81af1c7b "zfs", 
  v_op = 0xffffffff81b048e0, 
  v_data = 0xfffff801181e4450, 
  v_mount = 0xfffff8002cfae660, 
  v_nmntvnodes = {
    tqe_next = 0xfffff8030f9a3938, 
    tqe_prev = 0xfffff802f3fc93d0
  }, 
  v_un = {
    vu_mount = 0x0, 
    vu_socket = 0x0, 
    vu_cdev = 0x0, 
    vu_fifoinfo = 0x0
  }, 
  v_hashlist = {
    le_next = 0x0, 
    le_prev = 0x0
  }, 
  v_cache_src = {
    lh_first = 0xfffff801185a1690
  }, 
  v_cache_dst = {
    tqh_first = 0xfffff8011865a4d0, 
    tqh_last = 0xfffff8011865a4f0
  }, 
  v_cache_dd = 0xfffff8011865a4d0, 
  v_lock = {
    lock_object = {
      lo_name = 0xffffffff81af1c7b "zfs", 
      lo_flags = 117112832, 
      lo_data = 0, 
      lo_witness = 0x0
    }, 
    lk_lock = 1, 
    lk_exslpfail = 0, 
    lk_timo = 51, 
    lk_pri = 96
  }, 
  v_interlock = {
    lock_object = {
      lo_name = 0xffffffff80fe4d4c "vnode interlock", 
      lo_flags = 16973824, 
      lo_data = 0, 
      lo_witness = 0x0
    }, 
    mtx_lock = 4
  }, 
  v_vnlock = 0xfffff801eda4b9a0, 
  v_actfreelist = {
    tqe_next = 0xfffff8001f187000, 
    tqe_prev = 0xfffff8007e4309f8
  }, 
  v_bufobj = {
    bo_lock = {
      lock_object = {
        lo_name = 0xffffffff80fe4d5c "bufobj interlock", 
        lo_flags = 86179840, 
        lo_data = 0, 
        lo_witness = 0x0
      }, 
      rw_lock = 1
    }, 
    bo_ops = 0xffffffff814b7380, 
    bo_object = 0xfffff80059ffc000, 
    bo_synclist = {
      le_next = 0x0, 
      le_prev = 0x0
    }, 
    bo_private = 0xfffff801eda4b938, 
    __bo_vnode = 0xfffff801eda4b938, 
    bo_clean = {
      bv_hd = {
        tqh_first = 0x0, 
        tqh_last = 0xfffff801eda4ba58
      }, 
      bv_root = {
        pt_root = 0
      }, 
      bv_cnt = 0
    }, 
    bo_dirty = {
      bv_hd = {
        tqh_first = 0x0, 
        tqh_last = 0xfffff801eda4ba78
      }, 
      bv_root = {
        pt_root = 0
      }, 
      bv_cnt = 0
    }, 
    bo_numoutput = 0, 
    bo_flag = 0, 
    bo_bsize = 131072
  }, 
  v_pollinfo = 0x0, 
  v_label = 0x0, 
  v_lockf = 0x0, 
  v_rl = {
    rl_waiters = {
      tqh_first = 0x0, 
      tqh_last = 0xfffff801eda4bac0
    }, 
    rl_currdep = 0x0
  }, 
  v_cstart = 0, 
  v_lasta = 0, 
  v_lastw = 0, 
  v_clen = 0, 
  v_holdcnt = 1, 
  v_usecount = 0, 
  v_iflag = 512, 
  v_vflag = 0, 
  v_writecount = 0, 
  v_hash = 143216, 
  v_type = VDIR
}

Comment 10 Raimo Niskanen 2017-08-30 14:08:31 UTC

Lo and behold!

With the following patch:
===================================================================
--- sys/kern/vfs_subr.c (revision 322570)
+++ sys/kern/vfs_subr.c (working copy)
@@ -2449,6 +2449,9 @@ vholdl(struct vnode *vp)
                ASSERT_VI_LOCKED(vp, "vholdl");
 #endif
        vp->v_holdcnt++;
+       if (vp->v_holdcnt == 1 && vp->v_type == VBAD)
+               panic("vhold: reclaimed vnode");
+
        if ((vp->v_iflag & VI_FREE) == 0)
                return;
        VNASSERT(vp->v_holdcnt == 1, vp, ("vholdl: wrong hold count"));


I got this panic:
===================================================================
sasquatch.otp.ericsson.se dumped core - see /var/crash/vmcore.3

Wed Aug 30 07:18:08 CEST 2017

FreeBSD sasquatch.otp.ericsson.se 10.3-RELEASE-p21 FreeBSD 10.3-RELEASE-p21 #0: Mon Aug 28 11:05:53 CEST 2017     root@sasquatch.otp.ericsson.se:/usr/obj/usr/src/sys/DEBUG  amd64

panic: vhold: reclaimed vnode

GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: vhold: reclaimed vnode
cpuid = 1
KDB: stack backtrace:
#0 0xffffffff8098e810 at kdb_backtrace+0x60
#1 0xffffffff809514e6 at vpanic+0x126
#2 0xffffffff809513b3 at panic+0x43
#3 0xffffffff809f77e5 at vholdl+0x145
#4 0xffffffff809f0f40 at dounmount+0x60
#5 0xffffffff81a6dfe4 at zfs_unmount_snap+0x114
#6 0xffffffff81a10663 at dsl_dataset_user_release_impl+0x103
#7 0xffffffff81a10f2c at dsl_dataset_user_release_onexit+0x5c
#8 0xffffffff81a79fb6 at zfs_onexit_destroy+0x56
#9 0xffffffff81a70072 at zfsdev_close+0x52
#10 0xffffffff80833e19 at devfs_fpdrop+0xa9
#11 0xffffffff80836585 at devfs_close_f+0x45
#12 0xffffffff80905fa9 at _fdrop+0x29
#13 0xffffffff8090884e at closef+0x21e
#14 0xffffffff80906358 at closefp+0x98
#15 0xffffffff80d56e9f at amd64_syscall+0x40f
#16 0xffffffff80d3c0fb at Xfast_syscall+0xfb
Uptime: 1d19h20m50s

Comment 11 Andriy Gapon freebsd_committer

2017-08-30 16:00:56 UTC

(In reply to Raimo Niskanen from comment #10)
Interesting!  Are you able to associate this stack trace with any activity / commands?

Comment 12 Andriy Gapon freebsd_committer

2017-08-30 16:08:06 UTC

Also, could you please print *mp in the dounmount frame?
I already see one potential problem, but want to confirm that it is indeed it.

Comment 13 Raimo Niskanen 2017-08-31 06:47:19 UTC

It seems this time it was executing a backup script that did a 'zfs send' most probably:
    zfs send -R -I weekly-2017-08-26_04.25.43--1m zroot@daily-2017-08-31_03.38.14--1w

This does not seem to be one of my typical panics, though.  The last time this zfs send failed was June 20:th.

#0  doadump (textdump=<value optimized out>) at pcpu.h:219
219	pcpu.h: No such file or directory.
	in pcpu.h

(kgdb) bt
#0  doadump (textdump=<value optimized out>) at pcpu.h:219
#1  0xffffffff80951142 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:486
#2  0xffffffff80951525 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:889
#3  0xffffffff809513b3 in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:818
#4  0xffffffff809f77e5 in vholdl (vp=<value optimized out>) at /usr/src/sys/kern/vfs_subr.c:2453
#5  0xffffffff809f0f40 in dounmount (mp=0xfffff8003414a660, flags=524288, td=0xfffff8002ca3b000) at /usr/src/sys/kern/vfs_mount.c:1223
#6  0xffffffff81a6dfe4 in zfs_unmount_snap (snapname=<value optimized out>)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:3485
#7  0xffffffff81a10663 in dsl_dataset_user_release_impl (holds=0xfffff801d2054740, errlist=0x0, tmpdp=0xfffff8002c5cb000)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c:581
#8  0xffffffff81a10f2c in dsl_dataset_user_release_onexit (arg=0xfffff80034930600)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_userhold.c:629
#9  0xffffffff81a79fb6 in zfs_onexit_destroy (zo=0xfffff8020ee82140)
    at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_onexit.c:93
#10 0xffffffff81a70072 in zfsdev_close (data=0x2) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c:5995
#11 0xffffffff80833e19 in devfs_fpdrop (fp=<value optimized out>) at /usr/src/sys/fs/devfs/devfs_vnops.c:186
#12 0xffffffff80836585 in devfs_close_f (fp=<value optimized out>, td=<value optimized out>) at /usr/src/sys/fs/devfs/devfs_vnops.c:646
#13 0xffffffff80905fa9 in _fdrop (fp=0xfffff8002cf50050, td=0x0) at file.h:344
#14 0xffffffff8090884e in closef (fp=<value optimized out>, td=<value optimized out>) at /usr/src/sys/kern/kern_descrip.c:2339
#15 0xffffffff80906358 in closefp (fdp=0xfffff801c76f1800, fd=<value optimized out>, fp=0xfffff8002cf50050, td=0xfffff8002ca3b000, 
    holdleaders=<value optimized out>) at /usr/src/sys/kern/kern_descrip.c:1195
#16 0xffffffff80d56e9f in amd64_syscall (td=0xfffff8002ca3b000, traced=0) at subr_syscall.c:141
#17 0xffffffff80d3c0fb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396
#18 0x0000000801a05f3a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal
(kgdb) fr 5
#5  0xffffffff809f0f40 in dounmount (mp=0xfffff8003414a660, flags=524288, td=0xfffff8002ca3b000) at /usr/src/sys/kern/vfs_mount.c:1223
1223			vholdl(coveredvp);

(kgdb) set print pretty

(kgdb) p *mp
$1 = {
  mnt_mtx = {
    lock_object = {
      lo_name = 0xffffffff80fe41c1 "struct mount mtx", 
      lo_flags = 16973824, 
      lo_data = 0, 
      lo_witness = 0x0
    }, 
    mtx_lock = 4
  }, 
  mnt_gen = 1, 
  mnt_list = {
    tqe_next = 0xfffff8003414a330, 
    tqe_prev = 0xfffff8003414a9b8
  }, 
  mnt_op = 0xffffffff81b047c8, 
  mnt_vfc = 0xffffffff81b04780, 
  mnt_vnodecovered = 0xfffff8017144e588, 
  mnt_syncer = 0x0, 
  mnt_ref = 1, 
  mnt_nvnodelist = {
    tqh_first = 0x0, 
    tqh_last = 0xfffff8003414a6c0
  }, 
  mnt_nvnodelistsize = 0, 
  mnt_activevnodelist = {
    tqh_first = 0x0, 
    tqh_last = 0xfffff8003414a6d8
  }, 
  mnt_activevnodelistsize = 0, 
  mnt_writeopcount = 0, 
  mnt_kern_flag = 1073742016, 
  mnt_flag = 276828185, 
  mnt_opt = 0xfffff800842bc9a0, 
  mnt_optnew = 0x0, 
  mnt_maxsymlinklen = 0, 
  mnt_stat = {
    f_version = 537068824, 
    f_type = 222, 
    f_flags = 0, 
    f_bsize = 512, 
    f_iosize = 131072, 
    f_blocks = 17666176, 
    f_bfree = 7687944, 
    f_bavail = 7687944, 
    f_files = 7962179, 
    f_ffree = 7687944, 
    f_syncwrites = 0, 
    f_asyncwrites = 0, 
    f_syncreads = 0, 
    f_asyncreads = 0, 
    f_spare = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, 
    f_namemax = 255, 
    f_owner = 0, 
    f_fsid = {
      val = {1516185005, -21157410}
    }, 
    f_charspare = '\0' <repeats 79 times>, 
    f_fstypename = "zfs", '\0' <repeats 12 times>, 
    f_mntfromname = "zroot/export/otp_support@weekly-2017-08-26_04.25.43--1m", '\0' <repeats 32 times>, 
    f_mntonname = "/export/otp_support/.zfs/snapshot/weekly-2017-08-26_04.25.43--1m", '\0' <repeats 23 times>
  }, 
  mnt_cred = 0xfffff80181c3a500, 
  mnt_data = 0xfffff800aa443000, 
  mnt_time = 0, 
  mnt_iosize_max = 65536, 
  mnt_export = 0x0, 
  mnt_label = 0x0, 
  mnt_hashseed = 72392620, 
  mnt_lockref = 0, 
  mnt_secondary_writes = 0, 
  mnt_secondary_accwrites = 0, 
  mnt_susp_owner = 0x0, 
  mnt_gjprovider = 0x0, 
  mnt_explock = {
    lock_object = {
      lo_name = 0xffffffff80fc9c44 "explock", 
      lo_flags = 108199936, 
      lo_data = 0, 
      lo_witness = 0x0
    }, 
    lk_lock = 1, 
    lk_exslpfail = 0, 
    lk_timo = 0, 
    lk_pri = 96
  }, 
  mnt_upper_link = {
    tqe_next = 0x0, 
    tqe_prev = 0x0
  }, 
  mnt_uppers = {
    tqh_first = 0x0, 
    tqh_last = 0xfffff8003414a980
  }
}

Comment 14 Andriy Gapon freebsd_committer

2017-08-31 07:19:29 UTC

(In reply to Raimo Niskanen from comment #13)
Maybe there will be another kind of panic from the debug code...
Meanwhile, it seems that my original theory about this panic was wrong.
Could you please print *mp->mnt_vnodecovered as well?

Comment 15 Raimo Niskanen 2017-08-31 07:39:48 UTC

(kgdb) p *mp->mnt_vnodecovered 
$2 = {
  v_tag = 0xffffffff80ea8ace "none", 
  v_op = 0xffffffff8148eb18, 
  v_data = 0x0, 
  v_mount = 0x0, 
  v_nmntvnodes = {
    tqe_next = 0xfffff801d028c588, 
    tqe_prev = 0xfffff80250a29d08
  }, 
  v_un = {
    vu_mount = 0x0, 
    vu_socket = 0x0, 
    vu_cdev = 0x0, 
    vu_fifoinfo = 0x0
  }, 
  v_hashlist = {
    le_next = 0x0, 
    le_prev = 0xfffffe0000b80be0
  }, 
  v_cache_src = {
    lh_first = 0x0
  }, 
  v_cache_dst = {
    tqh_first = 0x0, 
    tqh_last = 0xfffff8017144e5d8
  }, 
  v_cache_dd = 0x0, 
  v_lock = {
    lock_object = {
      lo_name = 0xffffffff81af1c7b "zfs", 
      lo_flags = 117112832, 
      lo_data = 0, 
      lo_witness = 0x0
    }, 
    lk_lock = 1, 
    lk_exslpfail = 0, 
    lk_timo = 51, 
    lk_pri = 96
  }, 
  v_interlock = {
    lock_object = {
      lo_name = 0xffffffff80fe4da3 "vnode interlock", 
      lo_flags = 16973824, 
      lo_data = 0, 
      lo_witness = 0x0
    }, 
    mtx_lock = 18446735278365454336
  }, 
  v_vnlock = 0xfffff8017144e5f0, 
  v_actfreelist = {
    tqe_next = 0xfffff801d028c588, 
    tqe_prev = 0xffffffff81623330
  }, 
  v_bufobj = {
    bo_lock = {
      lock_object = {
        lo_name = 0xffffffff80fe4db3 "bufobj interlock", 
        lo_flags = 86179840, 
        lo_data = 0, 
        lo_witness = 0x0
      }, 
      rw_lock = 1
    }, 
    bo_ops = 0xffffffff814b73e0, 
    bo_object = 0x0, 
    bo_synclist = {
      le_next = 0x0, 
      le_prev = 0x0
    }, 
    bo_private = 0xfffff8017144e588, 
    __bo_vnode = 0xfffff8017144e588, 
    bo_clean = {
      bv_hd = {
        tqh_first = 0x0, 
        tqh_last = 0xfffff8017144e6a8
      }, 
      bv_root = {
        pt_root = 0
      }, 
      bv_cnt = 0
    }, 
    bo_dirty = {
      bv_hd = {
        tqh_first = 0x0, 
        tqh_last = 0xfffff8017144e6c8
      }, 
      bv_root = {
        pt_root = 0
      }, 
      bv_cnt = 0
    }, 
    bo_numoutput = 0, 
    bo_flag = 0, 
    bo_bsize = 131072
  }, 
  v_pollinfo = 0x0, 
  v_label = 0x0, 
  v_lockf = 0x0, 
  v_rl = {
    rl_waiters = {
      tqh_first = 0x0, 
      tqh_last = 0xfffff8017144e710
    }, 
    rl_currdep = 0x0
  }, 
  v_cstart = 0, 
  v_lasta = 0, 
  v_lastw = 0, 
  v_clen = 0, 
  v_holdcnt = 1, 
  v_usecount = 0, 
  v_iflag = 0, 
  v_vflag = 0, 
  v_writecount = 0, 
  v_hash = 119749, 
  v_type = VBAD
}

Comment 16 Andriy Gapon freebsd_committer

2017-08-31 08:27:34 UTC

(In reply to Raimo Niskanen from comment #15)
Okay, I believe that the snapshot problem is already fixed in the newer ZFS code (e.g. in stable/10, 11.1 and FreeBSD 10.4 betas).
Let's hope that the other (original) problem can be caught as well.

Comment 17 Raimo Niskanen 2017-08-31 09:03:34 UTC

Does that mean that we now wait for a different panic?

Or should I upgrade the server first?

Comment 18 Andriy Gapon freebsd_committer

2017-08-31 09:05:35 UTC

(In reply to Raimo Niskanen from comment #17)
If you can afford to upgrade the system that would be the best.  Please re-apply the debug patch in that case.
Otherwise, we would have to wait for a different panic.

Comment 19 Raimo Niskanen 2017-08-31 09:08:04 UTC

Upgrade to what version? (the shorter the better right now)

Comment 20 Andriy Gapon freebsd_committer

2017-08-31 09:11:44 UTC

(In reply to Raimo Niskanen from comment #19)
Whichever is more comfortable for you.  11.1 or 10.4 beta aka stable/10.

Comment 21 Raimo Niskanen 2017-08-31 09:15:55 UTC

I do not know how to upgrade to stable/10 or 10.4 beta with freebsd-upgrade...  And have locally compiled applications that do not know if I can easily recompile for 11.1.

Comment 22 Raimo Niskanen 2017-09-01 07:33:56 UTC

I just got another one in zfs_unmount_snap.  This machine can not complete my backup script now.

It seems this hides the bug we are hunting, or the bug we are hunting is a secondary problem to this bug...

Do you know when 10.4 is due?

I wonder if it is an option to wait for 10.4 to be released?

If not, suppose I build and install 10.4-beta from source - is it easy to upgrade to 10.4-release or 11.1-release from that using freebsd-update?

Comment 23 Andriy Gapon freebsd_committer

2017-09-01 07:37:26 UTC

(In reply to Raimo Niskanen from comment #22)
I think that 10.4 beta build has already started (or is about to start), so I expect that they will be published soon.
Another possibility is to temporarily disable the backups while the hunt for the other problem is going...

Comment 24 Raimo Niskanen 2017-09-01 09:00:22 UTC

I will disable backups, but fear that might stop provoking the bug.

If so I will wait for 10.4.

Comment 25 Andriy Gapon freebsd_committer

2017-09-01 09:56:33 UTC

(In reply to Raimo Niskanen from comment #24)
I am now starting to think that the snapshot unmount issue was _the_ the original problem as well.  It's just that the problematic access was not caught until later.  But let's see.

Comment 26 Raimo Niskanen 2017-09-06 09:58:07 UTC

Uptime now 5 days.  New record!

I guess the bug will not be provoked without my backup snapshots.

I'll wait for 10.4, then, before being able to take backups again, unfortunately the release plan says announcement is due 3:d of October...

Strangely enough I have a different server, also with 10.3 on which this has never happened.  They are very alike, especially the way both send zfs recursive zroot mirror with snapshots.  Can you say what may provoke the bug on this particular machine?

Comment 27 Andriy Gapon freebsd_committer

2017-10-25 13:05:51 UTC

Any news on this issue?  Have you been able to upgrade to 10.4?
Thanks!

Comment 28 Raimo Niskanen 2017-10-25 13:19:12 UTC

I take it that 10.4 is out, then :-)

I polled a few times but have not noticed the release.  Thank you for the notification!  I'll upgrade shortly.

Comment 29 Raimo Niskanen 2017-10-26 12:34:41 UTC

Upgraded to 10.4.  Uptime before that 54 days without a glitch.

Monthly ZFS stream backup script for October now running (26 days late :-).

Wating with antici..............pation!

Comment 30 Raimo Niskanen 2017-10-27 10:30:04 UTC

The monthly backup script completed without kernel panic.
I have enabled the crontab backups.
All back to normal operation.
If I get a week without kernel panic then the problem is definitely solved!

Comment 31 Andriy Gapon freebsd_committer

2017-10-27 10:31:30 UTC

Good news so far.   Thanks!
Let's see if it holds.

Comment 32 Raimo Niskanen 2017-11-03 14:02:59 UTC

We can declare this bug as fixed now.  No sign of trouble.

Comment 33 Andriy Gapon freebsd_committer

2017-11-03 15:31:57 UTC

Thank you!