Created attachment 163453 [details] procstat -kk -a output FreeBSD 10.2-RELEASE-p5 r289218 Deadlock is on an SSD-only mirrored ZFS pool: home ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gpt/mirror0-a ONLINE 0 0 0 gpt/mirror0-b ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 gpt/mirror1-a ONLINE 0 0 0 gpt/mirror1-b ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 gpt/mirror2-a ONLINE 0 0 0 gpt/mirror2-b ONLINE 0 0 0 mirror-3 ONLINE 0 0 0 gpt/mirror3-a ONLINE 0 0 0 gpt/mirror3-b ONLINE 0 0 0 The deadlocked process is a chksetuid run from periodic. The affected directory is identified. Here is a procstat -kk output: 1816 102485 find - mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 zfs_freebsd_open+0xf5 VOP_OPEN_APV+0xa1 vn_open_vnode+0x234 vn_open_cred+0x33e kern_openat+0x26f amd64_syscall+0x33a Xfast_syscall+0xfb Trying to ls the affected directory ends up in the same: 71376 102400 ls - mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 zfs_freebsd_open+0xf5 VOP_OPEN_APV+0xa1 vn_open_vnode+0x234 vn_open_cred+0x33e kern_openat+0x26f amd64_syscall+0x33a Xfast_syscall+0xfb However, I'm able to list directory entries via zdb(8). The processes are not killable. The machine is running and non-productive, so I'm able to give any debugging information, just give me some pointers. Attached is the output from procstat -a -kk.
I'm actually having the exact same problem, it started when we upgraded to 10.2 (we were running 10.1 before and it never happened there) and it happens about once a month and after a reboot it's good for about another month. It seems that its the periodic find that always gets stuck and after that alot of other i/o procs get stuck Here's the periodic find: 23293 100188 find - mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 ufs_lookup_ino+0xa0 VOP_CACHEDLOOKUP_APV+0xa1 vfs_cache_lookup+0xd6 VOP_LOOKUP_APV+0xa1 lookup+0x5a1 namei+0x4d4 kern_statat_vnhook+0xae sys_fstatat+0x2c amd64_syscall+0x357 Xfast_syscall+0xfb And here's a rsync call thats stuck: 83577 100644 rsync - mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 ufs_open+0x6d VOP_OPEN_APV+0xa1 vn_open_vnode+0x234 vn_open_cred+0x33e kern_openat+0x26f amd64_syscall+0x357 Xfast_syscall+0xfb One difference in setup is that I'm running a mfi raid and not zfs [root@backup-se ~]# mfiutil show volumes mfi0 Volumes: Id Size Level Stripe State Cache Name mfid0 ( 12T) RAID-5 64K OPTIMAL Disabled mfid1 ( 931G) RAID-0 64K OPTIMAL Disabled Filesystem Size Used Avail Capacity Mounted on /dev/mfid1p2 898G 4.8G 821G 1% / /dev/mfid0p1 12T 8.9T 1.8T 83% /backups devfs 1.0K 1.0K 0B 100% /dev fdescfs 1.0K 1.0K 0B 100% /dev/fd
(In reply to daniel from comment #1) Recompile the kernel with INVARIANTS etc, see the guide at https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html When the situation occurs, execute 'kgdb <path>/kernel.debug /dev/mem'. After that, switch to the thread which hung, with the kgdb command 'thread <id>'. Backtrace 'bt' would should you the arguments, find the vnode address which caused the hang (most likely it is shown as vp), and do 'p *vp', 'p *(vp->v_bufobj.bo_object)'.
Thanks for the pointers, I'll get back to you with the output as soon as it happens.
Okay! Got a proc stuck again, however, I'm having trouble understanding "find the vnode address which caused the hang", how do I know what caused the hang? Heres all the output I've gatherd so far: procstat -kk -a ===================================== 94614 100627 python2.7 - mi_switch+0x179 sleepq_switch+0x152 sleepq_wait+0x43 _sleep+0x366 vnode_create_vobject+0xe9 ufs_lookup_ino+0xa0 VOP_CACHEDLOOKUP_APV+0x10f vfs_cache_lookup+0xd6 VOP_LOOKUP_APV+0x10f lookup+0x5c2 namei+0x504 kern_statat_vnhook+0xae sys_lstat+0x30 amd64_syscall+0x278 Xfast_syscall+0xfb ===================================== [root@hub-se /usr/obj/usr/src/sys/GENERIC]# kgdb kernel.debug /dev/mem GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Reading symbols from /boot/kernel/aio.ko.symbols...done. Loaded symbols for /boot/kernel/aio.ko.symbols Reading symbols from /boot/kernel/accf_data.ko.symbols...done. Loaded symbols for /boot/kernel/accf_data.ko.symbols Reading symbols from /boot/kernel/accf_http.ko.symbols...done. Loaded symbols for /boot/kernel/accf_http.ko.symbols Reading symbols from /boot/kernel/fdescfs.ko.symbols...done. Loaded symbols for /boot/kernel/fdescfs.ko.symbols Reading symbols from /boot/kernel/nullfs.ko.symbols...done. Loaded symbols for /boot/kernel/nullfs.ko.symbols #0 sched_switch (td=0xffffffff81850720, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 1945 cpuid = PCPU_GET(cpuid); (kgbd) info threads 505 Thread 100627 (PID=94614: python2.7) sched_switch (td=0xfffff80379b5e940, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 (kgdb) thread 505 [Switching to thread 505 (Thread 100627)]#0 sched_switch (td=0xfffff80379b5e940, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 1945 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xfffff80379b5e940, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 #1 0xffffffff8094e149 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:491 #2 0xffffffff8098bb02 in sleepq_switch (wchan=<value optimized out>, pri=<value optimized out>) at /usr/src/sys/kern/subr_sleepqueue.c:538 #3 0xffffffff8098b963 in sleepq_wait (wchan=0xfffff808cdd41e00, pri=84) at /usr/src/sys/kern/subr_sleepqueue.c:617 #4 0xffffffff8094da56 in _sleep (ident=<value optimized out>, lock=<value optimized out>, priority=<value optimized out>, wmesg=<value optimized out>, sbt=<value optimized out>, pr=<value optimized out>) at /usr/src/sys/kern/kern_synch.c:255 #5 0xffffffff80be8759 in vnode_create_vobject (vp=0xfffff8036ee569c0, isize=512, td=0xfffff80379b5e940) at /usr/src/sys/vm/vnode_pager.c:120 #6 0xffffffff80bb1010 in ufs_lookup_ino (vdp=0xfffff8036ee569c0, vpp=0xfffffe0a5bdfd9a8, cnp=0xfffffe0a5bdfd9d0, dd_ino=0x0) at /usr/src/sys/ufs/ufs/ufs_lookup.c:259 #7 0xffffffff80e791ef in VOP_CACHEDLOOKUP_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:197 #8 0xffffffff809dba26 in vfs_cache_lookup (ap=<value optimized out>) at vnode_if.h:80 #9 0xffffffff80e7902f in VOP_LOOKUP_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:129 #10 0xffffffff809e40c2 in lookup (ndp=0xfffffe0a5bdfd948) at vnode_if.h:54 #11 0xffffffff809e3764 in namei (ndp=0xfffffe0a5bdfd948) at /usr/src/sys/kern/vfs_lookup.c:302 #12 0xffffffff809f911e in kern_statat_vnhook (td=0xfffff80379b5e940, flag=<value optimized out>, fd=-100, path=0x8067103d4 <Error reading address 0x8067103d4: Bad address>, pathseg=UIO_USERSPACE, sbp=0xfffffe0a5bdfda60, hook=0) at /usr/src/sys/kern/vfs_syscalls.c:2298 #13 0xffffffff809f92b0 in sys_lstat (td=0x0, uap=0xfffffe0a5bdfdb80) at /usr/src/sys/kern/vfs_syscalls.c:2278 #14 0xffffffff80d51b38 in amd64_syscall (td=0xfffff80379b5e940, traced=0) at subr_syscall.c:134 #15 0xffffffff80d359ab in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #16 0x000000080157f8aa in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) I'm letting the server stay stuck until I know how to get the info you need out of it :)
In the frame 5, the argument vp points to the vnode. Also do 'p *td' in that frame.
Sorry, but this is the first time I'm doing any kind of kernel debuging. So I'm supposed to write this after the backtrace? (kgdb) p *0xfffff8036ee569c0 $1 = -2130911031 But then the second command fails (kgdb) p *(vp->v_bufobj.bo_object) No symbol "vp" in current context.
(In reply to Daniel Ylitalo from comment #6) frame 5 p *vp p *td etc You must use symbols to access data, otherwise gdb cannot deduce types.
Created attachment 166244 [details] threads frame output
I've attached the output, let me know if you need something else or if I can reboot without the kernel debugging in place.
(In reply to Daniel Ylitalo from comment #9) This is seemngly what I need to start, but I probably would not look closer until some time, most likely tomorrow.
(In reply to Konstantin Belousov from comment #10) Is there a procstat -kk output somewhere ? I want to see the state of all threads when the problem occurs.
Unfortunately I rebooted the server already, but I still have debug compiled in so I'll get one for you the next time it happens.
Created attachment 166295 [details] Debugging patch I attached the debugging patch to try to see something about code which set the OBJ_DEAD flag. In fact, I do not expect that this would catch the offender, but we will see. Somebody who wants to help with the bug should apply the patch, reproduce the issue and then provide the same information as I already asked, most important is the p *vp and p *bo_object printouts. If I see what I wanted, I would provide further instructions how to proceed.
I've recompiled the kernel with the patch applied, I'll wait for it to happen again.
We've just ran into the same bug on one of our machines. # procstat -kk 52827 PID TID COMM TDNAME KSTACK 52827 100187 ruby21 - mi_switch+0xe1 thread_suspend_switch+0x170 thread_single+0x4e5 exit1+0xbe sigexit+0x925 postsig+0x286 ast+0x427 doreti_ast+0x1f 52827 100389 ruby21 - mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 zfs_freebsd_open+0xf5 VOP_OPEN_APV+0xa1 vn_open_vnode+0x234 vn_open_cred+0x33e kern_openat+0x26f amd64_syscall+0x357 Xfast_syscall+0xfb (kgdb) print *object $10 = { lock = { lock_object = { lo_name = 0xffffffff80ff8196 "vm object", lo_flags = 90374144, lo_data = 0, lo_witness = 0x0 }, rw_lock = 1 }, object_list = { tqe_next = 0xfffff80135aaa600, tqe_prev = 0xfffff80135aaa420 }, shadow_head = { lh_first = 0x0 }, shadow_list = { le_next = 0x0, le_prev = 0xfffff8019f85f030 }, memq = { tqh_first = 0x0, tqh_last = 0xfffff80135aaa548 }, rtree = { rt_root = 0, rt_flags = 0 '\0' }, size = 0, generation = 1, ref_count = 0, shadow_count = 0, memattr = 6 '\006', type = 2 '\002', flags = 24584, pg_color = 1569, paging_in_progress = 0, resident_page_count = 0, backing_object = 0x0, backing_object_offset = 0, pager_object_list = { tqe_next = 0x0, tqe_prev = 0x0 }, rvq = { lh_first = 0x0 }, cache = { rt_root = 0, rt_flags = 0 '\0' }, handle = 0xfffff8010bc08760, un_pager = { vnp = { vnp_size = 0, writemappings = 0 }, devp = { devp_pglist = { tqh_first = 0x0, tqh_last = 0x0 }, ops = 0x0, dev = 0x0 }, sgp = { sgp_pglist = { tqh_first = 0x0, tqh_last = 0x0 } }, swp = { swp_tmpfs = 0x0, swp_bcount = 0 } }, cred = 0x0, charge = 0 } (kgdb) p *vp $11 = { v_tag = 0xffffffff81acefb6 "zfs", v_op = 0xffffffff81ae12f0, v_data = 0xfffff80031109450, v_mount = 0xfffff8000801a330, v_nmntvnodes = { tqe_next = 0xfffff801c912a1d8, tqe_prev = 0xfffff8022bfa0780 }, v_un = { vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0 }, v_hashlist = { le_next = 0x0, le_prev = 0x0 }, v_cache_src = { lh_first = 0x0 }, v_cache_dst = { tqh_first = 0xfffff801a260f4d0, tqh_last = 0xfffff801a260f4f0 }, v_cache_dd = 0x0, v_lock = { lock_object = { lo_name = 0xffffffff81acefb6 "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0 }, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96 }, v_interlock = { lock_object = { lo_name = 0xffffffff80fd2f2d "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0 }, mtx_lock = 4 }, v_vnlock = 0xfffff8010bc087c8, v_actfreelist = { tqe_next = 0xfffff80079317000, tqe_prev = 0xfffff80092a9e820 }, v_bufobj = { bo_lock = { lock_object = { lo_name = 0xffffffff80fd2f3d "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0 }, rw_lock = 1 }, bo_ops = 0xffffffff8149bff0, bo_object = 0xfffff80135aaa500, bo_synclist = { le_next = 0x0, le_prev = 0x0 }, bo_private = 0xfffff8010bc08760, __bo_vnode = 0xfffff8010bc08760, bo_clean = { bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff8010bc08880 }, bv_root = { pt_root = 0 }, bv_cnt = 0 }, bo_dirty = { bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff8010bc088a0 }, bv_root = { pt_root = 0 }, bv_cnt = 0 }, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072 }, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = { rl_waiters = { tqh_first = 0x0, tqh_last = 0xfffff8010bc088e8 }, rl_currdep = 0x0 }, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 2, v_usecount = 2, v_iflag = 512, v_vflag = 0, v_writecount = 0, v_hash = 17547399, v_type = VREG } (kgdb) print vp->v_cache_dst->tqh_first->nc_name $14 = 0xfffff801a260f512 "puppet20160226-52827-x6ea8y" The file where things go wrong for us is: /tmp/puppet20160226-52827-x6ea8y Performing a cat on this file also hangs on vodead. Any relevant waiting channels: # ps -o lwp -laxwwwSH | awk '{ if ($10 !~ "^(-|ttyin|lockf|wait|select|kqread|tx\->)") print; }' LWP UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 100000 0 0 0 0 -16 0 0 285088 swapin DLs - 0:41.09 [kernel/swapper] 100048 0 0 0 0 -92 0 0 285088 vtbslp DLs - 0:00.00 [kernel/virtio_ballo] 100018 0 2 0 0 -16 0 0 16 crypto_w DL - 0:00.00 [crypto] 100019 0 3 0 0 -16 0 0 16 crypto_r DL - 0:00.00 [crypto returns] 100057 0 5 0 0 20 0 0 6176 arc_recl DL - 13:07.13 [zfskern/arc_reclaim] 100058 0 5 0 0 -8 0 0 6176 l2arc_fe DL - 0:40.44 [zfskern/l2arc_feed_] 102486 0 5 0 0 -8 0 0 6176 zio->io_ DL - 8:24.66 [zfskern/txg_thread_] 100062 0 7 0 0 -16 0 0 32 psleep DL - 17:00.73 [pagedaemon/pagedaem] 100068 0 7 0 0 -16 0 0 32 umarcl DL - 0:00.00 [pagedaemon/uma] 100063 0 8 0 0 -16 0 0 16 psleep DL - 0:00.00 [vmdaemon] 100064 0 9 0 0 155 0 0 16 pgzero DL - 0:00.17 [pagezero] 100001 0 10 0 0 -16 0 0 16 audit_wo DL - 14:36.35 [audit] 100065 0 16 0 0 -16 0 0 16 psleep DL - 0:51.50 [bufdaemon] 100066 0 17 0 0 -16 0 0 16 vlruwt DL - 1:27.70 [vnlru] 100067 0 18 0 0 16 0 0 16 syncer DL - 63:11.82 [syncer] 100406 65534 921 1 0 52 0 71104 33604 uwait Is - 0:00.00 /usr/local/bin/memcached -l 127.0.0.1 -m 1024 -p 11211 -C -d -u nobody -P /var/run/memcached/memcached.pid 100387 0 958 1 0 23 0 16612 2224 nanslp Is - 97:24.14 /usr/sbin/cron -s 100394 0 1194 0 0 -16 0 0 16 pftm DL - 30:48.46 [pf purge] 100388 0 1201 1 0 52 0 14700 2256 sbwait Is - 0:00.00 pflogd: [priv] (pflogd) 100375 64 1202 1201 0 20 0 14700 2312 bpf S - 5:36.59 pflogd: [running] -s 116 -i pflog0 -f /dev/null (pflogd) 100389 0 52827 1 0 52 0 213452 112968 vodead T - 0:08.69 ruby21: puppet agent: applying configuration (ruby21) 100400 0 55604 0 0 -16 0 0 16 ftcl DL - 0:00.00 [ftcleanup] 101076 0 58064 1 0 20 0 16588 2168 auditd Is - 0:07.02 /usr/sbin/auditd 101349 0 37651 37601 0 20 0 12356 2132 vodead D+ 0- 0:00.00 cat puppet20160226-52827-x6ea8y This is the first time we're seeing this.
I have to reboot the machine now, so hope gives enough extra information (though I doubt it). This was all the relevant information I could come up with.
We are seeing this issue at Limelight across a good number of boxes on stable/10 @ r285800. We have a newer 10 version in testing @ r296969, but have not used it with enough anger to empirically know if it has been fixed. It should be getting deployed further in the next months or so. Unfortunately we don't yet have a way to reproduce, so we can introduce INVARIANTS into a kernel to help capture some info. I'll report back if we do, but in the meantime I did want to +1 this.
Hello, another +1 for us.
And now the debugging info. Unfortunately, we are new for this bug, so we have the kernel without INVARIANTS and without debugging.patch yet. We have locked bacula-fd, which is backup client with similar function to find & stat and it occured during incremental backup operation: procstat -kk (I can send full procstat -kk -a personally, it has 429876 bytes): 1331 101013 bacula-fd - mi_switch+0xbe sleepq_catch_signals+0xab sleepq_wait_sig+0xc _cv_wait_sig+0x17a seltdwait+0xae kern_select+0x8fa sys_select+0x54 amd64_syscall+0x321 Xfast_syscall+0xfb 1331 101035 bacula-fd - mi_switch+0xbe sleepq_catch_signals+0xab sleepq_timedwait_sig+0xf _sleep+0x238 umtxq_sleep+0x125 do_wait+0x387 __umtx_op_wait_uint_private+0x83 amd64_syscall+0x321 Xfast_syscall+0xfb 1331 102066 bacula-fd - mi_switch+0xbe sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 ufs_lookup_ino+0xa0 VOP_CACHEDLOOKUP_APV+0x7a vfs_cache_lookup+0xd6 VOP_LOOKUP_APV+0x7a lookup+0x571 namei+0x344 kern_statat_vnhook+0xae sys_lstat+0x30 amd64_syscall+0x321 Xfast_syscall+0xfb 1331 103370 bacula-fd - mi_switch+0xbe sleepq_catch_signals+0xab sleepq_timedwait_sig+0xf _cv_timedwait_sig_sbt+0x19e seltdwait+0xa4 kern_select+0x8fa sys_select+0x54 amd64_syscall+0x321 Xfast_syscall+0xfb where the important thread is TID 102066, with vnode_create_vobject. (kgdb) info threads ... 701 Thread 102066 (PID=1331: bacula-fd) sched_switch (td=0xfffff803ec526960, newtd=0x0, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 ... (kgdb) thread 701 [Switching to thread 701 (Thread 102066)]#0 sched_switch (td=0xfffff803ec526960, newtd=0x0, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 1945 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xfffff803ec526960, newtd=0x0, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 #1 0xffffffff80580d4e in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:491 #2 0xffffffff805c155a in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:618 #3 0xffffffff80580837 in _sleep (ident=<value optimized out>, lock=<value optimized out>, priority=<value optimized out>, wmesg=<value optimized out>, sbt=<value optimized out>, pr=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/kern_synch.c:255 #4 0xffffffff80766e60 in vnode_create_vobject (vp=0xfffff816fe4f3588, isize=512, td=0xfffff803ec526960) at /usr/src/sys/vm/vnode_pager.c:120 #5 0xffffffff8072fe80 in ufs_lookup_ino (vdp=0xfffff816fe4f3588, vpp=0xfffffe17f2e63858, cnp=0xfffffe17f2e63880, dd_ino=0x0) at /usr/src/sys/ufs/ufs/ufs_lookup.c:259 #6 0xffffffff807f897a in VOP_CACHEDLOOKUP_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:197 #7 0xffffffff806092e6 in vfs_cache_lookup (ap=<value optimized out>) at vnode_if.h:80 #8 0xffffffff807f883a in VOP_LOOKUP_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:129 #9 0xffffffff806111f1 in lookup (ndp=0xfffffe17f2e637f8) at vnode_if.h:54 #10 0xffffffff80610a04 in namei (ndp=0xfffffe17f2e637f8) at /usr/src/sys/kern/vfs_lookup.c:302 #11 0xffffffff8062631e in kern_statat_vnhook (td=0xfffff803ec526960, flag=<value optimized out>, fd=-100, path=0x43c18128 <Error reading address 0x43c18128: Bad address>, pathseg=UIO_USERSPACE, sbp=0xfffffe17f2e63910, hook=0) at /usr/src/sys/kern/vfs_syscalls.c:2300 #12 0xffffffff80626420 in sys_lstat (td=0x0, uap=0xfffffe17f2e63a40) at /usr/src/sys/kern/vfs_syscalls.c:2280 #13 0xffffffff8078acd1 in amd64_syscall (td=0xfffff803ec526960, traced=0) at subr_syscall.c:141 #14 0xffffffff8076fffb in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #15 0x00000000425a731a in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) frame 4 #4 0xffffffff80766e60 in vnode_create_vobject (vp=0xfffff816fe4f3588, isize=512, td=0xfffff803ec526960) at /usr/src/sys/vm/vnode_pager.c:120 120 VM_OBJECT_SLEEP(object, object, PDROP | PVM, "vodead", 0); (kgdb) p *vp $1 = {v_tag = 0xffffffff8085c651 "ufs", v_op = 0xffffffff80b0f950, v_data = 0xfffff81488311160, v_mount = 0xfffff8019a27d000, v_nmntvnodes = {tqe_next = 0xfffff816894dc3b0, tqe_prev = 0xfffff80b0d4991f8}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0xfffff80824c363b0, le_prev = 0xfffff81742127b48}, v_cache_src = {lh_first = 0x0}, v_cache_dst = {tqh_first = 0xfffff80e5f6a5af0, tqh_last = 0xfffff80e5f6a5b10}, v_cache_dd = 0xfffff80e5f6a5af0, v_lock = {lock_object = {lo_name = 0xffffffff8085c651 "ufs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0}, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = 0xffffffff80870687 "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 0xfffff816fe4f35f0, v_actfreelist = { tqe_next = 0xfffff80b0d4991d8, tqe_prev = 0xfffff816894dc470}, v_bufobj = {bo_lock = {lock_object = {lo_name = 0xffffffff80870697 "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops = 0xffffffff80afc070, bo_object = 0xfffff802f8423d00, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xfffff816fe4f3588, __bo_vnode = 0xfffff816fe4f3588, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff816fe4f36a8}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff816fe4f36c8}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 32768}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfffff816fe4f3710}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 8, v_usecount = 7, v_iflag = 512, v_vflag = 0, v_writecount = 0, v_hash = 76467851, v_type = VDIR} (kgdb) p *(vp->v_bufobj.bo_object) $2 = {lock = {lock_object = {lo_name = 0xffffffff8088aeb6 "vm object", lo_flags = 90374144, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, object_list = {tqe_next = 0xfffff802f8423e00, tqe_prev = 0xfffff802f8423c20}, shadow_head = { lh_first = 0x0}, shadow_list = {le_next = 0x0, le_prev = 0xfffff801744afa30}, memq = {tqh_first = 0xfffff8183ea423d0, tqh_last = 0xfffff8183ea423e0}, rtree = {rt_root = 18446735381746688977, rt_flags = 0 '\0'}, size = 1, generation = 1, ref_count = 0, shadow_count = 0, memattr = 6 '\006', type = 2 '\002', flags = 16392, pg_color = 1608, paging_in_progress = 0, resident_page_count = 1, backing_object = 0x0, backing_object_offset = 0, pager_object_list = {tqe_next = 0x0, tqe_prev = 0x0}, rvq = {lh_first = 0x0}, cache = {rt_root = 0, rt_flags = 0 '\0'}, handle = 0xfffff816fe4f3588, un_pager = {vnp = {vnp_size = 512, writemappings = 0}, devp = {devp_pglist = { tqh_first = 0x200, tqh_last = 0x0}, ops = 0x0, dev = 0x0}, sgp = {sgp_pglist = {tqh_first = 0x200, tqh_last = 0x0}}, swp = {swp_tmpfs = 0x200, swp_bcount = 0}}, cred = 0x0, charge = 0} (kgdb) p *td $3 = {td_lock = 0xffffffff80b9d4d0, td_proc = 0xfffff8025dd694f0, td_plist = {tqe_next = 0xfffff801744ac4b0, tqe_prev = 0xfffff805374dd970}, td_runq = {tqe_next = 0x0, tqe_prev = 0xffffffff80b728b0}, td_slpq = { tqe_next = 0xfffff80389d11960, tqe_prev = 0xfffff80223243e80}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xfffffe00003849b8}, td_hash = {le_next = 0x0, le_prev = 0xfffffe00007d7590}, td_cpuset = 0xfffff801100593a8, td_sel = 0xfffff806ce36eb00, td_sleepqueue = 0x0, td_turnstile = 0xfffff80174d07e40, td_rlqe = 0xfffff8017477ba28, td_umtxq = 0xfffff803a5a44f00, td_tid = 102066, td_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = { __bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffff803ec526a18}, sq_proc = 0xfffff8025dd694f0, sq_flags = 1}, td_lend_user_pri = 255 '˙', td_flags = 4, td_inhibitors = 2, td_pflags = 0, td_dupfd = 0, td_sqqueue = 0, td_wchan = 0xfffff802f8423d00, td_wmesg = 0xffffffff8088be0a "vodead", td_lastcpu = 13 '\r', td_oncpu = 255 '˙', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', td_locks = 26721, td_rw_rlocks = 0, td_lk_slocks = 0, td_stopsched = 0, td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0, td_ucred = 0xfffff8017438e800, td_estcpu = 0, td_slptick = 1266348642, td_blktick = 0, td_swvoltick = 1266348642, td_cow = 0, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 7952, ru_ixrss = 21233268, ru_idrss = 1299996, ru_isrss = 13866624, ru_minflt = 1, ru_majflt = 0, ru_nswap = 0, ru_inblock = 2105608, ru_oublock = 0, ru_msgsnd = 1531599, ru_msgrcv = 100, ru_nsignals = 0, ru_nvcsw = 1685809, ru_nivcsw = 9633}, td_rux = {rux_runtime = 2226241352945, rux_uticks = 46227, rux_sticks = 62106, rux_iticks = 0, rux_uu = 365363300, rux_su = 490865795, rux_tu = 856229096}, td_incruntime = 0, td_runtime = 2226241352945, td_pticks = 0, td_sticks = 0, td_iticks = 0, td_uticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_generation = 1695442, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 0}, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_name = "bacula-fd\000\000\000\000\000\000\000\000\000\000", td_fpop = 0x0, td_dbgflags = 0, td_dbgksi = {ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0}}}}, ksi_flags = 0, ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, td_map_def_user = 0x0, td_dbg_forked = 0, td_vp_reserv = 0, td_no_sleeping = 0, td_dom_rr_idx = 0, td_sigmask = {__bits = {0, 0, 0, 0}}, td_rqindex = 48 '0', td_base_pri = 152 '\230', td_priority = 152 '\230', td_pri_class = 3 '\003', td_user_pri = 183 'ˇ', td_base_user_pri = 183 'ˇ', td_pcb = 0xfffffe17f2e63b80, td_state = TDS_INHIBITED, td_retval = {0, 1084620448}, td_slpcallout = {c_links = {le = {le_next = 0x0, le_prev = 0xfffffe0000ad9288}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xfffffe0000ad9288}}, c_time = 5430875852315327, c_precision = 26843, c_arg = 0xfffff803ec526960, c_func = 0xffffffff805c1370 <sleepq_timeout>, c_lock = 0x0, c_flags = 2, c_iflags = 272, c_cpu = 11}, td_frame = 0xfffffe17f2e63ac0, td_kstack_obj = 0xfffff8076ec32a00, td_kstack = 18446741977545703424, td_kstack_pages = 4, td_critnest = 1, td_md = {md_spinlock_count = 1, md_saved_flags = 582, md_spurflt_addr = 1131679744}, td_sched = 0xfffff803ec526de0, td_ar = 0x0, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0x0, td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_intr_frame = 0x0, td_rfppwait_p = 0xfffff80616f0a000, td_ma = 0x0, td_ma_cnt = 0, td_su = 0x0, td_dbg_sc_code = 0, td_dbg_sc_narg = 0, td_emuldata = 0x0} (kgdb) p *object $4 = {lock = {lock_object = {lo_name = 0xffffffff8088aeb6 "vm object", lo_flags = 90374144, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, object_list = {tqe_next = 0xfffff802f8423e00, tqe_prev = 0xfffff802f8423c20}, shadow_head = { lh_first = 0x0}, shadow_list = {le_next = 0x0, le_prev = 0xfffff801744afa30}, memq = {tqh_first = 0xfffff8183ea423d0, tqh_last = 0xfffff8183ea423e0}, rtree = {rt_root = 18446735381746688977, rt_flags = 0 '\0'}, size = 1, generation = 1, ref_count = 0, shadow_count = 0, memattr = 6 '\006', type = 2 '\002', flags = 16392, pg_color = 1608, paging_in_progress = 0, resident_page_count = 1, backing_object = 0x0, backing_object_offset = 0, pager_object_list = {tqe_next = 0x0, tqe_prev = 0x0}, rvq = {lh_first = 0x0}, cache = {rt_root = 0, rt_flags = 0 '\0'}, handle = 0xfffff816fe4f3588, un_pager = {vnp = {vnp_size = 512, writemappings = 0}, devp = {devp_pglist = { tqh_first = 0x200, tqh_last = 0x0}, ops = 0x0, dev = 0x0}, sgp = {sgp_pglist = {tqh_first = 0x200, tqh_last = 0x0}}, swp = {swp_tmpfs = 0x200, swp_bcount = 0}}, cred = 0x0, charge = 0}
(In reply to Rudolf Čejka from comment #19) From your report, I am almost certain (with ~95% level) that this issue, and the issue reported in PR204426 are the same. I would follow on that PR since the reporter seems to have low time to reproduce, and more involved debugging patches are already in flight there.
Just to clarify: After rebooting the machine once, the deadlock did not occur again. I'm currently unable to reproduce this issue but hope that the workloads of the other reporters could lead to a better diagnostic.
And another +1. Deadlock occurs on a raidz1 pool with 4 1.2TB members. This system is running on r294572. The issue started after approximately 59 days of production use with an average load of about 150 io transactions per second. The deadlock is on a directory, which is located on a +-600GB ZFS dataset and contains 7 files and two subdirs, a few kb total. procstat -kk for a stuck du process: PID TID COMM TDNAME KSTACK 97300 101745 du - mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 zfs_freebsd_open+0xf5 VOP_OPEN_APV+0xa1 vn_open_vnode+0x234 vn_open_cred+0x36a kern_openat+0x26f amd64_syscall+0x40f Xfast_syscall+0xfb And here is an rsync that is stuck: 25689 101762 rsync-backup - mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 zfs_freebsd_open+0xf5 VOP_OPEN_APV+0xa1 vn_open_vnode+0x234 vn_open_cred+0x36a kern_openat+0x26f amd64_syscall+0x40f Xfast_syscall+0xfb This filesystem is exported through nfs. We have a few nfsd threads stuck as well: root@nfs01:/home/roel # procstat -kk 857 PID TID COMM TDNAME KSTACK 857 100485 nfsd nfsd: master mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_timedwait_sig+0x10 _cv_timedwait_sig_sbt+0x18b svc_run_internal+0x46d svc_run+0x182 nfsrvd_nfsd+0x242 nfssvc_nfsd+0x107 sys_nfssvc+0x9c amd64_syscall+0x40f Xfast_syscall+0xfb 857 100505 nfsd nfsd: service mi_switch+0xe1 sleepq_catch_signals+0xab sleepq_wait_sig+0xf _cv_wait_sig+0x16a svc_run_internal+0x86e svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe ... lots of similar threads, except for: ... 857 100541 nfsd nfsd: service mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 zfs_fhtovp+0x45b nfsvno_fhtovp+0x7c nfsd_fhtovp+0xc8 nfsrvd_dorpc+0x12c nfssvc_program+0x54e svc_run_internal+0xd2b svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe 857 100588 nfsd nfsd: service mi_switch+0xe1 sleepq_wait+0x3a _sleep+0x287 vnode_create_vobject+0x100 zfs_fhtovp+0x45b nfsvno_fhtovp+0x7c nfsd_fhtovp+0xc8 nfsrvd_dorpc+0x12c nfssvc_program+0x54e svc_run_internal+0xd2b svc_thread_start+0xb fork_exit+0x9a fork_trampoline+0xe Some kgdb output for one of the stuck du threads: [Switching to thread 555 (Thread 101720)]#0 sched_switch (td=0xfffff8033a17e000, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 1945 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xfffff8033a17e000, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 #1 0xffffffff80959ff1 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:491 #2 0xffffffff8099b31a in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:617 #3 0xffffffff80959a17 in _sleep (ident=<value optimized out>, lock=<value optimized out>, priority=<value optimized out>, wmesg=<value optimized out>, sbt=<value optimized out>, pr=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/kern_synch.c:255 #4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10, isize=16, td=0xfffff8033a17e000) at /usr/src/sys/vm/vnode_pager.c:120 #5 0xffffffff81a7d5f5 in zfs_freebsd_open (ap=0xfffffe0466890658) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6076 #6 0xffffffff80e7e651 in VOP_OPEN_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:469 #7 0xffffffff80a05694 in vn_open_vnode (vp=0xfffff801d3e11b10, fmode=1179653, cred=0xfffff803682bd400, td=0xfffff8033a17e000, fp=0xfffff80240d403c0) at vnode_if.h:196 #8 0xffffffff80a0528a in vn_open_cred (ndp=0xfffffe0466890870, flagp=0xfffffe046689094c, cmode=<value optimized out>, vn_open_flags=<value optimized out>, cred=0x0, fp=0xfffff80240d403c0) at /usr/src/sys/kern/vfs_vnops.c:268 #9 0xffffffff809fe44f in kern_openat (td=0xfffff8033a17e000, fd=-100, path=0x801422d48 <Error reading address 0x801422d48: Bad address>, pathseg=UIO_USERSPACE, flags=1179653, mode=<value optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1093 #10 0xffffffff80d55aff in amd64_syscall (td=0xfffff8033a17e000, traced=0) at subr_syscall.c:141 #11 0xffffffff80d3ad7b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000800b7570a in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) frame 4 #4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10, isize=16, td=0xfffff8033a17e000) at /usr/src/sys/vm/vnode_pager.c:120 120 VM_OBJECT_SLEEP(object, object, PDROP | PVM, "vodead", 0); (kgdb) print vp $4 = (struct vnode *) 0xfffff801d3e11b10 (kgdb) print *vp $5 = {v_tag = 0xffffffff81aecb3b "zfs", v_op = 0xffffffff81aff790, v_data = 0xfffff801a2d1ca10, v_mount = 0xfffff8011a46e660, v_nmntvnodes = {tqe_next = 0xfffff801a53281d8, tqe_prev = 0xfffff801a6b8e5a8}, v_un = { vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = {lh_first = 0xfffff80019708850}, v_cache_dst = {tqh_first = 0xfffff80257063850, tqh_last = 0xfffff80257063870}, v_cache_dd = 0xfffff80257063850, v_lock = {lock_object = {lo_name = 0xffffffff81aecb3b "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0}, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = 0xffffffff80fe1a6c "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 0xfffff801d3e11b78, v_actfreelist = {tqe_next = 0xfffff801a6b8e588, tqe_prev = 0xfffff801a5328298}, v_bufobj = {bo_lock = {lock_object = {lo_name = 0xffffffff80fe1a7c "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops = 0xffffffff814b3900, bo_object = 0xfffff801fd0ea100, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xfffff801d3e11b10, __bo_vnode = 0xfffff801d3e11b10, bo_clean = {bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff801d3e11c30}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff801d3e11c50}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfffff801d3e11c98}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 127, v_usecount = 126, v_iflag = 512, v_vflag = 0, v_writecount = 0, v_hash = 7955980, v_type = VDIR} (kgdb) print *td $6 = {td_lock = 0xffffffff81619670, td_proc = 0xfffff8006233f4f0, td_plist = {tqe_next = 0x0, tqe_prev = 0xfffff8006233f500}, td_runq = {tqe_next = 0x0, tqe_prev = 0xffffffff815e3998}, td_slpq = { tqe_next = 0xfffff803f7d36960, tqe_prev = 0xfffff802836ed030}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xfffffe03e44e7908}, td_hash = {le_next = 0x0, le_prev = 0xfffffe000087aac0}, td_cpuset = 0xfffff800083cd3a8, td_sel = 0xfffff80039695e00, td_sleepqueue = 0x0, td_turnstile = 0xfffff8015b0cacc0, td_rlqe = 0xfffff80117ffa4d8, td_umtxq = 0xfffff802d770a780, td_tid = 101720, td_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffff8033a17e0b8}, sq_proc = 0xfffff8006233f4f0, sq_flags = 1}, td_lend_user_pri = 255 'ÿ', td_flags = 133124, td_inhibitors = 2, td_pflags = 0, td_dupfd = -1, td_sqqueue = 0, td_wchan = 0xfffff801fd0ea100, td_wmesg = 0xffffffff81006e8d "vodead", td_lastcpu = 1 '\001', td_oncpu = 255 'ÿ', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', td_locks = 4348, td_rw_rlocks = 0, td_lk_slocks = 0, td_stopsched = 0, td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0, td_ucred = 0xfffff803682bd400, td_estcpu = 0, td_slptick = 444244584, td_blktick = 0, td_swvoltick = 444244584, td_cow = 39, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 1656, ru_ixrss = 192, ru_idrss = 64, ru_isrss = 2048, ru_minflt = 119, ru_majflt = 0, ru_nswap = 0, ru_inblock = 749, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 739, ru_nivcsw = 2}, td_rux = {rux_runtime = 514120625, rux_uticks = 1, rux_sticks = 15, rux_iticks = 0, rux_uu = 12852, rux_su = 192791, rux_tu = 205644}, td_incruntime = 0, td_runtime = 514120625, td_pticks = 0, td_sticks = 0, td_iticks = 0, td_uticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_generation = 741, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_name = "du", '\0' <repeats 17 times>, td_fpop = 0x0, td_dbgflags = 0, td_dbgksi = {ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0}}}}, ksi_flags = 0, ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, td_map_def_user = 0x0, td_dbg_forked = 0, td_vp_reserv = 0, td_no_sleeping = 0, td_dom_rr_idx = 0, td_sigmask = {__bits = {0, 0, 0, 0}}, td_rqindex = 31 '\037', td_base_pri = 120 'x', td_priority = 120 'x', td_pri_class = 3 '\003', td_user_pri = 126 '~', td_base_user_pri = 126 '~', td_pcb = 0xfffffe0466890b80, td_state = TDS_INHIBITED, td_retval = {0, 1}, td_slpcallout = {c_links = {le = { le_next = 0x0, le_prev = 0xfffffe00008d0a10}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xfffffe00008d0a10}}, c_time = 20354765951718809, c_precision = 268435437, c_arg = 0xfffff8033a17e000, c_func = 0xffffffff8099b0d0 <sleepq_timeout>, c_lock = 0x0, c_flags = 0, c_iflags = 272, c_cpu = 2}, td_frame = 0xfffffe0466890ac0, td_kstack_obj = 0xfffff803af9b7200, td_kstack = 18446741893586407424, td_kstack_pages = 4, td_critnest = 1, td_md = {md_spinlock_count = 1, md_saved_flags = 582, md_spurflt_addr = 0}, td_sched = 0xfffff8033a17e480, td_ar = 0x0, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0xfffff8037c56e700, td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_intr_frame = 0x0, td_rfppwait_p = 0xfffff802656e09e0, td_ma = 0x0, td_ma_cnt = 0, td_su = 0x0, td_dbg_sc_code = 0, td_dbg_sc_narg = 0, td_emuldata = 0x0} And another stuck du process: (kgdb) thread 795 [Switching to thread 795 (Thread 101745)]#0 sched_switch (td=0xfffff8006f2e84b0, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 1945 cpuid = PCPU_GET(cpuid); (kgdb) bt #0 sched_switch (td=0xfffff8006f2e84b0, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 #1 0xffffffff80959ff1 in mi_switch (flags=260, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:491 #2 0xffffffff8099b31a in sleepq_wait (wchan=0x0, pri=0) at /usr/src/sys/kern/subr_sleepqueue.c:617 #3 0xffffffff80959a17 in _sleep (ident=<value optimized out>, lock=<value optimized out>, priority=<value optimized out>, wmesg=<value optimized out>, sbt=<value optimized out>, pr=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/kern_synch.c:255 #4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10, isize=16, td=0xfffff8006f2e84b0) at /usr/src/sys/vm/vnode_pager.c:120 #5 0xffffffff81a7d5f5 in zfs_freebsd_open (ap=0xfffffe046690d658) at /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:6076 #6 0xffffffff80e7e651 in VOP_OPEN_APV (vop=<value optimized out>, a=<value optimized out>) at vnode_if.c:469 #7 0xffffffff80a05694 in vn_open_vnode (vp=0xfffff801d3e11b10, fmode=1179653, cred=0xfffff802f6e2f000, td=0xfffff8006f2e84b0, fp=0xfffff80104fcc5a0) at vnode_if.h:196 #8 0xffffffff80a0528a in vn_open_cred (ndp=0xfffffe046690d870, flagp=0xfffffe046690d94c, cmode=<value optimized out>, vn_open_flags=<value optimized out>, cred=0x0, fp=0xfffff80104fcc5a0) at /usr/src/sys/kern/vfs_vnops.c:268 #9 0xffffffff809fe44f in kern_openat (td=0xfffff8006f2e84b0, fd=-100, path=0x801422d48 <Error reading address 0x801422d48: Bad address>, pathseg=UIO_USERSPACE, flags=1179653, mode=<value optimized out>) at /usr/src/sys/kern/vfs_syscalls.c:1093 #10 0xffffffff80d55aff in amd64_syscall (td=0xfffff8006f2e84b0, traced=0) at subr_syscall.c:141 #11 0xffffffff80d3ad7b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:396 #12 0x0000000800b7570a in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) frame 4 #4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10, isize=16, td=0xfffff8006f2e84b0) at /usr/src/sys/vm/vnode_pager.c:120 120 VM_OBJECT_SLEEP(object, object, PDROP | PVM, "vodead", 0); (kgdb) p *vp $7 = {v_tag = 0xffffffff81aecb3b "zfs", v_op = 0xffffffff81aff790, v_data = 0xfffff801a2d1ca10, v_mount = 0xfffff8011a46e660, v_nmntvnodes = {tqe_next = 0xfffff801a53281d8, tqe_prev = 0xfffff801a6b8e5a8}, v_un = { vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = {lh_first = 0xfffff80019708850}, v_cache_dst = {tqh_first = 0xfffff80257063850, tqh_last = 0xfffff80257063870}, v_cache_dd = 0xfffff80257063850, v_lock = {lock_object = {lo_name = 0xffffffff81aecb3b "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0}, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = 0xffffffff80fe1a6c "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 0xfffff801d3e11b78, v_actfreelist = {tqe_next = 0xfffff801a6b8e588, tqe_prev = 0xfffff801a5328298}, v_bufobj = {bo_lock = {lock_object = {lo_name = 0xffffffff80fe1a7c "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops = 0xffffffff814b3900, bo_object = 0xfffff801fd0ea100, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xfffff801d3e11b10, __bo_vnode = 0xfffff801d3e11b10, bo_clean = {bv_hd = { tqh_first = 0x0, tqh_last = 0xfffff801d3e11c30}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff801d3e11c50}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfffff801d3e11c98}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 127, v_usecount = 126, v_iflag = 512, v_vflag = 0, v_writecount = 0, v_hash = 7955980, v_type = VDIR} (kgdb) p *td $8 = {td_lock = 0xffffffff81619670, td_proc = 0xfffff80009eb59e0, td_plist = {tqe_next = 0x0, tqe_prev = 0xfffff80009eb59f0}, td_runq = {tqe_next = 0x0, tqe_prev = 0xffffffff815e87a8}, td_slpq = { tqe_next = 0xfffff803cfb27960, tqe_prev = 0xfffff8006f2e8990}, td_lockq = {tqe_next = 0x0, tqe_prev = 0xfffffe04662f90c8}, td_hash = {le_next = 0x0, le_prev = 0xfffffe000087ab88}, td_cpuset = 0xfffff800083cd3a8, td_sel = 0xfffff80290088500, td_sleepqueue = 0x0, td_turnstile = 0xfffff800b09c4840, td_rlqe = 0xfffff8015a73a910, td_umtxq = 0xfffff8012d91b280, td_tid = 101745, td_sigqueue = {sq_signals = {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffff8006f2e8568}, sq_proc = 0xfffff80009eb59e0, sq_flags = 1}, td_lend_user_pri = 255 'ÿ', td_flags = 133124, td_inhibitors = 2, td_pflags = 0, td_dupfd = -1, td_sqqueue = 0, td_wchan = 0xfffff801fd0ea100, td_wmesg = 0xffffffff81006e8d "vodead", td_lastcpu = 7 '\a', td_oncpu = 255 'ÿ', td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', td_locks = 3768, td_rw_rlocks = 0, td_lk_slocks = 0, td_stopsched = 0, td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0, td_ucred = 0xfffff802f6e2f000, td_estcpu = 0, td_slptick = 729400807, td_blktick = 0, td_swvoltick = 729400807, td_cow = 41, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 1696, ru_ixrss = 312, ru_idrss = 104, ru_isrss = 3328, ru_minflt = 121, ru_majflt = 0, ru_nswap = 0, ru_inblock = 840, ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 795, ru_nivcsw = 0}, td_rux = { rux_runtime = 547891773, rux_uticks = 1, rux_sticks = 25, rux_iticks = 0, rux_uu = 8428, rux_su = 210723, rux_tu = 219152}, td_incruntime = 0, td_runtime = 547891773, td_pticks = 0, td_sticks = 0, td_iticks = 0, td_uticks = 0, td_intrval = 0, td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_generation = 795, td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 4}, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_name = "du", '\0' <repeats 17 times>, td_fpop = 0x0, td_dbgflags = 0, td_dbgksi = {ksi_link = {tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = { _band = 0}, __spare__ = {__spare1__ = 0, __spare2__ = {0, 0, 0, 0, 0, 0, 0}}}}, ksi_flags = 0, ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}}, td_map_def_user = 0x0, td_dbg_forked = 0, td_vp_reserv = 0, td_no_sleeping = 0, td_dom_rr_idx = 0, td_sigmask = {__bits = {0, 0, 0, 0}}, td_rqindex = 32 ' ', td_base_pri = 120 'x', td_priority = 120 'x', td_pri_class = 3 '\003', td_user_pri = 130 '\202', td_base_user_pri = 130 '\202', td_pcb = 0xfffffe046690db80, td_state = TDS_INHIBITED, td_retval = {0, 1}, td_slpcallout = {c_links = {le = { le_next = 0x0, le_prev = 0xfffffe0000ac29b0}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xfffffe0000ac29b0}}, c_time = 21579501454196634, c_precision = 268435437, c_arg = 0xfffff8006f2e84b0, c_func = 0xffffffff8099b0d0 <sleepq_timeout>, c_lock = 0x0, c_flags = 0, c_iflags = 272, c_cpu = 9}, td_frame = 0xfffffe046690dac0, td_kstack_obj = 0xfffff80198fc0e00, td_kstack = 18446741893586919424, td_kstack_pages = 4, td_critnest = 1, td_md = {md_spinlock_count = 1, md_saved_flags = 582, md_spurflt_addr = 0}, td_sched = 0xfffff8006f2e8930, td_ar = 0x0, td_lprof = {{lh_first = 0x0}, {lh_first = 0x0}}, td_dtrace = 0xfffff802a4f1fc00, td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_intr_frame = 0x0, td_rfppwait_p = 0xfffff803cb6dd4f0, td_ma = 0x0, td_ma_cnt = 0, td_su = 0x0, td_dbg_sc_code = 0, td_dbg_sc_narg = 0, td_emuldata = 0x0} We are replicating this filesystem using incremental ZFS snapshots onto a secondary host (using zrep). I can mount the replicated fs and read the file there without issue. I am currently compiling the latest -STABLE kernel and am scheduling a reboot tonight.
(In reply to roel from comment #22) It would be interesting to see *vp->v_bufobj.bo_object. Also, it's better to add a really long output as an attachment rather than pasting it into a comment.
Here it is, for both du processes listed above: (kgdb) thread 555 (kgdb) frame 4 #4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10, isize=16, td=0xfffff8033a17e000) at /usr/src/sys/vm/vnode_pager.c:120 120 VM_OBJECT_SLEEP(object, object, PDROP | PVM, "vodead", 0); (kgdb) p *vp->v_bufobj.bo_object $1 = {lock = {lock_object = {lo_name = 0xffffffff810062d6 "vm object", lo_flags = 90374144, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, object_list = {tqe_next = 0xfffff801fd0ea200, tqe_prev = 0xfffff801fd0ea020}, shadow_head = {lh_first = 0x0}, shadow_list = {le_next = 0x0, le_prev = 0xfffff8022d69bb38}, memq = {tqh_first = 0x0, tqh_last = 0xfffff801fd0ea148}, rtree = {rt_root = 0, rt_flags = 0 '\0'}, size = 1, generation = 1, ref_count = 0, shadow_count = 0, memattr = 6 '\006', type = 2 '\002', flags = 16392, pg_color = 21, paging_in_progress = 0, resident_page_count = 0, backing_object = 0x0, backing_object_offset = 0, pager_object_list = { tqe_next = 0x0, tqe_prev = 0x0}, rvq = {lh_first = 0x0}, cache = {rt_root = 0, rt_flags = 0 '\0'}, handle = 0xfffff801d3e11b10, un_pager = {vnp = {vnp_size = 16, writemappings = 0}, devp = {devp_pglist = { tqh_first = 0x10, tqh_last = 0x0}, ops = 0x0, dev = 0x0}, sgp = {sgp_pglist = {tqh_first = 0x10, tqh_last = 0x0}}, swp = {swp_tmpfs = 0x10, swp_bcount = 0}}, cred = 0x0, charge = 0} (kgdb) thread 795 [Switching to thread 795 (Thread 101745)]#0 sched_switch (td=0xfffff8006f2e84b0, newtd=<value optimized out>, flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1945 1945 cpuid = PCPU_GET(cpuid); (kgdb) frame 4 #4 0xffffffff80bed620 in vnode_create_vobject (vp=0xfffff801d3e11b10, isize=16, td=0xfffff8006f2e84b0) at /usr/src/sys/vm/vnode_pager.c:120 120 VM_OBJECT_SLEEP(object, object, PDROP | PVM, "vodead", 0); (kgdb) p *vp->v_bufobj.bo_object $2 = {lock = {lock_object = {lo_name = 0xffffffff810062d6 "vm object", lo_flags = 90374144, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, object_list = {tqe_next = 0xfffff801fd0ea200, tqe_prev = 0xfffff801fd0ea020}, shadow_head = {lh_first = 0x0}, shadow_list = {le_next = 0x0, le_prev = 0xfffff8022d69bb38}, memq = {tqh_first = 0x0, tqh_last = 0xfffff801fd0ea148}, rtree = {rt_root = 0, rt_flags = 0 '\0'}, size = 1, generation = 1, ref_count = 0, shadow_count = 0, memattr = 6 '\006', type = 2 '\002', flags = 16392, pg_color = 21, paging_in_progress = 0, resident_page_count = 0, backing_object = 0x0, backing_object_offset = 0, pager_object_list = { tqe_next = 0x0, tqe_prev = 0x0}, rvq = {lh_first = 0x0}, cache = {rt_root = 0, rt_flags = 0 '\0'}, handle = 0xfffff801d3e11b10, un_pager = {vnp = {vnp_size = 16, writemappings = 0}, devp = {devp_pglist = { tqh_first = 0x10, tqh_last = 0x0}, ops = 0x0, dev = 0x0}, sgp = {sgp_pglist = {tqh_first = 0x10, tqh_last = 0x0}}, swp = {swp_tmpfs = 0x10, swp_bcount = 0}}, cred = 0x0, charge = 0}
+1 Same/similar enviroment like others - zfs mirrored pool with multiple ssd disks - exported over NFS - random directories are 'stuck' after a couple of days - ls works fine, but accessing resluts in vodead state and newnfs state on client this is on FreeBSD 10.3-RELEASE I'll try to get the debug info later.
A commit references this bug: Author: kib Date: Tue Jun 21 17:49:33 UTC 2016 New revision: 302063 URL: https://svnweb.freebsd.org/changeset/base/302063 Log: The vmtotal sysctl handler marks active vm objects to calculate statistics. Marking is done by setting the OBJ_ACTIVE flag. The flags change is locked, but the problem is that many parts of system assume that vm object initialization ensures that no other code could change the object, and thus performed lockless. The end result is corrupted flags in vm objects, most visible is spurious OBJ_DEAD flag, causing random hangs. Avoid the active object marking, instead provide equally inexact but immutable is_object_alive() definition for the object mapped state. Avoid iterating over the processes mappings altogether by using arguably improved definition of the paging thread as one which sleeps on the v_free_count. PR: 204764 Diagnosed by: pho Tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb) Changes: head/sys/vm/vm_meter.c head/sys/vm/vm_object.h
Anyone aware if this will be released as a patch for 10.3 through freebsd-update or if one should go patch the diff? :)
Created attachment 173037 [details] patch to apply r302063 on FreeBSD-10 Due to the lack of r263260 ("Rename global cnt to vm_cnt to avoid shadowing.") the patch for r302063 cannot be applied directly to FreeBSD-10. I created a patch that does not rely on r263260, so users of FreeBSD-10 can test it, too. with kind regards, Robert Schulze
(In reply to Robert Schulze from comment #28) kib MFCd r302063 into stable/10 a few weeks ago, 302243, so custom patching isn't needed if you bring your sources up to date. You are correct though cnt.v_free_count was renamed and should be the only difference between 10 and 11.
https://lists.freebsd.org/pipermail/freebsd-stable/2016-August/085150.html
Is it possible to have this for 10.3 as errata patch?
Upgrading to 10-STABLE solve this issue for me.
Yet another data point: Our environment: Dell R730 servers with ~2TB of RAID10 disk arrays running FreeBSD 10.3-RELEASE and postgresql92-server. UFS2 filesystems. We applied Kostik's patch from https://lists.freebsd.org/pipermail/freebsd-stable/2016-August/085150.html Before this we were seeing the VODEAD problem about 2--3 times per week. Since patching we've run for 16 days with out problems. A definite +1 for issuing this patch as an Errata Note.
Since the problem has been fixed, I think this PR can be closed now.