Bug 200992 - proccess won't die in thread_suspend_switch
Summary: proccess won't die in thread_suspend_switch
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: threads (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-threads (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2015-06-20 15:13 UTC by Steve Wills
Modified: 2017-04-26 14:33 UTC (History)
12 users (show)

See Also:


Attachments
Return true from callout_stop() if not_on_a_list is true. (52.37 KB, patch)
2015-06-21 06:31 UTC, Konstantin Belousov
no flags Details | Diff
Return true from callout_stop() if not_on_a_list is true. (806 bytes, patch)
2015-06-21 06:34 UTC, Konstantin Belousov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Wills freebsd_committer 2015-06-20 15:13:36 UTC
I'm running 11-CURRENT r283640 and was trying to restart consul running in a jail, but it won't die. ps shows:

  PID     JID USERNAME       THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
 4992      23    469           2  20   15 69972K 11200K STOP    4  45:31   0.00% consul

procstat shows:

% sudo procstat -k 4992
  PID    TID COMM             TDNAME           KSTACK                       
 4992 101861 consul           -                mi_switch sleepq_timedwait_sig _sleep umtxq_sleep do_wait __umtx_op_wait_uint_private ia32_syscall Xint0x80_syscall 
 4992 101934 consul           -                mi_switch thread_suspend_switch thread_single exit1 sys_sys_exit ia32_syscall Xint0x80_syscall 

I can leave the process running for a while if debugging is needed. Is this a thread issue? If not, any ideas on what might be causing the issue would be appreciated.
Comment 1 Steve Wills freebsd_committer 2015-06-20 23:30:04 UTC
As another piece of data to this, I just remembered that this process was started with idle priority via idprio -t and nice via nice -n 15. Perhaps it's a scheduler issue?
Comment 2 Konstantin Belousov freebsd_committer 2015-06-21 05:10:26 UTC
(In reply to Steve Wills from comment #0)
This might be a callout issue.

Find the address of the process and thread structures:
ps axHS -o pid,lwp,paddr,tdaddr 4992
Output is like this
 PID    LWP            PADDR           TDADDR
1485 100311 fffff8015934d4e8 fffff8016d006000
1485 100313 fffff8015934d4e8 fffff8011d7dd4a0

Third column is the struct proc address, fourth - the struct thread address.

Attach to the live system with kgdb:
kgdb .../kernel.debug /dev/mem
Do
p *(struct proc *)0x<proc addr>
p *(struct thread *)0x<thread addr for lwp 101861>
Comment 3 Steve Wills freebsd_committer 2015-06-21 05:51:20 UTC
(In reply to Konstantin Belousov from comment #2)
% ps axHS -o pid,lwp,paddr,tdaddr 4992
PID    LWP            PADDR           TDADDR
4992 101861 fffff801262a5000 fffff804173704b0
4992 101934 fffff801262a5000 fffff801264a2000

(kgdb) p *(struct proc *)0xfffff801262a5000
$1 = {p_list = {le_next = 0xfffff80126273000, le_prev = 0xfffff80035c43538}, p_threads = {tqh_first = 0xfffff804173704b0, tqh_last = 0xfffff801264a2010}, p_slock = {lock_object = {lo_name = 0xffffffff810c5281 "process slock", lo_flags = 537067520, lo_data = 0, 
      lo_witness = 0x0}, mtx_lock = 4}, p_ucred = 0xfffff8002ca16000, p_fd = 0xfffff8012628c890, p_fdtol = 0x0, p_stats = 0xfffff80045451200, p_limit = 0xfffff8002ca94200, p_limco = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = {
        tqe_next = 0x0, tqe_prev = 0x0}}, c_time = 0, c_precision = 0, c_arg = 0x0, c_func = 0, c_lock = 0xfffff801262a5120, c_flags = 0, c_iflags = 0, c_cpu = 0}, p_sigacts = 0xfffff804172b0000, p_flag = 268977281, p_flag2 = 0, p_state = PRS_NORMAL, p_pid = 4992, 
  p_hash = {le_next = 0x0, le_prev = 0xfffffe0000c9ac00}, p_pglist = {le_next = 0x0, le_prev = 0xfffff801262730d0}, p_pptr = 0xfffff80126273000, p_sibling = {le_next = 0x0, le_prev = 0xfffff801262730f8}, p_children = {lh_first = 0x0}, p_reaper = 0xfffff8000943e538, 
  p_reaplist = {lh_first = 0x0}, p_reapsibling = {le_next = 0xfffff80126273000, le_prev = 0xfffff80035c43648}, p_mtx = {lock_object = {lo_name = 0xffffffff810c5274 "process lock", lo_flags = 558039040, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_statmtx = {
    lock_object = {lo_name = 0xffffffff810c528f "pstatl", lo_flags = 537067520, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_itimmtx = {lock_object = {lo_name = 0xffffffff810c5296 "pitiml", lo_flags = 537067520, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, 
  p_profmtx = {lock_object = {lo_name = 0xffffffff810c529d "pprofl", lo_flags = 537067520, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, p_ksi = 0xfffff8002c6ee4d0, p_sigqueue = {sq_signals = {__bits = 0xfffff801262a51a8}, sq_kill = {__bits = 0xfffff801262a51b8}, 
    sq_list = {tqh_first = 0xfffff8013a3e4230, tqh_last = 0xfffff80076325070}, sq_proc = 0xfffff801262a5000, sq_flags = 1}, p_oppid = 0, p_vmspace = 0xfffff803ddbd5000, p_swtick = 2146978202, p_realtimer = {it_interval = {tv_sec = 0, tv_usec = 0}, it_value = {
      tv_sec = 0, tv_usec = 0}}, p_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 22640, ru_ixrss = 3584989440, ru_idrss = 80911220, ru_isrss = 39833216, ru_minflt = 43074, ru_majflt = 10724, ru_nswap = 0, 
    ru_inblock = 5065, ru_oublock = 173674, ru_msgsnd = 159, ru_msgrcv = 140, ru_nsignals = 1, ru_nvcsw = 87242807, ru_nivcsw = 347866}, p_rux = {rux_runtime = 7369363977561, rux_uticks = 235739, rux_sticks = 124672, rux_iticks = 0, rux_uu = 1786036539, 
    rux_su = 944556256, rux_tu = 2730592796}, p_crux = {rux_runtime = 0, rux_uticks = 0, rux_sticks = 0, rux_iticks = 0, rux_uu = 0, rux_su = 0, rux_tu = 0}, p_profthreads = 0, p_exitthreads = 0, p_traceflag = 0, p_tracevp = 0x0, p_tracecred = 0x0, 
  p_textvp = 0xfffff804172531d8, p_lock = 0, p_sigiolst = {slh_first = 0x0}, p_sigparent = 20, p_sig = 0, p_code = 0, p_stops = 0, p_stype = 0, p_step = 0 '\0', p_pfsflags = 0 '\0', p_nlminfo = 0x0, p_aioinfo = 0x0, p_singlethread = 0xfffff801264a2000, p_suspcount = 1, 
  p_xthread = 0x0, p_boundary_count = 0, p_pendingcnt = 3, p_itimers = 0x0, p_procdesc = 0x0, p_treeflag = 0, p_magic = 3203398350, p_osrel = 1001000, p_comm = 0xfffff801262a53c4 "consul", p_pgrp = 0xfffff80035a7a380, p_sysent = 0xffffffff815db2d8, 
  p_args = 0xfffff800455f2300, p_cpulimit = 9223372036854775807, p_nice = 0 '\0', p_fibnum = 0, p_reapsubtree = 18, p_xstat = 0, p_klist = {kl_list = {slh_first = 0x0}, kl_lock = 0xffffffff80985350 <knlist_mtx_lock>, kl_unlock = 0xffffffff809853b0 <knlist_mtx_unlock>, 
    kl_assert_locked = 0xffffffff80985420 <knlist_mtx_assert_locked>, kl_assert_unlocked = 0xffffffff80985430 <knlist_mtx_assert_unlocked>, kl_lockarg = 0xfffff801262a5120}, p_numthreads = 2, p_md = {md_ldt = 0x0, md_ldt_sd = {sd_lolimit = 0, sd_lobase = 0, 
      sd_type = 0, sd_dpl = 0, sd_p = 0, sd_hilimit = 0, sd_xx0 = 0, sd_gran = 0, sd_hibase = 0, sd_xx1 = 0, sd_mbz = 0, sd_xx2 = 0}}, p_itcallout = {c_links = {le = {le_next = 0x0, le_prev = 0x0}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0x0}}, 
    c_time = 0, c_precision = 0, c_arg = 0x0, c_func = 0, c_lock = 0xfffff801262a5120, c_flags = 0, c_iflags = 0, c_cpu = 0}, p_acflag = 0, p_peers = 0x0, p_leader = 0xfffff801262a5000, p_emuldata = 0x0, p_label = 0x0, p_sched = 0xfffff801262a5538, p_ktr = {
    stqh_first = 0x0, stqh_last = 0xfffff801262a54c8}, p_mqnotifier = {lh_first = 0x0}, p_dtrace = 0xfffff80035a58280, p_pwait = {cv_description = 0xffffffff810c5d30 "ppwait", cv_waiters = 0}, p_dbgwait = {cv_description = 0xffffffff810c5d37 "dbgwait", cv_waiters = 0}, 
  p_prev_runtime = 0, p_racct = 0x0, p_throttled = 0 '\0', p_orphan = {le_next = 0x0, le_prev = 0x0}, p_orphans = {lh_first = 0x0}}

(kgdb) p *(struct thread *)0xfffff804173704b0
$2 = {td_lock = 0xffffffff816dd680, td_proc = 0xfffff801262a5000, td_plist = {tqe_next = 0xfffff801264a2000, tqe_prev = 0xfffff801262a5010}, td_runq = {tqe_next = 0x0, tqe_prev = 0xffffffff816dd938}, td_slpq = {tqe_next = 0x0, tqe_prev = 0xfffff80126242d00}, 
  td_lockq = {tqe_next = 0x0, tqe_prev = 0xfffffe06683b2a28}, td_hash = {le_next = 0x0, le_prev = 0xfffffe0000cb1f28}, td_cpuset = 0xfffff8004501dc60, td_sel = 0x0, td_sleepqueue = 0xfffff80126242d00, td_turnstile = 0xfffff8004527e000, td_rlqe = 0x0, 
  td_umtxq = 0xfffff80045282000, td_tid = 101861, td_sigqueue = {sq_signals = {__bits = 0xfffff80417370548}, sq_kill = {__bits = 0xfffff80417370558}, sq_list = {tqh_first = 0x0, tqh_last = 0xfffff80417370568}, sq_proc = 0xfffff801262a5000, sq_flags = 1}, 
  td_lend_user_pri = 255 '�', td_flags = 165908, td_inhibitors = 2, td_pflags = 32, td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0, td_wmesg = 0x0, td_lastcpu = 4, td_oncpu = -1, td_owepreempt = 0 '\0', td_tsqueue = 0 '\0', td_locks = 3, td_rw_rlocks = 0, 
  td_lk_slocks = 0, td_stopsched = 0, td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0}, td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0, td_ucred = 0xfffff8002ca16000, td_estcpu = 0, td_slptick = 0, td_blktick = 0, 
  td_swvoltick = -1851235621, td_cow = 0, td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, ru_maxrss = 22640, ru_ixrss = 129738240, ru_idrss = 2928120, ru_isrss = 1441536, ru_minflt = 3, ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, 
    ru_oublock = 0, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 14928279, ru_nivcsw = 12043}, td_rux = {rux_runtime = 259991945848, rux_uticks = 4063, rux_sticks = 7199, rux_iticks = 0, rux_uu = 34755066, rux_su = 61580537, rux_tu = 96335604}, 
  td_incruntime = 0, td_runtime = 259991945848, td_pticks = 0, td_sticks = 0, td_iticks = 0, td_uticks = 0, td_intrval = 0, td_oldsigmask = {__bits = 0xfffff804173706fc}, td_generation = 14940322, td_sigstk = {
    ss_sp = 0xc208020000 <Error reading address 0xc208020000: Bad address>, ss_size = 32768, ss_flags = 0}, td_xsig = 0, td_profil_addr = 0, td_profil_ticks = 0, td_name = 0xfffff8041737073c "consul", td_fpop = 0x0, td_dbgflags = 0, td_dbgksi = {ksi_link = {
      tqe_next = 0x0, tqe_prev = 0x0}, ksi_info = {si_signo = 0, si_errno = 0, si_code = 0, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x0, si_value = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason = {_fault = {_trapno = 0}, _timer = {
          _timerid = 0, _overrun = 0}, _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 0, __spare2__ = 0xfffff804173707a0}}}, ksi_flags = 0, ksi_sigq = 0x0}, td_ng_outbound = 0, td_osd = {osd_nslots = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, 
      le_prev = 0x0}}, td_map_def_user = 0x0, td_dbg_forked = 0, td_vp_reserv = 0, td_no_sleeping = 0, td_dom_rr_idx = 0, td_su = 0x0, td_sigmask = {__bits = 0xfffff80417370818}, td_rqindex = 33 '!', td_base_pri = 120 'x', td_priority = 120 'x', 
  td_pri_class = 3 '\003', td_user_pri = 120 'x', td_base_user_pri = 120 'x', td_pcb = 0xfffffe066828bcc0, td_state = TDS_INHIBITED, td_uretoff = {tdu_retval = 0xfffff80417370840, tdu_off = 0}, td_slpcallout = {c_links = {le = {le_next = 0x0, 
        le_prev = 0xfffffe0000d57810}, sle = {sle_next = 0x0}, tqe = {tqe_next = 0x0, tqe_prev = 0xfffffe0000d57810}}, c_time = 1274956786070201, c_precision = 5368, c_arg = 0xfffff804173704b0, c_func = 0xffffffff80a25ff0 <sleepq_timeout>, c_lock = 0x0, c_flags = 0, 
    c_iflags = 272, c_cpu = 3}, td_frame = 0xfffffe066828bc00, td_kstack_obj = 0xfffff803dd2bed00, td_kstack = 18446741902203584512, td_kstack_pages = 4, td_critnest = 1, td_md = {md_spinlock_count = 1, md_saved_flags = 582, md_spurflt_addr = 0}, 
  td_sched = 0xfffff80417370930, td_ar = 0x0, td_lprof = 0xfffff804173708d8, td_dtrace = 0xfffff800455f2000, td_errno = 0, td_vnet = 0x0, td_vnet_lpush = 0x0, td_intr_frame = 0x0, td_rfppwait_p = 0x0, td_ma = 0x0, td_ma_cnt = 0, td_emuldata = 0x0}
Comment 4 Konstantin Belousov freebsd_committer 2015-06-21 06:31:11 UTC
The issue is that callout_stop() call in the sleepq_check_timeout() returned zero, which made the thread to (in)voluntarily relinguish the CPU.  But apparently the thread sleep callout was not really run, which means that there is nobody to make the thread runnable again.

The situation was already fixed once, it is the reason for the CALLOUT_DFRMIGRATION existence.

For me, it looks as if the r278469 could be a cause.  The added block which calculates the not_on_a_list means that sometimes previously pending callout is not longer pending, but also that we return 0 in this case.I think that in case not_on_a_list == 1, we _must_ return 1 from callout_stop().

I am attaching the non-tested proof of concept patch to try.
Comment 5 Konstantin Belousov freebsd_committer 2015-06-21 06:31:56 UTC
Created attachment 157913 [details]
Return true from callout_stop() if not_on_a_list is true.
Comment 6 Konstantin Belousov freebsd_committer 2015-06-21 06:34:38 UTC
Created attachment 157915 [details]
Return true from callout_stop() if not_on_a_list is true.
Comment 7 Allan Jude freebsd_committer 2015-06-21 14:59:23 UTC
I think I made have a similar issue (not in a jail):
# uname -a
FreeBSD Nexus.HML3.ScaleEngine.net 11.0-CURRENT FreeBSD 11.0-CURRENT #0 r283896: Mon Jun  1 22:08:43 UTC 2015     root@Nexus.HML3.ScaleEngine.net:/usr/obj/usr/src/sys/GENERIC-NODEBUG  amd64


  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
92435 root          2  40   20  1792M  1600M STOP    0 172.2H  96.56% VBoxHeadl

# procstat -k 92435
  PID    TID COMM             TDNAME           KSTACK
92435 100784 VBoxHeadless     -                mi_switch thread_suspend_switch thread_single exit1 sigexit postsig ast doreti_ast
92435 100841 VBoxHeadless     -                <running>


# ps axHS -o pid,lwp,paddr,tdaddr 92435
  PID    LWP            PADDR           TDADDR
92435 100784 fffff800dab5ea70 fffff801f75194b0
92435 100841 fffff800dab5ea70 fffff8016bad3960


as soon as I started kgdb, it spat out:
#0  sched_switch (td=0xfffff8000b87b4b0, newtd=<value optimized out>,
    flags=<value optimized out>) at /usr/src/sys/kern/sched_ule.c:1956
1956                    cpuid = PCPU_GET(cpuid);


(kgdb) set print pretty on
(kgdb) p *(struct proc*)0xfffff800dab5ea70
http://slexy.org/view/s20fuKiBZz

(kgdb) p *(struct thread *)0xfffff801f75194b0
http://slexy.org/view/s21oIiroY8

(kgdb) p *(struct thread *)0xfffff8016bad3960
http://slexy.org/view/s20GWaINNw
Comment 8 Allan Jude freebsd_committer 2015-06-21 15:00:22 UTC
# ps auxwHS 92435
USER   PID  %CPU %MEM     VSZ     RSS TT  STAT STARTED        TIME COMMAND
root 92435 100.0 19.7 1835244 1638368  0- RN+  13Jun15 11706:05.98 /usr/local/lib/virtualbox/VBoxHeadless -s fudo2.0
root 92435   0.0 19.7 1835244 1638368  0- TN+  13Jun15     0:00.03 /usr/local/lib/virtualbox/VBoxHeadless -s fudo2.0
Comment 9 Allan Jude freebsd_committer 2015-06-21 15:01:01 UTC
UID   PID  PPID CPU PRI NI     VSZ     RSS MWCHAN STAT TT         TIME COMMAND
  0 92435 92393   0  40 20 1835244 1638368 -      TN+   0-     0:00.03 /usr/local/lib/virtualbox/VBoxHeadless -s fudo2.0
  0 92435 92393   0 122 20 1835244 1638368 -      RN+   0- 11706:30.58 /usr/local/lib/virtualbox/VBoxHeadless -s fudo2.0
Comment 10 Konstantin Belousov freebsd_committer 2015-06-21 15:08:40 UTC
(In reply to Allan Jude from comment #7)
This has absolutely nothing to do with the issue reported by swills.

Yours is most likely genuine virtualbox bug, which somehow managed to enter infinite loop in head.
Comment 11 Konstantin Belousov freebsd_committer 2015-06-21 15:09:39 UTC
(In reply to Konstantin Belousov from comment #10)
s/in head/in kernel/.
Comment 12 Hans Petter Selasky freebsd_committer 2015-08-28 07:13:05 UTC
Please use a mutex protected callout to handle this! And all the corner cases will go away. See projects/hps_head .

--HPS
Comment 13 Konstantin Belousov freebsd_committer 2015-08-28 07:30:29 UTC
(In reply to Hans Petter Selasky from comment #12)
This is an layer that makes the locks work, there is no mutexes which you can lock safely.  Even if the callout KPI is extended to support interlocking with spinlock, this would still not provide anything to the case, because the only natural spinlock there is thread lock, which is not real spinlock and useless for the interlocking pattern.

The case there is inherently racy, similar to how locks implementations are racy due to the nature of the code, and must be correctly supported.
Comment 14 Hans Petter Selasky freebsd_committer 2015-08-28 08:10:00 UTC
(In reply to Konstantin Belousov from comment #13)

Hi,

You can always use a spinlock there, though -current doesn't support passing spinlocks to callouts, while projects/hps_head does.

--HPS
Comment 15 johans 2015-11-11 14:50:12 UTC
I've been seeing almost identical problems on 10.2-RELEASE machines which run under KVM virtualisation. I've had to reboot roughly 2 machines per day as a workaround due to processes like Apache, HAProxy and Postfix getting stuck and blocking any new instances starting up due to the listening sockets getting trapped with the stuck processes.

Applying the patch of this PR fixed all our problems.

Are there any concrete reasons / problems to not go forward with this patch? If not, I would like to suggest to get this committed with MFC to stable/10 and nominate this for an errata. This problem makes running 10.2-RELEASE highly problematic in virtualised environments with a decent workload.
Comment 16 johans 2015-11-11 15:42:59 UTC
To give a little more context on how this patch is tested, I'm now running this patch on multiple machines in production, each running roughly an excess of a few hundred mbit. All functioning as load balancers. The first rollout of this patch has been running for a few days already without any issues.
Comment 17 Steve Wills freebsd_committer 2015-11-11 16:33:59 UTC
(In reply to johans from comment #15)
The delay may be partially my fault, as I hadn't responded to the request for feedback regarding the patch solving my original issue. I hadn't experienced the issue very frequently (not nearly as frequently as you) so even though I had been running my system with the patch for a while, I wasn't confident it had been resolved. That said, it has been a while now and I haven't seen the issue. Given how it has solved the issue for you as well, we should definitely look at getting it committed and merged, and perhaps an errata issued.
Comment 18 Mark Linimon freebsd_committer freebsd_triage 2015-11-12 07:50:55 UTC
Adding a note from a recent posting to freebsd-stable@:

From: Johan Schuijt-Li <johan@300.nl>

The patch attached [here] solved all our problems.
Comment 19 johans 2015-11-12 08:29:27 UTC
(In reply to Mark Linimon from comment #18)
Hey Mark, that would be me, I was already active in this PR ;)
Comment 20 Dmitry Sivachenko freebsd_committer 2015-11-18 08:13:19 UTC
I have also experienced similar issues (unkillable processes).

With that patch applied I cannot reproduce this.
Comment 21 johans 2015-11-18 10:35:18 UTC
Small update from our end, we're now trying a different patch which I received from kib, will keep updating this thread with any tests we do + results.
Comment 22 johans 2015-11-19 10:04:06 UTC
(In reply to johans from comment #21)
The mentioned second patch didn't solve the problem, we've reverted back to the patch attached to this PR.
Comment 23 johans 2015-12-04 10:00:21 UTC
I've been testing a third patch which also solved the problem. I'll leave it up to kib to draw his conclusions from this on what patch should do what.
Comment 24 johans 2015-12-08 09:46:22 UTC
I'm suspecting that the patch (patch 1) attached here doesn't fully fix the problem, but only greatly reduces when it happens. I've encountered two issues since yesterday on different machines:

(1) This one is pretty straightforward related: I've had one puppet process get stuck in STOP state, unfortunately my colleague didn't to a procstat to see the exact trace but rather rebooted the machine.

(2) This one I'm unsure: I'm now debugging a hang in unmount which traces to a wait in zfs: dp->dp_spaceavail_cv

Running dtrace on txg group syncing shows that there is no dirty data left, or at least that it's below the max.
2015 Dec  8 10:39:21 :    0KB of    8MB used
2015 Dec  8 10:39:26 :    0KB of    8MB used
2015 Dec  8 10:39:31 :    0KB of    8MB used


Wake-up should be done by 'dsl_pool_dirty_delta':

    if (dp->dp_dirty_total <= zfs_dirty_data_max)
        cv_signal(&dp->dp_spaceavail_cv);    

This condition has been clearly met. Having this bug in the back of my head it seemed this might be related.
Comment 25 Antonio Nikishaev 2016-02-29 12:59:46 UTC
This bug is rather major.

I'm seeing the same issue with DigitalOcean instances (10.2-RELEASE(!)).
Riak2 nodes randomly hang up every several days.


% uname -a
FreeBSD riak-freebsd-8gb-nyc3-01 10.2-RELEASE FreeBSD 10.2-RELEASE #0 r286666: Wed Aug 12 15:26:37 UTC 2015     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

% sudo procstat -k 76975
  PID    TID COMM             TDNAME           KSTACK                       
76975 101201 beam.smp         -                mi_switch sleepq_timedwait _cv_timedwait_sbt amd64_syscall Xfast_syscall 
76975 101317 beam.smp         -                mi_switch thread_suspend_switch thread_single exit1 sigexit postsig ast doreti_ast 

% ps -auxw -p 76975
USER   PID %CPU %MEM     VSZ     RSS TT  STAT STARTED       TIME COMMAND
riak 76975  0.0 72.1 7524240 6030480  -  T<s  Tue08PM 1707:20.72 /usr/local/lib/riak/erts-5.10.3/bin/beam.smp -scl false -sfwi 500 -P 2560
Comment 26 commit-hook freebsd_committer 2016-03-02 18:46:30 UTC
A commit references this bug:

Author: kib
Date: Wed Mar  2 18:46:18 UTC 2016
New revision: 296320
URL: https://svnweb.freebsd.org/changeset/base/296320

Log:
  If callout_stop_safe() noted that the callout is currently executing,
  but next invocation is cancelled while migrating,
  sleepq_check_timeout() needs to be informed that the callout is
  stopped.  Otherwise the thread switches off CPU and never become
  runnable, since running callout could have already raced with us,
  while the migrating and cancelled callout could be one which is
  expected to set TDP_TIMOFAIL flag for us.  This contradicts with the
  expected behaviour of callout_stop() for other callers, which
  e.g. decrement references from the callout callbacks.

  Add a new flag CS_MIGRBLOCK requesting report of the situation as
  'successfully stopped'.

  Reviewed by:	jhb (previous version)
  Tested by:	cognet, pho
  PR:	200992
  Sponsored by:	The FreeBSD Foundation
  MFC after:	2 weeks
  Differential revision:	https://reviews.freebsd.org/D5221

Changes:
  head/sys/kern/kern_timeout.c
  head/sys/kern/subr_sleepqueue.c
  head/sys/sys/callout.h
Comment 27 Hans Petter Selasky freebsd_committer 2016-03-03 10:58:02 UTC
Hi,

I believe all of these callout quirks can be avoided by using a spinlock to proctect the thread callout like done in projects/hps_head. Has anyone tried to reproduce the issue with projects/hps_head, before making this patch?

--HPS
Comment 28 me+freebsd 2016-03-11 17:06:38 UTC
Will the patch land in STABLE?
Comment 29 commit-hook freebsd_committer 2016-03-15 10:46:00 UTC
A commit references this bug:

Author: kib
Date: Tue Mar 15 10:45:56 UTC 2016
New revision: 296896
URL: https://svnweb.freebsd.org/changeset/base/296896

Log:
  MFC r296320:
  Adjust _callout_stop_safe() return value for the subr_sleepqueue.c needs
  when migrating callout was blocked, but running one was not.

  PR:	200992

Changes:
_U  stable/10/
  stable/10/sys/kern/kern_timeout.c
  stable/10/sys/kern/subr_sleepqueue.c
  stable/10/sys/sys/callout.h
Comment 30 commit-hook freebsd_committer 2016-03-15 17:10:13 UTC
A commit references this bug:

Author: kib
Date: Tue Mar 15 17:09:28 UTC 2016
New revision: 296913
URL: https://svnweb.freebsd.org/changeset/base/296913

Log:
  MFC r296320:
  Adjust _callout_stop_safe() return value for the subr_sleepqueue.c needs
  when migrating callout was blocked, but running one was not.

  PR:	200992
  Approved by:	re (marius)

Changes:
_U  releng/10.3/
  releng/10.3/sys/kern/kern_timeout.c
  releng/10.3/sys/kern/subr_sleepqueue.c
  releng/10.3/sys/sys/callout.h
Comment 31 Steve Wills freebsd_committer 2016-04-22 14:02:41 UTC
Can this PR be closed now?