Bug 209233 - [patch] pthread_suspend_all_np races with check_suspend
Summary: [patch] pthread_suspend_all_np races with check_suspend
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: threads (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-threads (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2016-05-03 13:14 UTC by Lawrence Esswood
Modified: 2016-10-27 19:37 UTC (History)
3 users (show)

See Also:


Attachments
Patch to fix race (769 bytes, patch)
2016-05-03 13:14 UTC, Lawrence Esswood
no flags Details | Diff
Clear THR_FLAGS_SUSPENDED in resume_common() (996 bytes, patch)
2016-05-04 15:19 UTC, Konstantin Belousov
no flags Details | Diff
Clear THR_FLAGS_SUSPENDED in resume_common() v2 (1.34 KB, patch)
2016-05-04 17:16 UTC, Konstantin Belousov
no flags Details | Diff
test case (1.30 KB, text/plain)
2016-05-04 17:57 UTC, Lawrence Esswood
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Lawrence Esswood 2016-05-03 13:14:09 UTC
Created attachment 169926 [details]
Patch to fix race

Currently the suspend all implementation in libthr only sets the NEED_SUSPEND flag if a thread does not have the SUSPENDED flag set. However, that thread may be in the process of resuming, and so its NEED_SUSPEND flag should always be set if we wish a thread to stay suspended, even if already has the SUSPENDED flag set.
Comment 1 Konstantin Belousov freebsd_committer 2016-05-04 10:52:04 UTC
(In reply to Lawrence Esswood from comment #0)

So the issue is that we might leave pthread_resume{_all}_np with the THR_FLAG_SUSPENDED still set.  IMO this is the real bug, and it cannot be worked around by trying to set NEED_SUSPEND in the next suspend cycle.  Instead, resume functions should wait until the resumed thread is scheduled and has a chance to run enough to clear the flag.

One more wait/wake point around thread->cycle would be needed to ensure of proper control pass between resume->resumed->resume threads.
Comment 2 Lawrence Esswood 2016-05-04 12:12:04 UTC
(In reply to Konstantin Belousov from comment #1)

This would end up doing the same thing in the suspend logic as always setting the NEED_SUSPEND flag, as the !(thread->flags & THR_FLAGS_SUSPENDED) check would always be true if THR_FLAGS_NEED_SUSPEND was clear and if we insisted that !THR_FLAGS_NEED_SUSPEND => !THR_FLAGS_SUSPENDED after a resume. The only (observable) difference would be extra blocking in resume which we probabably want to avoid.
Comment 3 Konstantin Belousov freebsd_committer 2016-05-04 15:19:06 UTC
(In reply to Lawrence Esswood from comment #2)
Yes, my intent is to guarantee that the THR_FLAGS_SUSPENDED is never left dandling, since we are not supposed to distinguish between previous and current suspension in suspend_common().

Then, why not simply clear THR_FLAGS_SUSPENDED in resume_common() ?  We apparently do not care how much thread moved ahead returning from check_suspend(), and the function unblocks SIGCANCEL only after it is ready to next suspend.

Could you provide a test case for the problem ?
Comment 4 Konstantin Belousov freebsd_committer 2016-05-04 15:19:44 UTC
Created attachment 169969 [details]
Clear THR_FLAGS_SUSPENDED in resume_common()
Comment 5 Lawrence Esswood 2016-05-04 16:56:31 UTC
(In reply to Konstantin Belousov from comment #3)

I had a very wonky test that very unreliably failed, now I know why it happens I can probably create a much better one to attach.

Your suggestion I think would work. You would also have to change the break condition on the check_suspend loop otherwise if the thread is woken for any reason it will break too early.

One question however, what happens if the thread in check_suspend has its NEED_SUSPEND flag set and is signalled somewhere between the loop exit and the _thr_signal_unblock? I think it won't do another check_suspend / check_deferred until it next hits a _thr_ast which might be never. We could maybe extend the thread lock to include the _thr_signal_unblock, or have the end of check_suspend make a recursive call.

I will try knock up a test case.
Comment 6 Konstantin Belousov freebsd_committer 2016-05-04 17:16:04 UTC
(In reply to Lawrence Esswood from comment #5)
Yes, it is a bug that I left the old test in the loop condition.  I left only the check for THR_FLAGS_NEED_SUSPEND there, since it is cleared only on resume.

WRT SIGCANCEL generated after loop exit and before thr_signal_unblock.  The blocked signal list includes SIGCANCEL, thus the SIGCANCEL signal should be only delivered after unblock. Then typically we would re-enter _thr_ast() with the different set of flags.
Comment 7 Konstantin Belousov freebsd_committer 2016-05-04 17:16:40 UTC
Created attachment 169971 [details]
Clear THR_FLAGS_SUSPENDED in resume_common() v2

Fix loop condition.
Comment 8 Lawrence Esswood 2016-05-04 17:57:50 UTC
Created attachment 169973 [details]
test case

(In reply to Konstantin Belousov from comment #3)

This fails pretty reliably / is the best I can do.
Comment 9 Konstantin Belousov freebsd_committer 2016-05-04 18:26:48 UTC
(In reply to Lawrence Esswood from comment #8)
Test is good.

It also fails reliably for me, and works on the patched libthr.

If you have no more comments, I will commit the fix.  Some time later I might add the test to our test suite.  Would be nice if you add copyright and license into the test source preamble, share/examples/etc/bsd-style-copyright provides text of the preferred license.

Thanks.
Comment 10 commit-hook freebsd_committer 2016-05-05 10:20:55 UTC
A commit references this bug:

Author: kib
Date: Thu May  5 10:20:23 UTC 2016
New revision: 299114
URL: https://svnweb.freebsd.org/changeset/base/299114

Log:
  Do not leak THR_FLAGS_SUSPENDED from the previous suspend/resume
  cycle.  The flag currently is cleared by the resumed thread.  If next
  suspend request comes before the thread was able to clean the flag, in
  which case suspender skip the thread.

  Instead, clear the THR_FLAGS_SUSPEND flag in resume_common(), we do
  not care how much code was executed in the resumed thread when the
  pthread_resume_*np(s) functions returned.

  PR:	209233
  Reported by:	Lawrence Esswood <le277@cam.ac.uk>
  MFC after:	1 week

Changes:
  head/lib/libthr/thread/thr_resume_np.c
  head/lib/libthr/thread/thr_sig.c
Comment 11 commit-hook freebsd_committer 2016-05-12 06:53:36 UTC
A commit references this bug:

Author: kib
Date: Thu May 12 06:53:23 UTC 2016
New revision: 299521
URL: https://svnweb.freebsd.org/changeset/base/299521

Log:
  MFC r299114:
  Do not leak THR_FLAGS_SUSPENDED from the previous suspend/resume
  cycle.

  PR:	209233

Changes:
_U  stable/10/
  stable/10/lib/libthr/thread/thr_resume_np.c
  stable/10/lib/libthr/thread/thr_sig.c
Comment 12 Eric van Gyzen freebsd_committer 2016-10-27 18:44:20 UTC
Closing as fixed.  Please re-open if there is still work to do.