Bug 254995 - pthread_cond_timedwait() returns EDEADLK
Summary: pthread_cond_timedwait() returns EDEADLK
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: threads (show other bugs)
Version: 12.1-RELEASE
Hardware: i386 Any
: --- Affects Some People
Assignee: freebsd-threads (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-04-12 06:58 UTC by nkoch
Modified: 2022-04-06 08:14 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description nkoch 2021-04-12 06:58:19 UTC
In an embedded environment I sometimes (once per month) see pthread_cond_timedwait() returning EDEADLK, which is not documented anywhere. Most of the time I expect to see ETIMEDOUT.
As my program is forced to core dump when something unexpected happens I can see that there is no 'obvious' program error.
I have the same program running under FreeBSD-10.3 and have never seen this before.

From core dump:

(gdb) x/20x cond_
0x6bacd80:	0x00000000	0x00000000	0x00000000	0x00000004
0x6bacd90:	0x00000000	0x00000000	0x00000000	0x00000000
0x6bacda0:	0x7665642f	0x6465642f	0x752f7665	0x34747261
0x6bacdb0:	0x006c7463	0x00000000	0x00000000	0x00000000
0x6bacdc0:	0x00000000	0x06bacdc0	0x00000000	0x00000000
(gdb) x/20x mutex_
0x6510c80:	0x000187f2	0x00000004	0x00000000	0x00000000
0x6510c90:	0x00000000	0x00000000	0x00000000	0x00000000
0x6510ca0:	0x00000001	0x00000000	0x00000000	0x00000000
0x6510cb0:	0x00000000	0x00000000	0x00000000	0x00000000
0x6510cc0:	0x00000000	0x00000000	0x00000000	0x00000000
Comment 1 Konstantin Belousov freebsd_committer 2021-04-14 14:56:17 UTC
This is PI mutex, right?
For PI, EDEADLK means that mutex was tried while the thread owns it.  Is 0x000187f2
the tid of the thread that got EDEADLK?

Could it be that you stop/continue the process that demonstrate this behavior?
Comment 2 nkoch 2021-04-19 07:24:39 UTC
Yes, that is the thread that dead locks.

I do not stop the process (to my knowledge).

I am having a simple producer-consumer queue with 2 conds not_empty and not_full and one mutex (process local, PTHREAD_MUTEX_ERRORCHECK, PTHREAD_PRIO_INHERIT).
EDEADLK comes from the consumer waiting for anything with timeout.
The timeout is 100ms only as there is a regular keep alive check of
this thread.
The producer never waits.

So its basically this:

consumer thread:
  for(;;) {
    pthread_mutex_lock()
    if queue_empty()
      phtread_cond_timedwait() // <-- EDEADLK once in a month
    if !queue_empty() {
      read from queue
      pthread_cond_signal(not_full)
    }
    pthread_mutex_unlock()
    signal_alive()
    do_something()
  }


producer thread:
  ...
  pthread_mutex_lock()
  if !queue_full() {
    write to queue
    pthread_cond_signal(not_empty)
  }
  pthread_mutex_unlock()
  ...

    
BTW, there are about 32 threads with different realtime priorities and the programs very often forks/execs other programs.

I have never seen this under FreeBSD 10.3 and 9.1 with the same software, but that may of course mean that there were other effects I did not see so far.
Comment 3 nkoch 2021-04-19 07:52:49 UTC
Do the cond und mutex memory regions look consistent?
Could it be that due to a sw bug I somehow corrupt these regions?
Comment 4 Konstantin Belousov freebsd_committer 2021-05-03 16:09:13 UTC
(In reply to nkoch from comment #3)
Of course I cannot exclude memory corruption, but both condvar and mutex looks
consistent.  And mutex seems to be owned by the errored thread, so the most likely
cause is that mutex was not unlocked properly before sleeping for condvar.

I read the code and do not see how could it happen.  The only possible case which
might cause some disturbance there is stop.  Hm, could it be that some of your
exec's failed?  Do you ever exec from mt process, or only exec after fork?
Comment 5 nkoch 2021-05-04 11:00:39 UTC
I do my execs after fork. Could it be possible that the threading subsystem stops threads when forking (atfork or something)?
Comment 6 Konstantin Belousov freebsd_committer 2021-05-05 17:36:03 UTC
After fork, in the child, there is only one thread, which is the copy of the thread
issued the fork syscall.

Do you issue any threading calls after fork but before exec?
Comment 7 nkoch 2021-05-07 06:37:00 UTC
After fork I do reset some environment variables, reset realtime priority (which is raised in the parent process) and call signal() and sigprocmask() before execve(). I also have two atfork-handlers running in child context to close critical file handles.
But could that influence the parent process?
Comment 8 nkoch 2021-08-20 08:37:56 UTC
As a temporary workaround I changed my code to handle EDEADLK from pthread_cond_timedwait() like ETIMEDOUT.
Now I've got a runtime exception from within libthr:

Fatal error 'mutex 0x6531800 own 0x1871e is not on list 0x0 0x0' at line 141 in file /usr/src/lib/libthr/thread/thr_mutex.c (errno = 0)

This comes from a different cond/mutex pair when calling pthread_mutex_unlock()
after a pthread_cond_timedwait().

#0  0x0613058b in thr_kill () from /lib/libc.so.7
#1  0x0612ea8b in raise () from /lib/libc.so.7
#2  0x060ad0cf in abort () from /lib/libc.so.7
#3  0x05fbfb0d in ?? () from /lib/libthr.so.3
#4  0x05fb9fb4 in ?? () from /lib/libthr.so.3
#5  0x05fb9a9e in ?? () from /lib/libthr.so.3
#6  0x05fb9993 in pthread_mutex_unlock () from /lib/libthr.so.3
Comment 9 nkoch 2021-10-07 13:34:49 UTC
After inspecting the source of libthread, I see that the temporary workaround (ignoring EDEADLK) cannot work.

The hexdump of the mutex after calling pthread_cond_timedwait() shows that it is owned by the thread according to m_lock.m_owner. But it is not in the threads mutex queue as it should be.

That would trigger mutex_assert_isowned() in dequeue_mutex() when calling pthread_mutex_unlock().

So I think the mutex is in an inconsistent state.
I will try to reproduce the problem with some added debug logging in libpthread.
Comment 10 Konstantin Belousov freebsd_committer 2021-10-09 19:52:04 UTC
(In reply to nkoch from comment #9)
Perhaps try this.  Still, since you are claiming that there is no stop signals,
it should not matter.


commit 98a2fa4a342983bee7f1700530fd721652ce27b5
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Sat Oct 9 22:46:08 2021 +0300

    umtx: Do not return spurious failures on unlock after suspend for normal or PI mutexes

diff --git a/sys/kern/kern_umtx.c b/sys/kern/kern_umtx.c
index ea87259161c8..359318b3849e 100644
--- a/sys/kern/kern_umtx.c
+++ b/sys/kern/kern_umtx.c
@@ -1486,7 +1486,7 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags, bool rb)
 		if (error == -1)
 			return (EFAULT);
 		if (error == 1) {
-			error = thread_check_susp(td, false);
+			error = thread_check_susp(td, true);
 			if (error != 0)
 				return (error);
 			goto again;
@@ -1523,7 +1523,7 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags, bool rb)
 	if (error == 1) {
 		if (old != owner)
 			return (EINVAL);
-		error = thread_check_susp(td, false);
+		error = thread_check_susp(td, true);
 		if (error != 0)
 			return (error);
 		goto again;
@@ -2428,7 +2428,7 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags, bool rb)
 again:
 	error = casueword32(&m->m_owner, owner, &old, new_owner);
 	if (error == 1) {
-		error = thread_check_susp(td, false);
+		error = thread_check_susp(td, true);
 		if (error == 0)
 			goto again;
 	}
Comment 11 Konstantin Belousov freebsd_committer 2021-10-09 20:07:38 UTC
Better this one, handling malicious userspace

commit dc64dc0a431faeded1ea294489c4584d537bb01a
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Sat Oct 9 22:46:08 2021 +0300

    umtx: Do not return spurious failures on unlock after suspend for normal or PI mutexes

diff --git a/sys/kern/kern_umtx.c b/sys/kern/kern_umtx.c
index ea87259161c8..b06e121e58ec 100644
--- a/sys/kern/kern_umtx.c
+++ b/sys/kern/kern_umtx.c
@@ -1486,7 +1486,7 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags, bool rb)
 		if (error == -1)
 			return (EFAULT);
 		if (error == 1) {
-			error = thread_check_susp(td, false);
+			error = thread_check_susp(td, true);
 			if (error != 0)
 				return (error);
 			goto again;
@@ -1523,7 +1523,7 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags, bool rb)
 	if (error == 1) {
 		if (old != owner)
 			return (EINVAL);
-		error = thread_check_susp(td, false);
+		error = thread_check_susp(td, true);
 		if (error != 0)
 			return (error);
 		goto again;
@@ -2427,8 +2427,8 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags, bool rb)
 		new_owner |= UMUTEX_CONTESTED;
 again:
 	error = casueword32(&m->m_owner, owner, &old, new_owner);
-	if (error == 1) {
-		error = thread_check_susp(td, false);
+	if (error == 1 && (old & ~UMUTEX_CONTESTED) == id) {
+		error = thread_check_susp(td, true);
 		if (error == 0)
 			goto again;
 	}
Comment 12 nkoch 2022-01-17 15:14:32 UTC
I modified the kernel code as suggested. I also put some
printf's into libthr.
After a longer time of testing I had the deadlock again:

mutex_self_lock(struct pthread_mutex *m, const struct timespec *abstime)
{
        struct timespec ts1, ts2;
        int ret;

        switch (PMUTEX_TYPE(m->m_flags)) {
        case PTHREAD_MUTEX_ERRORCHECK:
        case PTHREAD_MUTEX_ADAPTIVE_NP:
                if (abstime) {
                        if (abstime->tv_sec < 0 || abstime->tv_nsec < 0 ||
                            abstime->tv_nsec >= 1000000000) {
                                ret = EINVAL;
                        } else {
                                clock_gettime(CLOCK_REALTIME, &ts1);
                                TIMESPEC_SUB(&ts2, abstime, &ts1);
                                __sys_nanosleep(&ts2, NULL);
                                ret = ETIMEDOUT;
                        }
                } else {
                        /*
                         * POSIX specifies that mutexes should return
                         * EDEADLK if a recursive lock is detected.
                         */
#if !defined(_PTHREADS_INVARIANTS)
#error
#endif
                        ret = EDEADLK;
******** thread 0x6883500(0x1871a), mutex 0x653e800 own 0x1871a list 0x0 0x0 err=11
                }

******** called from mutex_lock_sleep():

mutex_lock_sleep(struct pthread *curthread, struct pthread_mutex *m,
    const struct timespec *abstime)
{
        uint32_t id, owner;
        int count, ret;

        id = TID(curthread);
        if (PMUTEX_OWNER_ID(m) == id)
>>>>>>>>        return (mutex_self_lock(m, abstime));

******** called from mutex_lock_common():

mutex_lock_common(struct pthread_mutex *m, const struct timespec *abstime,
    bool cvattach, bool rb_onlist)
{
        struct pthread *curthread;
        int ret, robust;

        robust = 0;  /* pacify gcc */
        curthread  = _get_curthread();
        if (!cvattach && m->m_flags & PMUTEX_FLAG_PRIVATE)
                THR_CRITICAL_ENTER(curthread);
        if (!rb_onlist)
                robust = _mutex_enter_robust(curthread, m);
        ret = _thr_umutex_trylock2(&m->m_lock, TID(curthread));
        if (ret == 0 || ret == EOWNERDEAD) {
                enqueue_mutex(curthread, m, ret);
                if (ret == EOWNERDEAD)
                        m->m_lock.m_flags |= UMUTEX_NONCONSISTENT;
        } else {
>>>>>>>>        ret = mutex_lock_sleep(curthread, m, abstime);
        }

******** called from _mutex_cv_lock():

_mutex_cv_lock(struct pthread_mutex *m, int count, bool rb_onlist)
{
        int error;
        struct pthread_mutex saved = *m;

        error = mutex_lock_common(m, NULL, true, rb_onlist);
        if (error == 0 || error == EOWNERDEAD)
                m->m_count = count;
        else if (error == EDEADLK)
******** thread 0x6883500(0x1871a), mutex 0x653e800 own 0x1871a(0x1871a) list 0x0(0x0) 0x0(0x0) err=11
        return (error);
}

********called from cond_wait_kernel():

cond_wait_kernel(struct pthread_cond *cvp, struct pthread_mutex *mp,
    const struct timespec *abstime, int cancel)
{
        struct pthread *curthread;
        int error, error2, recurse, robust;

        curthread = _get_curthread();
        robust = _mutex_enter_robust(curthread, mp);

        error = _mutex_cv_detach(mp, &recurse);
        if (error != 0) {
                if (robust)
                        _mutex_leave_robust(curthread, mp);
                return (error);
        }

        if (cancel)
                _thr_cancel_enter2(curthread, 0);
        error = _thr_ucond_wait(&cvp->kcond, &mp->m_lock, abstime,
            CVWAIT_ABSTIME | CVWAIT_CLOCKID);
        if (cancel)
                _thr_cancel_leave(curthread, 0);

        /*
         * Note that PP mutex and ROBUST mutex may return
         * interesting error codes.
         */
        if (error == 0) {
                error2 = _mutex_cv_lock(mp, recurse, true);
        } else if (error == EINTR || error == ETIMEDOUT) {
                error2 = _mutex_cv_lock(mp, recurse, true);
                /*
                 * Do not do cancellation on EOWNERDEAD there.  The
                 * cancellation cleanup handler will use the protected
                 * state and unlock the mutex without making the state
                 * consistent and the state will be unrecoverable.
                 */
                if (error2 == 0 && cancel) {
                        if (robust) {
                                _mutex_leave_robust(curthread, mp);
                                robust = false;
                        }
                        _thr_testcancel(curthread);
                } else
********  error 11: 0x6883500(0x1871a)error2, curthread, TID(curthread));
Comment 13 nkoch 2022-01-19 08:23:11 UTC
I should add that almost all threads except one run at realtime priority.
The one thread without realtime priority forks processes every few seconds.
It gathers statistics by exec'ing ps, pstat and fstat. Could that result in
some kind of "indirect" deadlock situation because those programs may lock some kernel structures? Just an idea.
Comment 14 Konstantin Belousov freebsd_committer 2022-01-19 14:52:15 UTC
(In reply to nkoch from comment #13)
What is the exact what version of the OS used for your reproduction?
Comment 15 nkoch 2022-01-19 14:55:22 UTC
12.1 RELEASE-p13
Architecture is i386
Comment 16 Konstantin Belousov freebsd_committer 2022-01-19 15:15:44 UTC
(In reply to nkoch from comment #15)
So it has at least some fork/CoW bugs.  Update to stable/13 and see if it
helps.
Comment 17 nkoch 2022-04-06 08:14:00 UTC
After some more long testing (still FreeBSD 12.1) the deadlock went away, after I eliminated some of my peridic forks from a different thread where I exec'd programs to inspect system data structures (e.g. "pstat -T"/"fstat -p"/"ps -p ...").