In an embedded environment I sometimes (once per month) see pthread_cond_timedwait() returning EDEADLK, which is not documented anywhere. Most of the time I expect to see ETIMEDOUT. As my program is forced to core dump when something unexpected happens I can see that there is no 'obvious' program error. I have the same program running under FreeBSD-10.3 and have never seen this before. From core dump: (gdb) x/20x cond_ 0x6bacd80: 0x00000000 0x00000000 0x00000000 0x00000004 0x6bacd90: 0x00000000 0x00000000 0x00000000 0x00000000 0x6bacda0: 0x7665642f 0x6465642f 0x752f7665 0x34747261 0x6bacdb0: 0x006c7463 0x00000000 0x00000000 0x00000000 0x6bacdc0: 0x00000000 0x06bacdc0 0x00000000 0x00000000 (gdb) x/20x mutex_ 0x6510c80: 0x000187f2 0x00000004 0x00000000 0x00000000 0x6510c90: 0x00000000 0x00000000 0x00000000 0x00000000 0x6510ca0: 0x00000001 0x00000000 0x00000000 0x00000000 0x6510cb0: 0x00000000 0x00000000 0x00000000 0x00000000 0x6510cc0: 0x00000000 0x00000000 0x00000000 0x00000000
This is PI mutex, right? For PI, EDEADLK means that mutex was tried while the thread owns it. Is 0x000187f2 the tid of the thread that got EDEADLK? Could it be that you stop/continue the process that demonstrate this behavior?
Yes, that is the thread that dead locks. I do not stop the process (to my knowledge). I am having a simple producer-consumer queue with 2 conds not_empty and not_full and one mutex (process local, PTHREAD_MUTEX_ERRORCHECK, PTHREAD_PRIO_INHERIT). EDEADLK comes from the consumer waiting for anything with timeout. The timeout is 100ms only as there is a regular keep alive check of this thread. The producer never waits. So its basically this: consumer thread: for(;;) { pthread_mutex_lock() if queue_empty() phtread_cond_timedwait() // <-- EDEADLK once in a month if !queue_empty() { read from queue pthread_cond_signal(not_full) } pthread_mutex_unlock() signal_alive() do_something() } producer thread: ... pthread_mutex_lock() if !queue_full() { write to queue pthread_cond_signal(not_empty) } pthread_mutex_unlock() ... BTW, there are about 32 threads with different realtime priorities and the programs very often forks/execs other programs. I have never seen this under FreeBSD 10.3 and 9.1 with the same software, but that may of course mean that there were other effects I did not see so far.
Do the cond und mutex memory regions look consistent? Could it be that due to a sw bug I somehow corrupt these regions?
(In reply to nkoch from comment #3) Of course I cannot exclude memory corruption, but both condvar and mutex looks consistent. And mutex seems to be owned by the errored thread, so the most likely cause is that mutex was not unlocked properly before sleeping for condvar. I read the code and do not see how could it happen. The only possible case which might cause some disturbance there is stop. Hm, could it be that some of your exec's failed? Do you ever exec from mt process, or only exec after fork?
I do my execs after fork. Could it be possible that the threading subsystem stops threads when forking (atfork or something)?
After fork, in the child, there is only one thread, which is the copy of the thread issued the fork syscall. Do you issue any threading calls after fork but before exec?
After fork I do reset some environment variables, reset realtime priority (which is raised in the parent process) and call signal() and sigprocmask() before execve(). I also have two atfork-handlers running in child context to close critical file handles. But could that influence the parent process?
As a temporary workaround I changed my code to handle EDEADLK from pthread_cond_timedwait() like ETIMEDOUT. Now I've got a runtime exception from within libthr: Fatal error 'mutex 0x6531800 own 0x1871e is not on list 0x0 0x0' at line 141 in file /usr/src/lib/libthr/thread/thr_mutex.c (errno = 0) This comes from a different cond/mutex pair when calling pthread_mutex_unlock() after a pthread_cond_timedwait(). #0 0x0613058b in thr_kill () from /lib/libc.so.7 #1 0x0612ea8b in raise () from /lib/libc.so.7 #2 0x060ad0cf in abort () from /lib/libc.so.7 #3 0x05fbfb0d in ?? () from /lib/libthr.so.3 #4 0x05fb9fb4 in ?? () from /lib/libthr.so.3 #5 0x05fb9a9e in ?? () from /lib/libthr.so.3 #6 0x05fb9993 in pthread_mutex_unlock () from /lib/libthr.so.3
After inspecting the source of libthread, I see that the temporary workaround (ignoring EDEADLK) cannot work. The hexdump of the mutex after calling pthread_cond_timedwait() shows that it is owned by the thread according to m_lock.m_owner. But it is not in the threads mutex queue as it should be. That would trigger mutex_assert_isowned() in dequeue_mutex() when calling pthread_mutex_unlock(). So I think the mutex is in an inconsistent state. I will try to reproduce the problem with some added debug logging in libpthread.
(In reply to nkoch from comment #9) Perhaps try this. Still, since you are claiming that there is no stop signals, it should not matter. commit 98a2fa4a342983bee7f1700530fd721652ce27b5 Author: Konstantin Belousov <kib@FreeBSD.org> Date: Sat Oct 9 22:46:08 2021 +0300 umtx: Do not return spurious failures on unlock after suspend for normal or PI mutexes diff --git a/sys/kern/kern_umtx.c b/sys/kern/kern_umtx.c index ea87259161c8..359318b3849e 100644 --- a/sys/kern/kern_umtx.c +++ b/sys/kern/kern_umtx.c @@ -1486,7 +1486,7 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags, bool rb) if (error == -1) return (EFAULT); if (error == 1) { - error = thread_check_susp(td, false); + error = thread_check_susp(td, true); if (error != 0) return (error); goto again; @@ -1523,7 +1523,7 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags, bool rb) if (error == 1) { if (old != owner) return (EINVAL); - error = thread_check_susp(td, false); + error = thread_check_susp(td, true); if (error != 0) return (error); goto again; @@ -2428,7 +2428,7 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags, bool rb) again: error = casueword32(&m->m_owner, owner, &old, new_owner); if (error == 1) { - error = thread_check_susp(td, false); + error = thread_check_susp(td, true); if (error == 0) goto again; }
Better this one, handling malicious userspace commit dc64dc0a431faeded1ea294489c4584d537bb01a Author: Konstantin Belousov <kib@FreeBSD.org> Date: Sat Oct 9 22:46:08 2021 +0300 umtx: Do not return spurious failures on unlock after suspend for normal or PI mutexes diff --git a/sys/kern/kern_umtx.c b/sys/kern/kern_umtx.c index ea87259161c8..b06e121e58ec 100644 --- a/sys/kern/kern_umtx.c +++ b/sys/kern/kern_umtx.c @@ -1486,7 +1486,7 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags, bool rb) if (error == -1) return (EFAULT); if (error == 1) { - error = thread_check_susp(td, false); + error = thread_check_susp(td, true); if (error != 0) return (error); goto again; @@ -1523,7 +1523,7 @@ do_unlock_normal(struct thread *td, struct umutex *m, uint32_t flags, bool rb) if (error == 1) { if (old != owner) return (EINVAL); - error = thread_check_susp(td, false); + error = thread_check_susp(td, true); if (error != 0) return (error); goto again; @@ -2427,8 +2427,8 @@ do_unlock_pi(struct thread *td, struct umutex *m, uint32_t flags, bool rb) new_owner |= UMUTEX_CONTESTED; again: error = casueword32(&m->m_owner, owner, &old, new_owner); - if (error == 1) { - error = thread_check_susp(td, false); + if (error == 1 && (old & ~UMUTEX_CONTESTED) == id) { + error = thread_check_susp(td, true); if (error == 0) goto again; }
I modified the kernel code as suggested. I also put some printf's into libthr. After a longer time of testing I had the deadlock again: mutex_self_lock(struct pthread_mutex *m, const struct timespec *abstime) { struct timespec ts1, ts2; int ret; switch (PMUTEX_TYPE(m->m_flags)) { case PTHREAD_MUTEX_ERRORCHECK: case PTHREAD_MUTEX_ADAPTIVE_NP: if (abstime) { if (abstime->tv_sec < 0 || abstime->tv_nsec < 0 || abstime->tv_nsec >= 1000000000) { ret = EINVAL; } else { clock_gettime(CLOCK_REALTIME, &ts1); TIMESPEC_SUB(&ts2, abstime, &ts1); __sys_nanosleep(&ts2, NULL); ret = ETIMEDOUT; } } else { /* * POSIX specifies that mutexes should return * EDEADLK if a recursive lock is detected. */ #if !defined(_PTHREADS_INVARIANTS) #error #endif ret = EDEADLK; ******** thread 0x6883500(0x1871a), mutex 0x653e800 own 0x1871a list 0x0 0x0 err=11 } ******** called from mutex_lock_sleep(): mutex_lock_sleep(struct pthread *curthread, struct pthread_mutex *m, const struct timespec *abstime) { uint32_t id, owner; int count, ret; id = TID(curthread); if (PMUTEX_OWNER_ID(m) == id) >>>>>>>> return (mutex_self_lock(m, abstime)); ******** called from mutex_lock_common(): mutex_lock_common(struct pthread_mutex *m, const struct timespec *abstime, bool cvattach, bool rb_onlist) { struct pthread *curthread; int ret, robust; robust = 0; /* pacify gcc */ curthread = _get_curthread(); if (!cvattach && m->m_flags & PMUTEX_FLAG_PRIVATE) THR_CRITICAL_ENTER(curthread); if (!rb_onlist) robust = _mutex_enter_robust(curthread, m); ret = _thr_umutex_trylock2(&m->m_lock, TID(curthread)); if (ret == 0 || ret == EOWNERDEAD) { enqueue_mutex(curthread, m, ret); if (ret == EOWNERDEAD) m->m_lock.m_flags |= UMUTEX_NONCONSISTENT; } else { >>>>>>>> ret = mutex_lock_sleep(curthread, m, abstime); } ******** called from _mutex_cv_lock(): _mutex_cv_lock(struct pthread_mutex *m, int count, bool rb_onlist) { int error; struct pthread_mutex saved = *m; error = mutex_lock_common(m, NULL, true, rb_onlist); if (error == 0 || error == EOWNERDEAD) m->m_count = count; else if (error == EDEADLK) ******** thread 0x6883500(0x1871a), mutex 0x653e800 own 0x1871a(0x1871a) list 0x0(0x0) 0x0(0x0) err=11 return (error); } ********called from cond_wait_kernel(): cond_wait_kernel(struct pthread_cond *cvp, struct pthread_mutex *mp, const struct timespec *abstime, int cancel) { struct pthread *curthread; int error, error2, recurse, robust; curthread = _get_curthread(); robust = _mutex_enter_robust(curthread, mp); error = _mutex_cv_detach(mp, &recurse); if (error != 0) { if (robust) _mutex_leave_robust(curthread, mp); return (error); } if (cancel) _thr_cancel_enter2(curthread, 0); error = _thr_ucond_wait(&cvp->kcond, &mp->m_lock, abstime, CVWAIT_ABSTIME | CVWAIT_CLOCKID); if (cancel) _thr_cancel_leave(curthread, 0); /* * Note that PP mutex and ROBUST mutex may return * interesting error codes. */ if (error == 0) { error2 = _mutex_cv_lock(mp, recurse, true); } else if (error == EINTR || error == ETIMEDOUT) { error2 = _mutex_cv_lock(mp, recurse, true); /* * Do not do cancellation on EOWNERDEAD there. The * cancellation cleanup handler will use the protected * state and unlock the mutex without making the state * consistent and the state will be unrecoverable. */ if (error2 == 0 && cancel) { if (robust) { _mutex_leave_robust(curthread, mp); robust = false; } _thr_testcancel(curthread); } else ******** error 11: 0x6883500(0x1871a)error2, curthread, TID(curthread));
I should add that almost all threads except one run at realtime priority. The one thread without realtime priority forks processes every few seconds. It gathers statistics by exec'ing ps, pstat and fstat. Could that result in some kind of "indirect" deadlock situation because those programs may lock some kernel structures? Just an idea.
(In reply to nkoch from comment #13) What is the exact what version of the OS used for your reproduction?
12.1 RELEASE-p13 Architecture is i386
(In reply to nkoch from comment #15) So it has at least some fork/CoW bugs. Update to stable/13 and see if it helps.
After some more long testing (still FreeBSD 12.1) the deadlock went away, after I eliminated some of my peridic forks from a different thread where I exec'd programs to inspect system data structures (e.g. "pstat -T"/"fstat -p"/"ps -p ...").