Bug 254995 - pthread_cond_timedwait() returns EDEADLK
Summary: pthread_cond_timedwait() returns EDEADLK
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: threads (show other bugs)
Version: 12.1-RELEASE
Hardware: i386 Any
: --- Affects Some People
Assignee: freebsd-threads (Nobody)
Depends on:
Reported: 2021-04-12 06:58 UTC by nkoch
Modified: 2021-05-07 06:37 UTC (History)
1 user (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description nkoch 2021-04-12 06:58:19 UTC
In an embedded environment I sometimes (once per month) see pthread_cond_timedwait() returning EDEADLK, which is not documented anywhere. Most of the time I expect to see ETIMEDOUT.
As my program is forced to core dump when something unexpected happens I can see that there is no 'obvious' program error.
I have the same program running under FreeBSD-10.3 and have never seen this before.

From core dump:

(gdb) x/20x cond_
0x6bacd80:	0x00000000	0x00000000	0x00000000	0x00000004
0x6bacd90:	0x00000000	0x00000000	0x00000000	0x00000000
0x6bacda0:	0x7665642f	0x6465642f	0x752f7665	0x34747261
0x6bacdb0:	0x006c7463	0x00000000	0x00000000	0x00000000
0x6bacdc0:	0x00000000	0x06bacdc0	0x00000000	0x00000000
(gdb) x/20x mutex_
0x6510c80:	0x000187f2	0x00000004	0x00000000	0x00000000
0x6510c90:	0x00000000	0x00000000	0x00000000	0x00000000
0x6510ca0:	0x00000001	0x00000000	0x00000000	0x00000000
0x6510cb0:	0x00000000	0x00000000	0x00000000	0x00000000
0x6510cc0:	0x00000000	0x00000000	0x00000000	0x00000000
Comment 1 Konstantin Belousov freebsd_committer 2021-04-14 14:56:17 UTC
This is PI mutex, right?
For PI, EDEADLK means that mutex was tried while the thread owns it.  Is 0x000187f2
the tid of the thread that got EDEADLK?

Could it be that you stop/continue the process that demonstrate this behavior?
Comment 2 nkoch 2021-04-19 07:24:39 UTC
Yes, that is the thread that dead locks.

I do not stop the process (to my knowledge).

I am having a simple producer-consumer queue with 2 conds not_empty and not_full and one mutex (process local, PTHREAD_MUTEX_ERRORCHECK, PTHREAD_PRIO_INHERIT).
EDEADLK comes from the consumer waiting for anything with timeout.
The timeout is 100ms only as there is a regular keep alive check of
this thread.
The producer never waits.

So its basically this:

consumer thread:
  for(;;) {
    if queue_empty()
      phtread_cond_timedwait() // <-- EDEADLK once in a month
    if !queue_empty() {
      read from queue

producer thread:
  if !queue_full() {
    write to queue

BTW, there are about 32 threads with different realtime priorities and the programs very often forks/execs other programs.

I have never seen this under FreeBSD 10.3 and 9.1 with the same software, but that may of course mean that there were other effects I did not see so far.
Comment 3 nkoch 2021-04-19 07:52:49 UTC
Do the cond und mutex memory regions look consistent?
Could it be that due to a sw bug I somehow corrupt these regions?
Comment 4 Konstantin Belousov freebsd_committer 2021-05-03 16:09:13 UTC
(In reply to nkoch from comment #3)
Of course I cannot exclude memory corruption, but both condvar and mutex looks
consistent.  And mutex seems to be owned by the errored thread, so the most likely
cause is that mutex was not unlocked properly before sleeping for condvar.

I read the code and do not see how could it happen.  The only possible case which
might cause some disturbance there is stop.  Hm, could it be that some of your
exec's failed?  Do you ever exec from mt process, or only exec after fork?
Comment 5 nkoch 2021-05-04 11:00:39 UTC
I do my execs after fork. Could it be possible that the threading subsystem stops threads when forking (atfork or something)?
Comment 6 Konstantin Belousov freebsd_committer 2021-05-05 17:36:03 UTC
After fork, in the child, there is only one thread, which is the copy of the thread
issued the fork syscall.

Do you issue any threading calls after fork but before exec?
Comment 7 nkoch 2021-05-07 06:37:00 UTC
After fork I do reset some environment variables, reset realtime priority (which is raised in the parent process) and call signal() and sigprocmask() before execve(). I also have two atfork-handlers running in child context to close critical file handles.
But could that influence the parent process?