There are a lot more details here: https://github.com/paulfloyd/freebsd_valgrind/issues/189 https://github.com/paulfloyd/freebsd_valgrind/issues/188 and discussion here https://lists.freebsd.org/archives/freebsd-hackers/2022-May/001154.html It may be somewhat difficult to reproduce the error as it may be quite timing dependent. Steps to reproduce: Most importantly, you need a single CPU i386 or amd64 system. That's possible with VirtualBox. Clone and build the Valgrind source and regressions tests. clone git://sourceware.org/git/valgrind.git Backout a workaround for one of the issues $ git revert a4151207a28b506cdd1e7c095e4aebaaf289a384 Make sure that you have the ports for autoconf, automake, libtool and gmake installed. Run the folllowing comamnds If you have GCC installed you need to set the environment variables CC=clang CXX=clang++ $ sh autogen.sh $ ./configure $ gmake $ gmake check Now you can try the two testcases that were causing problems $ perl tests/vg_regtest none/tests/tls and $ perl tests/vg_regtest none/tests/pth_2sig If you want to run Valgrind without any wrapper scripts then do the following export VALGRIND_LAUNCHER=YOUR_PATH/valgrind/coregrind/valgrind export VALGRIND_LIB=YOUR_PATH/valgrind/.in_place export VALGRIND_LIB_INNER=YOUR_PATH/valgrind/.in_place (where YOUR_PATH/valgrind is the path to the git repo cloned above) cd to none/tests Run ../../.in_place/none-amd64-freebsd --tool=none ./tls or ../../.in_place/none-amd64-freebsd --tool=none ./pth_2sig
Thanks for the repro steps, I was able to trigger the problem locally. It's enough to pin all of the valgrind threads to the same CPU: /tmp/valgrind # cpuset -l 1 perl tests/vg_regtest none/tests/tls Really there are two problems here. First, it seems that one of the threads (the one switched out in ast()) is simply getting starved. There are several other always-runnable threads in the process that have a slightly higher scheduling priority, and they end up monopolizing the CPU. ULE has decided that the threads are "interactive", and in this case higher priority threads are always scheduled first. With multiple CPUs available I suppose the problem may still exist, it just becomes harder to trigger since with more CPUs there's less chance that the starved thread will get stuck. I'm guessing that the stuck thread is the one which is supposed to be writing to the pipe. The second problem is a livelock in the pipe code. It seems that pipelock()/pipeunlock() can cause reader threads to wake each other up in a loop even when there's nothing to do. This is because the wait channel used by readers and writers to signal each other is the same as the one used to serialize I/O operations among multiple concurrent readers. It's not too hard to write a standalone test program which triggers this, albeit unreliably. I suspect the livelock causes the scheduler problem, since the reader threads keep yielding the CPU to sleep, and this boosts their interactivity score.
Created attachment 234493 [details] standalone test case Here's a test program. Run it with a single parameter, the number of reader threads to create. It'll create a named pipe, "fifo" in its current directory. The problem can be triggered by a character to it from a different terminal, watching top(1) to see when CPU usage starts going up. Sometimes it takes quite a few tries before the problem occurs. It can help to force all threads onto a single CPU.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e8955bd643ee852d70a0b065f2a0d1bb3fa99df2 commit e8955bd643ee852d70a0b065f2a0d1bb3fa99df2 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2022-06-14 14:52:03 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2022-06-14 16:00:59 +0000 pipe: Use a distinct wait channel for I/O serialization Suppose a thread tries to read from an empty pipe. pipe_read() does the following: 1. pipelock(), possibly sleeping 2. check for buffered data 3. pipeunlock() 4. set PIPE_WANTR and sleep 5. goto 1 pipelock() is an open-coded mutex; if a thread blocks in pipelock(), it sleeps until the lock holder calls pipeunlock(). Both sleeps use the same wait channel. So if there are multiple threads in pipe_read(), a thread T1 in step 3 can wake up a thread T2 sleeping in step 4. Then T1 goes to sleep in step 4, and T2 acquires and releases the pipelock, waking up T1 again. This can go on indefinitely, livelocking the process (and potentially starving a would-be writer). Fix the problem by using a separate wait channel for pipelock(). Reported by: Paul Floyd <paulf2718@gmail.com> Reviewed by: mjg, kib PR: 264441 MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35415 sys/kern/sys_pipe.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a70c03b2d383011dece503716faf504598d5fe29 commit a70c03b2d383011dece503716faf504598d5fe29 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2022-06-14 14:52:03 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2022-07-14 13:50:10 +0000 pipe: Use a distinct wait channel for I/O serialization Suppose a thread tries to read from an empty pipe. pipe_read() does the following: 1. pipelock(), possibly sleeping 2. check for buffered data 3. pipeunlock() 4. set PIPE_WANTR and sleep 5. goto 1 pipelock() is an open-coded mutex; if a thread blocks in pipelock(), it sleeps until the lock holder calls pipeunlock(). Both sleeps use the same wait channel. So if there are multiple threads in pipe_read(), a thread T1 in step 3 can wake up a thread T2 sleeping in step 4. Then T1 goes to sleep in step 4, and T2 acquires and releases the pipelock, waking up T1 again. This can go on indefinitely, livelocking the process (and potentially starving a would-be writer). Fix the problem by using a separate wait channel for pipelock(). Reported by: Paul Floyd <paulf2718@gmail.com> Reviewed by: mjg, kib PR: 264441 Sponsored by: The FreeBSD Foundation (cherry picked from commit e8955bd643ee852d70a0b065f2a0d1bb3fa99df2) sys/kern/sys_pipe.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)