Created attachment 192211 [details]
The attached program repeatedly spawns a child process that creates a socketpair. One socket is passed back to the parent. One byte is written into the other end. The parent should be able to read this byte, but sometimes 0 (EOF) is returned.
It seems that the parent reuses stale socket/sockbuf memory from prior iterations.
To simplify the fd passing I've used fd_send/fd_recv from libnv. Those use SCM_RIGHTS under the hood. The program must be compiled with 'cc -O3 fdpass.c -lnv'.
Created attachment 192213 [details]
Dtrace script for debugging
Here is a Dtrace script that could be helpful in tracing down the issue.
Created attachment 192214 [details]
excerpt of Dtrace script output
Here is part of the Dtrace debug log.
When 'got err' is printed it means that the parent could not read the byte from the child. At this point the address of the receive sockbuf is 'fffff801071df148' and the SBS_CANTRCVMORE flag is set (CPU 1).
However, further above, this sockbuf looks fine in the child (CPU 3) and there aren't any calls to socantrcvmore_locked() in the meantime.
Even further above, this sockbuf was destroyed by the parent in a previous iteration (CPU 1) and therefore SBS_CANTRCVMORE is set. But this should not affect the current iteration. It looks like the memory from the child process on CPU 3 isn't made visible to CPU 1 properly.
Created attachment 192216 [details]
test program (fixed)
I've mistakenly uploaded the program with a workaround applied. Here is a fixed version with the workaround behind a ifdef.
To uncover the bug it helps to run multiple instances at the same time and to remove the debug output or to pipe it into /dev/null. After a few seconds, at least some instances should be dead.
I've tested only amd64 for now. I could reproduce it on stock FreeBSD 10/11 systems:
FreeBSD 10.3-RELEASE-p24 FreeBSD 10.3-RELEASE-p24 #0: Wed Nov 15 04:57:40 UTC 2017 firstname.lastname@example.org:/usr/obj/usr/src/sys/GENERIC amd64
hw.model: Intel(R) Celeron(R) CPU N3150 @ 1.60GHz
FreeBSD 11.1-RELEASE-p7 FreeBSD 11.1-RELEASE-p7 #0: Tue Mar 6 09:33:30 UTC 2018 email@example.com:/usr/obj/usr/src/sys/GENERIC amd64
hw.model: Intel(R) Celeron(R) CPU G530 @ 2.40GHz
Created attachment 192350 [details]
tar archive with test program, Dtrace script and log output
I've attached a program that reproduces the bug faster. I've also updated the Dtrace script and added some new log output. The Dtrace script now prints all socantrcvmore() calls, not only those from "a.out".
It turns out that the kernel sometimes closes the socket in the unp garbage collector (look for sockbuf fffff800adec3b50 in the debug log). So there probably is no cache issue after all.
Created attachment 192514 [details]
test program (faster reproduction)
Here is a better test program that should uncover the race in a few seconds at most. Compile with "cc fdpass-race-fast.c -pthread -lnv". I'm spawning another thread that just creates and closes socketpairs. This triggers the unp garbage collector constantly.
I believe the race is in soreceive_generic() in sys/kern/uipc_socket.c. The "in flight" domain socket is taken off the so_rcv sockbuf, but it is not externalized yet (dom_externalize() hasn't been called). This leads to the GC thinking that the socket is still "in flight" but it doesn't actually get marked because it is off the so_rcv sockbuf. Now the GC thinks this socket is unreachable and closes it.
Created attachment 192515 [details]
patch for a GC that detects cycles
I've tried to modify the GC so that it detects cycles in all "in flight" sockets and only destroys sockets that are in such a cycle. With this patch the test program doesn't exit anymore.
I wonder if there can be false negatives with a GC that just detects cycles. DragonFly has some interesting comments. DragonFly still suffers from this race, though.
It looks like Linux had the same race when they still used a mark & sweep GC instead of a cycle detecting one. Search for "rewrite AF_UNIX garbage collector".
The BUGS section in the man page for send(2) looks related to this problem. This seems to be a long standing issue.
A commit references this bug:
Date: Tue May 15 15:56:52 UTC 2018
New revision: 333634
Added a new socketpair() test scenario.
Obtained from: firstname.lastname@example.org
Triage at 2019 Waterloo Hackathon, CC markj@ and kib@ for investigation
Modified version of patch for review at https://reviews.freebsd.org/D23142
*** Bug 239803 has been marked as a duplicate of this bug. ***
A commit references this bug:
Date: Sat Jan 25 08:57:27 UTC 2020
New revision: 357110
Implement cycle-detecting garbage collector for AF_UNIX sockets
The existing AF_UNIX socket garbage collector destroys any socket
which may potentially be in a cycle, as indicated by its file reference
count being equal to its enqueue count. However, this can produce false
positives for in-flight sockets which aren't part of a cycle but are
part of one or more SCM_RIGHTS mssages and which have been closed
on the sending side. If the garbage collector happens to run at
exactly the wrong time, destruction of these sockets will render them
unusable on the receiving side, such that no previously-written data
may be read.
This change rewrites the garbage collector to precisely detect cycles:
1. The existing check of msgcount==f_count is still used to determine
whether the socket is potentially in a cycle.
2. The socket is now placed on a local "dead list", which is used to
reduce iteration time (and therefore contention on the global
3. The first pass through the dead list removes each potentially-dead
socket's outgoing references from the graph of potentially-dead
sockets, using a gc-specific copy of the original reference count.
4. The second series of passes through the dead list removes from the
list any socket whose remaining gc refcount is non-zero, as this
indicates the socket is actually accessible outside of any possible
cycle. Iteration is repeated until no further sockets are removed
from the dead list.
5. Sockets remaining in the dead list are destroyed as before.
Submitted by: email@example.com (prior version)
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D23142