Suppose I try to read from a SEQPACKET PF_LOCAL socket with MSG_PEEK set. We find the extent of the mbuf chain whose contents we're going to return: 1482 if (m->m_flags & M_EOR) { 1483 last = STAILQ_NEXT(m, m_stailq); 1484 lastlen = 0; 1485 flags |= MSG_EOR; 1486 break; 1487 } Later, we loop over mbufs in the socket buffer and stop once we see "last". The problem is that we drop the rx socket buffer buffer lock before that loop. So, if "last" is NULL, and the sender adds new a new record to the buffer before the loop (possible because the socket buffer lock is not held by the receiver), then we will return more than one record's worth of data: 1632 for (m = first; m != last; m = next) { 1633 next = STAILQ_NEXT(m, m_stailq); 1634 error = uiomove(mtod(m, char *), m->m_len, uio); 1635 if (__predict_false(error)) { 1636 SOCK_IO_RECV_UNLOCK(so); 1637 if (!peek) 1638 for (; m != last; m = next) { 1639 next = STAILQ_NEXT(m, m_stailq); 1640 m_free(m); 1641 } 1642 return (error); 1643 } 1644 if (!peek) 1645 m_free(m); 1646 } That is, the "m != last" check only works as intended if last != NULL or MSG_PEEK is not set. If last == NULL, then we will traverse the whole socket buffer. More generally, this pattern is sketchy: how do we know that some other routine isn't going to free the mbufs out from under us? The recv I/O lock synchronizes with other recv() calls, but there are other uipc functions like uipc_sendfile() which fiddle with the receive buffer without that lock held.
Gleb, could you please take a look?
(In reply to Mark Johnston from comment #1) Gleb, if you don't have time to work on this, please let me know soon and I will try to fix this for 15.0.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=69f61cee2efb1eec0640ca7de9b2d51599569a5d commit 69f61cee2efb1eec0640ca7de9b2d51599569a5d Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2025-11-14 02:39:48 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2025-11-14 02:39:48 +0000 unix/stream: fix a race with MSG_PEEK on SOCK_SEQPACKET with MSG_EOR The pr_soreceive method first scans the buffer holding the both I/O sx(9) and socket buffer mutex(9) and after figuring out how much needs to be copied out drops the mutex. Since the other side may only append to the buffer, it is safe to continue the operation holding the sx(9) only. However, the code had a bug that it used pointer in the very last mbuf as marker of the place where to stop. This worked both in a case when we drain a buffer completely (marker points at NULL) and in a case when we wanted to stop at MSG_EOR (marker points at next mbuf after MSG_EOR). However, this pointer is not consistent after we dropped the socket buffer mutex. Rewrite the logic to use the data length as bounds for the copyout cycle. Provide a test case that reproduces the race. Note that the race is very hard to hit, thus test will pass on unmodified kernel as well. In a virtual machine I needed to add tsleep(9) for 10 nanoseconds into the middle of function to be able to reproduce. PR: 290658 Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D53632 Fixes: d15792780760ef94647af9b377b5f0a80e1826bc sys/kern/uipc_usrreq.c | 87 +++++++++++++++++------------------- tests/sys/kern/unix_seqpacket_test.c | 62 +++++++++++++++++++++++++ 2 files changed, 103 insertions(+), 46 deletions(-)
Should we be trying to get this into 15.0?
(In reply to Colin Percival from comment #4) I'm not sure. It's an annoying bug but not critical. On the other hand, the patch which fixes it is non-trivial and early iterations of it in phabricator introduced even worse bugs. So I'd be inclined to just omit this for 15.0.
Ok, if this turns out to be an issue for a lot of people we can do an EN for it, but otherwise it can wait for 15.1.