Bug 290658 - race condition in SOCK_SEQPACKET unix sockets
Summary: race condition in SOCK_SEQPACKET unix sockets
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Gleb Smirnoff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-10-29 15:23 UTC by Mark Johnston
Modified: 2025-11-21 16:15 UTC (History)
2 users (show)

See Also:
markj: mfc-stable15+
markj: mfc-stable14-
markj: mfc-stable13-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Johnston freebsd_committer freebsd_triage 2025-10-29 15:23:09 UTC
Suppose I try to read from a SEQPACKET PF_LOCAL socket with MSG_PEEK set.  We find the extent of the mbuf chain whose contents we're going to return:

1482                         if (m->m_flags & M_EOR) {                                                                                                                                                                                                                                                                        
1483                                 last = STAILQ_NEXT(m, m_stailq);                                                                                                                                                                                                                                                         
1484                                 lastlen = 0;                                                                                                                                                                                                                                                                             
1485                                 flags |= MSG_EOR;                                                                                                                                                                                                                                                                        
1486                                 break;                                                                                                                                                                                                                                                                                   
1487                         }

Later, we loop over mbufs in the socket buffer and stop once we see "last".  The problem is that we drop the rx socket buffer buffer lock before that loop.  So, if "last" is NULL, and the sender adds new a new record to the buffer before the loop (possible because the socket buffer lock is not held by the receiver), then we will return more than one record's worth of data:

1632         for (m = first; m != last; m = next) {                                                                                                                                                                                                                                                                           
1633                 next = STAILQ_NEXT(m, m_stailq);                                                                                                                                                                                                                                                                         
1634                 error = uiomove(mtod(m, char *), m->m_len, uio);                                                                                                                                                                                                                                                         
1635                 if (__predict_false(error)) {                                                                                                                                                                                                                                                                            
1636                         SOCK_IO_RECV_UNLOCK(so);                                                                                                                                                                                                                                                                         
1637                         if (!peek)                                                                                                                                                                                                                                                                                       
1638                                 for (; m != last; m = next) {                                                                                                                                                                                                                                                            
1639                                         next = STAILQ_NEXT(m, m_stailq);                                                                                                                                                                                                                                                 
1640                                         m_free(m);                                                                                                                                                                                                                                                                       
1641                                 }                                                                                                                                                                                                                                                                                        
1642                         return (error);                                                                                                                                                                                                                                                                                  
1643                 }                                                                                                                                                                                                                                                                                                        
1644                 if (!peek)                                                                                                                                                                                                                                                                                               
1645                         m_free(m);                                                                                                                                                                                                                                                                                       
1646         }

That is, the "m != last" check only works as intended if last != NULL or MSG_PEEK is not set.  If last == NULL, then we will traverse the whole socket buffer.

More generally, this pattern is sketchy: how do we know that some other routine isn't going to free the mbufs out from under us?  The recv I/O lock synchronizes with other recv() calls, but there are other uipc functions like uipc_sendfile() which fiddle with the receive buffer without that lock held.
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2025-10-29 15:23:37 UTC
Gleb, could you please take a look?
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2025-11-04 16:37:43 UTC
(In reply to Mark Johnston from comment #1)
Gleb, if you don't have time to work on this, please let me know soon and I will try to fix this for 15.0.
Comment 3 commit-hook freebsd_committer freebsd_triage 2025-11-14 02:40:32 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=69f61cee2efb1eec0640ca7de9b2d51599569a5d

commit 69f61cee2efb1eec0640ca7de9b2d51599569a5d
Author:     Gleb Smirnoff <glebius@FreeBSD.org>
AuthorDate: 2025-11-14 02:39:48 +0000
Commit:     Gleb Smirnoff <glebius@FreeBSD.org>
CommitDate: 2025-11-14 02:39:48 +0000

    unix/stream: fix a race with MSG_PEEK on SOCK_SEQPACKET with MSG_EOR

    The pr_soreceive method first scans the buffer holding the both I/O sx(9)
    and socket buffer mutex(9) and after figuring out how much needs to be
    copied out drops the mutex.  Since the other side may only append to the
    buffer, it is safe to continue the operation holding the sx(9) only.
    However, the code had a bug that it used pointer in the very last mbuf as
    marker of the place where to stop.  This worked both in a case when we
    drain a buffer completely (marker points at NULL) and in a case when we
    wanted to stop at MSG_EOR (marker points at next mbuf after MSG_EOR).
    However, this pointer is not consistent after we dropped the socket buffer
    mutex.

    Rewrite the logic to use the data length as bounds for the copyout cycle.

    Provide a test case that reproduces the race.  Note that the race is very
    hard to hit, thus test will pass on unmodified kernel as well.  In a
    virtual machine I needed to add tsleep(9) for 10 nanoseconds into the
    middle of function to be able to reproduce.

    PR:                     290658
    Reviewed by:            markj
    Differential Revision:  https://reviews.freebsd.org/D53632
    Fixes:                  d15792780760ef94647af9b377b5f0a80e1826bc

 sys/kern/uipc_usrreq.c               | 87 +++++++++++++++++-------------------
 tests/sys/kern/unix_seqpacket_test.c | 62 +++++++++++++++++++++++++
 2 files changed, 103 insertions(+), 46 deletions(-)
Comment 4 Colin Percival freebsd_committer freebsd_triage 2025-11-15 17:19:43 UTC
Should we be trying to get this into 15.0?
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2025-11-18 16:53:23 UTC
(In reply to Colin Percival from comment #4)
I'm not sure.  It's an annoying bug but not critical.  On the other hand, the patch which fixes it is non-trivial and early iterations of it in phabricator introduced even worse bugs.  So I'd be inclined to just omit this for 15.0.
Comment 6 Colin Percival freebsd_committer freebsd_triage 2025-11-18 17:01:05 UTC
Ok, if this turns out to be an issue for a lot of people we can do an EN for it, but otherwise it can wait for 15.1.