Bug 212716

Summary: recv() with MSG_WAITALL doesn't always unblock on EOF
Product: Base System Reporter: Lewis Donzis <lew>
Component: kernAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Many People CC: greg, lew, markj
Priority: ---    
Version: 13.2-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
echeck socket receive buffer for errors after dropping then regaining the lock in soreceive_generic().
none
sockets: check socket state after call to pr_rcvd() in soreceive_generic()
none
regression tests
none
sockets: re-check socket state after call to pr_rcvd()
none
ockets: re-check socket state after call to pr_rcvd() none

Description Lewis Donzis 2016-09-16 00:39:57 UTC
We have a simple test program to illustrate the problem.

Run "server" on a machine, and then run "client" on the same machine.

Server creates a listening socket and loops accepting connections, sending a little data, closing the socket, and then back to the top of the loop.

Client creates a socket, connects to the server, does a recv() with MSG_WAITALL, closes the socket, and loops.

This runs anywhere from one to a few dozen times and then hangs.  The server socket is in FIN_WAIT_2, and the client socket is in CLOSE_WAIT.  So the client side is waiting for the application to close the socket, but it's still stuck in the recv(), never being awakened by the EOF from the server closing the socket.

The same code runs on Linux and QNX without any problem.

This seems ridiculously simple and far reaching.  Seems like we must be overlooking something, but it's a really simple test case.

client.c:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <err.h>

int main (int argc, char *argv[])
{
   int c;
   bool verbose = false;
   char *addr = "127.0.0.1";

   signal(SIGPIPE, SIG_IGN);
   while ((c = getopt(argc, argv, "v")) != -1) {
      switch (c) {
         case 'v': ++verbose;    break;
         default:  return 1;     break;
         }
      }
   argc -= optind - 1;
   argv += optind - 1;

   if (argc > 1) addr = argv[1];

   for (int try = 1;; try++) {
      printf("Try %d\n", try);
      int s;
      if ((s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0)
         err(1, "socket");
   
      struct sockaddr_in sa = { .sin_family = AF_INET, .sin_port = htons(79), .sin_addr.s_addr = inet_addr(addr) };
      if (connect(s, (struct sockaddr *)&sa, sizeof sa) < 0)
         err(1, "connect");

      // get the response
      char rbuf [4096];
      int rcvLen;
      if ((rcvLen = recv(s, rbuf, sizeof rbuf, MSG_WAITALL)) < 0)
         err(1, "recv");
      if (verbose) printf("Received: '%.*s'\n", rcvLen, rbuf);
      close(s);
      }

   return 0;
   }


server.c:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <ctype.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <err.h>

int main (int argc, char *argv[])
{
   int c;
   bool verbose = false;

   signal(SIGPIPE, SIG_IGN);
   while ((c = getopt(argc, argv, "v")) != -1) {
      switch (c) {
         case 'v': ++verbose;    break;
         default:  return 1;     break;
         }
      }
   argc -= optind - 1;
   argv += optind - 1;

   int s0;
   if ((s0 = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0)
      err(1, "socket");
   int on = true;
   if (setsockopt(s0, SOL_SOCKET, SO_REUSEPORT, &on, sizeof on) < 0)
      err(1, "setsockopt");
   struct sockaddr_in sa = { .sin_family = AF_INET, .sin_port = htons(79) };

   if (bind(s0, (struct sockaddr *)&sa, sizeof sa) < 0)
      err(1, "bind");
   if (listen(s0, 1000) < 0)
      err(1, "listen");

   printf("Listening\n");

   for (int try = 1;; try++) {
      socklen_t sl;
      int s = accept(s0, (struct sockaddr *)&sa, &sl);
      if (s < 0) err(1, "accept");
      printf("Try %d\n", try);

      // send the response
      static char sbuf [] = "Hello from server\n";
      if (send(s, sbuf, sizeof sbuf - 1, 0) < 0)
         err(1, "send");
      if (verbose) printf("Sent:     %s", sbuf);
      shutdown(s, SHUT_WR);
      close(s);
      }

   return 0;
   }
Comment 1 Lewis Donzis 2016-09-16 01:26:56 UTC
btw, it makes no difference whether it's run in the same machine (to localhost or to the machine's own IP address) or across the network, or whether it's IPv4 or IPv6.  In all cases, the recv() never unblocks when the peer closes the socket.  Also, the shutdown() was added just to test, it makes no difference whether it's there or not.
Comment 2 Lewis Donzis 2016-09-17 12:11:24 UTC
The same thing happens if we change the test to use a UNIX domain socket.  So it's unrelated to TCP and appears to be a more general problem in the socket layer.
Comment 3 Lewis Donzis 2016-11-07 14:23:19 UTC
A little more info... the problem appears to occur only if exactly one segment has been sent from server to client when the connection is closed.  If no data has been sent, or if more than one segment has been sent, then it behaves properly.
Comment 4 Lewis Donzis 2023-08-29 22:52:51 UTC
This bug still occurs in FreeBSD 13.2, seven years since it was first reported.
Comment 5 Greg Becker 2023-08-30 15:37:28 UTC
I was able to reproduce this problem on a freshly installed 13.2-RELEASE with no custom configuration:

FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64
FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)
VT(efifb): resolution 1024x768
CPU: Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz (2600.12-MHz K8-class CPU)


FWIW, the problem seems to go away if you add the following to /boot/loader.conf.local and reboot:

net.inet.tcp.soreceive_stream="1"

I haven't done any further investigation...
Comment 6 Greg Becker 2023-08-30 22:34:59 UTC
I am also able to repro with 15.0-CURRENT.

The problem seems to be that we are not rechecking the socket receive buffer for errors or to see if the peer closed after calling pr_rcvd() (around line 2430 in file uipc_socket.c), having dropped the socket receive buffer lock before the call and then reacquiring it after the call.  Which is to say, the socket state can change once the receive buffer lock is dropped, and hence previous assertions made while the lock was held cannot be relied upon...

I've attached an initial patch, but I need to spend more time reviewing the code, as it's not clear to me yet whether we should be exiting the MSG_WAITALL loop at line 2522 if the receive buffer contains valid data.  Perhaps someone more familiar with this code and chime in...
Comment 7 Greg Becker 2023-08-30 22:36:51 UTC
Created attachment 244510 [details]
echeck socket receive buffer for errors after dropping then regaining the lock in soreceive_generic().
Comment 8 Greg Becker 2023-08-31 14:57:46 UTC
Created attachment 244523 [details]
sockets: check socket state after call to pr_rcvd() in soreceive_generic()
Comment 9 Mark Johnston freebsd_committer freebsd_triage 2023-08-31 16:07:43 UTC
(In reply to Greg Becker from comment #6)
In addition, sbwait() should probably assert that SBS_CANTRECVMORE and SBS_CANTSENDMORE are not set for SO_RCV and SO_SND directions, respectively.

Regarding the patch, the new check should happen in the "pr->pr_flags & PR_WANTRCVD" block, since that's where the sockbuf lock is dropped.  Otherwise the existing check is effectively duplicated.

The general behaviour of soreceive_generic() appears to be to consume data as long as it's present, even in the face of SBS_CANTRECVMORE and socket errors; see the "goto dontblock" statements near the beginning of the function.  soreceive_stream() similarly drains the receive buffer before handling termination.  I'd argue for the same approach in the MSG_WAITALL loop.
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2023-08-31 17:33:11 UTC
Created attachment 244533 [details]
regression tests

Here's a patch which adds regression tests to exercise the bug.  The seqpacket test always succeeds but I think that's to be expected; MSG_WAITALL semantics don't really make sense for that socket type.
Comment 11 Greg Becker 2023-09-01 00:32:24 UTC
Thanks Mark, agreed.  I've moved the check up to the pr_rcvd'd block and made it contingent upon sb_mb being NULL (which is what I was trying to leverage in the original patch), and that seems to be working well.

I've also added a KASSERT to sbwait() as suggested.  Haven't tripped it yet, but I expect we'll want to thoroughly test it.  I'll post the patch in a bit, but I want to spend some more time investigating soreceive_generic() for any holes where it might bail out having not delivered all available data first.
Comment 12 Greg Becker 2023-09-01 13:28:04 UTC
Created attachment 244557 [details]
sockets: re-check socket state after call to pr_rcvd()

I tested the KASSERT in sbwait without the fix to soreceive_generic() and it panics as expected.  I am now currently running kyua test to evaluate the before/after results.

I'm not too worried about the apparent double test of (sb_mb == NULL), as I suspect the optimizer will eliminate the redundant test.

I still haven't fully evaluated whether all data in the socket will be fully delivered before any socket errors or closed states are acted upon.
Comment 13 Greg Becker 2023-09-08 15:14:46 UTC
I went ahead and created a differential for this bug:

https://reviews.freebsd.org/D41783
Comment 14 Greg Becker 2023-09-09 13:33:25 UTC
Created attachment 244732 [details]
ockets: re-check socket state after call to pr_rcvd()

This patch reflects the differential at https://reviews.freebsd.org/D41783
Comment 15 commit-hook freebsd_committer freebsd_triage 2023-09-16 09:27:59 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=38426b32e106cf251520259b29779372eda7a5f8

commit 38426b32e106cf251520259b29779372eda7a5f8
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-08-31 17:29:29 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-09-15 16:59:46 +0000

    socket tests: Add a regression test for MSG_WAITALL

    PR:             212716
    MFC after:      1 week
    Sponsored by:   The FreeBSD Foundation

 tests/sys/kern/Makefile                   |   2 +
 tests/sys/kern/socket_msg_waitall.c (new) | 181 ++++++++++++++++++++++++++++++
 2 files changed, 183 insertions(+)
Comment 16 commit-hook freebsd_committer freebsd_triage 2023-09-16 09:28:00 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f44102411e1b16363e91a074c9c403069c550e4d

commit f44102411e1b16363e91a074c9c403069c550e4d
Author:     Greg Becker <becker.greg@att.net>
AuthorDate: 2023-09-09 23:38:24 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-09-15 16:59:46 +0000

    sockets: re-check socket state after call to pr_rcvd()

    Socket state may have changed after dropping the receive
    buffer lock in order to call pr_rcvd().  If the buffer is
    empty, re-check the state after reaquiring the lock and
    skip calling sbwait() if the socket is in error or the
    peer has closed.

    PR:             212716
    Reviewed by:    markj, glebius
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D41783

 sys/kern/uipc_socket.c | 4 ++++
 1 file changed, 4 insertions(+)
Comment 17 commit-hook freebsd_committer freebsd_triage 2023-09-27 12:53:14 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e4ed8e67e206c069955d541de41addcd0da05b82

commit e4ed8e67e206c069955d541de41addcd0da05b82
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-08-31 17:29:29 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-09-27 12:33:50 +0000

    socket tests: Add a regression test for MSG_WAITALL

    PR:             212716
    MFC after:      1 week
    Sponsored by:   The FreeBSD Foundation

    (cherry picked from commit 38426b32e106cf251520259b29779372eda7a5f8)

 tests/sys/kern/Makefile                   |   2 +
 tests/sys/kern/socket_msg_waitall.c (new) | 181 ++++++++++++++++++++++++++++++
 2 files changed, 183 insertions(+)
Comment 18 commit-hook freebsd_committer freebsd_triage 2023-09-27 12:53:15 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=862d2d514cb782353bce1af99d670db08b1e6c3c

commit 862d2d514cb782353bce1af99d670db08b1e6c3c
Author:     Greg Becker <becker.greg@att.net>
AuthorDate: 2023-09-09 23:38:24 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-09-27 12:33:08 +0000

    sockets: re-check socket state after call to pr_rcvd()

    Socket state may have changed after dropping the receive
    buffer lock in order to call pr_rcvd().  If the buffer is
    empty, re-check the state after reaquiring the lock and
    skip calling sbwait() if the socket is in error or the
    peer has closed.

    PR:             212716
    Reviewed by:    markj, glebius
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D41783

    (cherry picked from commit f44102411e1b16363e91a074c9c403069c550e4d)

 sys/kern/uipc_socket.c | 4 ++++
 1 file changed, 4 insertions(+)
Comment 19 commit-hook freebsd_committer freebsd_triage 2023-09-27 12:54:16 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=811b3d0f0fe5f1cf6b6fbe4a14103bcab9c400df

commit 811b3d0f0fe5f1cf6b6fbe4a14103bcab9c400df
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-08-31 17:29:29 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-09-27 12:33:24 +0000

    socket tests: Add a regression test for MSG_WAITALL

    PR:             212716
    MFC after:      1 week
    Sponsored by:   The FreeBSD Foundation

    (cherry picked from commit 38426b32e106cf251520259b29779372eda7a5f8)

 tests/sys/kern/Makefile                   |   2 +
 tests/sys/kern/socket_msg_waitall.c (new) | 181 ++++++++++++++++++++++++++++++
 2 files changed, 183 insertions(+)
Comment 20 commit-hook freebsd_committer freebsd_triage 2023-09-27 12:54:17 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=caaea455773898bee9a0c59466a84256192e0980

commit caaea455773898bee9a0c59466a84256192e0980
Author:     Greg Becker <becker.greg@att.net>
AuthorDate: 2023-09-09 23:38:24 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-09-27 12:33:05 +0000

    sockets: re-check socket state after call to pr_rcvd()

    Socket state may have changed after dropping the receive
    buffer lock in order to call pr_rcvd().  If the buffer is
    empty, re-check the state after reaquiring the lock and
    skip calling sbwait() if the socket is in error or the
    peer has closed.

    PR:             212716
    Reviewed by:    markj, glebius
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D41783

    (cherry picked from commit f44102411e1b16363e91a074c9c403069c550e4d)

 sys/kern/uipc_socket.c | 4 ++++
 1 file changed, 4 insertions(+)
Comment 21 Mark Johnston freebsd_committer freebsd_triage 2023-09-27 12:55:59 UTC
I'll get this into 14.0 as well.  Thanks to Greg for taking a look at this.
Comment 22 commit-hook freebsd_committer freebsd_triage 2023-09-28 20:42:21 UTC
A commit in branch releng/14.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d7ee24d0d5aed1180e9f65058c3a5ff45a52210e

commit d7ee24d0d5aed1180e9f65058c3a5ff45a52210e
Author:     Greg Becker <becker.greg@att.net>
AuthorDate: 2023-09-09 23:38:24 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-09-28 20:41:09 +0000

    sockets: re-check socket state after call to pr_rcvd()

    Socket state may have changed after dropping the receive
    buffer lock in order to call pr_rcvd().  If the buffer is
    empty, re-check the state after reaquiring the lock and
    skip calling sbwait() if the socket is in error or the
    peer has closed.

    Approved by:    re (gjb)
    PR:             212716
    Reviewed by:    markj, glebius
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D41783

    (cherry picked from commit f44102411e1b16363e91a074c9c403069c550e4d)
    (cherry picked from commit caaea455773898bee9a0c59466a84256192e0980)

 sys/kern/uipc_socket.c | 4 ++++
 1 file changed, 4 insertions(+)