We have a simple test program to illustrate the problem. Run "server" on a machine, and then run "client" on the same machine. Server creates a listening socket and loops accepting connections, sending a little data, closing the socket, and then back to the top of the loop. Client creates a socket, connects to the server, does a recv() with MSG_WAITALL, closes the socket, and loops. This runs anywhere from one to a few dozen times and then hangs. The server socket is in FIN_WAIT_2, and the client socket is in CLOSE_WAIT. So the client side is waiting for the application to close the socket, but it's still stuck in the recv(), never being awakened by the EOF from the server closing the socket. The same code runs on Linux and QNX without any problem. This seems ridiculously simple and far reaching. Seems like we must be overlooking something, but it's a really simple test case. client.c: #include <stdio.h> #include <stdlib.h> #include <stdbool.h> #include <signal.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <unistd.h> #include <arpa/inet.h> #include <err.h> int main (int argc, char *argv[]) { int c; bool verbose = false; char *addr = "127.0.0.1"; signal(SIGPIPE, SIG_IGN); while ((c = getopt(argc, argv, "v")) != -1) { switch (c) { case 'v': ++verbose; break; default: return 1; break; } } argc -= optind - 1; argv += optind - 1; if (argc > 1) addr = argv[1]; for (int try = 1;; try++) { printf("Try %d\n", try); int s; if ((s = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) err(1, "socket"); struct sockaddr_in sa = { .sin_family = AF_INET, .sin_port = htons(79), .sin_addr.s_addr = inet_addr(addr) }; if (connect(s, (struct sockaddr *)&sa, sizeof sa) < 0) err(1, "connect"); // get the response char rbuf [4096]; int rcvLen; if ((rcvLen = recv(s, rbuf, sizeof rbuf, MSG_WAITALL)) < 0) err(1, "recv"); if (verbose) printf("Received: '%.*s'\n", rcvLen, rbuf); close(s); } return 0; } server.c: #include <stdio.h> #include <stdlib.h> #include <stdbool.h> #include <ctype.h> #include <signal.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <unistd.h> #include <arpa/inet.h> #include <err.h> int main (int argc, char *argv[]) { int c; bool verbose = false; signal(SIGPIPE, SIG_IGN); while ((c = getopt(argc, argv, "v")) != -1) { switch (c) { case 'v': ++verbose; break; default: return 1; break; } } argc -= optind - 1; argv += optind - 1; int s0; if ((s0 = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) err(1, "socket"); int on = true; if (setsockopt(s0, SOL_SOCKET, SO_REUSEPORT, &on, sizeof on) < 0) err(1, "setsockopt"); struct sockaddr_in sa = { .sin_family = AF_INET, .sin_port = htons(79) }; if (bind(s0, (struct sockaddr *)&sa, sizeof sa) < 0) err(1, "bind"); if (listen(s0, 1000) < 0) err(1, "listen"); printf("Listening\n"); for (int try = 1;; try++) { socklen_t sl; int s = accept(s0, (struct sockaddr *)&sa, &sl); if (s < 0) err(1, "accept"); printf("Try %d\n", try); // send the response static char sbuf [] = "Hello from server\n"; if (send(s, sbuf, sizeof sbuf - 1, 0) < 0) err(1, "send"); if (verbose) printf("Sent: %s", sbuf); shutdown(s, SHUT_WR); close(s); } return 0; }
btw, it makes no difference whether it's run in the same machine (to localhost or to the machine's own IP address) or across the network, or whether it's IPv4 or IPv6. In all cases, the recv() never unblocks when the peer closes the socket. Also, the shutdown() was added just to test, it makes no difference whether it's there or not.
The same thing happens if we change the test to use a UNIX domain socket. So it's unrelated to TCP and appears to be a more general problem in the socket layer.
A little more info... the problem appears to occur only if exactly one segment has been sent from server to client when the connection is closed. If no data has been sent, or if more than one segment has been sent, then it behaves properly.
This bug still occurs in FreeBSD 13.2, seven years since it was first reported.
I was able to reproduce this problem on a freshly installed 13.2-RELEASE with no custom configuration: FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64 FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c) VT(efifb): resolution 1024x768 CPU: Intel(R) Xeon(R) CPU E5-2697A v4 @ 2.60GHz (2600.12-MHz K8-class CPU) FWIW, the problem seems to go away if you add the following to /boot/loader.conf.local and reboot: net.inet.tcp.soreceive_stream="1" I haven't done any further investigation...
I am also able to repro with 15.0-CURRENT. The problem seems to be that we are not rechecking the socket receive buffer for errors or to see if the peer closed after calling pr_rcvd() (around line 2430 in file uipc_socket.c), having dropped the socket receive buffer lock before the call and then reacquiring it after the call. Which is to say, the socket state can change once the receive buffer lock is dropped, and hence previous assertions made while the lock was held cannot be relied upon... I've attached an initial patch, but I need to spend more time reviewing the code, as it's not clear to me yet whether we should be exiting the MSG_WAITALL loop at line 2522 if the receive buffer contains valid data. Perhaps someone more familiar with this code and chime in...
Created attachment 244510 [details] echeck socket receive buffer for errors after dropping then regaining the lock in soreceive_generic().
Created attachment 244523 [details] sockets: check socket state after call to pr_rcvd() in soreceive_generic()
(In reply to Greg Becker from comment #6) In addition, sbwait() should probably assert that SBS_CANTRECVMORE and SBS_CANTSENDMORE are not set for SO_RCV and SO_SND directions, respectively. Regarding the patch, the new check should happen in the "pr->pr_flags & PR_WANTRCVD" block, since that's where the sockbuf lock is dropped. Otherwise the existing check is effectively duplicated. The general behaviour of soreceive_generic() appears to be to consume data as long as it's present, even in the face of SBS_CANTRECVMORE and socket errors; see the "goto dontblock" statements near the beginning of the function. soreceive_stream() similarly drains the receive buffer before handling termination. I'd argue for the same approach in the MSG_WAITALL loop.
Created attachment 244533 [details] regression tests Here's a patch which adds regression tests to exercise the bug. The seqpacket test always succeeds but I think that's to be expected; MSG_WAITALL semantics don't really make sense for that socket type.
Thanks Mark, agreed. I've moved the check up to the pr_rcvd'd block and made it contingent upon sb_mb being NULL (which is what I was trying to leverage in the original patch), and that seems to be working well. I've also added a KASSERT to sbwait() as suggested. Haven't tripped it yet, but I expect we'll want to thoroughly test it. I'll post the patch in a bit, but I want to spend some more time investigating soreceive_generic() for any holes where it might bail out having not delivered all available data first.
Created attachment 244557 [details] sockets: re-check socket state after call to pr_rcvd() I tested the KASSERT in sbwait without the fix to soreceive_generic() and it panics as expected. I am now currently running kyua test to evaluate the before/after results. I'm not too worried about the apparent double test of (sb_mb == NULL), as I suspect the optimizer will eliminate the redundant test. I still haven't fully evaluated whether all data in the socket will be fully delivered before any socket errors or closed states are acted upon.
I went ahead and created a differential for this bug: https://reviews.freebsd.org/D41783
Created attachment 244732 [details] ockets: re-check socket state after call to pr_rcvd() This patch reflects the differential at https://reviews.freebsd.org/D41783
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=38426b32e106cf251520259b29779372eda7a5f8 commit 38426b32e106cf251520259b29779372eda7a5f8 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2023-08-31 17:29:29 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-09-15 16:59:46 +0000 socket tests: Add a regression test for MSG_WAITALL PR: 212716 MFC after: 1 week Sponsored by: The FreeBSD Foundation tests/sys/kern/Makefile | 2 + tests/sys/kern/socket_msg_waitall.c (new) | 181 ++++++++++++++++++++++++++++++ 2 files changed, 183 insertions(+)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f44102411e1b16363e91a074c9c403069c550e4d commit f44102411e1b16363e91a074c9c403069c550e4d Author: Greg Becker <becker.greg@att.net> AuthorDate: 2023-09-09 23:38:24 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-09-15 16:59:46 +0000 sockets: re-check socket state after call to pr_rcvd() Socket state may have changed after dropping the receive buffer lock in order to call pr_rcvd(). If the buffer is empty, re-check the state after reaquiring the lock and skip calling sbwait() if the socket is in error or the peer has closed. PR: 212716 Reviewed by: markj, glebius MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D41783 sys/kern/uipc_socket.c | 4 ++++ 1 file changed, 4 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=e4ed8e67e206c069955d541de41addcd0da05b82 commit e4ed8e67e206c069955d541de41addcd0da05b82 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2023-08-31 17:29:29 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-09-27 12:33:50 +0000 socket tests: Add a regression test for MSG_WAITALL PR: 212716 MFC after: 1 week Sponsored by: The FreeBSD Foundation (cherry picked from commit 38426b32e106cf251520259b29779372eda7a5f8) tests/sys/kern/Makefile | 2 + tests/sys/kern/socket_msg_waitall.c (new) | 181 ++++++++++++++++++++++++++++++ 2 files changed, 183 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=862d2d514cb782353bce1af99d670db08b1e6c3c commit 862d2d514cb782353bce1af99d670db08b1e6c3c Author: Greg Becker <becker.greg@att.net> AuthorDate: 2023-09-09 23:38:24 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-09-27 12:33:08 +0000 sockets: re-check socket state after call to pr_rcvd() Socket state may have changed after dropping the receive buffer lock in order to call pr_rcvd(). If the buffer is empty, re-check the state after reaquiring the lock and skip calling sbwait() if the socket is in error or the peer has closed. PR: 212716 Reviewed by: markj, glebius MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D41783 (cherry picked from commit f44102411e1b16363e91a074c9c403069c550e4d) sys/kern/uipc_socket.c | 4 ++++ 1 file changed, 4 insertions(+)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=811b3d0f0fe5f1cf6b6fbe4a14103bcab9c400df commit 811b3d0f0fe5f1cf6b6fbe4a14103bcab9c400df Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2023-08-31 17:29:29 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-09-27 12:33:24 +0000 socket tests: Add a regression test for MSG_WAITALL PR: 212716 MFC after: 1 week Sponsored by: The FreeBSD Foundation (cherry picked from commit 38426b32e106cf251520259b29779372eda7a5f8) tests/sys/kern/Makefile | 2 + tests/sys/kern/socket_msg_waitall.c (new) | 181 ++++++++++++++++++++++++++++++ 2 files changed, 183 insertions(+)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=caaea455773898bee9a0c59466a84256192e0980 commit caaea455773898bee9a0c59466a84256192e0980 Author: Greg Becker <becker.greg@att.net> AuthorDate: 2023-09-09 23:38:24 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-09-27 12:33:05 +0000 sockets: re-check socket state after call to pr_rcvd() Socket state may have changed after dropping the receive buffer lock in order to call pr_rcvd(). If the buffer is empty, re-check the state after reaquiring the lock and skip calling sbwait() if the socket is in error or the peer has closed. PR: 212716 Reviewed by: markj, glebius MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D41783 (cherry picked from commit f44102411e1b16363e91a074c9c403069c550e4d) sys/kern/uipc_socket.c | 4 ++++ 1 file changed, 4 insertions(+)
I'll get this into 14.0 as well. Thanks to Greg for taking a look at this.
A commit in branch releng/14.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=d7ee24d0d5aed1180e9f65058c3a5ff45a52210e commit d7ee24d0d5aed1180e9f65058c3a5ff45a52210e Author: Greg Becker <becker.greg@att.net> AuthorDate: 2023-09-09 23:38:24 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-09-28 20:41:09 +0000 sockets: re-check socket state after call to pr_rcvd() Socket state may have changed after dropping the receive buffer lock in order to call pr_rcvd(). If the buffer is empty, re-check the state after reaquiring the lock and skip calling sbwait() if the socket is in error or the peer has closed. Approved by: re (gjb) PR: 212716 Reviewed by: markj, glebius MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D41783 (cherry picked from commit f44102411e1b16363e91a074c9c403069c550e4d) (cherry picked from commit caaea455773898bee9a0c59466a84256192e0980) sys/kern/uipc_socket.c | 4 ++++ 1 file changed, 4 insertions(+)