Bug 181741 - Packet loss when 'control' messages are present with large data (sendmsg(2))
Summary: Packet loss when 'control' messages are present with large data (sendmsg(2))
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Mark Johnston
URL:
Keywords: needs-qa
: 215933 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-09-02 05:50 UTC by Yuri Victorovich
Modified: 2019-04-20 15:04 UTC (History)
15 users (show)

See Also:
koobs: mfc-stable11+
koobs: mfc-stable10-


Attachments
file.diff (8.61 KB, patch)
2013-09-02 05:50 UTC, Yuri Victorovich
no flags Details | Diff
patch-10-net-control-loss-3.patch (8.69 KB, patch)
2013-09-03 07:03 UTC, Yuri Victorovich
no flags Details | Diff
unix_passfd.c.diff (4.67 KB, patch)
2013-10-02 23:26 UTC, Gleb Smirnoff
no flags Details | Diff
updated patch series, applies to -r297514 (30.62 KB, patch)
2016-04-03 21:14 UTC, chris.torek
no flags Details | Diff
Chris Torek's patch 168943 from comment #10 rebased to 11.2-STABLE r336040 (21.13 KB, patch)
2018-07-20 17:49 UTC, Don Lewis
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Victorovich freebsd_committer 2013-09-02 05:50:00 UTC
There is the case when sendmsg(2) silently loses packets for AF_LOCAL domain when large packets with control part in them are sent.

* Problem Description
There is the watermark limit on sockbuf determined by net.local.stream.sendspace, default is 8192 bytes (field sockbuf.sb_hiwat).
When sendmsg(2) sends large enough data (8K+ that hits this 8192 limit) with control message, sosend_generic will be cutting the message data into separate mbufs based on 'sbspace' (derived from the above-mentioned sb_hiwat limit) with adjustment for control message size as it sees it. This way it tries to make sure this sb_hiwat limit is enforced.

However, down on uipc level control message is being further modified in two ways: unp_internalize modifies it into some 'internal' form, also unp_addsockcred function adds another control message when LOCAL_CREDS are requested by client. Both functions only increase control message size beyond its original size (seen by sosend_generic). So that the first final mbuf sent (concatenation of control and data) will always be larger than 'sbspace' limit that sosend_generic was cutting data for.

There is also the function sbappendcontrol_locked. It checks the 'sbspace' limit again, and discards the packet when sbspace llimit is exceeded. Its result code is essentially ignored in uipc_send. I believe, sbappendcontrol_locked shouldn't be checking space at all, since packets are expected to be properly sized to begin with. But this won't be the right fix, since sizes would be exceeding the sbspace limit anyway.

sosend_default is one level up over uipc level, and it doesn't know what uipc will do with control message. Therefore it can't know what the real adjustment for control message is needed (to properly cut data). It wrongly takes the original control size and this makes the first packet too large and discarded by sbappendcontrol_locked.

* Patch synopsys
- Added the new function into struct pr_usrreqs:
int     (*pru_finalizecontrol)(struct socket *so, int flags, struct mbuf **pcontrol);
It is called by sosend_generic to update control message to its final form.

- Removed 'sbspace' check from sbappendcontrol_locked. The only context it is called from is uipc_send, and all packet sizes are already conforming.

- Fixed few wrong error codes relevant to this situation

Fix: Patch attached with submission follows:
Comment 1 Yuri Victorovich freebsd_committer 2013-09-02 09:56:33 UTC
Please hold off from checking this in.
I will submit the updated patch.

Yuri
Comment 2 Yuri Victorovich freebsd_committer 2013-09-03 07:03:41 UTC
Here is the updated patch.
Comment 3 Yuri Victorovich freebsd_committer 2013-09-03 07:45:45 UTC
I originally came across this issue while running tmux (terminal 
multiplexer) under linuxlator. For some reason, tmux under linuxlator 
sends data into the local socket in larger portions on Linux, control 
message was included, and this exposed this bug. FreeBSD version of tmux 
doesn't expose this problem.
Comment 4 Mark Linimon freebsd_committer freebsd_triage 2013-09-10 02:39:52 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 5 Gleb Smirnoff freebsd_committer 2013-10-02 23:26:49 UTC
  I've made a test case for this problem.
  Patch for unix_passfd test from our test suite.

-- 
Totus tuus, Glebius.
Comment 6 dfilter service freebsd_committer 2014-02-06 13:18:18 UTC
Author: glebius
Date: Thu Feb  6 13:18:10 2014
New Revision: 261550
URL: http://svnweb.freebsd.org/changeset/base/261550

Log:
  Add test case for kern/181741. Right now test fails.
  
  PR:		181741
  Sponsored by:	Nginx, Inc.

Modified:
  head/tools/regression/sockets/unix_passfd/unix_passfd.c

Modified: head/tools/regression/sockets/unix_passfd/unix_passfd.c
==============================================================================
--- head/tools/regression/sockets/unix_passfd/unix_passfd.c	Thu Feb  6 12:43:06 2014	(r261549)
+++ head/tools/regression/sockets/unix_passfd/unix_passfd.c	Thu Feb  6 13:18:10 2014	(r261550)
@@ -29,11 +29,14 @@
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <sys/stat.h>
+#include <sys/sysctl.h>
+#include <sys/un.h>
 
 #include <err.h>
 #include <fcntl.h>
 #include <limits.h>
 #include <stdio.h>
+#include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 
@@ -106,11 +109,10 @@ samefile(const char *test, struct stat *
 }
 
 static void
-sendfd(const char *test, int sockfd, int sendfd)
+sendfd_payload(const char *test, int sockfd, int sendfd,
+    void *payload, size_t paylen)
 {
 	struct iovec iovec;
-	char ch;
-
 	char message[CMSG_SPACE(sizeof(int))];
 	struct cmsghdr *cmsghdr;
 	struct msghdr msghdr;
@@ -118,13 +120,12 @@ sendfd(const char *test, int sockfd, int
 
 	bzero(&msghdr, sizeof(msghdr));
 	bzero(&message, sizeof(message));
-	ch = 0;
 
 	msghdr.msg_control = message;
 	msghdr.msg_controllen = sizeof(message);
 
-	iovec.iov_base = &ch;
-	iovec.iov_len = sizeof(ch);
+	iovec.iov_base = payload;
+	iovec.iov_len = paylen;
 
 	msghdr.msg_iov = &iovec;
 	msghdr.msg_iovlen = 1;
@@ -138,33 +139,35 @@ sendfd(const char *test, int sockfd, int
 	len = sendmsg(sockfd, &msghdr, 0);
 	if (len < 0)
 		err(-1, "%s: sendmsg", test);
-	if (len != sizeof(ch))
+	if (len != paylen)
 		errx(-1, "%s: sendmsg: %zd bytes sent", test, len);
 }
 
 static void
-recvfd(const char *test, int sockfd, int *recvfd)
+sendfd(const char *test, int sockfd, int sendfd)
+{
+	char ch;
+
+	return (sendfd_payload(test, sockfd, sendfd, &ch, sizeof(ch)));
+}
+
+static void
+recvfd_payload(const char *test, int sockfd, int *recvfd,
+    void *buf, size_t buflen)
 {
 	struct cmsghdr *cmsghdr;
-	char message[CMSG_SPACE(sizeof(int))];
+	char message[CMSG_SPACE(SOCKCREDSIZE(CMGROUP_MAX)) + sizeof(int)];
 	struct msghdr msghdr;
 	struct iovec iovec;
 	ssize_t len;
-	char ch;
 
 	bzero(&msghdr, sizeof(msghdr));
-	ch = 0;
 
 	msghdr.msg_control = message;
 	msghdr.msg_controllen = sizeof(message);
 
-	iovec.iov_base = &ch;
-	iovec.iov_len = sizeof(ch);
-
-	msghdr.msg_iov = &iovec;
-	msghdr.msg_iovlen = 1;
-
-	iovec.iov_len = sizeof(ch);
+	iovec.iov_base = buf;
+	iovec.iov_len = buflen;
 
 	msghdr.msg_iov = &iovec;
 	msghdr.msg_iovlen = 1;
@@ -172,19 +175,33 @@ recvfd(const char *test, int sockfd, int
 	len = recvmsg(sockfd, &msghdr, 0);
 	if (len < 0)
 		err(-1, "%s: recvmsg", test);
-	if (len != sizeof(ch))
+	if (len != buflen)
 		errx(-1, "%s: recvmsg: %zd bytes received", test, len);
+
 	cmsghdr = CMSG_FIRSTHDR(&msghdr);
 	if (cmsghdr == NULL)
 		errx(-1, "%s: recvmsg: did not receive control message", test);
-	if (cmsghdr->cmsg_len != CMSG_LEN(sizeof(int)) ||
-	    cmsghdr->cmsg_level != SOL_SOCKET ||
-	    cmsghdr->cmsg_type != SCM_RIGHTS)
+	*recvfd = -1;
+	for (; cmsghdr != NULL; cmsghdr = CMSG_NXTHDR(&msghdr, cmsghdr)) {
+		if (cmsghdr->cmsg_level == SOL_SOCKET &&
+		    cmsghdr->cmsg_type == SCM_RIGHTS &&
+		    cmsghdr->cmsg_len == CMSG_LEN(sizeof(int))) {
+			*recvfd = *(int *)CMSG_DATA(cmsghdr);
+			if (*recvfd == -1)
+				errx(-1, "%s: recvmsg: received fd -1", test);
+		}
+	}
+	if (*recvfd == -1)
 		errx(-1, "%s: recvmsg: did not receive single-fd message",
 		    test);
-	*recvfd = *(int *)CMSG_DATA(cmsghdr);
-	if (*recvfd == -1)
-		errx(-1, "%s: recvmsg: received fd -1", test);
+}
+
+static void
+recvfd(const char *test, int sockfd, int *recvfd)
+{
+	char ch;
+
+	return (recvfd_payload(test, sockfd, recvfd, &ch, sizeof(ch)));
 }
 
 int
@@ -330,6 +347,43 @@ main(int argc, char *argv[])
 	closesocketpair(fd);
 
 	printf("%s passed\n", test);
+
+	/*
+	 * Test for PR 181741. Receiver sets LOCAL_CREDS, and kernel
+	 * prepends a control message to the data. Sender sends large
+	 * payload. Payload + SCM_RIGHTS + LOCAL_CREDS hit socket buffer
+	 * limit, and receiver receives truncated data.
+	 */
+	test = "test8-rigths+creds+payload";
+	printf("beginning %s\n", test);
+
+	{
+		const int on = 1;
+		u_long sendspace;
+		size_t len;
+		void *buf;
+
+		len = sizeof(sendspace);
+		if (sysctlbyname("net.local.stream.sendspace", &sendspace,
+		    &len, NULL, 0) < 0)
+			err(-1, "%s: sysctlbyname(net.local.stream.sendspace)",
+			    test);
+
+		if ((buf = malloc(sendspace)) == NULL)
+			err(-1, "%s: malloc", test);
+
+		domainsocketpair(test, fd);
+		if (setsockopt(fd[1], 0, LOCAL_CREDS, &on, sizeof(on)) < 0)
+			err(-1, "%s: setsockopt(LOCAL_CREDS)", test);
+		tempfile(test, &putfd_1);
+		sendfd_payload(test, fd[0], putfd_1, buf, sendspace);
+		recvfd_payload(test, fd[1], &getfd_1, buf, sendspace);
+		close(putfd_1);
+		close(getfd_1);
+		closesocketpair(fd);
+	}
+
+	printf("%s passed\n", test);
 	
 	return (0);
 }
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 7 Alan Somers freebsd_committer 2014-02-20 17:29:27 UTC
I've been working on kern/185813, which is closely related.  My
comments on your patch:

1) In uipc_socket.c, you twice do "if (clen > space) error =
EMSGSIZE".  Should the comparison be with sp->so_snd->sb_hiwat instead
of space?  Space shrinks when the sockbuf is partially full, but
sb_hiwat is constant.  (Actually, sb_hiwat also shrinks for Unix
domain sockets, but I am going to fix that as part of kern/185812).

2) In uipc_finalizecontrol(), I think that you need to grab
UNP_LINK_RLOCK to protect the linkage between unp and unp2.

3) It would be fantastic if you could convert the testcase to ATF
format.  ATF is the new format that all testcases should use going
forward.  It's easily automatable, unlike the stuff in
tools/regression, and it's very flexible too.
https://wiki.freebsd.org/TestingFreeBSD

4) I think there are some tab/space issues with the patch, but I'm not
positive because I'm reading it in Firefox.

-Alan
Comment 8 Harald Schmalzbauer 2014-08-25 17:21:16 UTC
Is there anybody who can make that patch happen for 10.1?

Thanks a lot,

-Harry
Comment 9 commit-hook freebsd_committer 2015-04-25 05:32:37 UTC
A commit references this bug:

Author: ngie
Date: Sat Apr 25 05:32:03 UTC 2015
New revision: 281974
URL: https://svnweb.freebsd.org/changeset/base/281974

Log:
  MFC r261550,r281354,r281355,r281356,r281358,r281359,r281360,r281361,r281362,r281391,r281392,r281393,r281394,r281395,r281397,r281398,r281399,r281400,r281401,r281402,r281403,r281404,r281407,r281408,r281409,r281410,r281411:

  r261550 (by glebius):

  Add test case for kern/181741. Right now test fails.

  PR:		181741
  Sponsored by:	Nginx, Inc.

  r281354:

  Fix warnings, fix a typo in a testcase description, bump WARNS to 3

  - Remove argc/argv (-Wunused)
  - Cast len in comparison to size_t (-Wsign-compare)

  Sponsored by: EMC / Isilon Storage Division

  r281355:

  Fix -Wunused warnings, bump WARNS to 6

  The testcase fails today on subtest # 9

  The output is still broken if prove -rv is run and the testcase aborts
  prematurely (the testcase doesn't really conform to TAP protocol properly,
  except when it completes fully)

  Sponsored by: EMC / Isilon Storage Division

  r281356:

  Fix -Wunused warnings, bump WARNS to 6

  The output is still broken if prove -rv is run and the testcase aborts
  prematurely with fail_assertion (the testcase doesn't really conform to TAP
  protocol properly, except when it completes fully)

  Sponsored by: EMC / Isilon Storage Division

  r281358:

  - Parameterize out the number of accept/connect attempts
  - Randomize the bind port to allow 2+ consecutive calls in < 10 minutes, and
    to also not fail if (for instance) there's a server already listening on port
    8080
  - Don't leak the listening socket / fds into the child process
  - Fix warnings:
  -- Remove argc/argv (-Wunused)
  -- Mark sig __unused (-Wunused)
  -- Mark quit static (-Wmissing-variable-declarations)

  Sponsored by: EMC / Isilon Storage Division

  r281359:

  Remove argc/argv (-Wunused)

  Sponsored by: EMC / Isilon Storage Division

  r281360:

  Fix warnings

  - Remove argc/argv (-Wunused)
  - Mark some parameters to socket_listen_update __unused (-Wunused)

  Sponsored by: EMC / Isilon Storage Division

  r281361:

  Remove argc/argv (-Wunused)

  Sponsored by: EMC / Isilon Storage Division

  r281362:

  Use _exit, not exit in forked process

  Sponsored by: EMC / Isilon Storage Division

  r281391:

  - Use static buffers for temporary file paths instead of strdup of constant strings
  - Don't use /tmp because it's outside ATF's prescribed sandbox
  - Use mkstemp instead of mktemp to eliminate warning

  Sponsored by: EMC / Isilon Storage Division

  r281392:

  - Garbage collect argc/argv (-Wunused)
  - Bump WARNS to 6

  Sponsored by: EMC / Isilon Storage Division

  r281393:

  Fix warnings and bump WARNS to 6
  - Garbage collect argc/argv (-Wunused)
  - sleep(3) will always return an unsigned int; don't check for return codes <0
    (-Wsign-compare)

  Sponsored by: EMC / Isilon Storage Division

  r281394:

  - Don't use /tmp because it's outside ATF's prescribed sandbox
  - Replace a hardcoded PATH_MAX value with sizeof(path)
  - Use path like an array, not a pointer, and always try to unlink it in cleanup

  Sponsored by: EMC / Isilon Storage Division

  r281395:

  Fix a -Wuninitialized warning by setting the socket to -1 and bump WARNS to 6

  Sponsored by: EMC / Isilon Storage Division

  r281397:

  Mark signum unused in signal_handler; bump WARNS to 6

  Sponsored by: EMC / Isilon Storage Division

  r281398:

  Garbage collect argc/argv and bump WARNS to 6

  Sponsored by: EMC / Isilon Storage Division

  r281399:

  Fix warnings and bump WARNS to 6
  - Staticize variables as needed
  - Garbage collect argc/argv
  - Fix -Wsign-compare warnings by casting small sizeof to (int)

  Sponsored by: EMC / Isilon Storage Division

  r281400:

  - Garbage collect argc/argv; bump WARNS to 6
  - Make the socket path random and move it out of /tmp as that's outside ATF's
    prescribed path

  Sponsored by: EMC / Isilon Storage Division

  r281401:

  - Garbage collect argc/argv
  - Use random paths instead of one in /tmp

  Sponsored by: EMC / Isilon Storage Division

  r281402:

  Garbage collect argc/argv and bump WARNS to 6

  Sponsored by: EMC / Isilon Storage Division

  r281403:

  Garbage collect argc/argv and bump WARNS to 6

  Sponsored by: EMC / Isilon Storage Division

  r281404:

  Generate temporary files with mkstemp instead of mktemp

  Sponsored by: EMC / Isilon Storage Division

  r281407:

  Fix the knob twiddling to work properly per src.opts.mk

  Sponsored by: EMC / Isilon Storage Division

  r281408:

  - Remove the .t wrapper and put the "magic" of determining the number of
    testcases into the .c file
  - Require root for now because it fails with SOCK_RAW without root privileges
  - Increment the test count properly on socket create failure

  Sponsored by: EMC / Isilon Storage Division

  r281409:

  Fix warnings, bump WARNS to 6, and use a temporary socket instead of one in /tmp

  Sponsored by: EMC / Isilon Storage Division

  r281410:

  Fix more warnings I didn't catch in the first go-around

  Sponsored by: EMC / Isilon Storage Division

  r281411:

  Fix even more warnings..

  Sponsored by: EMC / Isilon Storage Division

Changes:
_U  stable/10/
  stable/10/tools/regression/sockets/accept_fd_leak/Makefile
  stable/10/tools/regression/sockets/accept_fd_leak/accept_fd_leak.c
  stable/10/tools/regression/sockets/accf_data_attach/Makefile
  stable/10/tools/regression/sockets/accf_data_attach/accf_data_attach.c
  stable/10/tools/regression/sockets/fstat/Makefile
  stable/10/tools/regression/sockets/fstat/fstat.c
  stable/10/tools/regression/sockets/kqueue/Makefile
  stable/10/tools/regression/sockets/kqueue/kqueue.c
  stable/10/tools/regression/sockets/listen_backlog/Makefile
  stable/10/tools/regression/sockets/listen_backlog/listen_backlog.c
  stable/10/tools/regression/sockets/listenclose/Makefile
  stable/10/tools/regression/sockets/listenclose/listenclose.c
  stable/10/tools/regression/sockets/pr_atomic/Makefile
  stable/10/tools/regression/sockets/pr_atomic/pr_atomic.c
  stable/10/tools/regression/sockets/reconnect/Makefile
  stable/10/tools/regression/sockets/reconnect/reconnect.c
  stable/10/tools/regression/sockets/rtsocket/Makefile
  stable/10/tools/regression/sockets/rtsocket/rtsocket.c
  stable/10/tools/regression/sockets/sblock/Makefile
  stable/10/tools/regression/sockets/sblock/sblock.c
  stable/10/tools/regression/sockets/sendfile/sendfile.c
  stable/10/tools/regression/sockets/shutdown/Makefile
  stable/10/tools/regression/sockets/shutdown/shutdown.c
  stable/10/tools/regression/sockets/sigpipe/Makefile
  stable/10/tools/regression/sockets/sigpipe/sigpipe.c
  stable/10/tools/regression/sockets/so_setfib/Makefile
  stable/10/tools/regression/sockets/so_setfib/so_setfib.c
  stable/10/tools/regression/sockets/so_setfib/so_setfib.t
  stable/10/tools/regression/sockets/socketpair/Makefile
  stable/10/tools/regression/sockets/socketpair/socketpair.c
  stable/10/tools/regression/sockets/unix_bindconnect/Makefile
  stable/10/tools/regression/sockets/unix_bindconnect/unix_bindconnect.c
  stable/10/tools/regression/sockets/unix_close_race/Makefile
  stable/10/tools/regression/sockets/unix_close_race/unix_close_race.c
  stable/10/tools/regression/sockets/unix_passfd/Makefile
  stable/10/tools/regression/sockets/unix_passfd/unix_passfd.c
  stable/10/tools/regression/sockets/unix_sendtorace/Makefile
  stable/10/tools/regression/sockets/unix_sendtorace/unix_sendtorace.c
  stable/10/tools/regression/sockets/unix_socket/Makefile
  stable/10/tools/regression/sockets/unix_socket/unix_socket.c
  stable/10/tools/regression/sockets/unix_sorflush/Makefile
  stable/10/tools/regression/sockets/unix_sorflush/unix_sorflush.c
  stable/10/tools/regression/sockets/zerosend/zerosend.c
Comment 10 chris.torek 2016-04-03 21:14:10 UTC
Created attachment 168943 [details]
updated patch series, applies to -r297514

We were running into this same bug at iX, so I ported the original patch to the HEAD kernel (which required a bit of tweaking) and then added my own changes (see commit text for descriptions).  Then I back-ported it back to the iX kernel which is a bit behind HEAD (and fixed typos in both versions, etc), where it is now at least lightly tested.

I'll attach the updated patches as a series of git commits suitable for "git am" (on the github version of the FreeBSD tree); feel free to redo them in some other form, including perhaps combining Yuri's original patch and the small subsequent fixes.  The last (and large) "rewrite for clarity" section probably should stay a separate patch though.
Comment 11 Allan Jude freebsd_committer 2017-01-08 22:13:50 UTC
(In reply to chris.torek from comment #10)
Users are reporting failures on FreeNAS 9.10 that were traced back to your patch offered here.

This simple test program:

http://pastebin.ca/3755236

sends an FD between a child and a parent

In the process, the SCM_RIGHTS is lost, and recvmsg returns a message with SCM_CREDS instead.
Comment 12 chris.torek 2017-01-09 00:37:13 UTC
The test code demonstrating the new bug is slightly wrong (or insufficient at least).  What's happening is that one of the attached patches also changes the code in uipc_finalizecontrol() to always prepend SCM_CREDS to DGRAM sockets.  The result is that you need a larger control-message buffer here so that you can get both the credentials (i.e., who sent you the fd) *and* the rights (the fd itself).  With the small control-message buffer you have room only for the prepended credentials.

Apparently UNP_WANTCREDS is not set on the pre-connected SOCK_DGRAM socketpair.  So, dropping or altering that particular commit (so that it prepends SCM_CREDS only if UNP_WANTCREDS is set, whether or not this is a one-shot stream-ish socket or repeating datagram socket) will make the test program run.  Or, making its control message receive buffer bigger will also make the test program run...
Comment 13 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:42:26 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 14 Don Lewis freebsd_committer 2018-07-20 17:49:43 UTC
Created attachment 195304 [details]
Chris Torek's patch 168943 from comment #10 rebased to 11.2-STABLE r336040

This patch fixes a severe Firefox 61 stability problem that I have been experiencing.

I don't know if comments 11 and 12 are considered a blocker.
Comment 15 Jilles Tjoelker freebsd_committer 2018-07-20 21:26:00 UTC
I consider the issue brought up in comments 11 and 12 a blocker.

The patch "[PATCH 3/4] uipc_finalizecontrol: read-lock the unp link" introduces a change not mentioned in the commit message: datagram sockets always behave as if the LOCAL_CREDS socket option (UNP_WANTCRED flag in the kernel) is set on them.

That change seems a bad idea to me. Apart from the general problems with sending unsolicited control messages (full buffers, code that examines only the first control message if only one is expected), the specifics of SCM_CREDS make this even more dangerous. A strange thing about SCM_CREDS is that there are two flavours of it: one containing a struct cmsgcred, generated when the sender explicitly attaches an SCM_CREDS message, and one containing a struct sockcred, generated under certain conditions when the receiver has enabled the LOCAL_CREDS socket option. If a struct sockcred is attached, a struct cmsgcred (if any) is removed. The two flavours have incompatible fields, but are not reliably distinguishable in the general case (if the application programmer even knows about this problem).

Therefore, behaving as if LOCAL_CREDS is set when it is not may cause applications to receive a struct sockcred and interpret it as a struct cmsgcred. For example, an application trying to read cmcred_uid for the real UID will get the effective UID from sc_euid. This is likely to allow an easy attack since a plain write(2) from a setuid root program (such as an error message) will have the struct sockcred attached to it.

The patchset does seem to fix the bug that adding a struct sockcred for LOCAL_CREDS silently does nothing if m_get() or m_getcl() with M_NOWAIT fails, possibly leaving a struct cmsgcred from the sender.

The patch "[PATCH 3/4] uipc_finalizecontrol: read-lock the unp link" also adds a comment about a bug that UNP_WANTCRED may be cleared by a failing send, so that no struct sockcred is ever sent on the connection. Since this only affects stream and seqpacket sockets, I think it is best fixed on the application side, by using getpeereid(3) or LOCAL_PEERCRED instead of LOCAL_CREDS; this also simplifies the application code and makes the credentials more reliable (since they are from connect(2)/listen(2) time instead of from the write).

Related, if UNP_WANTCRED is cleared then struct cmsgcred from the sender will get through, which might lead to a struct cmsgcred being interpreted as a struct sockcred.
Comment 16 chris.torek 2018-07-20 23:52:00 UTC
For what it's worth, I agree with Jilles Tjoelker - the confusion around SCM_CREDS is a big problem here.  I'm not really sure what is the Right Thing to do about it.  Certainly patch 3 does too much, though.
Comment 17 Mark Johnston freebsd_committer 2018-07-25 20:07:32 UTC
I'm not sure I see much value in internalizing ("finalizing") control messages
in sosend_generic().  In general, the socket layer does not treat the sockbuf limits (sb_mbmax and sb_hiwat) as strict limits; see the comment about SB_STOP in uipc_rcvd(), for example.  Furthermore, the unix socket code never actually puts anything in the send buffer.  The purpose of the limits in this context is to enable a somewhat rudimentary backpressure system, and since sosend_generic() already puts a hard bound on the size of control messages, I don't see why it needs to go through the trouble of performing an exact check.

Assuming that argument is reasonable, I believe it's sufficient to just omit the space checks in sbappendcontrol() like we already do for SEQPACKET unix sockets.
Comment 18 Mark Johnston freebsd_committer 2018-07-25 20:49:57 UTC
(In reply to Mark Johnston from comment #17)
> since sosend_generic() already puts a hard bound on the size of control messages, I don't see why it needs to go through the trouble of performing an exact check.

To be clear, I'm saying that given a control message of size S, the amount of occupied buffer space is some linear function of S plus a constant (the size of a sockcred, if LOCAL_CREDS is configured).  Given that we have a bound on S imposed by sosend_generic(), I don't think it's important to compute the exact amount of buffer space required for the internalized form of the messages.
Comment 20 Mark Johnston freebsd_committer 2018-08-03 14:19:56 UTC
*** Bug 215933 has been marked as a duplicate of this bug. ***
Comment 21 commit-hook freebsd_committer 2018-08-04 20:27:18 UTC
A commit references this bug:

Author: markj
Date: Sat Aug  4 20:26:55 UTC 2018
New revision: 337328
URL: https://svnweb.freebsd.org/changeset/base/337328

Log:
  Don't check rcv sockbuf limits when sending on a unix stream socket.

  sosend_generic() performs an initial comparison of the amount of data
  (including control messages) to be transmitted with the send buffer
  size. When transmitting on a unix socket, we then compare the amount
  of data being sent with the amount of space in the receive buffer size;
  if insufficient space is available, sbappendcontrol() returns an error
  and the data is lost.  This is easily triggered by sending control
  messages together with an amount of data roughly equal to the send
  buffer size, since the control message size may change in uipc_send()
  as file descriptors are internalized.

  Fix the problem by removing the space check in sbappendcontrol(),
  whose only consumer is the unix sockets code.  The stream sockets code
  uses the SB_STOP mechanism to ensure that senders will block if the
  receive buffer fills up.

  PR:		181741
  MFC after:	1 month
  Sponsored by:	The FreeBSD Foundation
  Differential Revision:	https://reviews.freebsd.org/D16515

Changes:
  head/sys/kern/uipc_sockbuf.c
  head/sys/kern/uipc_usrreq.c
  head/sys/sys/sockbuf.h
Comment 22 commit-hook freebsd_committer 2018-08-04 20:30:27 UTC
A commit references this bug:

Author: markj
Date: Sat Aug  4 20:29:58 UTC 2018
New revision: 337329
URL: https://svnweb.freebsd.org/changeset/base/337329

Log:
  Fix the regression test for PR 181741.

  With r337328, the test hangs becase the sendmsg() call will block until
  the receive buffer is at least partially drained.  Fix the problem by
  using a non-blocking socket and allowing short writes.  Also assert
  that a SCM_CREDS message was received if one was expected.

  PR:		181741
  MFC after:	1 month
  Sponsored by:	The FreeBSD Foundation
  Differential Revision:	https://reviews.freebsd.org/D16516

Changes:
  head/tests/sys/kern/unix_passfd_test.c
Comment 23 commit-hook freebsd_committer 2018-08-17 16:04:55 UTC
A commit references this bug:

Author: markj
Date: Fri Aug 17 16:04:21 UTC 2018
New revision: 337975
URL: https://svnweb.freebsd.org/changeset/base/337975

Log:
  MFC r337328:
  Don't check rcv sockbuf limits when sending on a unix stream socket.

  PR:	181741, 212812

Changes:
_U  stable/11/
  stable/11/sys/kern/uipc_sockbuf.c
  stable/11/sys/kern/uipc_usrreq.c
  stable/11/sys/sys/sockbuf.h