When recvmsg receives SCM_RIGHTS control message, it allocates file descriptors and report them in the control message buffer given by the application. If the buffer is too small to record the FDs, recvmsg allocates FDs but doesn't record them. This means the application cannot close those FDs which is not reported. i.e. FDs leaks. How-To-Repeat: % cat tst.c #include <stdlib.h> #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/uio.h> #include <unistd.h> #define MAX_FDS 10 #define SEND_FDS 10 #define RECV_FDS 3 int main(int argc, char **argv) { int ret; int sv[2]; struct msghdr msg; struct iovec iov; union { struct cmsghdr header; char bytes[CMSG_SPACE(sizeof(int)*MAX_FDS)]; } cmsg; struct cmsghdr *cmh = &cmsg.header, *c; int *fds; int i; char buf[1024]; char cmdline[1024]; snprintf(cmdline, sizeof(cmdline), "fstat -p %u", (unsigned)getpid()); ret = socketpair(PF_UNIX, SOCK_STREAM, 0, sv); if (ret == -1) { perror("socketpair"); exit(1); } iov.iov_base = "a"; iov.iov_len = 1; cmh->cmsg_len = CMSG_LEN(sizeof(int)*SEND_FDS); cmh->cmsg_level = SOL_SOCKET; cmh->cmsg_type = SCM_RIGHTS; fds = (int *)CMSG_DATA(cmh); for (i = 0; i < SEND_FDS; i++) { fds[i] = 0; /* stdin */ } msg.msg_name = NULL; msg.msg_namelen = 0; msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_control = cmh; msg.msg_controllen = CMSG_SPACE(sizeof(int)*SEND_FDS); msg.msg_flags = 0; ret = sendmsg(sv[0], &msg, 0); if (ret == -1) { perror("sendmsg"); exit(1); } system(cmdline); /* fstat -p $$ before recvmsg */ iov.iov_base = buf; iov.iov_len = sizeof(buf); msg.msg_name = NULL; msg.msg_namelen = 0; msg.msg_iov = &iov; msg.msg_iovlen = 1; msg.msg_control = cmh; msg.msg_controllen = CMSG_SPACE(sizeof(int)*RECV_FDS); msg.msg_flags = 0; printf("before recvmsg: msg_controllen=%d\n", msg.msg_controllen); ret = recvmsg(sv[1], &msg, 0); if (ret == -1) { perror("sendmsg"); exit(1); } printf("after recvmsg: msg_controllen=%d\n", msg.msg_controllen); for (c = CMSG_FIRSTHDR(&msg); c != NULL; c = CMSG_NXTHDR(&msg, c)) { if (c->cmsg_len == 0) { printf("cmsg_len is zero\n"); exit(1); } if (c->cmsg_level == SOL_SOCKET && c->cmsg_type == SCM_RIGHTS) { int *fdp, *end; printf("cmsg_len=%d\n", c->cmsg_len); fdp = (int *)CMSG_DATA(c); end = (int *)((char *)c + c->cmsg_len); for (i = 0; fdp+i < end; i++) { printf("fd[%d]=%d\n", i, fdp[i]); } } } system(cmdline); /* fstat -p $$ after recvmsg */ return 0; } % gcc -Wall tst.c % ./a.out USER CMD PID FD MOUNT INUM MODE SZ|DV R/W akr a.out 9822 root / 2 drwxr-xr-x 512 r akr a.out 9822 wd / 32142 drwxr-xr-x 512 r akr a.out 9822 text / 32138 -rwxr-xr-x 9286 r akr a.out 9822 0 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 1 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 2 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 3* local stream ffffff00613da320 <-> ffffff00613da258 akr a.out 9822 4* local stream ffffff00613da258 <-> ffffff00613da320 before recvmsg: msg_controllen=32 after recvmsg: msg_controllen=32 cmsg_len=56 fd[0]=5 fd[1]=6 fd[2]=7 fd[3]=8 fd[4]=0 fd[5]=0 fd[6]=0 fd[7]=0 fd[8]=0 fd[9]=0 USER CMD PID FD MOUNT INUM MODE SZ|DV R/W akr a.out 9822 root / 2 drwxr-xr-x 512 r akr a.out 9822 wd / 32142 drwxr-xr-x 512 r akr a.out 9822 text / 32138 -rwxr-xr-x 9286 r akr a.out 9822 0 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 1 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 2 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 3* local stream ffffff00613da320 <-> ffffff00613da258 akr a.out 9822 4* local stream ffffff00613da258 <-> ffffff00613da320 akr a.out 9822 5 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 6 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 7 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 8 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 9 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 10 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 11 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 12 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 13 /dev 97 crw--w---- ttyp0 rw akr a.out 9822 14 /dev 97 crw--w---- ttyp0 rw This program sends 10 file descriptors via UNIX domain socket pair and receives some of them using a small buffer. This result shows * recvmsg allocates 10 FDs (5-14) * 4 of them (5-8) are reported to the application * 6 of them (9-14) are not reported to the application So the application cannot close the 6 file descriptors.
Responsible Changed From-To: freebsd-bugs->rwatson Take ownership of this as I've been spending some time with UNIX domain socket sockets.
I just ran into this very bug on FreeBSD 11.0p7.
It looks to me as though the loop in kern_recvit() which sets MSG_CTRUNC needs to use dom_dispose to ensure that the remaining rights are closed before the mbuf is freed. Unfortunately, dom_dispose wants a socket rather than an mbuf - this was changed in r285522 to fix a race between unp_dispose() and unp_gc(). However, in this case the truncated mbuf will have already been removed from the socket buffer, so we don't need to worry about the race. (If MSG_PEEK is set, the rights will still be present in the sockbuf and we needn't do anything.) So, to fix this we'd either need a way to invoke dom_dispose on a plain mbuf (i.e., the KPI prior to r285522), or we need to fake up a socket like sorflush() does.
A commit references this bug: Author: markj Date: Sun Jul 22 18:07:09 UTC 2018 New revision: 336614 URL: https://svnweb.freebsd.org/changeset/base/336614 Log: Add a regression test for PR 131876. PR: 131876 MFC after: 1 week Changes: head/tests/sys/kern/unix_passfd_test.c
A commit references this bug: Author: markj Date: Sun Jul 29 19:12:06 UTC 2018 New revision: 336873 URL: https://svnweb.freebsd.org/changeset/base/336873 Log: MFC r336614: Add a regression test for PR 131876. PR: 131876 Changes: _U stable/11/ stable/11/tests/sys/kern/unix_passfd_test.c
A commit references this bug: Author: markj Date: Tue Jul 31 00:48:08 UTC 2018 New revision: 336957 URL: https://svnweb.freebsd.org/changeset/base/336957 Log: Add a regression test related to PR 131876. If an error occurs while copying a SCM_RIGHTS message to userspace, we free the mbuf containing externalized rights, leaking them. PR: 131876 MFC after: 1 week Sponsored by: The FreeBSD Foundation Changes: head/tests/sys/kern/unix_passfd_test.c
https://reviews.freebsd.org/D16561 https://reviews.freebsd.org/D16562
A commit references this bug: Author: markj Date: Tue Aug 7 15:04:53 UTC 2018 New revision: 337421 URL: https://svnweb.freebsd.org/changeset/base/337421 Log: MFC r336957: Add a regression test related to PR 131876. PR: 131876 Changes: _U stable/11/ stable/11/tests/sys/kern/unix_passfd_test.c
A commit references this bug: Author: markj Date: Tue Aug 7 16:36:50 UTC 2018 New revision: 337423 URL: https://svnweb.freebsd.org/changeset/base/337423 Log: Improve handling of control message truncation. If a recvmsg(2) or recvmmsg(2) caller doesn't provide sufficient space for all control messages, the kernel sets MSG_CTRUNC in the message flags to indicate truncation of the control messages. In the case of SCM_RIGHTS messages, however, we were failing to dispose of the rights that had already been externalized into the recipient's file descriptor table. Add a new function and mbuf type to handle this cleanup task, and use it any time we fail to copy control messages out to the recipient. To simplify cleanup, control message truncation is now only performed at control message boundaries. The change also fixes a few related bugs: - Rights could be leaked to the recipient process if an error occurred while copying out a message's contents. - We failed to set MSG_CTRUNC if the truncation occurred on a control message boundary, e.g., if the caller received two control messages and provided only the exact amount of buffer space needed for the first. PR: 131876 Reviewed by: ed (previous version) MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16561 Changes: head/sys/compat/cloudabi/cloudabi_sock.c head/sys/compat/freebsd32/freebsd32_misc.c head/sys/compat/linux/linux_socket.c head/sys/kern/uipc_syscalls.c head/sys/kern/uipc_usrreq.c head/sys/sys/mbuf.h
A commit references this bug: Author: markj Date: Tue Aug 7 16:39:08 UTC 2018 New revision: 337424 URL: https://svnweb.freebsd.org/changeset/base/337424 Log: Update PR 131876 regression tests after r337423. - Add some more cases to the truncation test. - Remove the "expect fail" annotations. PR: 131876 MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D16562 Changes: head/tests/sys/kern/unix_passfd_test.c
A commit references this bug: Author: markj Date: Wed Sep 12 19:13:34 UTC 2018 New revision: 338618 URL: https://svnweb.freebsd.org/changeset/base/338618 Log: MFC r337423: Improve handling of control message truncation. PR: 131876 Changes: _U stable/11/ stable/11/sys/compat/cloudabi/cloudabi_sock.c stable/11/sys/compat/freebsd32/freebsd32_misc.c stable/11/sys/compat/linux/linux_socket.c stable/11/sys/kern/uipc_syscalls.c stable/11/sys/kern/uipc_usrreq.c stable/11/sys/sys/mbuf.h
A commit references this bug: Author: markj Date: Wed Sep 12 19:15:58 UTC 2018 New revision: 338619 URL: https://svnweb.freebsd.org/changeset/base/338619 Log: MFC r337329: Fix the regression test for PR 181741. MFC r337424: Update PR 131876 regression tests after r337423. PR: 131876 Changes: _U stable/11/ stable/11/tests/sys/kern/unix_passfd_test.c
Correct version (latest reported in) and MFC status
I still see a leak with this test scenario after running it multiple times: https://people.freebsd.org/~pho/stress/log/sendmsg2.txt on FreeBSD 13.0-CURRENT #0 r343687
(In reply to Peter Holm from comment #14) Hmm, this doesn't appear to be an fd leak but rather a refcount leak on the unp pcb.
(In reply to Peter Holm from comment #14) This was fixed by r343784, which has been merged to the affected branches.