Bug 215933

Summary: SCM_RIGHTS messages being lost, socket data being lost as well (with example code)
Product: Base System Reporter: ian
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Closed DUPLICATE    
Severity: Affects Many People CC: markj, sepherosa
Priority: ---    
Version: 10.3-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Minimal example code to demonstrate the bug none

Description ian 2017-01-10 11:48:01 UTC
Created attachment 178701 [details]
Minimal example code to demonstrate the bug

I have a reproducible situation where an entire message including SCM_RIGHTS is lost when transmitting over a unix domain socket.

This situation occurs when the total transmitted data alligns with the size of
the socket buffer. The attached code reproduces this on many platforms including freebsd 8.4, and 10.3.

The attached code sends a variable size message without an attached fd, followed by a fixed small size message containing a SCM_RIGHTS message. Some of these messages go missing in the kernel.

dtrace summing the total of the 'sendmsg' syscalls against the 'recvmsg' syscalls confirms this.

Typical output from the attached example is as follows

Master sent a total of 18203750 bytes
Slave done received a total of 18203190 bytes, dropped 35 frames (Guessed original based on fdesc frame only frame drops 18203750)..

The output from the dtrace script which counts the raw syscall return values for sendmsg and recvmsg is as follows

Sent=18203750 rcvd=18203190

This indicates that 35 16 byte messages with an attached file descriptor were lost while being transmitted over a unix domain socket. There was no error returned to the sending end.

My wild guess is that when the 'data' portion of the message with SCM_RIGHTS fits in the socket buffer, but the 'extra' data for the SCM_RIGHTS does not, the return value indicates a success (bytes total matches requested), but the messages is dropped because the SCM_RIGHTS extra data overflows.

The output from the example program shows a combined receive very close to the socket buffer size for every drop.

The following dtrace script was used to verify the behaviour at the syscall level.

#pragma D option quiet

BEGIN
{
  totalsent=0;
  totalrcvd=0;
}

syscall::sendmsg:return
/execname == "scm_rights_thrash"/
{
  totalsent+=arg1;
}

syscall::recvmsg:return
/execname == "scm_rights_thrash"/
{
  totalrcvd+=arg1;
}

END
{
  printf("Sent=%d rcvd=%d\n",totalsent,totalrcvd);
}
Comment 1 ian 2017-01-10 12:27:13 UTC
By adding a small delay to the example code the problem becomes
far more consistently repeatable.

int
main(int argc, char *argv[]){
        int fds[2];
        size_t total=0;
        if(socketpair(AF_UNIX,SOCK_STREAM,0,fds)==0){
                size_t sequence=0;
                int newpid=fork();
                if(newpid<0) exit(EX_OSERR);
                if(newpid==0){ close(fds[1]); run_consumer(fds[0]); }
                close(fds[0]);
                printf("Master ready..\n");
                for(size_t i=6000;i<8500;i++){
                        int tfd=open("/dev/null",O_WRONLY);
                        total+=send_test_message(fds[1],sequence++,i,-1);
                        total+=send_test_message(fds[1],sequence++,0,tfd);
                        close(tfd);
 ----->                 usleep(100);
                }
        }
        printf("Master sent a total of %zd bytes\n",total);
        usleep(50000);
        exit(0);
}

With this change I have 100% consistent loss of data for large frame sizes of 
8154 to 8192 when using a local stream socket buffer size of 8192 bytes.
Comment 2 Sepherosa Ziehau 2017-01-16 03:01:38 UTC
I believe this is exact same as:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=181741
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2018-08-03 14:19:56 UTC

*** This bug has been marked as a duplicate of bug 181741 ***