Created attachment 178701 [details] Minimal example code to demonstrate the bug I have a reproducible situation where an entire message including SCM_RIGHTS is lost when transmitting over a unix domain socket. This situation occurs when the total transmitted data alligns with the size of the socket buffer. The attached code reproduces this on many platforms including freebsd 8.4, and 10.3. The attached code sends a variable size message without an attached fd, followed by a fixed small size message containing a SCM_RIGHTS message. Some of these messages go missing in the kernel. dtrace summing the total of the 'sendmsg' syscalls against the 'recvmsg' syscalls confirms this. Typical output from the attached example is as follows Master sent a total of 18203750 bytes Slave done received a total of 18203190 bytes, dropped 35 frames (Guessed original based on fdesc frame only frame drops 18203750).. The output from the dtrace script which counts the raw syscall return values for sendmsg and recvmsg is as follows Sent=18203750 rcvd=18203190 This indicates that 35 16 byte messages with an attached file descriptor were lost while being transmitted over a unix domain socket. There was no error returned to the sending end. My wild guess is that when the 'data' portion of the message with SCM_RIGHTS fits in the socket buffer, but the 'extra' data for the SCM_RIGHTS does not, the return value indicates a success (bytes total matches requested), but the messages is dropped because the SCM_RIGHTS extra data overflows. The output from the example program shows a combined receive very close to the socket buffer size for every drop. The following dtrace script was used to verify the behaviour at the syscall level. #pragma D option quiet BEGIN { totalsent=0; totalrcvd=0; } syscall::sendmsg:return /execname == "scm_rights_thrash"/ { totalsent+=arg1; } syscall::recvmsg:return /execname == "scm_rights_thrash"/ { totalrcvd+=arg1; } END { printf("Sent=%d rcvd=%d\n",totalsent,totalrcvd); }
By adding a small delay to the example code the problem becomes far more consistently repeatable. int main(int argc, char *argv[]){ int fds[2]; size_t total=0; if(socketpair(AF_UNIX,SOCK_STREAM,0,fds)==0){ size_t sequence=0; int newpid=fork(); if(newpid<0) exit(EX_OSERR); if(newpid==0){ close(fds[1]); run_consumer(fds[0]); } close(fds[0]); printf("Master ready..\n"); for(size_t i=6000;i<8500;i++){ int tfd=open("/dev/null",O_WRONLY); total+=send_test_message(fds[1],sequence++,i,-1); total+=send_test_message(fds[1],sequence++,0,tfd); close(tfd); -----> usleep(100); } } printf("Master sent a total of %zd bytes\n",total); usleep(50000); exit(0); } With this change I have 100% consistent loss of data for large frame sizes of 8154 to 8192 when using a local stream socket buffer size of 8192 bytes.
I believe this is exact same as: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=181741
*** This bug has been marked as a duplicate of bug 181741 ***