A socket's sb_cc field includes the length of control data (like the source address, for PR_ADDR protocols, and any control info like if SO_TIMESTAMP, IP_RECVIF, etc. are set). Problems with this include: - FIONREAD just returns sb_cc, but not all these bytes are readable - This count is used to determine the high/low water marks on the socket. Advantages are: - sb_cc is really the length of the attached mbufs, so it's simple to compute. Fix: Make sb_cc reflect the number of data bytes in the socket buffer, instead of the amount of data+control. This may be as simple as fixing sballoc() and sbfree() in sys/socketvar.h to only count MT_HEADER, MT_DATA, MT_OOBDATA mbufs, or may be much more involved. How-To-Repeat: Adel Abushaev <adel@ksu.ru> reported the problem and included this pair of test programs. /* client.c */ #include <stdio.h> #include <sys/types.h> #include <sys/time.h> #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> void main() { int sockfd; struct sockaddr_in servaddr_in; char buffer[512]; memset(&servaddr_in, 0, sizeof(struct sockaddr_in)); servaddr_in.sin_family = AF_INET; servaddr_in.sin_addr.s_addr = inet_addr("127.0.0.1"); servaddr_in.sin_port = htons(4000); sockfd = socket(AF_INET, SOCK_DGRAM, 0); if (sockfd <= 0) { perror("socket"); return; } sendto(sockfd, buffer, 512, 0, (struct sockaddr *)&servaddr_in, sizeof(struct sockaddr_in)); close(sockfd); } /* server.c */ #include <stdio.h> #include <sys/types.h> #include <sys/time.h> #include <sys/socket.h> #include <netinet/in.h> #include <arpa/inet.h> #include <sys/stat.h> #include <sys/fcntl.h> #include <sys/ioctl.h> #ifdef SOLARIS #include <sys/filio.h> #endif void main() { int sockfd; struct sockaddr_in myaddr_in, peeraddr_in; unsigned addrlen; char buffer[512]; struct timeval tim; int optval; fd_set fdsread, fdswrite, fdsexcept; memset(&myaddr_in, 0, sizeof(struct sockaddr_in)); memset(&peeraddr_in, 0, sizeof(struct sockaddr_in)); myaddr_in.sin_family = AF_INET; myaddr_in.sin_addr.s_addr = INADDR_ANY; myaddr_in.sin_port = htons(4000); sockfd = socket(AF_INET, SOCK_DGRAM, 0); if (sockfd <= 0) { perror("socket"); return; } if (bind(sockfd, (struct sockaddr *)&myaddr_in, sizeof(struct sockaddr_in)) < 0) { perror("bind"); return; } FD_ZERO(&fdsread); FD_ZERO(&fdswrite); FD_ZERO(&fdsexcept); FD_SET(sockfd, &fdsread); tim.tv_sec=60; tim.tv_usec=0; select(sockfd+1, &fdsread, &fdswrite, &fdsexcept, &tim); if (FD_ISSET(sockfd, &fdsread)) { ioctl(sockfd, FIONREAD, &optval); printf("ioctl: %d\n",optval); optval=recvfrom(sockfd, buffer, 512, 0, (struct sockaddr *)&peeraddr_in, &addrlen); printf("recv: %d\n",optval); ioctl(sockfd, FIONREAD, &optval); printf("ioctl: %d\n",optval); } close(sockfd); }
Wouldn't you agree that the better solution would be to just fix the computation/adjusting of the high and low watermarks to not include control data, as sb_cc's count is also used for other purposes which may (or may not? [*]) involve the counting of the control data mbufs? This is feasible with a new variable which would hold (sb_cc - control data) which need only be adjusted when control type mbufs are added to the sockbuf. This way, low and high watermarks can be adjusted as per that new value. [*] Of course, "may or may not" is ambiguous. I see sb_cc being a, as commented in socketvar.h, "_actual_ chars in buffer" count. Perhaps some other bit in the code also relies on this interpretation and assumes that, indeed, there are no mbufs attached to sb_mb when sb_cc is zero. Comments, anyone? --Bosko -- Bosko Milekic * pages.infinit.net/bmilekic/index.html * www.technokratis.com bmilekic@dsuper.net * bmilekic@technokratis.com * b.milekic@marianopolis.edu
Responsible Changed From-To: freebsd-bugs->bmilekic Over to Bosko Milekic <bmilekic@FreeBSD.org>. He seems to have a good grasp of the problem and solution.
Responsible Changed From-To: bmilekic->kbyanc Over to kbyanc because this has been sitting around in my basket for way too long and I haven't done anything about it and I don't want to do anything about it. Recently, kbyanc has done some related work, and it was recommended to me to forward this to him.
State Changed From-To: open->feedback Is this still a problem with modern versions of FreeBSD?
Responsible Changed From-To: kbyanc->linimon
State Changed From-To: feedback->suspended Submitter notes that this problem still recurs on 6.1. Mark suspended awaiting someone to take an interest in this antique PR.
Responsible Changed From-To: linimon->freebsd-bugs
State Changed From-To: suspended->open Doesn't seem like this would be that hard to fix, at least in HEAD where I have the luxury of adding fields to the socket.
Responsible Changed From-To: freebsd-bugs->kmacy Doesn't seem like this would be that hard to fix, at least in HEAD where I have the luxury of adding fields to the socket.
Responsible Changed From-To: kmacy->freebsd-net kmacy has asked for all of his PRs to be reassigned back to the pool.
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped