| Summary: | failed connect(2) seems to cause problems with socket | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Alejandro Forero Cuervo <bachue> |
| Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 4.4-RELEASE | ||
| Hardware: | Any | ||
| OS: | Any | ||
I've hacked away at this for a bit, and it seems that during the first call, something NULLs out so_pcb. This makes the COMMON_START macro (see netinet/tcp_usrreq.c) fail out with EINVAL, and since tcp_connect() uses the COMMON_START macro, any subsequent connect() will fail with EINVAL instead of ECONNREFUSED. The in_pcbdetach() and in_pcbdisconnect() routines seem to be likely culprits, and I'm currently tracking down where one of these functions may be called in error. -- Matt Emmerton This still happens in today's 6-stable. State Changed From-To: open->closed This is a POSIX compliant behavior. Quoting http://www.opengroup.org/onlinepubs/000095399/functions/connect.html "If connect() fails, the state of the socket is unspecified. Conforming applications should close the file descriptor and create a new socket before attempting to reconnect." |
When calls to connect(2) fail with ECONNREFUSED (and perhaps with other errors; I have not been able to test that), the file descriptor passed as the first parameter seems to get "damaged". Further calls to connect with the same file descriptor will fail and set errno to EINVAL. EINVAL is not documented in the list of error codes for connect(2). It is different than EBADF, which is what one gets when calling connect with an obviously wrong file descriptor (such as -1 or ... well, an invalid file descriptor). This is confusing. The only alternative the programmer is left with is to call close(2) on the file descriptor and then socket(2) to obtain it again. When this is done, things will work. I noticed this problem while checking some mysterious error that one of my apps had on FreeBSD. It makes porting software to FreeBSD hard. This is the sort of problem that will not be easily spotted when applications get ported to FreeBSD. I checked this on GNU/Linux and it doesn't have this bug. I don't have access to other Unix boxes on which to test things at this moment. How-To-Repeat: The following is some C code that ilustrates the error. It's as simple as I could get it (just ~40 lines... the original version is more complex and does more things, such as iterate on the h_addr_list array). What it does is: 1. Lookup some host. 2. Initialize a sockaddr struct. 3. Call socket(2) to get a file descriptor. 4. Iterate trying to get the socket connected, sleeping between failures. The problem is that the output it produces is localhost:2004: Connection refused localhost:2004: Invalid argument localhost:2004: Invalid argument [...] rather than localhost:2004: Connection refused localhost:2004: Connection refused localhost:2004: Connection refused [...] . So there it goes: #include <stdlib.h> #include <string.h> #include <stdio.h> #include <errno.h> #include <netdb.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> /* Change HOST and PORT so something that get you ECONNREFUSED */ #define HOST "localhost" #define PORT 2004 int main () { struct sockaddr_in in; struct hostent *hp; int fd; hp = gethostbyname(HOST); if (!hp) fprintf(stderr, "%s: gethostbyname fail\n", HOST), exit(-1); memset(&in, 0, sizeof(struct sockaddr_in)); in.sin_port = htons(PORT); in.sin_family = hp->h_addrtype; memmove((caddr_t)&in.sin_addr, hp->h_addr_list[0], hp->h_length); fd = socket(AF_INET, SOCK_STREAM, 0); if (fd < 0) fprintf(stderr, "socket: %s\n", strerror(errno)), exit(-1); for (;;) { if (connect(fd, &in, sizeof(struct sockaddr_in)) != -1) exit(0); fprintf(stderr, "%s:%d: %s\n", HOST, PORT, strerror(errno)); sleep(1); } }