Bug 31746

Summary: failed connect(2) seems to cause problems with socket
Product: Base System Reporter: Alejandro Forero Cuervo <bachue>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.4-RELEASE   
Hardware: Any   
OS: Any   

Description Alejandro Forero Cuervo 2001-11-04 06:20:00 UTC
When calls to connect(2) fail with ECONNREFUSED (and perhaps with other errors; I have not been able to test that), the file descriptor passed as the first parameter seems to get "damaged".  Further calls to connect with the same file descriptor will fail and set errno to EINVAL.

EINVAL is not documented in the list of error codes for connect(2).  It is different than EBADF, which is what one gets when calling connect with an obviously wrong file descriptor (such as -1 or ... well, an invalid file descriptor).  This is confusing.

The only alternative the programmer is left with is to call close(2) on the file descriptor and then socket(2) to obtain it again.  When this is done, things will work.

I noticed this problem while checking some mysterious error that one of my apps had on FreeBSD.  It makes porting software to FreeBSD hard.  This is the sort of problem that will not be easily spotted when applications get ported to FreeBSD.

I checked this on GNU/Linux and it doesn't have this bug.  I don't have access to other Unix boxes on which to test things at this moment.

How-To-Repeat: The following is some C code that ilustrates the error.  It's as simple as I could get it (just ~40 lines... the original version is more complex and does more things, such as iterate on the h_addr_list array).

What it does is:

1. Lookup some host.
2. Initialize a sockaddr struct.
3. Call socket(2) to get a file descriptor.
4. Iterate trying to get the socket connected, sleeping between failures.

The problem is that the output it produces is

    localhost:2004: Connection refused
    localhost:2004: Invalid argument
    localhost:2004: Invalid argument
    [...]

rather than

    localhost:2004: Connection refused
    localhost:2004: Connection refused
    localhost:2004: Connection refused
    [...] .

So there it goes:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>
#include <netdb.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

/* Change HOST and PORT so something that get you ECONNREFUSED */
#define HOST "localhost"
#define PORT 2004

int
main ()
{
  struct sockaddr_in in;
  struct hostent *hp;
  int fd;

  hp = gethostbyname(HOST);
  if (!hp)
    fprintf(stderr, "%s: gethostbyname fail\n", HOST), exit(-1);

  memset(&in, 0, sizeof(struct sockaddr_in));
  in.sin_port = htons(PORT);
  in.sin_family = hp->h_addrtype;
  memmove((caddr_t)&in.sin_addr, hp->h_addr_list[0], hp->h_length);

  fd = socket(AF_INET, SOCK_STREAM, 0);
  if (fd < 0)
    fprintf(stderr, "socket: %s\n", strerror(errno)), exit(-1);

  for (;;)
    {
      if (connect(fd, &in, sizeof(struct sockaddr_in)) != -1)
        exit(0);

      fprintf(stderr, "%s:%d: %s\n", HOST, PORT, strerror(errno));
      sleep(1);
    }
}
Comment 1 matt 2001-11-17 02:42:12 UTC
I've hacked away at this for a bit, and it seems that during the first call,
something NULLs out so_pcb.  This makes the COMMON_START macro (see
netinet/tcp_usrreq.c) fail out with EINVAL, and since tcp_connect() uses the
COMMON_START macro, any subsequent connect() will fail with EINVAL instead
of ECONNREFUSED.

The in_pcbdetach() and in_pcbdisconnect() routines seem to be likely
culprits, and I'm currently tracking down where one of these functions may
be called in error.

--
Matt Emmerton
Comment 2 John E. Hein 2006-01-12 10:30:04 UTC
This still happens in today's 6-stable.
Comment 3 Gleb Smirnoff freebsd_committer freebsd_triage 2006-01-12 14:05:59 UTC
State Changed
From-To: open->closed

This is a POSIX compliant behavior. Quoting 

http://www.opengroup.org/onlinepubs/000095399/functions/connect.html 

"If connect() fails, the state of the socket is unspecified. Conforming 
applications should close the file descriptor and create a new socket 
before attempting to reconnect."