Bug 22190

Summary: A threaded read(2) from a socketpair(2) fd can sometimes fail with errno 19 (ENODEV)
Product: Base System Reporter: grubba <grubba>
Component: kernAssignee: freebsd-threads (Nobody) <threads>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.0-RELEASE   
Hardware: Any   
OS: Any   

Description grubba 2000-10-21 17:10:01 UTC
In the testsuite for a threaded application, a process spawning test
that spawns 1000 /bin/cat /dev/null and waits for them sometimes fails
because read(2) returns -1 with errno set to 19 (ENODEV). ENODEV is
not a documented error code for read(2).

Down-stripped code that triggs the bug:
  {
    pid_t pid=-2;
    int control_pipe[2];	/* Used for communication with the child. */
    char buf[4];

    if (socketpair(AF_UNIX, SOCK_STREAM, 0, control_pipe) < 0) {
      error("Failed to create child communication pipe.\n");
    }

    {
      int loop_cnt = 0;
      sigset_t new_sig, old_sig;
      sigfillset(&new_sig);
      while(sigprocmask(SIG_BLOCK, &new_sig, &old_sig))
	;

      do {

	pid=fork();
	if (pid == -1) {
	  if (errno == EAGAIN) {
	    /* Process table full or similar.
	     * Try sleeping for a bit.
	     */
	    if (loop_cnt++ < 60) {
	      /* Don't sleep for too long... */
	      poll(NULL, 0, 100);

	      /* Try again */
	      continue;
	    }
	  } else if (errno == EINTR) {
	    /* Try again */
	    continue;
	  }
	}
	break;
      } while(1);

      while(sigprocmask(SIG_SETMASK, &old_sig, 0))
	;
    }

    if(pid == -1) {
      int e = errno;
      /*
       * fork() failed
       */

      while(close(control_pipe[0]) < 0 && errno==EINTR);
      while(close(control_pipe[1]) < 0 && errno==EINTR);

      error("Process.create_process(): fork() failed. errno:%d\n",
	    e);
    } else if(pid) {
      int olderrno;

      /*
       * The parent process
       */

      /* Close our child's end of the pipe. */
      while(close(control_pipe[1]) < 0 && errno==EINTR);

      /* Wake up the child. */
      buf[0] = 0;

      while (((e = write(control_pipe[0], buf, 1)) < 0) && (errno == EINTR))
	;
      if(e!=1) {
	/* Paranoia in case close() sets errno. */
	olderrno = errno;
	while(close(control_pipe[0]) < 0 && errno==EINTR)
          ;
	error("Child process died prematurely. (e=%d errno=%d)\n",
	      e, olderrno);
      }

      /* Wait for exec or error */
      while (((e = read(control_pipe[0], buf, 3)) < 0) && (errno == EINTR))
	;
      /* Paranoia in case close() sets errno. */
      olderrno = errno;

      while(close(control_pipe[0]) < 0 && errno==EINTR)
        ;

      if (!e) {
	/* OK! */
	pop_n_elems(args);
	push_int(0);
	return;
      } else {
	/* Something went wrong. */
	switch(buf[0]) {
	  /* ... */
	case 0:
	  /* read() probably failed. */
	default:
	  /******************************************************************
           * This point is reached with buf = {0, 4, 0}, e = -1, olderrno=19.
           *****************************************************************/
	  error("Process.create_process(): "
		"Child failed: %d, %d, %d, %d, %d!\n",
		buf[0], buf[1], buf[2], e, olderrno);
	  break;
	}
      }
    }else{
      /*
       * The child process
       */
      /* Close our parent's end of the pipe. */
      while(close(control_pipe[0]) < 0 && errno==EINTR);
      /* Ensure that the pipe will be closed when the child starts. */
      if(set_close_on_exec(control_pipe[1], 1) < 0)
	PROCERROR(PROCE_CLOEXEC, 0);

      /* Wait for parent to get ready... */
      while ((( e = read(control_pipe[1], buf, 1)) < 0) && (errno == EINTR))
	;

      /* ... */
      execvp(argv[0], argv);
      PROCERROR(PROCE_EXEC, 0);
      exit(99);
    }
  }

For the full source, please check src/signal_handler.c:f_create_process() in a Pike distribution.

Testsuite report:

testsuite: Test 9406 (shift 0) (CRNL) failed.
  1: mixed a() {  for(int x=0;x<10;x++) { for(int e=0;e<100;e++) if(Process.create_process(({"/bin/cat","/dev/null"}))->wait()) return e; __signal_watchdog(); } return -1;; }
  2: mixed b() { return -1; }
Error: Process.create_process(): Child failed: 0, 4, 0, -1, 19!
__builtin.create_process: create(({"/bin/cat","/dev/null"}))
__builtin: create_process()
testsuite: Test 9406 (shift 0) (CRNL):1: a()
/tmp/autobuild/pike7.1-20001021082826.tar/bin/test_pike.pike:572: main(3,({"/tmp/autobuild/pike7.1-20001021082826.tar/bin/test_pike.pike","modules/CommonLog/module_testsuite","modules/Gdbm/module_testsuite","modules/Gettext/module_testsuite","modules/Gmp/module_testsuite",,,34}))

How-To-Repeat: Unfortunately, the problem is intermittent.
It may be triggered by resource exhaustion.
Comment 1 Jason Evans freebsd_committer freebsd_triage 2000-10-23 22:39:44 UTC
Responsible Changed
From-To: freebsd-bugs->jasone

Over to maintainer. 
Comment 2 Jason Evans freebsd_committer freebsd_triage 2002-05-11 23:23:08 UTC
Responsible Changed
From-To: jasone->freebsd-bugs
Comment 3 iedowse freebsd_committer freebsd_triage 2002-08-11 20:56:44 UTC
State Changed
From-To: open->feedback


Does this problem still occur on more recent releases?
Comment 4 Dan Nelson 2002-08-30 16:13:48 UTC
Yes, it does.  The pike developers have a build farm, similar to 
tinderbox, and my -current machine just failed the testsuite with the 
error "read(2) failed with ENODEV!".  It seems to be very infrequent; 
it's probably run a couple dozen builds with no problem.

I'm going to add a PTHREAD_ASSERT in uthread_read.c to see if I other 
programs are also getting ENODEV but ignoring it.  I haven't been able 
to get crashdumps working on my -current box, so I can't put a panic in 
the kernel's read().
Comment 5 Ceri Davies freebsd_committer freebsd_triage 2003-06-08 18:57:08 UTC
State Changed
From-To: feedback->open

Feedback has been requested and received; throw this PR back open.
Comment 6 Ceri Davies freebsd_committer freebsd_triage 2003-06-09 11:49:36 UTC
Adding to audit trail:


Date: Mon, 9 Jun 2003 12:41:32 +0200 (MET DST)
From: Henrik Grubbstr <grubba@roxen.com>
Message-ID: <Pine.GSO.4.21.0306091233430.13083-100000@jms.roxen.com>

Well, since the last followup was from august last year, I can inform you
that the bug was last triggered on Dan's FreeBSD 5.1-BETA machine
yesterday:

Fatal error 'read(2) may not return ENODEV' at line 98 in file /usr/src/lib/libc_r/uthread/uthread_read.c (errno = 19)
Abort trap (core dumped)

Core was generated by `pike'.
Program terminated with signal 6, Aborted.
#0  0x2826239f in kill () at {standard input}:15
	in {standard input}

Active threads
Current language:  auto; currently asm
* 1 process 33497  0x2826239f in kill () at {standard input}:15

Backtrace
#0  0x2826239f in kill () at {standard input}:15
#1  0x282c219a in abort () at /usr/src/lib/libc/stdlib/abort.c:72
#2  0x2820f443 in _thread_exit ()
    at /usr/src/lib/libc_r/uthread/uthread_exit.c:99
#3  0x28209d65 in _read (fd=12, buf=0xbf966fe8, nbytes=3)
    at /usr/src/lib/libc_r/uthread/uthread_read.c:98
#4  0x28209d9b in __read (fd=12, buf=0xbf966fe8, nbytes=3)
    at /usr/src/lib/libc_r/uthread/uthread_read.c:108
#5  0x080b725a in f_create_process (args=1)
    at /usr/tmp/xenofarm/pike-7.5/dan.emsphone.com/buildtmp/Pike7.5-20030608-215603/src/signal_handler.c:3512
#6  0x0807073d in low_mega_apply (type=APPLY_LOW, args=1, arg1=0x85dced8, 
    arg2=0x6) at apply_low.h:195
#7  0x08071734 in mega_apply (type=APPLY_LOW, args=1, arg1=0x85dced8, arg2=0x6)
    at /usr/tmp/xenofarm/pike-7.5/dan.emsphone.com/buildtmp/Pike7.5-20030608-215603/src/interpret.c:1702
#8  0x080cd7a5 in call_pike_initializers (o=0x85dced8, args=1)
    at /usr/tmp/xenofarm/pike-7.5/dan.emsphone.com/buildtmp/Pike7.5-20030608-215603/src/object.c:326
#9  0x080cd894 in debug_clone_object (p=0x5, args=1)
    at /usr/tmp/xenofarm/pike-7.5/dan.emsphone.com/buildtmp/Pike7.5-20030608-215603/src/object.c:352
#10 0x080711fe in low_mega_apply (type=APPLY_SVALUE_STRICT, args=1, 
    arg1=0x8533554, arg2=0x0)
    at /usr/tmp/xenofarm/pike-7.5/dan.emsphone.com/buildtmp/Pike7.5-20030608-215603/src/interpret.c:1500
#11 0x0806e77d in opcode_F_APPLY (arg1=33496) at interpret_functions.h:1873
#12 0x08533166 in ?? ()
#13 0x08071750 in mega_apply (type=APPLY_STACK, args=1, arg1=0x0, arg2=0x0)
    at /usr/tmp/xenofarm/pike-7.5/dan.emsphone.com/buildtmp/Pike7.5-20030608-215603/src/interpret.c:1704
#14 0x08071874 in f_call_function (args=1)
    at /usr/tmp/xenofarm/pike-7.5/dan.emsphone.com/buildtmp/Pike7.5-20030608-215603/src/interpret.c:1769
#15 0x080f6fed in new_thread_func (data=0xbfbff404)
    at /usr/tmp/xenofarm/pike-7.5/dan.emsphone.com/buildtmp/Pike7.5-20030608-215603/src/threads.c:788
#16 0x28204e6d in _thread_start ()
    at /usr/src/lib/libc_r/uthread/uthread_create.c:275
#17 0xbf91c000 in ?? ()

sysname: FreeBSD
release: 5.1-BETA
version: FreeBSD 5.1-BETA #271: Thu May 29 16:33:28 CDT 2003
dan@dan.emsphone.com:/usr/src/sys/i386/compile/DANSMP 
machine: i386
nodename: dan.emsphone.com
testname: default
command: make xenofarm
clientversion: $Id: client.sh,v 1.73 2003/05/20 12:48:33 mani Exp $
putversion: $Id: put.c,v 1.14 2003/01/12 21:14:16 ceder Exp $
contact: dnelson@allantgroup.com

Thanks,

--
Henrik Grubbström					grubba@roxen.com
Roxen Internet Software AB
Comment 7 Kris Kennaway freebsd_committer freebsd_triage 2003-07-13 02:40:15 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-threads

Assign to threads mailing list
Comment 8 Maxim Konovalov freebsd_committer freebsd_triage 2006-04-24 20:33:14 UTC
State Changed
From-To: open->suspended

In RELENG_5,6 and HEAD libc_r is deprecated in favour of 
libpthread and libthr.  Nobody is working on libc_r bugs 
so mark this PR as suspended.
Comment 9 K. Macy freebsd_committer freebsd_triage 2007-11-18 08:42:06 UTC
State Changed
From-To: suspended->closed


libc_r is no longer supported