Bug 87208 - [patch] [regression] /dev/cuad[0/1] bad file descriptor error during mgetty read
Summary: [patch] [regression] /dev/cuad[0/1] bad file descriptor error during mgetty read
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Christian S.J. Peron
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-10-10 14:10 UTC by Norbert P. Copones
Modified: 2006-03-23 16:26 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Norbert P. Copones 2005-10-10 14:10:14 UTC
The device /dev/cuad[0/1] cannot be accessed by mgetty during startup (mgetty entry in /etc/tty). mgetty log shows this output:

10/10 18:32:46 ad1  mgetty: interim release 1.1.33-Apr10
10/10 18:32:46 ad1  check for lockfiles
10/10 18:32:46 ad1  locking the line
10/10 18:32:47 ad1  mod: cannot make /dev/cuad1 stdin: Bad file descriptor
10/10 18:32:47 ad1  open device /dev/cuad1 failed: Bad file descriptor
10/10 18:32:47 ad1  cannot get terminal line dev=cuad1, exiting: Bad file descriptor

Fix: 

i tried changing cuad1 to ttyd1 in /etc/ttys entry and sending HUP signal to init. first attempt always fails (the usual bad file descriptor error), second attempt sometimes succeeds. then switching back ttyd1 to cuad1 again in /etc/ttys and sending HUP signal will error (cuad1: Device busy). so i killed mgetty and again send HUP signal to init and it will succeed at this time.
How-To-Repeat: setting "cuad1 "/usr/local/sbin/mgetty" unknown  on insecure" in /etc/ttys and rebooting the system.
Comment 1 HASHI Hiroaki 2005-10-18 10:30:58 UTC
In this case, mgetty open a /dev/cuad? and dup(2) to stdin.

    int fd;
    
    fd = open(devname, O_RDWR | O_NDELAY | O_NOCTTY );

    /* make new fd == stdin if it isn't already */

    if (fd > 0)
    {
        (void) close(0);
--->    if (dup(fd) != 0)
        {
            lprintf( L_FATAL, "mod: cannot make %s stdin", devname );
            return ERROR;
        }
    }

Bad dup() was not return descriptor 0.

Is this a dup(3)'s bug?
(or imcompatible change?)

Workaround:
  mgetty use dup2(3) instead of use dup(3).

  dup2(fd, 0)
  .
  .
  dup2(0, 1)
  .
  .
  dup2(0, 2)
  .
  .
Comment 2 Dmitry Pryanishnikov 2005-11-10 23:39:44 UTC
Hello!

  I'm CCing this follow-up to freebsd-stable because this problem can
prevent use of RELENG_6 machines in production (mgetty is quite usual
example of such a use). This bug is a regression vs. RELENG_5/4.

  My analysis shows that it isn't only dup() problem. File descriptor 0
get somehow "reserved" in RELENG_6, but only IF process has been started
by the init via /etc/ttys! Look at this simple program:

#include <unistd.h>
#include <syslog.h>
#include <fcntl.h>

#include <stdio.h>
#include <string.h>
#include <stdarg.h>

main()
{
     int res;

     while((res=open("/dev/null",O_RDONLY)) < 3)
         if (res == -1) syslog(LOG_ERR,"open(): %m");
     syslog(LOG_ERR,"Started"); sleep(10);
     if (close(0) == -1) syslog(LOG_ERR,"close(0): %m");
     if (close(2) == -1) syslog(LOG_ERR,"close(2): %m");
     if ((res=dup(1)) == -1) syslog(LOG_ERR,"dup(1): %m");
     syslog(LOG_ERR,"dup() gave %d\n",res);
     sleep(10);
     return 0;
}

One can watch the file descriptor usage in two points where program is 
sleeping: first after program has opened enough files to use descriptor
#3, and second after closing descriptors #0 and #2 and copying descriptor
#1. So, when I start this program under 6.0-RELEASE in usual way (./a.out),
in first point lsof shows me the following (I'll show only plain descriptors
and omit cwd/rtd/txt information):

At first sleep:

a.out   837 root    0u  VCHR       0,70  0t77713     70 /dev/ttyv1
a.out   837 root    1u  VCHR       0,70  0t77713     70 /dev/ttyv1
a.out   837 root    2u  VCHR       0,70  0t77713     70 /dev/ttyv1
a.out   837 root    3r  VCHR       0,13      0t0     13 /dev/null
a.out   837 root    4u  unix 0xc1c7b9bc      0t0        ->0xc1bf7de8

(descriptor #4 has been created by syslog()). Program logged the following:

a.out: dup() gave 0

At the second sleep:

a.out   837 root    0u  VCHR       0,70  0t77713     70 /dev/ttyv1
a.out   837 root    1u  VCHR       0,70  0t77713     70 /dev/ttyv1
a.out   837 root    3r  VCHR       0,13      0t0     13 /dev/null
a.out   837 root    4u  unix 0xc1c7b9bc      0t0        ->0xc1bf7de8

So all OK in this mode: there were 3 standard files open at the beginning
(descr. 0-2), program has opened descr. 3 (and 4), closed 0 and 2 
successfully, and copied 1 to 0. Now let's start this program from the
/etc/ttys:

cuad0  "/root/tmp/a.out"       unknown on insecure

Now we have the following at the first sleep():

a.out   817 root    1r  VCHR       0,13      0t0     13 /dev/null
a.out   817 root    2r  VCHR       0,13      0t0     13 /dev/null
a.out   817 root    3r  VCHR       0,13      0t0     13 /dev/null
a.out   817 root    4u  unix 0xc1c7bde8      0t0        ->0xc1bf7de8

Note that open() has also skipped descr. 0! Then program tries to close it,
gives an error:

close(0): Bad file descriptor
dup() gave 2

Note that descriptor 0 isn't open: close() refuses to close it. But dup()
doesn't "see" it and returns descr. 2 instead. At the second sleep, we
have exactly the same open file table: descr. 0 is not in use, 1-3 point
at /dev/null. So it seems to me that open() suffers from the same problem 
here as a dup(): descriptor 0 becomes "reserved" somehow.


Sincerely, Dmitry
-- 
Atlantis ISP, System Administrator
e-mail:  dmitry@atlantis.dp.ua
nic-hdl: LYNX-RIPE
Comment 3 Gleb Smirnoff freebsd_committer freebsd_triage 2005-11-15 14:55:49 UTC
Responsible Changed
From-To: freebsd-i386->freebsd-bugs

Probably not i386 specific.
Comment 4 hk 2005-11-22 11:24:27 UTC
Hello,

this problem is preventing production use here. Currently I
can use /dev/cuad0 if I have the entry

cuad0   "/usr/local/sbin/mgetty"        unknown on insecure

twice(*) in "/etc/ttys" and issue "kill -HUP 1" after booting
to multi-user. Having only the first entry, sending SIGHUP
to init won't work, but with both entries, so far the first
SIGHUP to init gets everything working.

Maybe this is helpful in finding the culprit. This is on a
ASRock CPU EX Upgrade Board (K7UPGRADE-880/A/ASR) with
AMD Athlon(tm) XP 2800+ and 512MB Memory, running
FreeBSD 6.0-STABLE #3: Sun Nov 20 19:50:43 CET 2005

(*) one entry comes before pseudo terminal entries, the other
    afterwards.

Regards,
Holger Kipp
Comment 5 pblok 2005-12-01 21:30:04 UTC
Hi,

 

Problem is caused by sys/kern/kern_descrip.c 1.279.2.1. When the changes are
undone, mgetty works. I am still figuring out if the kernel patch is wrong,
or that mgetty is doing something iffy.

 

Peter
Comment 6 Kostik Belousov 2005-12-19 16:37:42 UTC
Ok,
it seems I have found the problem. Please, test the patch below:

Index: sys/kern/kern_descrip.c
===================================================================
RCS file: /usr/local/arch/ncvs/src/sys/kern/kern_descrip.c,v
retrieving revision 1.289
diff -u -r1.289 kern_descrip.c
--- sys/kern/kern_descrip.c	30 Nov 2005 05:12:03 -0000	1.289
+++ sys/kern/kern_descrip.c	19 Dec 2005 16:36:44 -0000
@@ -1512,6 +1512,8 @@
 				newfdp->fd_freefile = i;
 		}
 	}
+	if (newfdp->fd_freefile == -1)
+		newfdp->fd_freefile = i;
 	FILEDESC_UNLOCK_FAST(fdp);
 	FILEDESC_LOCK(newfdp);
 	for (i = 0; i <= newfdp->fd_lastfile; ++i)
@@ -1519,9 +1521,9 @@
 			fdused(newfdp, i);
 	FILEDESC_UNLOCK(newfdp);
 	FILEDESC_LOCK_FAST(fdp);
-	if (newfdp->fd_freefile == -1)
-		newfdp->fd_freefile = i;
 	newfdp->fd_cmask = fdp->fd_cmask;
+	KASSERT(fd_first_free(newfdp, 0, newfdp->fd_nfiles) == newfdp->fd_freefile,
+		("fd_first_free != fd_freefile fdp %p newfdp %p p %p", fdp, newfdp, curproc));
 	FILEDESC_UNLOCK_FAST(fdp);
 	return (newfdp);
 }
Comment 7 Norbert P. Copones 2005-12-19 17:43:10 UTC
seems the workaround is already commited in the ports tree. it makes
mgetty use dup2(2) instead of dup(2). mgetty works fine now.
Comment 8 Kostik Belousov 2005-12-20 09:42:05 UTC
Yes,
workaround just hide real kernel bug, that I'm trying to fix in the
submitted patch.
Comment 9 Gleb Smirnoff freebsd_committer freebsd_triage 2005-12-23 10:15:22 UTC
Responsible Changed
From-To: freebsd-bugs->des

Dag-Erling, please handle this. Looks like you have introduced the problem.
Comment 10 Christian S.J. Peron freebsd_committer freebsd_triage 2006-03-19 21:47:30 UTC
Responsible Changed
From-To: des->csjp

I will take ownership of this PR as I am working on a fix.
Comment 11 Christian S.J. Peron freebsd_committer freebsd_triage 2006-03-20 05:48:02 UTC
State Changed
From-To: open->patched

An experimental fix has been commited to -CURRENT, once it's testing 
period expires, we will merge it into RELENG_6
Comment 12 Christian S.J. Peron freebsd_committer freebsd_triage 2006-03-23 16:26:25 UTC
State Changed
From-To: patched->closed

Merged to RELENG_6