Bug 27230

Summary: Users after NIS lines in /etc/passwd
Product: Base System Reporter: quinot <quinot>
Component: binAssignee: Jacques Vidrine <nectar>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 5.0-CURRENT   
Hardware: Any   
OS: Any   

Description quinot 2001-05-09 17:10:01 UTC
	Consider a /etc/master.passwd with the following structure:
root:...
user1:...
+:...
user2:...

	ie using NIS ('+' line) AND with a local user declared
	after the '+' line.

	When both ypbind and rpcbind are running, user2 is seen correctly.

	When neither of them is running, running 'id user2' hangs for
	75 seconds in getpwnam(), then returns 'no such user'.

	When only rpcbind is running, it does not hang but returns
	'no such user' immediately.

	There is a similar problem with /etc/groups, which had
	the unfortunate on my system that the
	  chown root:wheel /dev/tty[pqrsPQRS]*
	in /etc/rc took ages, because '+' was before 'wheel' in
	my /etc/groups.

Fix: 

Fix: unknown.

        Work-around: Move all the '+' lines in /etc/master.passwd and /etc/group
        to the end of the file. Document the problem. Possibly modify
        vipw to do the rearrangement automatically when generating /etc/passwd
        from master.passwd.
How-To-Repeat: 	* killall rpcbind ypbind
        * id user1 (where user1 is a username before the '+'): OK
        * id user2 (------------------------- after  -------):
           hangs for 75 seconds then returns 'unknown user')
        * launch rpcbind
        * id user2 now returns 'unknwon user' immediately
        * launch ypbind
        * id user2 works OK
Comment 1 dima 2001-05-10 00:38:23 UTC
quinot@inf.enst.fr writes:
> 
> >Number:         27230
> >Category:       bin
> >Synopsis:       Users after NIS lines in /etc/passwd
> 	
> >Description:
> 	Consider a /etc/master.passwd with the following structure:
> root:...
> user1:...
> +:...
> user2:...
> 
> 	ie using NIS ('+' line) AND with a local user declared
> 	after the '+' line.
> 
> 	When both ypbind and rpcbind are running, user2 is seen correctly.
> 
> 	When neither of them is running, running 'id user2' hangs for
> 	75 seconds in getpwnam(), then returns 'no such user'.
> 
> 	When only rpcbind is running, it does not hang but returns
> 	'no such user' immediately.

This is an artifact of the introduction of nsswitch.  Basically, when
the database-specific lookup routines return NS_UNAVAIL, the search is
short-circuited.  This is wrong because, as you show, there may be
entries later on for which the routine won't return NS_UNAVAIL.

The attached patch seems to fix this for me.  Please try it and see if
it works for you, too.  If you're not confident with being able to
rebuild libc and the appropriate programs correctly, you'll probably
want to do a full make world.

> 	There is a similar problem with /etc/groups, which had
> 	the unfortunate on my system that the
> 	  chown root:wheel /dev/tty[pqrsPQRS]*
> 	in /etc/rc took ages, because '+' was before 'wheel' in
> 	my /etc/groups.

This problem doesn't seem to affect groups.  I can't reproduce it, and
a quick look at the code supports this observation.  Perhaps there
were other causes for the delay you describe above.

					Dima Dorfman
					dima@unixfreak.org

Index: getpwent.c
===================================================================
RCS file: /st/src/FreeBSD/src/lib/libc/gen/getpwent.c,v
retrieving revision 1.59
diff -u -r1.59 getpwent.c
--- getpwent.c	2001/01/24 12:59:22	1.59
+++ getpwent.c	2001/05/09 23:27:55
@@ -910,7 +910,7 @@
 				r = __getpwcompat(_PW_KEYBYNAME, 0, user);
 
 				if (r == NS_UNAVAIL)
-					return r;
+					break;
 				if (r == NS_NOTFOUND) {
 					/*
 					 * just because this user is bad
@@ -924,7 +924,7 @@
 				r = __getpwcompat(_PW_KEYBYNAME, 0, user);
 
 				if (r == NS_UNAVAIL)
-					return r;
+					break;
 				if (r == NS_NOTFOUND)
 					continue;
 				break;
Comment 2 Jacques Vidrine 2001-05-10 03:19:02 UTC
On Wed, May 09, 2001 at 04:38:23PM -0700, Dima Dorfman wrote:
> quinot@inf.enst.fr writes:
> > 
> > >Number:         27230
> > >Category:       bin
> > >Synopsis:       Users after NIS lines in /etc/passwd
> > 	
> > >Description:
> > 	Consider a /etc/master.passwd with the following structure:
> > root:...
> > user1:...
> > +:...
> > user2:...
> > 
> > 	ie using NIS ('+' line) AND with a local user declared
> > 	after the '+' line.
> > 
> > 	When both ypbind and rpcbind are running, user2 is seen correctly.
> > 
> > 	When neither of them is running, running 'id user2' hangs for
> > 	75 seconds in getpwnam(), then returns 'no such user'.
> > 
> > 	When only rpcbind is running, it does not hang but returns
> > 	'no such user' immediately.
> 
> This is an artifact of the introduction of nsswitch.  Basically, when
> the database-specific lookup routines return NS_UNAVAIL, the search is
> short-circuited.  This is wrong because, as you show, there may be
> entries later on for which the routine won't return NS_UNAVAIL.

No, NS_UNAVAIL _should_ short-circuit like  this.  I'll look for a bug
in __getpwcompat that returns NS_UNAVAIL inappropriately. 
-- 
Jacques Vidrine / n@nectar.com / jvidrine@verio.net / nectar@FreeBSD.org
Comment 3 dima 2001-05-10 03:40:48 UTC
"Jacques A. Vidrine" <n@nectar.com> writes:
> On Wed, May 09, 2001 at 04:38:23PM -0700, Dima Dorfman wrote:
> > quinot@inf.enst.fr writes:
> > > 
> > > >Number:         27230
> > > >Category:       bin
> > > >Synopsis:       Users after NIS lines in /etc/passwd
> > > 	
> > > >Description:
> > > 	Consider a /etc/master.passwd with the following structure:
> > > root:...
> > > user1:...
> > > +:...
> > > user2:...
> > > 
> > > 	ie using NIS ('+' line) AND with a local user declared
> > > 	after the '+' line.
> > > 
> > > 	When both ypbind and rpcbind are running, user2 is seen correctly.
> > > 
> > > 	When neither of them is running, running 'id user2' hangs for
> > > 	75 seconds in getpwnam(), then returns 'no such user'.
> > > 
> > > 	When only rpcbind is running, it does not hang but returns
> > > 	'no such user' immediately.
> > 
> > This is an artifact of the introduction of nsswitch.  Basically, when
> > the database-specific lookup routines return NS_UNAVAIL, the search is
> > short-circuited.  This is wrong because, as you show, there may be
> > entries later on for which the routine won't return NS_UNAVAIL.
> 
> No, NS_UNAVAIL _should_ short-circuit like  this.  I'll look for a bug
> in __getpwcompat that returns NS_UNAVAIL inappropriately. 

In this case, it gets returned here:

        if(__ypdomain == NULL) {
                if(_yp_check(&__ypdomain) == 0)
                        return NS_UNAVAIL;
        }

line 512, rev. 1.59 of getpwent.c.

> -- 
> Jacques Vidrine / n@nectar.com / jvidrine@verio.net / nectar@FreeBSD.org
>
Comment 4 quinot 2001-05-10 08:46:04 UTC
Le 2001-05-10, Jacques A. Vidrine écrivait :

> No, NS_UNAVAIL _should_ short-circuit like  this.  I'll look for a bug
> in __getpwcompat that returns NS_UNAVAIL inappropriately. 

BTW: In case this information is needed, the contents of my /etc/nsswitch.conf
is:

hosts: files [NOTFOUND=continue] dns

-- 
Thomas Quinot ** Département Informatique & Réseaux ** quinot@inf.enst.fr
              ENST   //   46 rue Barrault   //   75634 PARIS CEDEX 13
Comment 5 quinot 2001-05-10 11:58:23 UTC
Le 2001-05-10, Dima Dorfman écrivait :

> > No, NS_UNAVAIL _should_ short-circuit like  this.  I'll look for a bug
> > in __getpwcompat that returns NS_UNAVAIL inappropriately. 
> In this case, it gets returned here:
> 
>         if(__ypdomain == NULL) {
>                 if(_yp_check(&__ypdomain) == 0)
>                         return NS_UNAVAIL;
>         }
> 
> line 512, rev. 1.59 of getpwent.c.

As I understand it, your patch and/or changing the returned value would
resolve the faulty 'no such user' error, but not the 75-second hang
that is experienced when rpcbind is not running.

-- 
Thomas Quinot ** Département Informatique & Réseaux ** quinot@inf.enst.fr
              ENST   //   46 rue Barrault   //   75634 PARIS CEDEX 13
Comment 6 dima 2001-05-11 04:27:32 UTC
Thomas Quinot <quinot@inf.enst.fr> writes:
> Le 2001-05-10, Dima Dorfman écrivait :
> 
> > > No, NS_UNAVAIL _should_ short-circuit like  this.  I'll look for a bug
> > > in __getpwcompat that returns NS_UNAVAIL inappropriately. 
> > In this case, it gets returned here:
> > 
> >         if(__ypdomain == NULL) {
> >                 if(_yp_check(&__ypdomain) == 0)
> >                         return NS_UNAVAIL;
> >         }
> > 
> > line 512, rev. 1.59 of getpwent.c.
> 
> As I understand it, your patch and/or changing the returned value would
> resolve the faulty 'no such user' error, but not the 75-second hang
> that is experienced when rpcbind is not running.

I don't think that's a bug.  It's the nature of NIS; it should wait in
hopes of the server responding.  Perhaps the bug is that it doesn't
wait when rpcbind is running but ypbind isn't.

> 
> -- 
> Thomas Quinot ** Département Informatique & Réseaux ** quinot@inf.enst.fr
>               ENST   //   46 rue Barrault   //   75634 PARIS CEDEX 13 
>
Comment 7 quinot 2001-05-11 15:50:51 UTC
Le 2001-05-11, Dima Dorfman écrivait :

> > As I understand it, your patch and/or changing the returned value would
> > resolve the faulty 'no such user' error, but not the 75-second hang
> > that is experienced when rpcbind is not running.

> I don't think that's a bug.  It's the nature of NIS; it should wait in
> hopes of the server responding.  Perhaps the bug is that it doesn't
> wait when rpcbind is running but ypbind isn't.

In that case perhaps we could change the code to use TCP when trying to
connect to the local portmapper, so we can get a 'connection refused'
immediately rather than timing out when there is no portmapper running.

If this is not possible, then the fact that all '+' lines in /etc/passwd
and /etc/groups should be at the end of the file should IMO be documented.

Thomas.

-- 
Thomas Quinot ** Département Informatique & Réseaux ** quinot@inf.enst.fr
              ENST   //   46 rue Barrault   //   75634 PARIS CEDEX 13
Comment 8 Jacques Vidrine freebsd_committer freebsd_triage 2001-05-11 20:46:33 UTC
Responsible Changed
From-To: freebsd-bugs->nectar

Make this PR bug me so I can look at it again in a 
couple of weeks.  I'm not convinced there is a problem 
in the lookup code, but I'm hitting the road and won't 
be able to set up a decent test environment until then.
Comment 9 Jacques Vidrine freebsd_committer freebsd_triage 2003-02-05 03:19:50 UTC
State Changed
From-To: open->closed

If memory serves...  the behavior seen is the expected and 
correct behavior.
Comment 10 quinot 2003-02-05 11:17:14 UTC
Le 2003-02-05, Jacques Vidrine écrivait :

> State-Changed-From-To: open->closed
> State-Changed-By: nectar
> State-Changed-When: Tue Feb 4 19:19:50 PST 2003
> State-Changed-Why: 
> If memory serves...  the behavior seen is the expected and
> correct behavior.

What about using TCP instead of UDP for communication between ypbind
and rpcbind? This would suppress the 75 second timeout when rpcbind
is not running.

-- 
Thomas Quinot ** Département Informatique & Réseaux ** quinot@inf.enst.fr
              ENST   //   46 rue Barrault   //   75634 PARIS CEDEX 13
Comment 11 Jacques Vidrine freebsd_committer freebsd_triage 2003-02-05 12:40:48 UTC
On Wed, Feb 05, 2003 at 12:17:14PM +0100, Thomas Quinot wrote:
> Le 2003-02-05, Jacques Vidrine $BqD(Brivait :
> 
> > State-Changed-From-To: open->closed
> > State-Changed-By: nectar
> > State-Changed-When: Tue Feb 4 19:19:50 PST 2003
> > State-Changed-Why: 
> > If memory serves...  the behavior seen is the expected and
> > correct behavior.
> 
> What about using TCP instead of UDP for communication between ypbind
> and rpcbind? This would suppress the 75 second timeout when rpcbind
> is not running.

That would be optimizing for the unusual case.

Cheers,
-- 
Jacques A. Vidrine <nectar@celabo.org>          http://www.celabo.org/
NTT/Verio SME          .     FreeBSD UNIX     .       Heimdal Kerberos
jvidrine@verio.net     .  nectar@FreeBSD.org  .          nectar@kth.se