Bug 31479

Summary: Solaris NFS client times out in getacl
Product: Base System Reporter: quinot <quinot>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 5.0-CURRENT   
Hardware: Any   
OS: Any   

Description quinot 2001-10-24 18:10:00 UTC
With Solaris 2.[68] as clients, and -CURRENT of Oct. 17 as server:
ls on client works, but ls -l waits for a timeout once for each file
in the directory, and issues a 'NFS getacl failed' message.

The server is not multi-homed, and a packet capture shows no trace of
address mismatch problems. One interesting thing is that the client
first does GETATTR on the file (and apparently gets a reply), and
then sends some other RPC, to which the server never replies.
Could this be the getacl request mentioned in the client error message?
I see no mention of getacl whatsoever in the -CURRENT server code. If
no such function is implemented, shouldn't we reject the request?

A packet capture is available at
  http://www.infres.enst.fr/~quinot/nfs.cap

Client is  137.194.192.1, server is 137.194.162.11. The test consists
in first performing an 'ls' on one file, then an 'ls -l' on the same
file. Result:

ls photos-ta; ls -l photos-ta
photos-ta
NFS getacl failed for server shalmaneser.enst.fr: error 5 (RPC: Timed
out)
-rw-------   1 quinot   astre        474 Oct 18 14:17 photos-ta

This issue was brought up on current@freebsd.org, but no reply was made.

Fix: 

None known so far.
	Note: Trying to launch mountd with '-2' and to mount the FS with
	'-o vers=2' on the client results in mount hanging on the client
	(although mountd logs a 'mount request succeeded').
How-To-Repeat: 
Export an NFS file system from a FreeBSD 5-CURRENT server to a Solaris 2.8
client. Do an 'ls' on an existing file on that FS (works correctly),
do an 'ls -l' (hanges, times out, displays error message before returning
correct data).
Comment 1 quinot 2001-10-25 09:37:16 UTC
The following feedback has been received on -CURRENT for this PR.

----- Forwarded message from BSD User <bsder@allcaps.org> -----

Delivered-To: quinot@inf.enst.fr
Date: Wed, 24 Oct 2001 22:18:57 -0700 (PDT)
From: BSD User <bsder@allcaps.org>
To: Paul van der Zwan <paulz@trantor.xs4all.nl>
Cc: Thomas Quinot <quinot@inf.enst.fr>, <current@freebsd.org>
Subject: Re: Multiple NFS server problems with Solaris 8 clients 
In-Reply-To: <200110241729.f9OHTLx21951@trantor.xs4all.nl>

On Wed, 24 Oct 2001, Paul van der Zwan wrote:

> I have looked at a trace I made using snoop and it shows an NFS_ACL call which
> is not supported by FreeBSD. It should have sent a reply that it does not
> know the NFS_ACL protocol but apparently it does not.
> The only return traffic I see is an empty packet with the tcp ACK.
> It looks like an implementation error in the -current NFS server.
>
> 	Paul

I have been digging at traces of 4.4-RELEASE (which works) and -current
(which doesn't).

Both versions get it wrong.  I have no idea why 4.4-RELEASE worked.

-current responds with a blank TCP packet (which it emphatically should
*not* do) to the GETACL3 call.  It *could* conceivably be received as an
RPC packet with the "Last Fragment" flag not set and a length of 0.  Who
knows what the Solaris 8 client is doing when it encounters this (probably
getting stuck waiting for more data which never comes).

4.4-RELEASE responds with an RPC packet indicating "success" (which is
*also* wrong if the NFS server doesn't support ACLs) and then puts what
looks to be garbage in the response.  However, it is a valid RPC reponse
with the "Last Fragment" flag set.  Presumably the Solaris client gets the
message, sees the last fragment, throws away the packet as an error and
continues on with life.

I presume that the "correct" response is to send back an RPC reply (with
the "Last Fragment" set) which indicates that the RPC message was accepted
but that the procedure was unavailable (PROC_UNAVAIL).  Hopefully this
matches what an older Solaris server would do when faced with a Solaris 8
client and everything will proceed normally from there.

If anybody wants ethereal traces, I can send them.  Just ask.

Andy L.



----- End forwarded message -----

-- 
Thomas Quinot ** Département Informatique & Réseaux ** quinot@inf.enst.fr
              ENST   //   46 rue Barrault   //   75634 PARIS CEDEX 13
Comment 2 quinot 2001-10-25 18:00:03 UTC
This patch fixes the problem for me. Thanks Ian!

----- Forwarded message from Ian Dowse <iedowse@maths.tcd.ie> -----

To: BSD User <bsder@allcaps.org>
Cc: Thomas Quinot <quinot@inf.enst.fr>,
	Paul van der Zwan <paulz@trantor.xs4all.nl>, current@freebsd.org,
	fs@freebsd.org
Subject: Re: Multiple NFS server problems with Solaris 8 clients 
In-Reply-To: Your message of "Thu, 25 Oct 2001 03:16:08 PDT."
             <20011025030312.J8642-100000@mail.allcaps.org> 
Date: Thu, 25 Oct 2001 17:32:47 +0100
From: Ian Dowse <iedowse@maths.tcd.ie>
Message-ID:  <200110251732.aa69596@salmon.maths.tcd.ie>

In message <20011025030312.J8642-100000@mail.allcaps.org>, BSD User writes:
>Actually, upon instrumenting some code, it looks like RELEASE-4.4 gets it
>mostly right.  It ejects a PROG_UNAVAIL call which causes the Solaris 8
>client to back off.  The correct message would seem to be PROC_UNAVAIL,
>but I would take PROG_UNAVAIL if I could get -current to eject it.

I think PROG_UNAVAIL is correct; the packet trace that Thomas
provided shows an RPC request with a program ID of 100227 which is
not the NFS program ID.

Try the patch below. Peter's NFS revamp changed the semantics of
the nfsm_reply() macro, and nfsrv_noop() was not updated to match.
Previously nfsm_reply would set 'error' to 0 when nd->nd_flag did
not have ND_NFSV3 set, and much of the code that uses nfsrv_noop
to generate errors ensured that nd->nd_flag was zero. Now nfsm_reply
never sets 'error' to 0, so it needs to be done explicitly. Server
op functions must return 0 in order for a reply to be sent to the
client.

Ian

Index: nfs_serv.c
===================================================================
RCS file: /home/iedowse/CVS/src/sys/nfsserver/nfs_serv.c,v
retrieving revision 1.107
diff -u -r1.107 nfs_serv.c
--- nfs_serv.c	2001/09/28 04:37:08	1.107
+++ nfs_serv.c	2001/10/25 16:19:33
@@ -4000,6 +4000,7 @@
 	else
 		error = EPROCUNAVAIL;
 	nfsm_reply(0);
+	error = 0;
 nfsmout:
 	return (error);
 }

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message

----- End forwarded message -----

-- 
Thomas Quinot ** Département Informatique & Réseaux ** quinot@inf.enst.fr
              ENST   //   46 rue Barrault   //   75634 PARIS CEDEX 13
Comment 3 iedowse freebsd_committer freebsd_triage 2001-10-25 20:08:12 UTC
State Changed
From-To: open->closed


Fixed in nfs_serv.c revision 1.108. Thanks for the bug report!