Bug 16028

Summary: Change 1.10 in libc/xdr/xdr_rec.c breaks some RPC
Product: Base System Reporter: sue <sue>
Component: miscAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 3.3-RELEASE   
Hardware: Any   
OS: Any   

Description sue 2000-01-10 16:00:01 UTC
The change 1.10 to xdr/xdr_rec.c causes some RPC programs to fail when
they read incorrect data.  It appears that transferring a size of
19947 bytes is important.  If, in my server program I truncat my calls
to that size, the failure occurs every time.  Likely this is due to the
fact that at that particular size, the LAST_FRAG bytes are sent as their
own separate write call.

Using ktrace, it shows that the server is writing the last 4 bytes,
which is the LAST_FRAG, but the client is "moving on" before reading it.
Then, when it attempts to read after issuing the next request, it reads
the (previous but unread) LAST_FRAG and dies (by returning FALSE from
set_input_fragment()), returning 0 to the caller.

Fix: 

The only fix I know of (workaround really) is to remove the test added in rev 1.10
from xdr_rec.c - lines 561 and 562 (553-562 if you count the comment added
at the time also).

I discovered this problem because I was getting "random" failures on a test
RPC program.  These failures never occurred on a BSDI system.  I had
already been debugging in the XDR routines and comparing them against
the BSDI version showed this single, key difference.

It is possible that the check/change added in 1.10 is invalid and incorrect.
However, it is also possible (perhaps likely) that the check is exposing
a different, latent XDR bug.  It appears that with or without the change,
XDR is "completing" before reading the LAST_FRAG and in the original code,
it likely just skips it the next pass through (when it would read test/file2
from the server).  Perhaps this change just indicates that there is a bug
in the coordination of rstrm->fbtbc and rstrm->last_frag that has been there
forever, but masked.  I am not sure which may be the case.
How-To-Repeat: A demonstration program can be retrieved from:
ftp.sleepycat.com://pub/rpcbug.tar.gz

After unpacking it (objs and executables are in there, rebuild if necessary)
see the problem by:
% cd file
% ./file_svc &
% ./rls localhost test/*

You will see output like:
Sent test/file1 Got 19947
Bad file test/file2

When it works (rebuild libc with 1.10's check removed) you'd see:
Sent test/file1 Got 19947
Sent test/file2 Got 19947
Sent test/file3 Got 19947
Sent test/file4 Got 19947

In the tar there is also the output of a ktrace/kdump on both
the client and server processes, in cl.kd and svc.kd, respectively.

In svc.kd, at line 587, you can see the server writing the LAST_FRAG
separately from the data it just sent.  Then on the next line it reads the
next RPC request from the client, to read test/file2.

In cl.kd, at line 485, you can see the client reading the last of the data 
from the server.  Then on 487, it writes the output to stdout from
the program itself.  On 490 it writes the next RPC request.
Finally, on 494, it reads the LAST_FRAG that the server wrote for the
previous RPC.
Comment 1 Bill Paul freebsd_committer freebsd_triage 2000-01-19 06:15:15 UTC
State Changed
From-To: open->closed

Apparently it's legal for client RPC programs to receive zero length 
records with the LAST_FRAG marger bit set. So the test that works on 
the server side breaks on the client side. I changed the test to look 
for a header value of 0, since that is actually not legal. This fixes 
the client side while still maintaining the test for the server side. 
With this fix, the sample application works correctly. 

I updated xdr_rec.c in both the -current and -stable branches. The fix 
will be in 4.0-RELEASE when it comes out.