Bug 57696

Summary: [nfs] NFS client readdir terminates prematurely if renaming files
Product: Base System Reporter: Brian Candler <B.Candler>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Open ---    
Severity: Affects Only Me CC: bdrewery, grahamperrin, j.schripsema, murray, rmacklem
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Brian Candler 2003-10-07 13:40:14 UTC
If you have an opendir/readdir loop which rename()s files out of that
directory while the directory is being read, and the directory is mounted
over NFS, the readdir terminates prematurely - i.e. not all files are seen.

Problem verified with the following combinations:

   FreeBSD client - NetApp server:         problem [1]
   FreeBSD client - Solaris 2.8 server:    problem [1]
   FreeBSD client - FreeBSD server:        problem [2]
   FreeBSD client - Linux server:          problem [3]
   Linux client - Linux server:            no problem [3]
   Solaris 2.8 client - Netapp server:     no problem [4]

That seems to nail it as a FreeBSD NFS client issue. References:
[1] http://www.mail-archive.com/sqwebmail%40inter7.com/msg06643.html
[2] http://www.mail-archive.com/sqwebmail%40inter7.com/msg06644.html
[3] not yet appeared on archive, message from Stefan Kaltenbrunner
[4] http://www.mail-archive.com/sqwebmail%40inter7.com/msg06657.html

It's a problem in particular for Maildir messages, when moving files from
Maildir/new/* to Maildir/cur/*

Fix: 

No idea. Workaround implemented in courier-imap is to opendir, readdir 20
items into array, closedir, process the 20 items, rinse and repeat.
How-To-Repeat: 
Run the following program with an NFS-mounted directory as the command-line
argument. The failure mode is:

    bash-2.05a# ./testnfs /na0/testdir
    Transferred 169 out of 200 files

This program also posted at reference [1] above.

/* Demonstrate problem with NFS readdir */

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h> /* for mkdir */
#include <sys/stat.h>  /* for mkdir */
#include <dirent.h>    /* opendir/readdir etc */

#define TESTSIZE 200
    
int main(int argc, char *argv[])
{
  int i;
  char fnbuf[1024], fnbuf2[1024];
  char *dir = argv[1];
  int count;
  DIR *dp;
  struct dirent *de;
  
  if (argc < 2 || !dir || !dir[0])
    dir = ".";

  sprintf(fnbuf, "%s/new", dir);
  mkdir(fnbuf, 0777);
  sprintf(fnbuf, "%s/cur", dir);
  mkdir(fnbuf, 0777);
    
  for (i=0; i<TESTSIZE; i++) {
    FILE *f;
    sprintf(fnbuf, "%s/new/MYTESTFILE%d", dir, i);
    f = fopen(fnbuf, "w");
    if (!f) { perror("fopen"); exit(1); }
    fprintf(f, "Some dummy content\n");
    fclose(f);
  }

  sprintf(fnbuf, "%s/new", dir);
  dp = opendir(fnbuf);
  if (!dp) { perror("opendir"); exit(1); }
  count = 0;
  while ((de = readdir(dp))) {
    if (de->d_name[0] == '.') continue;
    sprintf(fnbuf, "%s/new/%s", dir, de->d_name);
    sprintf(fnbuf2,"%s/cur/%s:2,S", dir, de->d_name);
    if (rename(fnbuf, fnbuf2) < 0) {
      perror("rename");
      fprintf(stderr, "(from %s to %s)\n", fnbuf, fnbuf2);
      continue;
    }
    count++;
  }  

  fprintf(stderr, "Transferred %d out of %d files\n", count, TESTSIZE);
  return count != TESTSIZE;
}
Comment 1 Brian Candler 2003-10-07 16:45:38 UTC
The other reference is:
[3] http://sourceforge.net/mailarchive/message.php?msg_id=6224555
Comment 2 Chris.Shenton 2003-10-07 22:43:35 UTC
Same problem confirmed on FreeBSD-4.9-PRERELEASE.

Solaris 2.9 does NOT exhibit this problem.
Comment 3 chris 2003-10-08 01:33:25 UTC
I hope I'm doing the follow-up right; I don't know how to see
follow-ups in GNATS.

At work I tested a FreeBSD-4.9-PRERELEASE client against our netapp
and found the same failure with the "nfstest" program Brian posted. A
Solaris-2.9 client did not exhibit this problem.

When I got home, I ran the test with a FreeBSD-5.1 client to NetApp,
then with FreeBSD client to FreeBSD server.  Both failed the same
way.  This would tend to indicate an NFS client problem in 4.x and
5.x.


NFS from FreeBSD-5.1-CURRENT client to NetApp-6.3.2 fails:

  NetApp> version
  NetApp Release 6.3.2: Tue Mar 18 14:54:05 PST 2003

  chris@PECTOPAH<111> time ./nfstest /netapp/chris2/nfstestdir
  Transferred 169 out of 200 files
  0.016u 0.177s 0:00.66 27.2%	5+188k 0+200io 0pf+0w

  chris@PECTOPAH<112> uname -a
  FreeBSD PECTOPAH.shenton.org 5.1-CURRENT FreeBSD 5.1-CURRENT #10: Wed Sep 17 11:57:38 EDT 2003     root@PECTOPAH.shenton.org:/usr/obj/usr/src/sys/PECTOPAH  i386


NFS from (diskless) FreeBSD-5.1-CURRENT client to FreeBSD-5.1-CURRENT
NFS server also fails:

  chris@Kitchen<107> time ./nfstest kitchen
  Transferred 169 out of 200 files
  0.016u 0.937s 0:01.99 47.2%     4+174k 0+200io 0pf+0w

  chris@Kitchen<108> uname -a
  FreeBSD Kitchen.shenton.org 5.1-CURRENT FreeBSD 5.1-CURRENT #13: Sat Oct  4 14:21:23 EDT 2003     chris@PECTOPAH.shenton.org:/usr/obj/usr/src/sys/PECTOPAH  i386
Comment 4 Brian Candler 2003-10-08 17:29:42 UTC
On Tue, Oct 07, 2003 at 08:33:25PM -0400, Chris Shenton wrote:
> At work I tested a FreeBSD-4.9-PRERELEASE client against our netapp
> and found the same failure with the "nfstest" program Brian posted. A
> Solaris-2.9 client did not exhibit this problem.

Apparently a "bad cookie" message is logged for each failure event, and the
following related thread has been pointed out to me:
http://lists.freebsd.org/pipermail/freebsd-current/2003-August/008402.html

The implication of what is written there is that readdir() is entitled to
fail if the directory has been changed underneath it.

I don't know how Linux and Solaris cope with this: do they take an in-memory
snapshot of the whole directory at the client side, which readdir() then
traverses? Or do they just blindly continue to traverse a directory which
they know has changed?

Regards,

Brian.
Comment 5 Kris Kennaway freebsd_committer freebsd_triage 2003-11-15 21:49:22 UTC
State Changed
From-To: open->analyzed

See also kern/26142.  This is a known problem in the nfs code; 
the ufs code contains code to deal with this situation.  Bug peter 
for more details :)
Comment 6 cel freebsd_committer freebsd_triage 2006-05-24 20:08:06 UTC
Responsible Changed
From-To: freebsd-bugs->cel

Linux and Solaris watch the mtime of the directory while it's being 
read.  If the mtime changes, those clients know that the directory 
itself has changed, and effectively re-read the directory contents 
from the beginning.
Comment 7 cel freebsd_committer freebsd_triage 2007-03-12 15:21:55 UTC
Responsible Changed
From-To: cel->freebsd-bugs

Back to the public pool.
Comment 8 renchap 2009-12-10 14:17:43 UTC
Hi,

I can reproduce this bug with FreeBSD 8.0 :
% uname -a
FreeBSD vty-testhp1 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08 UTC 2009 root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64

Steps to reproduce : install benchmark/bonnie++ port, and launch bonnie++ inside a NFS mount.

Is it any plans to fix this ?
Comment 9 Eitan Adler freebsd_committer freebsd_triage 2012-07-01 17:10:38 UTC
State Changed
From-To: analyzed->open

unowned PRs must not be in analyzed state
Comment 10 Bryan Drewery freebsd_committer freebsd_triage 2016-05-16 22:47:01 UTC
*** Bug 26142 has been marked as a duplicate of this bug. ***
Comment 11 Bryan Drewery freebsd_committer freebsd_triage 2016-05-16 22:47:18 UTC
More discussion: https://lists.freebsd.org/pipermail/freebsd-fs/2014-October/020155.html
Comment 12 murray 2016-10-15 04:51:58 UTC
FYI, 13 years later, this is still an issue with FreeSD 11.0-RELEASE.  If I run bonnie++ from a FreeBSD NFSv3 client to a Synology NAS NFSv3 server, I still see the error.

$ bonnie++ -u murray
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...Bonnie: drastic I/O error (rmdir): Directory not empty
Cleaning up test directory after error.

Is there any chance this works with the NFSv4 client in FreeBSD, or is it still the same codepath?
Comment 13 Eitan Adler freebsd_committer freebsd_triage 2018-05-20 23:51:37 UTC
For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"
Comment 14 Rick Macklem freebsd_committer freebsd_triage 2018-05-23 22:22:15 UTC
Still a problem and will affect NFSv4 as well.
The short version is that NFS is not a POSIX compliant file
system.
The only way to reliably read a directory and remove its contents
is to repeatedly read the first entry in the directory (cookie offset 0)
and unlink() that entry until the directory is empty.
(If you modify bonnie++ to do this, it will work correctly over NFS.)
For other cases like rename, there is no fix.

What I believe other clients do to work around the problem is have
opendir() read the entire directory and then the readdir()s return
entries from that and never do getdirentries(2). This gives you
the entire directory in the form it had before the remove/rename...
(I have posted suggesting doing this for FreeBSD, but I've never had
 the collective say "yes, you should do this", so I haven't done it.)
It's actually pretty easy to do, since the code in libc already does
this for for "union" mounts, so changing to do this for NFS mounts
would be easy to code.
The problem with doing this is the libc opendir() is going to use a
lot of address space for large directories and might break in the extreme
case.
I suggested an upper limit on directory size for the above but, again,
since no one said this was a good idea, I didn't pursue it.
(Sorry, it has been a while and I don't remember which email list
 I posted the "should I do this?" to.)