After investigating some issues with some programs here, I've figured out what the cause is. It seems there is a bug with libc's telldir/seekdir. I was able to track a fix down that originated in NetBSD, so I've taken the time to port it. I've tested it with some regression tools which I will attach to this PR. If you chose to commit (any) parts of this diff you will probably need to bump the OS version because the size of dirdesc will change. I didn't notice any issues with this at all, so you probably will want to avoid doing that. Fix: Apply libc.diff to RELENG_7/RELENG_6 and rerun regression test tools. Everything appears to pass and I've been using these on a few machines for the past couple of months. Patch attached with submission follows: How-To-Repeat: Here's some of the regression test program that show the problem. If you compile seekdir-twice.c with a simple "#define DEBUG" you'll get more useful info. Busted telldir/seekdir machine (RELENG_7) first found: a pos returned by telldir: 1 second found: b third found: c should be == second: b pos given to seekdir: 1 pos returned by telldir: 2 should be == second: c should be == second: c Fixed seekdir/telldir with included patch: first found: a pos returned by telldir: 673198160 second found: b third found: c should be == second: b pos given to seekdir: 673198160 pos returned by telldir: 673198160 should be == second: b should be == second: b
It should also be noted that this fix apparently corrects some samba issues users were having on netbsd and are currently having on freebsd. The original NetBSD PR is http://mail-index.netbsd.org/netbsd-bugs/2004/02/05/0008.html christos committed the fix for tell/seekdir etc using a pointer for each dir instead of a hash table, which is what this patch does.
Responsible Changed From-To: freebsd-i386->freebsd-bugs This does not sound i386-specific.
I can reproduce the bug in FreeBSD 10. Unfortunately the patch doesn't apply anymore :(
Much of this patch is not needed. The existing dd_td list is already per-directory instead of being a global cache table. Also, the problem description is not at all clear. I think one of the root bugs is that if you call 'seekdir() followed by telldir()', POSIX requires the return value of telldir() to match the value passed to seekdir(). Fixing this means removing the (I think dubious) SINGLEUSE code and fixing telldir() to look for an existing loc structure for the current position instead of always allocating a new one. That should be a much smaller patch. The only additional complication is if the dd_loc list should instead be changed so it is easier to locate an existing loc for the current position (i.e. either sorting the list or using a hash table)
The POSIX requirement for telldir() after seekdir() can be found here: http://pubs.opengroup.org/onlinepubs/009695399/functions/telldir.html A candidate fix is up for review at: https://phabric.freebsd.org/D490
(In reply to John Baldwin from comment #5) > Much of this patch is not needed. The existing dd_td list is already > per-directory instead of being a global cache table. Also, the problem > description is not at all clear. I think one of the root bugs is that if > you call 'seekdir() followed by telldir()', POSIX requires the return value > of telldir() to match the value passed to seekdir(). Fixing this means > removing the (I think dubious) SINGLEUSE code and fixing telldir() to look > for an existing loc structure for the current position instead of always > allocating a new one. That should be a much smaller patch. Thank you for taking this over! Indeed I have checked that the other BSDs have already removed the SINGLEUSE code. NetBSD and OpenBSD did it around 2006 to workaround Samba issues. DragonFlyBSD did in in 2008: "Remove the SINGLEUSE feature for telldir(), it does not conform to the Open Group specification."
A commit references this bug: Author: jhb Date: Tue Jul 29 00:16:34 UTC 2014 New revision: 269204 URL: http://svnweb.freebsd.org/changeset/base/269204 Log: If telldir() is called immediately after a call to seekdir(), POSIX requires the return value of telldir() to equal the value passed to seekdir(). The current seekdir code with SINGLEUSE enabled breaks this case as each call to telldir() allocates a new cookie. Instead, remove the SINGLEUSE code and change telldir() to look for an existing cookie for the directory's current location rather than always creating a new cookie. CR: https://phabric.freebsd.org/D490 PR: 121656 Reviewed by: jilles MFC after: 1 week Changes: head/lib/libc/gen/directory.3 head/lib/libc/gen/telldir.c
Fix committed to HEAD.
Fix merged to stable and will be in 10.1.
A commit references this bug: Author: jhb Date: Thu Aug 14 20:20:23 UTC 2014 New revision: 270002 URL: http://svnweb.freebsd.org/changeset/base/270002 Log: MFC 268531,269079,269204: Fix various edge cases with rewinddir(), seekdir(), and telldir(): - In the unionfs case, opendir() and fdopendir() read the directory's full contents and cache it. This cache is not refreshed when rewinddir() is called, so rewinddir() will not notice updates to a directory. Fix this by splitting the code to fetch a directory's contents out of __opendir_common() into a new _filldir() function and call this from rewinddir() when operating on a unionfs directory. - If rewinddir() is called on a directory opened with fdopendir() before any directory entries are fetched, rewinddir() will not adjust the seek location of the backing file descriptor. If the file descriptor passed to fdopendir() had a non-zero offset, the rewinddir() will not rewind to the beginning. Fix this by always seeking back to 0 in rewinddir(). This means the dd_rewind hack can also be removed. - Add missing locking to rewinddir() - POSIX says that passing a location returned by telldir() to seekdir() after an intervening call to rewinddir() is undefined, so reclaim any pending telldir() cookies in the directory when rewinddir() is called. - If telldir() is called immediately after a call to seekdir(), POSIX requires the return value of telldir() to equal the value passed to seekdir(). The current seekdir code with SINGLEUSE enabled breaks this case as each call to telldir() allocates a new cookie. Instead, remove the SINGLEUSE code and change telldir() to look for an existing cookie for the directory's current location rather than always creating a new cookie. PR: 121656 Changes: _U stable/10/ stable/10/include/dirent.h stable/10/lib/libc/gen/directory.3 stable/10/lib/libc/gen/gen-private.h stable/10/lib/libc/gen/opendir.c stable/10/lib/libc/gen/readdir.c stable/10/lib/libc/gen/rewinddir.c stable/10/lib/libc/gen/telldir.c stable/10/lib/libc/gen/telldir.h