Bug 165392 - [ufs] [patch] Multiple mkdir/rmdir fails with errno 31
Summary: [ufs] [patch] Multiple mkdir/rmdir fails with errno 31
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.0-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-02-22 11:00 UTC by Vsevolod Volkov
Modified: 2022-07-13 09:24 UTC (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vsevolod Volkov 2012-02-22 11:00:22 UTC
Multiple sequence of mkdir and rmdir causes mkdir failure with errno 31. Usualy it happens on 32765 iteration.

How-To-Repeat: Compile and execute the following program:

#include <sys/stat.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>

int main (void)
{
  int i;
  char dir[100];
  for (i = 0; i < 50000; i++)
  {
    snprintf (dir, sizeof(dir), "empty_dir/%d", i);
    printf ("%s\n", dir);
    if (mkdir (dir, 0700) == -1)
    {
      printf ("mkdir %s: (errno %d)\n", dir, errno);
      break;
    }
    if (rmdir (dir) == -1)
    {
      printf ("rmdir %s: (errno %d)\n", dir, errno);
      break;
    }
  }
  return 0;
}

gcc -o test1 test1.c
mkdir empty_dir
./test1
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2012-02-22 21:23:15 UTC
Could you please provide full details of the tested filesystem?
-- 
Andriy Gapon
Comment 2 Eitan Adler freebsd_committer freebsd_triage 2012-02-23 05:05:19 UTC
Responsible Changed
From-To: freebsd-bugs->eadler

I'll take it.
Comment 3 Vsevolod Volkov 2012-02-23 07:59:42 UTC
I've tested 2 computers with 9.0-RELEASE (amd64 and i386). Filesystems
are UFS2 with soft updates:

tunefs: POSIX.1e ACLs: (-a)                                disabled
tunefs: NFSv4 ACLs: (-N)                                   disabled
tunefs: MAC multilabel: (-l)                               disabled
tunefs: soft updates: (-n)                                 enabled
tunefs: soft update journaling: (-j)                       disabled
tunefs: gjournal: (-J)                                     disabled
tunefs: trim: (-t)                                         disabled
tunefs: maximum blocks per file in a cylinder group: (-e)  2048
tunefs: average file size: (-f)                            16384
tunefs: average number of files in a directory: (-s)       64
tunefs: minimum percentage of free space: (-m)             8%
tunefs: optimization preference: (-o)                      time
tunefs: volume label: (-L)
Comment 4 Vsevolod Volkov 2012-02-23 08:06:03 UTC
There is no problem with zfs. Test passed with 200000 iterations.
Comment 5 Eitan Adler freebsd_committer freebsd_triage 2012-02-25 15:24:40 UTC
Responsible Changed
From-To: eadler->freebsd-fs

I'm not going to have time to look into this soon enough
Comment 6 Jilles Tjoelker freebsd_committer freebsd_triage 2012-02-25 18:27:02 UTC
> [mkdir fails with [EMLINK], but link count < LINK_MAX]

I can reproduce this problem with UFS with soft updates (with or without
journaling).

A reproduction without C programs is:

cd empty_dir
mkdir `jot 32766 1`     # the last one will fail (correctly)
rmdir 1
mkdir a                 # will erroneously fail

The problem appears to be because the previous rmdir has not yet been
fully completed. It is still holding onto the link count until the
directory is written, which may take up to two minutes.

The same problem can occur with other calls that increase the link count
such as link() and rename().

A workaround is to call fsync() on the directory that contained the
deleted entries. It will then release its hold on the link count and
allow mkdir or other calls. If fsync() is only called when [EMLINK] is
returned, the performance impact should not be very bad, although it
still causes more I/O than necessary.

The book "The Design and Implementation of the FreeBSD Operating System"
contains a detailed description of soft updates in section 8.6 Soft
Updates. The subsection "File Removal Requirements for Soft Updates"
appears particularly relevant to this problem.

A possible solution is to check for the problematic situation
(i_effnlink < LINK_MAX && i_nlink >= LINK_MAX) and if so synchronously
write one or more deleted directory entries that pointed to the inode
with the link count problem. After that, i_nlink should be less than
LINK_MAX and the link count can be checked again (depending on whether
locks need to be dropped to do the write, it may or may not be possible
for another thread to use up the last link first).

For mkdir() and rename(), the directory that contains the deleted
entries is obvious (the directory that will contain the new directory)
while for link() it can (in the general case) only be found in soft
updates data structures. Soft updates must track this because (if the
link count became 0) it will not clear the inode before all directory
entries that pointed to it have been written.

Simply replacing the i_nlink < LINK_MAX check with i_effnlink < LINK_MAX
is unsafe because it will lead to overflow of the 16-bit signed i_nlink
field. If the field is made larger, I don't see how it is prevented that
the code commits such a set of changes that an inode on disk has more
than LINK_MAX links for some time (for example if a file in the new
directory is fsynced while the old directory entries are still on the
disk).

-- 
Jilles Tjoelker
Comment 7 Jaakko Heinonen freebsd_committer freebsd_triage 2013-05-20 20:21:34 UTC
Hi!

>  A workaround is to call fsync() on the directory that contained the
>  deleted entries. It will then release its hold on the link count and
>  allow mkdir or other calls. If fsync() is only called when [EMLINK] is
>  returned, the performance impact should not be very bad, although it
>  still causes more I/O than necessary.

I tried to implement this with the following patch:

	http://people.freebsd.org/~jh/patches/ufs-check_linkcnt.diff

However, VOP_FSYNC(9) with the MNT_WAIT flag seems not to update the
i_nlink count for a reason unknown to me. I can verify that also by
taking your reproduction recipe above and adding "fsync ." between
"rmdir 1" and "mkdir a".

Does this mean that fsync(2) is broken for directories on softdep
enabled UFS?

I have cc'd Kirk in hope he could shed some light on this.

-- 
Jaakko
Comment 8 Jilles Tjoelker freebsd_committer freebsd_triage 2013-05-27 17:53:28 UTC
fsync certainly helps but not as effectively as you'd want. Some
combination of sleeps, fsyncs and mkdir attempts appears to be needed. A
shell loop like
  rmdir 8; fsync .; \
  until mkdir h 2>/dev/null; do printf .; fsync .; sleep 1; done
takes two seconds.

However, in
  rmdir 13; mkdir m; fsync .; \
  until mkdir m 2>/dev/null; do printf .; sleep 1; done
the fsync is of no benefit. It is just as slow as omitting it (about
half a minute).

I must have taken long enough to type/recall the commands when I tried
this earlier. In my earlier experiments I gave the commands separately.

> Does this mean that fsync(2) is broken for directories on softdep
> enabled UFS?

I don't think fsync(2) has to sync the exact link count to disk, since
fsck will take care of that. However, it has to sync the timestamps,
permissions and directory entries.

> I have cc'd Kirk in hope he could shed some light on this.

I'm also interested in whether it is safe to call VOP_FSYNC at that
point, especially in the case of a rename where a lock on the source
directory vnode may be held at the same time.

-- 
Jilles Tjoelker
Comment 9 Jaakko Heinonen freebsd_committer freebsd_triage 2013-05-29 17:53:11 UTC
On 2013-05-27, Jilles Tjoelker wrote:
> > However, VOP_FSYNC(9) with the MNT_WAIT flag seems not to update the
> > i_nlink count for a reason unknown to me. I can verify that also by
> > taking your reproduction recipe above and adding "fsync ." between
> > "rmdir 1" and "mkdir a".
> 
> fsync certainly helps but not as effectively as you'd want. Some
> combination of sleeps, fsyncs and mkdir attempts appears to be needed.

I have revised the patch and the following version _appears_ to work.

	http://people.freebsd.org/~jh/patches/ufs-check_linkcnt.2.diff

It's still experimental and doesn't handle link(2) or rename(2) at all.

In my testing debug.softdep.linkcnt_retries is increased by one with
your original reproduction recipe.

> I'm also interested in whether it is safe to call VOP_FSYNC at that
> point, especially in the case of a rename where a lock on the source
> directory vnode may be held at the same time.

I think your concern is valid because softdep_fsync() needs to lock
parent directories. Possibly you can work around the problem by
unlocking the vnodes, doing fsync and then restarting rename.
Unfortunately this makes rename even more complex.

-- 
Jaakko
Comment 10 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:58:24 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 11 vvv 2022-06-17 17:29:49 UTC
The PR was submitted 10 years ago. But it's actual for 13.1.
Comment 12 Konstantin Belousov freebsd_committer freebsd_triage 2022-06-18 11:07:51 UTC
I do not think it is enough to fsync() only the directory, as jh' patch
cleanly illustrates.  What it does is significant portion of VFS_SYNC()
anyway, so it is cleaner to just do VFS_SYNC() and pay the cost of syncing
data even if not strictly necessary.

OTOH it is not safe to do any of that stuff while then vnode is locked.
We recently got a way to correctly restart this kind of VOPs if they need
to relock the directory, which implies the need to do re-lookup of the entry.

I put the patch at
https://reviews.freebsd.org/D35514
to handle just mkdir() for now.  Other syscalls like link() and rename would
need similar treatment if my patch works.
Comment 13 commit-hook freebsd_committer freebsd_triage 2022-06-22 12:36:47 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8db679af66b023802139d41e275e41a77da1c515

commit 8db679af66b023802139d41e275e41a77da1c515
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2022-06-18 10:59:31 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2022-06-22 12:35:47 +0000

    UFS: make mkdir() and link() reliable when using SU and reaching nlink limit

    i_nlink overflow might be transient, i_effnlink indicates the final
    value of the link count after all dependencies would be resolved. So if
    i_nlink reached the maximum but i_efflink did not, we should be able to
    make the link by syncing.

    We must sync the whole filesystem to resolve dependencies,
    which requires unlocking vnodes locked for VOPs.  Use existing
    ERELOOKUP/VOP_UNLOCK_PAIR() mechanism to restart the VOP if sync with
    unlock was done.

    PR:     165392
    Reported by:    Vsevolod Volkov <vvv@colocall.net>
    Reviewed by:    mckusick
    Tested by:      pho
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D35514

 sys/ufs/ufs/ufs_vnops.c | 42 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)
Comment 14 commit-hook freebsd_committer freebsd_triage 2022-06-24 14:49:06 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=bc6d0d72f4f4e96b24d0ad558b271cb6f483801e

commit bc6d0d72f4f4e96b24d0ad558b271cb6f483801e
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2022-06-22 13:54:01 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2022-06-24 14:46:26 +0000

    UFS rename: make it reliable when using SU and reaching nlink limit

    PR:     165392
    Reviewed by:    mckusick
    Tested by:      pho
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D35577

 sys/ufs/ufs/ufs_vnops.c | 95 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 84 insertions(+), 11 deletions(-)
Comment 15 commit-hook freebsd_committer freebsd_triage 2022-06-25 06:22:35 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=25a1b1f30c85670b6d970df426b97c07d4e0e61d

commit 25a1b1f30c85670b6d970df426b97c07d4e0e61d
Author:     Peter Holm <pho@FreeBSD.org>
AuthorDate: 2022-06-25 06:21:30 +0000
Commit:     Peter Holm <pho@FreeBSD.org>
CommitDate: 2022-06-25 06:21:30 +0000

    stress2: Added reagression tests
    PR:     165392

 tools/test/stress2/misc/nlink.sh (new +x)  |  96 +++++++++++++++++++++++
 tools/test/stress2/misc/nlink2.sh (new +x) |  71 +++++++++++++++++
 tools/test/stress2/misc/nlink3.sh (new +x) | 106 +++++++++++++++++++++++++
 tools/test/stress2/misc/nlink4.sh (new +x) | 105 +++++++++++++++++++++++++
 tools/test/stress2/misc/nlink5.sh (new +x) | 120 +++++++++++++++++++++++++++++
 5 files changed, 498 insertions(+)
Comment 16 commit-hook freebsd_committer freebsd_triage 2022-06-29 09:40:13 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=65d8e97c4c23a0b27d9e0bc73e4150ebbb00b844

commit 65d8e97c4c23a0b27d9e0bc73e4150ebbb00b844
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2022-06-18 10:59:31 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2022-06-29 09:38:26 +0000

    UFS: make mkdir() and link() reliable when using SU and reaching nlink limit

    PR:     165392

    (cherry picked from commit 8db679af66b023802139d41e275e41a77da1c515)

 sys/ufs/ufs/ufs_vnops.c | 42 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 35 insertions(+), 7 deletions(-)
Comment 17 commit-hook freebsd_committer freebsd_triage 2022-07-13 08:49:17 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0aa68d5fbedbd191bd00f07f3ce9dd5ec4679ff6

commit 0aa68d5fbedbd191bd00f07f3ce9dd5ec4679ff6
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2022-06-22 13:54:01 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2022-07-13 08:38:01 +0000

    UFS rename: make it reliable when using SU and reaching nlink limit

    PR:     165392

    (cherry picked from commit bc6d0d72f4f4e96b24d0ad558b271cb6f483801e)

 sys/ufs/ufs/ufs_vnops.c | 95 +++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 84 insertions(+), 11 deletions(-)