Multiple sequence of mkdir and rmdir causes mkdir failure with errno 31. Usualy it happens on 32765 iteration.
How-To-Repeat: Compile and execute the following program:
int main (void)
for (i = 0; i < 50000; i++)
snprintf (dir, sizeof(dir), "empty_dir/%d", i);
printf ("%s\n", dir);
if (mkdir (dir, 0700) == -1)
printf ("mkdir %s: (errno %d)\n", dir, errno);
if (rmdir (dir) == -1)
printf ("rmdir %s: (errno %d)\n", dir, errno);
gcc -o test1 test1.c
Could you please provide full details of the tested filesystem?
I'll take it.
I've tested 2 computers with 9.0-RELEASE (amd64 and i386). Filesystems
are UFS2 with soft updates:
tunefs: POSIX.1e ACLs: (-a) disabled
tunefs: NFSv4 ACLs: (-N) disabled
tunefs: MAC multilabel: (-l) disabled
tunefs: soft updates: (-n) enabled
tunefs: soft update journaling: (-j) disabled
tunefs: gjournal: (-J) disabled
tunefs: trim: (-t) disabled
tunefs: maximum blocks per file in a cylinder group: (-e) 2048
tunefs: average file size: (-f) 16384
tunefs: average number of files in a directory: (-s) 64
tunefs: minimum percentage of free space: (-m) 8%
tunefs: optimization preference: (-o) time
tunefs: volume label: (-L)
There is no problem with zfs. Test passed with 200000 iterations.
I'm not going to have time to look into this soon enough
> [mkdir fails with [EMLINK], but link count < LINK_MAX]
I can reproduce this problem with UFS with soft updates (with or without
A reproduction without C programs is:
mkdir `jot 32766 1` # the last one will fail (correctly)
mkdir a # will erroneously fail
The problem appears to be because the previous rmdir has not yet been
fully completed. It is still holding onto the link count until the
directory is written, which may take up to two minutes.
The same problem can occur with other calls that increase the link count
such as link() and rename().
A workaround is to call fsync() on the directory that contained the
deleted entries. It will then release its hold on the link count and
allow mkdir or other calls. If fsync() is only called when [EMLINK] is
returned, the performance impact should not be very bad, although it
still causes more I/O than necessary.
The book "The Design and Implementation of the FreeBSD Operating System"
contains a detailed description of soft updates in section 8.6 Soft
Updates. The subsection "File Removal Requirements for Soft Updates"
appears particularly relevant to this problem.
A possible solution is to check for the problematic situation
(i_effnlink < LINK_MAX && i_nlink >= LINK_MAX) and if so synchronously
write one or more deleted directory entries that pointed to the inode
with the link count problem. After that, i_nlink should be less than
LINK_MAX and the link count can be checked again (depending on whether
locks need to be dropped to do the write, it may or may not be possible
for another thread to use up the last link first).
For mkdir() and rename(), the directory that contains the deleted
entries is obvious (the directory that will contain the new directory)
while for link() it can (in the general case) only be found in soft
updates data structures. Soft updates must track this because (if the
link count became 0) it will not clear the inode before all directory
entries that pointed to it have been written.
Simply replacing the i_nlink < LINK_MAX check with i_effnlink < LINK_MAX
is unsafe because it will lead to overflow of the 16-bit signed i_nlink
field. If the field is made larger, I don't see how it is prevented that
the code commits such a set of changes that an inode on disk has more
than LINK_MAX links for some time (for example if a file in the new
directory is fsynced while the old directory entries are still on the
> A workaround is to call fsync() on the directory that contained the
> deleted entries. It will then release its hold on the link count and
> allow mkdir or other calls. If fsync() is only called when [EMLINK] is
> returned, the performance impact should not be very bad, although it
> still causes more I/O than necessary.
I tried to implement this with the following patch:
However, VOP_FSYNC(9) with the MNT_WAIT flag seems not to update the
i_nlink count for a reason unknown to me. I can verify that also by
taking your reproduction recipe above and adding "fsync ." between
"rmdir 1" and "mkdir a".
Does this mean that fsync(2) is broken for directories on softdep
I have cc'd Kirk in hope he could shed some light on this.
fsync certainly helps but not as effectively as you'd want. Some
combination of sleeps, fsyncs and mkdir attempts appears to be needed. A
shell loop like
rmdir 8; fsync .; \
until mkdir h 2>/dev/null; do printf .; fsync .; sleep 1; done
takes two seconds.
rmdir 13; mkdir m; fsync .; \
until mkdir m 2>/dev/null; do printf .; sleep 1; done
the fsync is of no benefit. It is just as slow as omitting it (about
half a minute).
I must have taken long enough to type/recall the commands when I tried
this earlier. In my earlier experiments I gave the commands separately.
> Does this mean that fsync(2) is broken for directories on softdep
> enabled UFS?
I don't think fsync(2) has to sync the exact link count to disk, since
fsck will take care of that. However, it has to sync the timestamps,
permissions and directory entries.
> I have cc'd Kirk in hope he could shed some light on this.
I'm also interested in whether it is safe to call VOP_FSYNC at that
point, especially in the case of a rename where a lock on the source
directory vnode may be held at the same time.
On 2013-05-27, Jilles Tjoelker wrote:
> > However, VOP_FSYNC(9) with the MNT_WAIT flag seems not to update the
> > i_nlink count for a reason unknown to me. I can verify that also by
> > taking your reproduction recipe above and adding "fsync ." between
> > "rmdir 1" and "mkdir a".
> fsync certainly helps but not as effectively as you'd want. Some
> combination of sleeps, fsyncs and mkdir attempts appears to be needed.
I have revised the patch and the following version _appears_ to work.
It's still experimental and doesn't handle link(2) or rename(2) at all.
In my testing debug.softdep.linkcnt_retries is increased by one with
your original reproduction recipe.
> I'm also interested in whether it is safe to call VOP_FSYNC at that
> point, especially in the case of a rename where a lock on the source
> directory vnode may be held at the same time.
I think your concern is valid because softdep_fsync() needs to lock
parent directories. Possibly you can work around the problem by
unlocking the vnodes, doing fsync and then restarting rename.
Unfortunately this makes rename even more complex.
For bugs matching the following criteria:
Status: In Progress Changed: (is less than) 2014-06-01
Reset to default assignee and clear in-progress tags.
Mail being skipped