Multiple sequence of mkdir and rmdir causes mkdir failure with errno 31. Usualy it happens on 32765 iteration. How-To-Repeat: Compile and execute the following program: #include <sys/stat.h> #include <unistd.h> #include <stdio.h> #include <errno.h> int main (void) { int i; char dir[100]; for (i = 0; i < 50000; i++) { snprintf (dir, sizeof(dir), "empty_dir/%d", i); printf ("%s\n", dir); if (mkdir (dir, 0700) == -1) { printf ("mkdir %s: (errno %d)\n", dir, errno); break; } if (rmdir (dir) == -1) { printf ("rmdir %s: (errno %d)\n", dir, errno); break; } } return 0; } gcc -o test1 test1.c mkdir empty_dir ./test1
Could you please provide full details of the tested filesystem? -- Andriy Gapon
Responsible Changed From-To: freebsd-bugs->eadler I'll take it.
I've tested 2 computers with 9.0-RELEASE (amd64 and i386). Filesystems are UFS2 with soft updates: tunefs: POSIX.1e ACLs: (-a) disabled tunefs: NFSv4 ACLs: (-N) disabled tunefs: MAC multilabel: (-l) disabled tunefs: soft updates: (-n) enabled tunefs: soft update journaling: (-j) disabled tunefs: gjournal: (-J) disabled tunefs: trim: (-t) disabled tunefs: maximum blocks per file in a cylinder group: (-e) 2048 tunefs: average file size: (-f) 16384 tunefs: average number of files in a directory: (-s) 64 tunefs: minimum percentage of free space: (-m) 8% tunefs: optimization preference: (-o) time tunefs: volume label: (-L)
There is no problem with zfs. Test passed with 200000 iterations.
Responsible Changed From-To: eadler->freebsd-fs I'm not going to have time to look into this soon enough
> [mkdir fails with [EMLINK], but link count < LINK_MAX] I can reproduce this problem with UFS with soft updates (with or without journaling). A reproduction without C programs is: cd empty_dir mkdir `jot 32766 1` # the last one will fail (correctly) rmdir 1 mkdir a # will erroneously fail The problem appears to be because the previous rmdir has not yet been fully completed. It is still holding onto the link count until the directory is written, which may take up to two minutes. The same problem can occur with other calls that increase the link count such as link() and rename(). A workaround is to call fsync() on the directory that contained the deleted entries. It will then release its hold on the link count and allow mkdir or other calls. If fsync() is only called when [EMLINK] is returned, the performance impact should not be very bad, although it still causes more I/O than necessary. The book "The Design and Implementation of the FreeBSD Operating System" contains a detailed description of soft updates in section 8.6 Soft Updates. The subsection "File Removal Requirements for Soft Updates" appears particularly relevant to this problem. A possible solution is to check for the problematic situation (i_effnlink < LINK_MAX && i_nlink >= LINK_MAX) and if so synchronously write one or more deleted directory entries that pointed to the inode with the link count problem. After that, i_nlink should be less than LINK_MAX and the link count can be checked again (depending on whether locks need to be dropped to do the write, it may or may not be possible for another thread to use up the last link first). For mkdir() and rename(), the directory that contains the deleted entries is obvious (the directory that will contain the new directory) while for link() it can (in the general case) only be found in soft updates data structures. Soft updates must track this because (if the link count became 0) it will not clear the inode before all directory entries that pointed to it have been written. Simply replacing the i_nlink < LINK_MAX check with i_effnlink < LINK_MAX is unsafe because it will lead to overflow of the 16-bit signed i_nlink field. If the field is made larger, I don't see how it is prevented that the code commits such a set of changes that an inode on disk has more than LINK_MAX links for some time (for example if a file in the new directory is fsynced while the old directory entries are still on the disk). -- Jilles Tjoelker
Hi! > A workaround is to call fsync() on the directory that contained the > deleted entries. It will then release its hold on the link count and > allow mkdir or other calls. If fsync() is only called when [EMLINK] is > returned, the performance impact should not be very bad, although it > still causes more I/O than necessary. I tried to implement this with the following patch: http://people.freebsd.org/~jh/patches/ufs-check_linkcnt.diff However, VOP_FSYNC(9) with the MNT_WAIT flag seems not to update the i_nlink count for a reason unknown to me. I can verify that also by taking your reproduction recipe above and adding "fsync ." between "rmdir 1" and "mkdir a". Does this mean that fsync(2) is broken for directories on softdep enabled UFS? I have cc'd Kirk in hope he could shed some light on this. -- Jaakko
fsync certainly helps but not as effectively as you'd want. Some combination of sleeps, fsyncs and mkdir attempts appears to be needed. A shell loop like rmdir 8; fsync .; \ until mkdir h 2>/dev/null; do printf .; fsync .; sleep 1; done takes two seconds. However, in rmdir 13; mkdir m; fsync .; \ until mkdir m 2>/dev/null; do printf .; sleep 1; done the fsync is of no benefit. It is just as slow as omitting it (about half a minute). I must have taken long enough to type/recall the commands when I tried this earlier. In my earlier experiments I gave the commands separately. > Does this mean that fsync(2) is broken for directories on softdep > enabled UFS? I don't think fsync(2) has to sync the exact link count to disk, since fsck will take care of that. However, it has to sync the timestamps, permissions and directory entries. > I have cc'd Kirk in hope he could shed some light on this. I'm also interested in whether it is safe to call VOP_FSYNC at that point, especially in the case of a rename where a lock on the source directory vnode may be held at the same time. -- Jilles Tjoelker
On 2013-05-27, Jilles Tjoelker wrote: > > However, VOP_FSYNC(9) with the MNT_WAIT flag seems not to update the > > i_nlink count for a reason unknown to me. I can verify that also by > > taking your reproduction recipe above and adding "fsync ." between > > "rmdir 1" and "mkdir a". > > fsync certainly helps but not as effectively as you'd want. Some > combination of sleeps, fsyncs and mkdir attempts appears to be needed. I have revised the patch and the following version _appears_ to work. http://people.freebsd.org/~jh/patches/ufs-check_linkcnt.2.diff It's still experimental and doesn't handle link(2) or rename(2) at all. In my testing debug.softdep.linkcnt_retries is increased by one with your original reproduction recipe. > I'm also interested in whether it is safe to call VOP_FSYNC at that > point, especially in the case of a rename where a lock on the source > directory vnode may be held at the same time. I think your concern is valid because softdep_fsync() needs to lock parent directories. Possibly you can work around the problem by unlocking the vnodes, doing fsync and then restarting rename. Unfortunately this makes rename even more complex. -- Jaakko
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped
The PR was submitted 10 years ago. But it's actual for 13.1.
I do not think it is enough to fsync() only the directory, as jh' patch cleanly illustrates. What it does is significant portion of VFS_SYNC() anyway, so it is cleaner to just do VFS_SYNC() and pay the cost of syncing data even if not strictly necessary. OTOH it is not safe to do any of that stuff while then vnode is locked. We recently got a way to correctly restart this kind of VOPs if they need to relock the directory, which implies the need to do re-lookup of the entry. I put the patch at https://reviews.freebsd.org/D35514 to handle just mkdir() for now. Other syscalls like link() and rename would need similar treatment if my patch works.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=8db679af66b023802139d41e275e41a77da1c515 commit 8db679af66b023802139d41e275e41a77da1c515 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2022-06-18 10:59:31 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2022-06-22 12:35:47 +0000 UFS: make mkdir() and link() reliable when using SU and reaching nlink limit i_nlink overflow might be transient, i_effnlink indicates the final value of the link count after all dependencies would be resolved. So if i_nlink reached the maximum but i_efflink did not, we should be able to make the link by syncing. We must sync the whole filesystem to resolve dependencies, which requires unlocking vnodes locked for VOPs. Use existing ERELOOKUP/VOP_UNLOCK_PAIR() mechanism to restart the VOP if sync with unlock was done. PR: 165392 Reported by: Vsevolod Volkov <vvv@colocall.net> Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35514 sys/ufs/ufs/ufs_vnops.c | 42 +++++++++++++++++++++++++++++++++++------- 1 file changed, 35 insertions(+), 7 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=bc6d0d72f4f4e96b24d0ad558b271cb6f483801e commit bc6d0d72f4f4e96b24d0ad558b271cb6f483801e Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2022-06-22 13:54:01 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2022-06-24 14:46:26 +0000 UFS rename: make it reliable when using SU and reaching nlink limit PR: 165392 Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35577 sys/ufs/ufs/ufs_vnops.c | 95 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 84 insertions(+), 11 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=25a1b1f30c85670b6d970df426b97c07d4e0e61d commit 25a1b1f30c85670b6d970df426b97c07d4e0e61d Author: Peter Holm <pho@FreeBSD.org> AuthorDate: 2022-06-25 06:21:30 +0000 Commit: Peter Holm <pho@FreeBSD.org> CommitDate: 2022-06-25 06:21:30 +0000 stress2: Added reagression tests PR: 165392 tools/test/stress2/misc/nlink.sh (new +x) | 96 +++++++++++++++++++++++ tools/test/stress2/misc/nlink2.sh (new +x) | 71 +++++++++++++++++ tools/test/stress2/misc/nlink3.sh (new +x) | 106 +++++++++++++++++++++++++ tools/test/stress2/misc/nlink4.sh (new +x) | 105 +++++++++++++++++++++++++ tools/test/stress2/misc/nlink5.sh (new +x) | 120 +++++++++++++++++++++++++++++ 5 files changed, 498 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=65d8e97c4c23a0b27d9e0bc73e4150ebbb00b844 commit 65d8e97c4c23a0b27d9e0bc73e4150ebbb00b844 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2022-06-18 10:59:31 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2022-06-29 09:38:26 +0000 UFS: make mkdir() and link() reliable when using SU and reaching nlink limit PR: 165392 (cherry picked from commit 8db679af66b023802139d41e275e41a77da1c515) sys/ufs/ufs/ufs_vnops.c | 42 +++++++++++++++++++++++++++++++++++------- 1 file changed, 35 insertions(+), 7 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=0aa68d5fbedbd191bd00f07f3ce9dd5ec4679ff6 commit 0aa68d5fbedbd191bd00f07f3ce9dd5ec4679ff6 Author: Konstantin Belousov <kib@FreeBSD.org> AuthorDate: 2022-06-22 13:54:01 +0000 Commit: Konstantin Belousov <kib@FreeBSD.org> CommitDate: 2022-07-13 08:38:01 +0000 UFS rename: make it reliable when using SU and reaching nlink limit PR: 165392 (cherry picked from commit bc6d0d72f4f4e96b24d0ad558b271cb6f483801e) sys/ufs/ufs/ufs_vnops.c | 95 +++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 84 insertions(+), 11 deletions(-)