This is a problem a user of MooseFS reported. Under some circumstances creating a new fs entry (any type: directory, regular file, special file) on MooseFS mount shows a message "Resource temporarily unavailable" and any subsequent operations on this inode (ls -al, rm or rmdir) also show this message. And MooseFS cannot be unmounted, the system shows a message "Device busy". Only a restart of the whole machine helps. Since this was a bit similar to a problem some versions of Linux kernel had, when a process on one machine deleted a CWD of a process on a different machine, we at first thought it had to do with CWDs only and introduced some safeguards in MooseFS client for FreeBSD. But recent findings show this is much more serious and on FreeBSD side. A simple test: we take two machines and mount MooseFS on both. On FreeBSD machine (13.0-RELEASE-p3) we use these mount options: mfsmount -o mfsattrcacheto=0 -o mfsxattrcacheto=0 -o mfsentrycacheto=0 -o mfsdirentrycacheto=0 -o mfssymlinkcacheto=0 -o mfsgroupscacheto=0 /mnt/mfs All the -o options are to disable any attribute caches that may exist (any lookup, access, mkdir etc. operations will return 0 seconds as cache time). We also have a second machine with the same MooseFS instance. Operating system on the second machine is irrelevant. Then we perform these steps, exactly in the order shown below: ***FreeBSD machine*** ~# cd /mnt/mfs/testdir /mnt/mfs/testdir# ls -al total 2932 drwxr-xr-x 2 root wheel 1 Aug 23 12:41 . drwxrwxrwx 43 root wheel 3001433 Aug 23 12:28 .. /mnt/mfs/testdir# ****** (FreeBSD "sees" that "testdir" is empty) ***OTHER machine*** ~# cd /mnt/mfs/testdir /mnt/mfs/testdir# mkdir dir /mnt/mfs/testdir# ****** (Other machine creates a directory named "dir" inside "testdir") ***FreeBSD machine*** /mnt/mfs/testdir# ls -al total 2933 drwxr-xr-x 3 root wheel 1 Aug 23 12:59 . drwxrwxrwx 43 root wheel 3001433 Aug 23 12:28 .. drwxr-xr-x 2 root wheel 1 Aug 23 12:59 dir /mnt/mfs/testdir# ****** (FreeBSD "sees" that there is now "dir" inside "testdir") ***OTHER machine*** /mnt/mfs/testdir# ls -ali total 2933 8 drwxr-xr-x 3 root wheel 1 Aug 23 12:59 . 1 drwxrwxrwx 43 root wheel 3001433 Aug 23 12:28 .. 9 drwxr-xr-x 2 root wheel 1 Aug 23 12:59 dir /mnt/mfs/testdir# rmdir dir /mnt/mfs/testdir# ls -al total 2932 drwxr-xr-x 2 root wheel 1 Aug 23 13:00 . drwxrwxrwx 43 root wheel 3001433 Aug 23 12:28 .. /mnt/mfs/testdir# ****** (We check the inode number of "dir" on the other machine and delete "dir") ***FreeBSD machine*** /mnt/mfs/testdir# ls -al total 2932 drwxr-xr-x 2 root wheel 1 Aug 23 13:00 . drwxrwxrwx 43 root wheel 3001433 Aug 23 12:28 .. /mnt/mfs/testdir# ****** (FreeBSD "sees" again, that "testdir" is empty) Now we wait for at least 5 minutes, the timing will be explained below. ***FreeBSD machine*** /mnt/mfs/testdir# echo "foo" > file.txt -bash: file.txt: Resource temporarily unavailable /mnt/mfs/testdir# ls -al ls: file.txt: Resource temporarily unavailable total 2932 drwxr-xr-x 2 root wheel 1 Aug 23 13:17 . drwxrwxrwx 43 root wheel 3001433 Aug 23 12:28 .. /mnt/mfs/testdir# ****** Ooops?! ***OTHER machine*** /mnt/mfs/testdir# ls -ali total 2932 8 drwxr-xr-x 2 root wheel 1 Aug 23 13:17 . 1 drwxrwxrwx 43 root wheel 3001433 Aug 23 12:28 .. 9 -rw-r--r-- 1 root wheel 0 Aug 23 13:17 file.txt /mnt/mfs/testdir# ****** The newly created file got the same inode number as the recently deleted directory "dir"... Notes: 1) The effect is not exclusive to former directory inode numbers becoming file inode numbers. It happens whenever the new object is of a different type than the old one (so ex-directory inode number becomes re-used as file, ex-file as fifo, ex-fifo as a device or directory etc.). The "ls -al" scenario is not the only one, the same will happen if objects are created on FreeBSD machine and then deleted from another machine, which is of course a normal occurrence in a network file system. 2) Default inode reuse time in MooseFS is 24 hours. It was set to 5 minutes for testing purposes only. The person, that reported the problem first (there were others after), used the default 24 hours. And only inodes that are truly "free" are reused, that means: no CWDs (active on any MooseFS client connected to the instance), no sustained (deleted but still open) files are reused. The 24 hour delay is counted from the moment they are considered free, so if a file is in a sustained state for, let's say, 24 hours after deletion (and then whatever process had a hold on it finally finishes), its inode number is still not reused for another 24 hours. 3) Default cache times in MooseFS: file attributes cache timeout - 1 second, extended attributes (xattr) cache timeout - 30 seconds, directory entry cache timeout - 1 second, negative entry cache timeout - 0 seconds (default no negative cache), symbolic link cache timeout - 300 seconds, supplementary groups cache timeout - 300 seconds 4) Caches in the above experiment were ALL set to 0. 5) The problem was first reported on FreeBSD 12.1. So, to sum it up: we say "don't cache anything at all/longer than 300 seconds", FreeBSD caches indode attributes (we don't know, which ones, but at least type) for longer than 24 hours and it causes a serious problem, because a new inode with reused inode number is basically unusable in the file system.
I'll look into this when I have some time. But could you please tell me what FUSE protocol version MooseFS is mounting with? Earlier protocol versions had very limited ability to specify cache retention times.
This is directly from the test machine I did this particular test on: compiled_with_fuse: 3.2 kernel_fuse_protocol: 7.28 I don't know about our users, but I assume they mount with the latest they get with the system.
As a user, here's my settings. compiled_with_fuse: 31.0 kernel_fuse_protocol: 7.28 PKG: fusefs-libs3-3.10.4 PKG: moosefs3-client-3.0.116
Patch in review. Note that this bug actually didn't have anything to do with inode attributes. The file type isn't considered an attribute, because it must remain the same throughout a file's lifetime. So the entry cache is more relevant than the attribute cache. But as it turns out, the best way to handle this situation is the same regardless of whether the entry cache has expired or not.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=25927e068fcbcac0a5111a881de723bd984b04b3 commit 25927e068fcbcac0a5111a881de723bd984b04b3 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-12-06 05:43:17 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2021-12-07 04:36:46 +0000 fusefs: correctly handle an inode that changes file types Correctly handle the situation where a FUSE server unlinks a file, then creates a new file of a different type but with the same inode number. Previously fuse_vnop_lookup in this situation would return EAGAIN. But since it didn't call vgone(), the vnode couldn't be reused right away. Fix this by immediately calling vgone() and reallocating a new vnode. This problem can occur in three code paths, during VOP_LOOKUP, VOP_SETATTR, or following FUSE_GETATTR, which usually happens during VOP_GETATTR but can occur during other vops, too. Note that the correct response actually doesn't depend on whether the entry cache has expired. In fact, during VOP_LOOKUP, we can't even tell. Either it has expired already, or else the vnode got reclaimed by vnlru. Also, correct the error code during the VOP_SETATTR path. PR: 258022 Reported by: chogata@moosefs.pro MFC after: 2 weeks Reviewed by: pfg Differential Revision: https://reviews.freebsd.org/D33283 sys/fs/fuse/fuse_internal.c | 9 +++++--- sys/fs/fuse/fuse_node.c | 25 +++++++++++---------- tests/sys/fs/fusefs/getattr.cc | 50 ++++++++++++++++++++++++++++++++++++++++++ tests/sys/fs/fusefs/lookup.cc | 32 +++++++++++++++++++++------ tests/sys/fs/fusefs/setattr.cc | 47 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 142 insertions(+), 21 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=139764c4613cde14c97ff8dd8007eb0f27f5fb9f commit 139764c4613cde14c97ff8dd8007eb0f27f5fb9f Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-12-06 05:43:17 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2022-01-03 02:36:38 +0000 fusefs: correctly handle an inode that changes file types Correctly handle the situation where a FUSE server unlinks a file, then creates a new file of a different type but with the same inode number. Previously fuse_vnop_lookup in this situation would return EAGAIN. But since it didn't call vgone(), the vnode couldn't be reused right away. Fix this by immediately calling vgone() and reallocating a new vnode. This problem can occur in three code paths, during VOP_LOOKUP, VOP_SETATTR, or following FUSE_GETATTR, which usually happens during VOP_GETATTR but can occur during other vops, too. Note that the correct response actually doesn't depend on whether the entry cache has expired. In fact, during VOP_LOOKUP, we can't even tell. Either it has expired already, or else the vnode got reclaimed by vnlru. Also, correct the error code during the VOP_SETATTR path. PR: 258022 Reported by: chogata@moosefs.pro Reviewed by: pfg Differential Revision: https://reviews.freebsd.org/D33283 (cherry picked from commit 25927e068fcbcac0a5111a881de723bd984b04b3) sys/fs/fuse/fuse_internal.c | 9 +++++--- sys/fs/fuse/fuse_node.c | 25 +++++++++++---------- tests/sys/fs/fusefs/getattr.cc | 50 ++++++++++++++++++++++++++++++++++++++++++ tests/sys/fs/fusefs/lookup.cc | 32 +++++++++++++++++++++------ tests/sys/fs/fusefs/setattr.cc | 47 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 142 insertions(+), 21 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f09df4b05cf4b9f065e3db642666355a95c036e4 commit f09df4b05cf4b9f065e3db642666355a95c036e4 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-12-06 05:43:17 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2022-01-03 05:15:38 +0000 fusefs: correctly handle an inode that changes file types Correctly handle the situation where a FUSE server unlinks a file, then creates a new file of a different type but with the same inode number. Previously fuse_vnop_lookup in this situation would return EAGAIN. But since it didn't call vgone(), the vnode couldn't be reused right away. Fix this by immediately calling vgone() and reallocating a new vnode. This problem can occur in three code paths, during VOP_LOOKUP, VOP_SETATTR, or following FUSE_GETATTR, which usually happens during VOP_GETATTR but can occur during other vops, too. Note that the correct response actually doesn't depend on whether the entry cache has expired. In fact, during VOP_LOOKUP, we can't even tell. Either it has expired already, or else the vnode got reclaimed by vnlru. Also, correct the error code during the VOP_SETATTR path. PR: 258022 Reported by: chogata@moosefs.pro Reviewed by: pfg Differential Revision: https://reviews.freebsd.org/D33283 (cherry picked from commit 25927e068fcbcac0a5111a881de723bd984b04b3) sys/fs/fuse/fuse_internal.c | 9 +++++--- sys/fs/fuse/fuse_node.c | 25 +++++++++++---------- tests/sys/fs/fusefs/getattr.cc | 50 ++++++++++++++++++++++++++++++++++++++++++ tests/sys/fs/fusefs/lookup.cc | 32 +++++++++++++++++++++------ tests/sys/fs/fusefs/setattr.cc | 47 +++++++++++++++++++++++++++++++++++++++ 5 files changed, 142 insertions(+), 21 deletions(-)