Bug 258022 - [FUSEFS] Inode attributes are cached unnecessarily/for too long
Summary: [FUSEFS] Inode attributes are cached unnecessarily/for too long
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-08-24 11:30 UTC by Agata
Modified: 2021-08-25 13:34 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Agata 2021-08-24 11:30:38 UTC
This is a problem a user of MooseFS reported. Under some circumstances creating a new fs entry (any type: directory, regular file, special file) on MooseFS mount shows a message "Resource temporarily unavailable" and any subsequent operations on this inode (ls -al, rm or rmdir) also show this message. And MooseFS cannot be unmounted, the system shows a message "Device busy". Only a restart of the whole machine helps.

Since this was a bit similar to a problem some versions of Linux kernel had, when a process on one machine deleted a CWD of a process on a different machine, we at first thought it had to do with CWDs only and introduced some safeguards in MooseFS client for FreeBSD. But recent findings show this is much more serious and on FreeBSD side.

A simple test: we take two machines and mount MooseFS on both. 
On FreeBSD machine (13.0-RELEASE-p3) we use these mount options:

mfsmount -o mfsattrcacheto=0 -o mfsxattrcacheto=0 -o mfsentrycacheto=0 -o mfsdirentrycacheto=0 -o mfssymlinkcacheto=0 -o mfsgroupscacheto=0 /mnt/mfs

All the -o options are to disable any attribute caches that may exist (any lookup, access, mkdir etc. operations will return 0 seconds as cache time).

We also have a second machine with the same MooseFS instance. Operating system on the second machine is irrelevant.

Then we perform these steps, exactly in the order shown below:

***FreeBSD machine***
~# cd /mnt/mfs/testdir
/mnt/mfs/testdir# ls -al
total 2932
drwxr-xr-x   2 root  wheel        1 Aug 23 12:41 .
drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
/mnt/mfs/testdir#
******
(FreeBSD "sees" that "testdir" is empty)

***OTHER machine***
~# cd /mnt/mfs/testdir
/mnt/mfs/testdir# mkdir dir
/mnt/mfs/testdir#
******
(Other machine creates a directory named "dir" inside "testdir")

***FreeBSD machine***
/mnt/mfs/testdir# ls -al
total 2933
drwxr-xr-x   3 root  wheel        1 Aug 23 12:59 .
drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
drwxr-xr-x   2 root  wheel        1 Aug 23 12:59 dir
/mnt/mfs/testdir#
******
(FreeBSD "sees" that there is now "dir" inside "testdir")

***OTHER machine***
/mnt/mfs/testdir# ls -ali
total 2933
8 drwxr-xr-x   3 root  wheel        1 Aug 23 12:59 .
1 drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
9 drwxr-xr-x   2 root  wheel        1 Aug 23 12:59 dir
/mnt/mfs/testdir# rmdir dir
/mnt/mfs/testdir# ls -al
total 2932
drwxr-xr-x   2 root  wheel        1 Aug 23 13:00 .
drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
/mnt/mfs/testdir#
******
(We check the inode number of "dir" on the other machine and delete "dir")

***FreeBSD machine***
/mnt/mfs/testdir# ls -al
total 2932
drwxr-xr-x   2 root  wheel        1 Aug 23 13:00 .
drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
/mnt/mfs/testdir#
******
(FreeBSD "sees" again, that "testdir" is empty)

Now we wait for at least 5 minutes, the timing will be explained below.

***FreeBSD machine***
/mnt/mfs/testdir# echo "foo" > file.txt
-bash: file.txt: Resource temporarily unavailable
/mnt/mfs/testdir# ls -al
ls: file.txt: Resource temporarily unavailable
total 2932
drwxr-xr-x   2 root  wheel        1 Aug 23 13:17 .
drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
/mnt/mfs/testdir#
******
Ooops?!

***OTHER machine***
/mnt/mfs/testdir# ls -ali
total 2932
8 drwxr-xr-x   2 root  wheel        1 Aug 23 13:17 .
1 drwxrwxrwx  43 root  wheel  3001433 Aug 23 12:28 ..
9 -rw-r--r--   1 root  wheel        0 Aug 23 13:17 file.txt
/mnt/mfs/testdir#
******
The newly created file got the same inode number as the recently deleted directory "dir"...

Notes:
1) The effect is not exclusive to former directory inode numbers becoming file inode numbers. It happens whenever the new object is of a different type than the old one (so ex-directory inode number becomes re-used as file, ex-file as fifo, ex-fifo as a device or directory etc.). The "ls -al" scenario is not the only one, the same will happen if objects are created on FreeBSD machine and then deleted from another machine, which is of course a normal occurrence in a network file system.
2) Default inode reuse time in MooseFS is 24 hours. It was set to 5 minutes for testing purposes only. The person, that reported the problem first (there were others after), used the default 24 hours. And only inodes that are truly "free" are reused, that means: no CWDs (active on any MooseFS client connected to the instance), no sustained (deleted but still open) files are reused. The 24 hour delay is counted from the moment they are considered free, so if a file is in a sustained state for, let's say, 24 hours after deletion (and then whatever process had a hold on it finally finishes), its inode number is still not reused for another 24 hours.
3) Default cache times in MooseFS: file attributes cache timeout - 1 second, extended attributes (xattr) cache timeout - 30 seconds, directory entry cache timeout - 1 second, negative entry cache timeout - 0 seconds (default no negative cache), symbolic link cache timeout - 300 seconds, supplementary groups cache timeout - 300 seconds
4) Caches in the above experiment were ALL set to 0.
5) The problem was first reported on FreeBSD 12.1.

So, to sum it up: we say "don't cache anything at all/longer than 300 seconds", FreeBSD caches indode attributes (we don't know, which ones, but at least type) for longer than 24 hours and it causes a serious problem, because a new inode with reused inode number is basically unusable in the file system.
Comment 1 Alan Somers freebsd_committer 2021-08-24 13:44:51 UTC
I'll look into this when I have some time.  But could you please tell me what FUSE protocol version MooseFS is mounting with?  Earlier protocol versions had very limited ability to specify cache retention times.
Comment 2 Agata 2021-08-25 08:19:07 UTC
This is directly from the test machine I did this particular test on:

compiled_with_fuse: 3.2
kernel_fuse_protocol: 7.28

I don't know about our users, but I assume they mount with the latest they get with the system.
Comment 3 jSML4ThWwBID69YC 2021-08-25 13:34:36 UTC
As a user, here's my settings. 

compiled_with_fuse: 31.0
kernel_fuse_protocol: 7.28
PKG: fusefs-libs3-3.10.4
PKG: moosefs3-client-3.0.116