If a FUSE daemon responds to a FUSE_OPEN request with the FOPEN_DIRECT_IO flag, then it wants the kernel to bypass the cache for that file. And fusefs(4) will honor that. However, the fuse protocol allows a file to be opened multiple times, possibly with different FOPEN_ flags. In that case, the kernel should consider that those FOPEN_ flags reply to one file descriptor only. So whether we bypass the cache should depend on what FUSE_OPEN returned for that particular file descriptor. It's a dumb feature, IMHO, but that's the way it is. The problem is that our current code interprets FOPEN_DIRECT_IO by setting the FN_DIRECTIO flag within the vnode's private data. But there can be multiple file descriptors per vnode. So we can't possibly respect that flag as the protocol intends. To solve this problem, we should move the FN_DIRECTIO flag from the private vnode data to the fuse_filehandle structure. For the record, this behavior was added by 2015, by commit ead063e0a245b902fbda1a9ced2f3b963afd39ed (svn r279536), and gluster relies on its correct implementation.
(In reply to Alan Somers from comment #0) can we just remove FN_DIRECTIO flag and set the io_flag to IO_DIRECT after checking if the fuse_open_flag was FOPEN_DIRECT_IO
(In reply to Raghav from comment #1) Yes, that's basically what I plan to do.
(In reply to Alan Somers from comment #2) I can implement this change if it's okay with you
(In reply to Raghav from comment #3) I certainly won't say no to that. If you post it, I'll review it.
Created attachment 268026 [details] move direct I/O flag from vnode to filehandle 1. fuse_file.c: removed all the code where vnode flag was getting set/unset to FN_DIRECTIO 2. fuse_vnops.c: replaced all the checks of VTOFUD(vp)->flag with checks of fufh->fuse_open_flags & FOPEN_DIRECTIO. After which I guess fuse_vnop_write and fuse_vnop_read now correctly set IO_DIRECT based on the filehandle not the vnode.
just following up on this patch. No rush just checking if you had a chance to review it yet.
(In reply to raghav narayan from comment #6) This change is exactly what I had in mind, raghav. The only thing that it's missing is a test case. Would you like to try writing some? I suggest one test case for reads and one for writes. They belong in tests/sys/fs/fusefs/read.cc and tests/sys/fs/fusefs/write.cc respectively. Each test should open two file handles for the same file. The daemon should return FOPEN_DIRECT_IO for one, but not for the other. And then the test should verify that the cache gets bypassed for the first file descriptor, but not for the other. Note that opening two file handles is trickier than it sounds, because the kernel will reuse file handles when possible. So it's not as simple as just opening two file descriptors. The easiest way is to open two file descriptors using different PIDs, GIDs or UIDs.
(In reply to Alan Somers from comment #7) yeah, sure. I will try writing these tests and will get back to you when I have something.
Hello, I hope you're doing well. I tried writing the test for reads. Do we have to do the reads after we open both the fds? expect_open(ino, FOPEN_DIRECT_IO, 1); fd_direct = open(FULLPATH, O_RDONLY); ASSERT_LE(0, fd_direct) << strerror(errno); expect_open(ino, 0, 1); ASSERT_EQ(0, seteuid(65534)) << strerror(errno); fd_cached = open(FULLPATH, O_RDONLY); ASSERT_EQ(0, seteuid(0)) << strerror(errno); ASSERT_EQ(0, seteuid(0)) << strerror(errno); ASSERT_LE(0, fd_cached) << strerror(errno); and then the reads 2 with direct_io fd and 2 with no flag fd EXPECTED behaviour: read1 using fd_direct bypasses cache read2 using fd_direct bypasses cache read3 using fd_cached bypasses cache read4 using fd_cached hits cache but with the patch I wrote this is the behaviour: read1 using fd_direct bypasses cache read2 using fd_direct hits cache read3 using fd_cached hits cache read4 using fd_cached hits cache It passes the tests when I when I open one fd with directio and do 2 reads with directio and then open fd with no flags and do 2 reads with cached(1st read pouplates cache second hits). From the first test I can make out that the fix is incomplete. It would be very helpful if you could tell me where should I look. Thank you
(In reply to raghav narayan from comment #9) That looks like a good start, raghav. I can take a closer look on Monday. But I have two questions now: why do you do seteuid(0) once, let alone twice? Could you please upload the patch as an attachment, a correctly formatted patch file?
(In reply to Alan Somers from comment #10) >Could you please upload the patch as an attachment, a correctly formatted patch file? Yes surely. >why do you do seteuid(0) once, let alone twice? Uhh the second time was a mistake but I did it once to continue with the privelage that we started with.
Created attachment 268811 [details] test for per-filehandle-directio-reads
I just reviewed your test case: * You should use consistent white space: indent with hard tabs, don't put spaces before the tabs, and indent the comments too. * Link to this PR in the comment block above the test case name * The seteuid lines are going to fail when executed by a non-root user. The best solution, IMHO, is to create a separate test class named something like ReadPrivileged and in its setup block, skip the test if it isn't being run by root. See forget.cc for an example. * You shouldn't hardcode uid 65534 . If this were a atf-c test, the correct way to get that UID would be to call atf_tc_get_config_var(tc, "unprivileged_user") . Unfortunately, there isn't such an API for googletest. We should probably add one. But until we do, you can use get_unprivileged_id from utils.cc . * It's risky to switch euid in the middle of a test case and then try to switch it back. If the test case fails before you can switch back, then every other test case in that test program will fail, too. Best to fork a child process and only allow the child to drop privileges. See allow_other.cc for an example of how to do that. Otherwise, the test is pretty much exactly what I had in mind. Good job.
(In reply to Alan Somers from comment #13) Hi Alan, Thank you for the review. I have now created a new class ReadPrivileged which skips the test if the user is not root. this time I am using fork(), which internally calls get_unprivileged_id, and everything works fine. But now there is also a FUSE_SETATTR request, is this expected?
Created attachment 269262 [details] test for per-filehandle-directio-reads created ReadPrivileged test class that skips if not root. test forks a child process and sets up expectations for including FUSE_SETATTR in the parent and then opens a new fd in the child and performs 2 reads for directio and 2 for cached. Test fails with 2 FUSE_READ expectations unsatisfied.
Created attachment 269263 [details] test for per-filehandle-directio-reads
Created attachment 269283 [details] test for per-filehandle-directio-reads
This looks much better. But I still have a few more * The latest patch still has a lot of mixed spaces and tabs that you need to clean up. * Please combine the test and the kernel change into a single patch file * The SETATTR is simply updating the file's atime. I suggest you suppress that by setting m_noatime in your test's SetUp method, since it isn't important to this test case. * On failure, the error message would be a little more clear if you only set one expectation for FUSE_READ, but used .Times(3) . To do that, you would need to bypass the expect_read helper in utils.cc and manually set the expectation with EXPECT_CALL... . * Since we're actually sending multiple OPENs to the daemon, it would be nice to also validate that we track the fuse file handle correctly. To do that, you could add a "fh" argument to expect_open in utils.hh, with a default value of FH.
Created attachment 269360 [details] fuse-per-fh-direct-io test + fix Hi Alan, I have added fh argument to expect_open in utils.hh. In this patch Im using two separate EXPECT_CALL blocks for FUSE_READ, one expecting fh_direct and other fh_cached. But now I am seeing unexpected behavior: the EXPECT_CALL for read with fh_direct never gets fulfilled even though pread( with fd_direct) is called twice in the child before any reads on fd_cached. Instead the fh_cached expectation gets satisfied.
Created attachment 269361 [details] fuse-per-fh-direct-io test + fix
I'm still seeing mixed tabs and spaces in the latest patch. Please indent consistently with hard tabs, 8 spaces per tab. Also, an extra level of indentation in the fork() call would make the parent func and child func easier to distinguish. See how it's done in allow_other.cc for example. The reason that the test isn't behaving as expected is because all four reads come from the child process. So fuse_vnop_read chooses the same fuse filehandle for both. In order to ensure that both fuse filehandles get used, some reads should come from the parent process and some from the child. Also, pro tip: the tests will be easier to debug if you add this small diff: diff --git a/tests/sys/fs/fusefs/mockfs.cc b/tests/sys/fs/fusefs/mockfs.cc index b9e4afbbea0b..fb19ec7561d7 100644 --- a/tests/sys/fs/fusefs/mockfs.cc +++ b/tests/sys/fs/fusefs/mockfs.cc @@ -296,7 +296,8 @@ void MockFS::debug_request(const mockfs_buf_in &in, ssize_t buflen) printf(" flags=%#x", in.body.opendir.flags); break; case FUSE_READ: - printf(" offset=%" PRIu64 " size=%u", + printf(" fh=%#" PRIx64 " offset=%" PRIu64 " size=%u", + in.body.read.fh, in.body.read.offset, in.body.read.size); if (verbosity > 1) And if you invoke the test like this: mkdir mountpoint; ./read --gtest_filter=ReadPrivileged.direct_io_per_filehandle -v
Created attachment 269513 [details] fuse-per-fh-direct-io test + fix
Created attachment 269514 [details] fuse-per-fh-direct-io test + fix
(In reply to Alan Somers from comment #21) >I'm still seeing mixed tabs and spaces in the latest patch. Please indent >consistently with hard tabs, 8 spaces per tab. Also, an extra level of >indentation in the fork() call would make the parent func and child func easier >to distinguish. See how it's done in allow_other.cc for example. I think this patch is now consistent with tabs and the parent and child funcs are distinguishable. >The reason that the test isn't behaving as expected is because all four reads >come from the child process. So fuse_vnop_read chooses the same fuse >filehandle for both. In order to ensure that both fuse filehandles get used, >some reads should come from the parent process and some from the child. This time I opened fd_direct in the child and fd_cached in the parent and the reads for fd_cached is done after the fork completes. And from this output we can see that fd_cached is issuing two FUSE_READ requests(I did .Times(2) just to demonstrate the read): Note: Google Test filter = ReadPrivileged.direct_io_per_filehandle [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from ReadPrivileged [ RUN ] ReadPrivileged.direct_io_per_filehandle INIT ino= 0 ACCESS ino= 1 mask=0x1 LOOKUP ino= 1 some_file.txt OPEN ino=42 flags=0 LOOKUP ino= 1 some_file.txt OPEN ino=42 flags=0 READ ino=42 fh=0x64 offset=0 size=8 READ ino=42 fh=0x64 offset=0 size=8 FLUSH ino=42 fh=0x64 lock_owner=8622 READ ino=42 fh=0xc8 offset=0 size=8 READ ino=42 fh=0xc8 offset=0 size=8 [ OK ] ReadPrivileged.direct_io_per_filehandle (2 ms) [----------] 1 test from ReadPrivileged (2 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test suite ran. (2 ms total) [ PASSED ] 1 test. If this test is correct then i can start writing one for writes. And the diff you gave helped a lot, thank you.
(In reply to raghav narayan from comment #24) Ironically, now the test _isn't_ working for me. The first three reads all use the fh_direct file handle. That violates expectations, and fails the tests. The reason why the read from fd_cached ends up using the fh_direct file handle is because it goes through the buffer cache, which calls fuse_vnop_strategy, which calls fuse_vnop_read again. But this time, it doesn't have a cred pointer, so fuse_filehandle_getrw doesn't know which file handle to use, and it grabs the wrong one by mistake. I don't know why the test is passing for you.
(In reply to Alan Somers from comment #25) If I open fd_cached, do reads then open fd_direct and do direct I/O reads everything behaves as expected. but if I open fd_cached in the parent then open fd_direct in the child and perform the reads after both filehandles are open, I can see that reads on fd_cached also result in multiple FUSE_READ requests with fh_cached. This suggests that the FN_DIRECTIO set on the vnode by fd_direct is affecting fd_cached as well and bypassing the cache. Is this not the behavior we expect from the current vnode-based implementation?
(In reply to Alan Somers from comment #25) Thanks for the explanantion however I can't observe that in my setup and the output I gave was from the current unfixed module if that helps. How should I reproduce what you're seeing.