Summary: | Incorrect file handle in fsync on FUSE-based filesystem. | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | Agata <chogata> | ||||
Component: | kern | Assignee: | Alan Somers <asomers> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | asomers, cem, emaste, freebsd, piotr.konopelko | ||||
Priority: | --- | ||||||
Version: | 12.1-STABLE | ||||||
Hardware: | amd64 | ||||||
OS: | Any | ||||||
Attachments: |
|
Description
Agata
2020-02-26 12:02:43 UTC
Thanks for the great bug report! I can't reproduce this problem using a simple passthrough file system, and I'm not setup to use moosefs. So can you please try again, collect some more information with dtrace? Run the following command while you reproduce the bug, and show me the output: sudo dtrace -i 'fbt:fusefs:fdisp_wait_answ:entry /args[0]->finh->opcode == 20/ {printf("nodeid=%d", args[0]->finh->nodeid);}' -i 'fbt:fusefs:fuse_vnop_fsync:entry' I tried, but I got this message: dtrace: invalid probe specifier fbt:fusefs:fdisp_wait_answ:entry /args[0]->finh->opcode == 20/ {printf("nodeid=%d", args[0]->finh->nodeid);}: probe description fbt:fusefs:fdisp_wait_answ:entry does not match any probes That error message suggests that you haven't loaded fusefs.ko yet. Have you? And is this a vanilla release, or did you build your own kernel? I had it loaded for sure, because MooseFS worked :) But I restarted my FreeBSD, loaded everything again and now it works. The only difference is I typed 'kldload fusefs.ko' instead of 'kldload fuse.ko', like I did previously. This is a symbolic link, but maybe that makes a difference somehow? Anyway, output: dtrace: description 'fbt:fusefs:fdisp_wait_answ:entry ' matched 1 probe dtrace: description 'fbt:fusefs:fuse_vnop_fsync:entry' matched 1 probe CPU ID FUNCTION:NAME 3 61484 fuse_vnop_fsync:entry 3 61540 fdisp_wait_answ:entry nodeid=152281 2 61540 fdisp_wait_answ:entry nodeid=152281 And the problem still exists, of course. And just for clarity: this is a vanilla installation, generic kernel. (In reply to Agata from comment #4) Yep, you're absolutely right that the name of the fuse module (fuse.ko vs fusefs.ko) makes a difference from dtrace's perspective. But I realize now that I need more information still. Could you please repeat the experiment, while running this new dtrace script: sudo dtrace -i 'fbt:fusefs:fdisp_wait_answ:entry /args[0]->finh->opcode == 20/ { ffsi = (struct fuse_fsync_in*)(args[0]->indata); printf("nodeid=%d fh=%lu", args[0]->finh->nodeid, ffsi->fh); } fbt:fusefs:fuse_filehandle_init:entry { fvdat = (struct fuse_vnode_data*)(args[0]->v_data); printf("fufh_type=%d nodeid=%lu fh=%lu", args[1], fvdat->nid, args[5]->fh); }' Output from dtrace: dtrace: description 'fbt:fusefs:fdisp_wait_answ:entry ' matched 2 probes CPU ID FUNCTION:NAME 0 61627 fuse_filehandle_init:entry fufh_type=0 nodeid=4033 fh=16777217 2 61627 fuse_filehandle_init:entry fufh_type=1 nodeid=4033 fh=16777218 3 61540 fdisp_wait_answ:entry nodeid=4033 fh=16777218 3 61540 fdisp_wait_answ:entry nodeid=4033 fh=16777217 FUSE debug info from MooseFS client (fragment with file write attempt): unique: 18, opcode: OPEN (14), nodeid: 4033, insize: 48, pid: 2862 unique: 18, success, outsize: 32 unique: 19, opcode: SETATTR (4), nodeid: 4033, insize: 128, pid: 2862 unique: 19, success, outsize: 120 unique: 20, opcode: WRITE (16), nodeid: 4033, insize: 241, pid: 2862 unique: 20, success, outsize: 24 unique: 21, opcode: FSYNC (20), nodeid: 4033, insize: 56, pid: 2862 unique: 21, success, outsize: 16 unique: 21, opcode: FSYNC (20), nodeid: 4033, insize: 40, pid: 2862 unique: 21, error: -9 (Bad file descriptor), outsize: 16 unique: 22, opcode: FLUSH (25), nodeid: 4033, insize: 64, pid: 2862 unique: 22, success, outsize: 16 TRUSS debug fragments from MooseFS client: 2851 100237: 62.544737948 0.002355606 read(10,"8\0\0\0\^T\0\0\0\^U\0\0\0\0\0\0\0\M-A\^O\0\0\0\0\0\0\0\0\0\0\0\0\0\0.\v\0\0\0\0\0\0\^B\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0",135168) = 56 (0x38) 2851 100237: 62.545068158 0.000117054 write(2,"unique: 21, opcode: FSYNC (20), nodeid: 4033, insize: 56, pid: 2862\n",68) = 68 (0x44) 2851 100237: 62.597535720 0.000047771 write(2," unique: 21, success, outsize: 16\n",36) = 36 (0x24) 2851 100237: 62.597661434 0.000044977 writev(10,[{"\^P\0\0\0\0\0\0\0\^U\0\0\0\0\0\0\0",16}],1) = 16 (0x10) 2851 100238: 62.597824584 0.058742103 read(10,"(\0\0\0\^T\0\0\0\^U\0\0\0\0\0\0\0\M-A\^O\0\0\0\0\0\0\0\0\0\0\0\0\0\0.\v\0\0\0\0\0\0",135168) = 40 (0x28) 2851 100238: 62.597959796 0.000051682 write(2,"unique: 21, opcode: FSYNC (20), nodeid: 4033, insize: 40, pid: 2862\n",68) = 68 (0x44) 2851 100238: 62.599129498 0.000052241 write(2," unique: 21, error: -9 (Bad file descriptor), outsize: 16\n",60) = 60 (0x3c) 2851 100238: 62.599243200 0.000040508 writev(10,[{"\^P\0\0\0\M-w\M^?\M^?\M^?\^U\0\0\0\0\0\0\0",16}],1) = 16 (0x10) I'm leaning toward calling this a MooseFS bug, because as you can see the file handles that fusefs fsyncs are both file handles that were returned by the server. However, in the last experiment, you didn't post the moosefs operation log, so we can't see the relationship between the FUSE file handle and whatever it is that MooseFS logs. Perhaps you could try again, and show both the dtrace output, moosefs operation log, and libfuse debug output? Created attachment 212270 [details]
Reinitialize io size for FUSE_FSYNC for each file handle
Actually, I think this is a kernel bug now. I just can't understand why you can reproduce it and I can't. The key is in the return value of read in your truss output. For the first FUSE_FSYNC operation, it's 56. For the second, it's 40. It should be 56 always. The proximal cause of the bug is a failure to reinitialize a read/write variable in a loop within the kernel. I just don't understand what's special about MooseFS; this should apply to all file systems.
There is also a bug in either libfuse or MooseFS. For that second operation, the fact that read returned 40 means that the file handle field isn't valid. The server should treat the file handle as nonexistent in that case.
Can you please apply the attached patch and try again?
A commit references this bug: Author: asomers Date: Mon Mar 9 01:57:22 UTC 2020 New revision: 358798 URL: https://svnweb.freebsd.org/changeset/base/358798 Log: fusefs: fix fsync for files with multiple open handles We were reusing a structure for multiple operations, but failing to reinitialize one member. The result is that a server that cares about FUSE file handle IDs would see one correct FUSE_FSYNC operation, and one with the FHID unset. PR: 244431 Reported by: Agata <chogata@gmail.com> MFC after: 2 weeks Changes: head/sys/fs/fuse/fuse_internal.c head/tests/sys/fs/fusefs/fsync.cc Ok, I was able to reproduce the bug at last with my mock file system, even though I can't do it with a real file system. I'm pretty confident about the cause so I committed it already. But please let me know if it isn't working. And thanks for your great bug report! Yes, it helped, I see correct file handles for FSYNC now and of course vi is behaving good too. Thanks so much for the patch :) A commit references this bug: Author: asomers Date: Sun Mar 22 15:17:00 UTC 2020 New revision: 359213 URL: https://svnweb.freebsd.org/changeset/base/359213 Log: MFC r358798: fusefs: fix fsync for files with multiple open handles We were reusing a structure for multiple operations, but failing to reinitialize one member. The result is that a server that cares about FUSE file handle IDs would see one correct FUSE_FSYNC operation, and one with the FHID unset. PR: 244431 Reported by: Agata <chogata@gmail.com> Changes: _U stable/12/ stable/12/sys/fs/fuse/fuse_internal.c stable/12/tests/sys/fs/fusefs/fsync.cc |