Bug 244431 - Incorrect file handle in fsync on FUSE-based filesystem.
Summary: Incorrect file handle in fsync on FUSE-based filesystem.
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Alan Somers
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-02-26 12:02 UTC by Agata
Modified: 2020-03-22 15:21 UTC (History)
5 users (show)

See Also:


Attachments
Reinitialize io size for FUSE_FSYNC for each file handle (433 bytes, patch)
2020-03-09 01:10 UTC, Alan Somers
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Agata 2020-02-26 12:02:43 UTC
OS version: FreeBSD 12.1-p2
Installed: /usr/ports/sysutils/fusefs-libs3/, /usr/ports/sysutils/moosefs3-client/ (and dependencies). Port tree is upgraded, so newest versions are installed (MooseFS is in version 3.0.111).

On mounted MooseFS share I opened a file with vi (shipped with system, NOT vim!), added something to it (anything, even 1 character), then I closed it with :wq and I got this error:

Error: /mnt/mfs/jakisplik: Bad file descriptor; /mnt/mfs/jakisplik: WARNING: FILE TRUNCATED.

I then closed it with :q!, opened it again and the characters I added were there, nothing got truncated, changes got written. But the message was annoying, so I debugged:

This is trace from vi:

 77244 vi       CALL  fstatat(AT_FDCWD,0x8007991c0,0x7fffffffc180,0)
 77244 vi       NAMI  "/mnt/mfs_f3/jakisplik"
 77244 vi       STRU  struct stat {dev=18446744073390849796, ino=152281, mode=0100644, nlink=1, uid=0, gid=0, rdev=0, atime=1582713427, mtime=1582713243, ctime=1582713243, birthtime=-1.-00000001, size=100, blksize=65536, blocks=1, flags=0x0 }
 77244 vi       RET   fstatat 0
 77244 vi       CALL  openat(AT_FDCWD,0x8007991c0,0x601<O_WRONLY|O_CREAT|O_TRUNC>,0666<S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH>)
 77244 vi       NAMI  "/mnt/mfs_f3/jakisplik"
 77244 vi       RET   openat 10/0xa
 77244 vi       CALL  fcntl(0xa,F_GETFL)
 77244 vi       RET   fcntl 1
 77244 vi       CALL  fstat(0xa,0x7fffffffbd20)
 77244 vi       STRU  struct stat {dev=18446744073390849796, ino=152281, mode=0100644, nlink=1, uid=0, gid=0, rdev=0, atime=1582713427, mtime=1582713433, ctime=1582713433, birthtime=-1.-00000001, size=0, blksize=65536, blocks=0, flags=0x0 }
 77244 vi       RET   fstat 0
 77244 vi       CALL  mmap(0,0x11000,0x3<PROT_READ|PROT_WRITE>,0x1002<MAP_PRIVATE|MAP_ANON>,0xffffffff,0)
 77244 vi       RET   mmap 34377715712/0x801125000
 77244 vi       CALL  write(0xa,0x801125680,0x5b)
 77244 vi       GIO   fd 10 wrote 91 bytes
       "sdlkjgldfskjglksdfjg
	dsf;gjsdf;gjsdlkfg
	dsgsdkjglsdfkjgsdflkg
	dfklgjsdlgjsdf

	34535345

	1

       "
 77244 vi       RET   write 91/0x5b
 77244 vi       CALL  fstat(0xa,0x7fffffffbfa0)
 77244 vi       STRU  struct stat {dev=18446744073390849796, ino=152281, mode=0100644, nlink=1, uid=0, gid=0, rdev=0, atime=1582713427, mtime=1582713433, ctime=1582713433, birthtime=-1.-00000001, size=91, blksize=65536, blocks=0, flags=0x0 }
 77244 vi       RET   fstat 0
 77244 vi       CALL  fsync(0xa)
 77244 vi       RET   fsync -1 errno 9 Bad file descriptor
 77244 vi       CALL  fstatat(AT_FDCWD,0x7fffffffb800,0x7fffffffbc00,0)
 77244 vi       NAMI  "/usr/share/nls/C/libc.cat"
 77244 vi       RET   fstatat -1 errno 2 No such file or directory
 77244 vi       CALL  fstatat(AT_FDCWD,0x7fffffffb800,0x7fffffffbc00,0)
 77244 vi       NAMI  "/usr/share/nls/libc/C"
 77244 vi       RET   fstatat -1 errno 2 No such file or directory
 77244 vi       CALL  fstatat(AT_FDCWD,0x7fffffffb800,0x7fffffffbc00,0)
 77244 vi       NAMI  "/usr/local/share/nls/C/libc.cat"
 77244 vi       RET   fstatat -1 errno 2 No such file or directory
 77244 vi       CALL  fstatat(AT_FDCWD,0x7fffffffb800,0x7fffffffbc00,0)
 77244 vi       NAMI  "/usr/local/share/nls/libc/C"
 77244 vi       RET   fstatat -1 errno 2 No such file or directory
 77244 vi       CALL  sigaction(SIGTSTP,0x80032c618,0x80032c638)
 77244 vi       RET   sigaction 0

And this is operation log from MooseFS:

02.26 11:21:49.525090: uid:0 gid:0 pid:77159 cmd:getattr (1): OK (1.0,[drwxrwxrwx:0040777,22,0,0,1582712303,1582712269,1582712269,3085103])
02.26 11:22:01.594382: uid:0 gid:0 pid:70840 cmd:getattr (1): OK (1.0,[drwxrwxrwx:0040777,22,0,0,1582712303,1582712269,1582712269,3085103])
02.26 11:22:02.634610: uid:0 gid:0 pid:70840 cmd:access (1,0x4): OK
02.26 11:22:02.635052: uid:0 gid:0 pid:70840 cmd:opendir (1): OK [handle:00000001]
02.26 11:22:02.635365: uid:0 gid:0 pid:70840 cmd:statfs (1): OK (17739638960128,16586417303552,16586417303552,4175528346,0,20406533)
02.26 11:22:02.635857: uid:0 gid:0 pid:70840 cmd:readdir (1,4096,1373): OK (1008)
02.26 11:22:02.635947: uid:0 gid:0 pid:70840 cmd:readdir (1,4096,1373): OK (no data)
02.26 11:22:02.636295: uid:0 gid:0 pid:70840 cmd:access (1,0x1): OK
02.26 11:22:02.636394: uid:0 gid:0 pid:70840 cmd:lookup (1,jakisplik) (using open dir cache): OK (0.0,152281,1.0,[-rw-r--r--:0100644,1,0,0,1582712269,1582712290,1582712290,96])
02.26 11:22:05.875419: uid:0 gid:0 pid:77175 cmd:access (1,0x1): OK
02.26 11:22:05.875878: uid:0 gid:0 pid:77175 cmd:lookup (1,jakisplik): OK (0.0,152281,1.0,[-rw-r--r--:0100644,1,0,0,1582712269,1582712290,1582712290,96])
02.26 11:22:05.876326: uid:0 gid:0 pid:77175 cmd:access (1,0x1): OK
02.26 11:22:05.876734: uid:0 gid:0 pid:77175 cmd:lookup (1,jakisplik): OK (0.0,152281,1.0,[-rw-r--r--:0100644,1,0,0,1582712269,1582712290,1582712290,96])
02.26 11:22:05.876811: uid:0 gid:0 pid:77175 cmd:access (152281,0x4): OK
02.26 11:22:05.876975: uid:0 gid:0 pid:77175 cmd:open (152281) (using cached data from lookup): OK (direct_io:0,keep_cache:0) [handle:01000001]
02.26 11:22:05.879620: uid:0 gid:0 pid:77175 cmd:read (152281,96,0): OK (96)
02.26 11:22:05.880246: uid:0 gid:0 pid:77175 cmd:access (1,0x1): OK
02.26 11:22:05.880646: uid:0 gid:0 pid:77175 cmd:lookup (1,jakisplik): OK (0.0,152281,1.0,[-rw-r--r--:0100644,1,0,0,1582712526,1582712290,1582712290,96])
02.26 11:22:05.880731: uid:0 gid:0 pid:77175 cmd:access (152281,0x2): OK
02.26 11:22:09.611350: uid:0 gid:0 pid:77175 cmd:access (1,0x1): OK
02.26 11:22:09.611732: uid:0 gid:0 pid:77175 cmd:lookup (1,jakisplik): OK (0.0,152281,1.0,[-rw-r--r--:0100644,1,0,0,1582712526,1582712290,1582712290,96])
02.26 11:22:09.612169: uid:0 gid:0 pid:77175 cmd:access (1,0x1): OK
02.26 11:22:09.612504: uid:0 gid:0 pid:77175 cmd:lookup (1,jakisplik): OK (0.0,152281,1.0,[-rw-r--r--:0100644,1,0,0,1582712526,1582712290,1582712290,96])
02.26 11:22:09.612581: uid:0 gid:0 pid:77175 cmd:access (152281,0x2): OK
02.26 11:22:09.612686: uid:0 gid:0 pid:77175 cmd:open (152281) (using cached data from lookup): OK (direct_io:0,keep_cache:0) [handle:01000002]
02.26 11:22:09.613255: uid:0 gid:0 pid:77175 cmd:setattr (152281,0x8,[size=0]): OK (1.0,[-rw-r--r--:0100644,1,0,0,1582712526,1582712529,1582712529,0])
02.26 11:22:09.613444: uid:0 gid:0 pid:77175 cmd:write (152281,89,0): OK (89)
02.26 11:22:09.615475: uid:0 gid:0 pid:77175 cmd:fsync (152281,0) [handle:01000002]: OK
02.26 11:22:09.615538: uid:0 gid:0 pid:77175 cmd:fsync (152281,0) [handle:00000008]: EBADF (Bad file descriptor)
02.26 11:22:09.615686: uid:0 gid:0 pid:77175 cmd:flush (152281) [handle:01000002,uselocks:0,lock_owner:0000000000012D77]: OK
02.26 11:22:09.616051: uid:0 gid:0 pid:77175 cmd:access (1,0x1): OK
02.26 11:22:09.616482: uid:0 gid:0 pid:77175 cmd:lookup (1,jakisplik): OK (0.0,152281,1.0,[-rw-r--r--:0100644,1,0,0,1582712526,1582712529,1582712529,89])
02.26 11:22:12.002663: uid:0 gid:0 pid:77175 cmd:release (152281) [handle:01000002,uselocks:0,lock_owner:0000000000012D77]: OK
02.26 11:22:12.002758: uid:0 gid:0 pid:77175 cmd:release (152281) [handle:01000001,uselocks:0,lock_owner:0000000000012D77]: OK
02.26 11:22:15.050756: uid:0 gid:0 pid:70840 cmd:getattr (1) (using open dir cache): OK (1.0,[drwxrwxrwx:0040777,22,0,0,1582712522,1582712269,1582712269,3085103])

So, you can see that after a succesful write, there is an fsync that returns an error:

 77244 vi       RET   write 91/0x5b
 77244 vi       CALL  fstat(0xa,0x7fffffffbfa0)
 77244 vi       STRU  struct stat {dev=18446744073390849796, ino=152281, mode=0100644, nlink=1, uid=0, gid=0, rdev=0, atime=1582713427, mtime=1582713433, ctime=1582713433, birthtime=-1.-00000001, size=91, blksize=65536, blocks=0, flags=0x0 }
 77244 vi       RET   fstat 0
 77244 vi       CALL  fsync(0xa)
 77244 vi       RET   fsync -1 errno 9 Bad file descriptor

But, *one* fsync from vi generates *two* fsyncs sent to the filesystem, and the second one is with a completely wrong handle, because "00000008" was never returned from any open operation.

02.26 11:22:09.613444: uid:0 gid:0 pid:77175 cmd:write (152281,89,0): OK (89)
02.26 11:22:09.615475: uid:0 gid:0 pid:77175 cmd:fsync (152281,0) [handle:01000002]: OK
02.26 11:22:09.615538: uid:0 gid:0 pid:77175 cmd:fsync (152281,0) [handle:00000008]: EBADF (Bad file descriptor)

I repeated the whole vi operation several times, and a curious thing: it's not always "00000008", but it's that value most of the time. I once got "00000001" and once or twice "00000004", but all the other tries (10+) yielded "00000008".

Vi used on a local filesystem (ufs) doesn't write any weird messages when saving an edited file with ":wq", so it's clearly a FUSE-related problem.
Comment 1 Alan Somers freebsd_committer freebsd_triage 2020-02-27 03:03:12 UTC
Thanks for the great bug report!  I can't reproduce this problem using a simple passthrough file system, and I'm not setup to use moosefs.  So can you please try again, collect some more information with dtrace?  Run the following command while you reproduce the bug, and show me the output:

sudo dtrace -i 'fbt:fusefs:fdisp_wait_answ:entry /args[0]->finh->opcode == 20/ {printf("nodeid=%d", args[0]->finh->nodeid);}' -i 'fbt:fusefs:fuse_vnop_fsync:entry'
Comment 2 Agata 2020-02-27 06:56:03 UTC
I tried, but I got this message:

dtrace: invalid probe specifier fbt:fusefs:fdisp_wait_answ:entry /args[0]->finh->opcode == 20/ {printf("nodeid=%d", args[0]->finh->nodeid);}: probe description fbt:fusefs:fdisp_wait_answ:entry does not match any probes
Comment 3 Alan Somers freebsd_committer freebsd_triage 2020-02-27 14:04:02 UTC
That error message suggests that you haven't loaded fusefs.ko yet.  Have you?  And is this a vanilla release, or did you build your own kernel?
Comment 4 Agata 2020-02-27 14:39:51 UTC
I had it loaded for sure, because MooseFS worked :)

But I restarted my FreeBSD, loaded everything again and now it works. The only difference is I typed 'kldload fusefs.ko' instead of 'kldload fuse.ko', like I did previously. This is a symbolic link, but maybe that makes a difference somehow?

Anyway, output:

dtrace: description 'fbt:fusefs:fdisp_wait_answ:entry ' matched 1 probe
dtrace: description 'fbt:fusefs:fuse_vnop_fsync:entry' matched 1 probe
CPU     ID                    FUNCTION:NAME
  3  61484            fuse_vnop_fsync:entry
  3  61540            fdisp_wait_answ:entry nodeid=152281
  2  61540            fdisp_wait_answ:entry nodeid=152281

And the problem still exists, of course.
Comment 5 Agata 2020-02-27 14:40:41 UTC
And just for clarity: this is a vanilla installation, generic kernel.
Comment 6 Alan Somers freebsd_committer freebsd_triage 2020-02-27 15:12:02 UTC
(In reply to Agata from comment #4)
Yep, you're absolutely right that the name of the fuse module (fuse.ko vs fusefs.ko) makes a difference from dtrace's perspective.

But I realize now that I need more information still.  Could you please repeat the experiment, while running this new dtrace script:

sudo dtrace -i 'fbt:fusefs:fdisp_wait_answ:entry /args[0]->finh->opcode == 20/ { ffsi = (struct fuse_fsync_in*)(args[0]->indata); printf("nodeid=%d fh=%lu", args[0]->finh->nodeid, ffsi->fh); } fbt:fusefs:fuse_filehandle_init:entry { fvdat = (struct fuse_vnode_data*)(args[0]->v_data); printf("fufh_type=%d nodeid=%lu fh=%lu", args[1], fvdat->nid, args[5]->fh); }'
Comment 7 Agata 2020-02-28 07:01:28 UTC
Output from dtrace:

dtrace: description 'fbt:fusefs:fdisp_wait_answ:entry ' matched 2 probes
CPU     ID                    FUNCTION:NAME
  0  61627       fuse_filehandle_init:entry fufh_type=0 nodeid=4033 fh=16777217
  2  61627       fuse_filehandle_init:entry fufh_type=1 nodeid=4033 fh=16777218
  3  61540            fdisp_wait_answ:entry nodeid=4033 fh=16777218
  3  61540            fdisp_wait_answ:entry nodeid=4033 fh=16777217

FUSE debug info from MooseFS client (fragment with file write attempt):

unique: 18, opcode: OPEN (14), nodeid: 4033, insize: 48, pid: 2862
   unique: 18, success, outsize: 32
unique: 19, opcode: SETATTR (4), nodeid: 4033, insize: 128, pid: 2862
   unique: 19, success, outsize: 120
unique: 20, opcode: WRITE (16), nodeid: 4033, insize: 241, pid: 2862
   unique: 20, success, outsize: 24
unique: 21, opcode: FSYNC (20), nodeid: 4033, insize: 56, pid: 2862
   unique: 21, success, outsize: 16
unique: 21, opcode: FSYNC (20), nodeid: 4033, insize: 40, pid: 2862
   unique: 21, error: -9 (Bad file descriptor), outsize: 16
unique: 22, opcode: FLUSH (25), nodeid: 4033, insize: 64, pid: 2862
   unique: 22, success, outsize: 16

TRUSS debug fragments from MooseFS client:

 2851 100237: 62.544737948 0.002355606 read(10,"8\0\0\0\^T\0\0\0\^U\0\0\0\0\0\0\0\M-A\^O\0\0\0\0\0\0\0\0\0\0\0\0\0\0.\v\0\0\0\0\0\0\^B\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0",135168) = 56 (0x38)
 2851 100237: 62.545068158 0.000117054 write(2,"unique: 21, opcode: FSYNC (20), nodeid: 4033, insize: 56, pid: 2862\n",68) = 68 (0x44)
 2851 100237: 62.597535720 0.000047771 write(2,"   unique: 21, success, outsize: 16\n",36) = 36 (0x24)
 2851 100237: 62.597661434 0.000044977 writev(10,[{"\^P\0\0\0\0\0\0\0\^U\0\0\0\0\0\0\0",16}],1) = 16 (0x10)
 2851 100238: 62.597824584 0.058742103 read(10,"(\0\0\0\^T\0\0\0\^U\0\0\0\0\0\0\0\M-A\^O\0\0\0\0\0\0\0\0\0\0\0\0\0\0.\v\0\0\0\0\0\0",135168) = 40 (0x28)
 2851 100238: 62.597959796 0.000051682 write(2,"unique: 21, opcode: FSYNC (20), nodeid: 4033, insize: 40, pid: 2862\n",68) = 68 (0x44)
 2851 100238: 62.599129498 0.000052241 write(2,"   unique: 21, error: -9 (Bad file descriptor), outsize: 16\n",60) = 60 (0x3c)
 2851 100238: 62.599243200 0.000040508 writev(10,[{"\^P\0\0\0\M-w\M^?\M^?\M^?\^U\0\0\0\0\0\0\0",16}],1) = 16 (0x10)
Comment 8 Alan Somers freebsd_committer freebsd_triage 2020-03-09 00:35:41 UTC
I'm leaning toward calling this a MooseFS bug, because as you can see the file handles that fusefs fsyncs are both file handles that were returned by the server.  However, in the last experiment, you didn't post the moosefs operation log, so we can't see the relationship between the FUSE file handle and whatever it is that MooseFS logs.  Perhaps you could try again, and show both the dtrace output, moosefs operation log, and libfuse debug output?
Comment 9 Alan Somers freebsd_committer freebsd_triage 2020-03-09 01:10:06 UTC
Created attachment 212270 [details]
Reinitialize io size for FUSE_FSYNC for each file handle

Actually, I think this is a kernel bug now.  I just can't understand why you can reproduce it and I can't.  The key is in the return value of read in your truss output.  For the first FUSE_FSYNC operation, it's 56.  For the second, it's 40.  It should be 56 always.  The proximal cause of the bug is a failure to reinitialize a read/write variable in a loop within the kernel.  I just don't understand what's special about MooseFS; this should apply to all file systems.

There is also a bug in either libfuse or MooseFS.  For that second operation, the fact that read returned 40 means that the file handle field isn't valid.  The server should treat the file handle as nonexistent in that case.

Can you please apply the attached patch and try again?
Comment 10 commit-hook freebsd_committer freebsd_triage 2020-03-09 01:57:36 UTC
A commit references this bug:

Author: asomers
Date: Mon Mar  9 01:57:22 UTC 2020
New revision: 358798
URL: https://svnweb.freebsd.org/changeset/base/358798

Log:
  fusefs: fix fsync for files with multiple open handles

  We were reusing a structure for multiple operations, but failing to
  reinitialize one member.  The result is that a server that cares about FUSE
  file handle IDs would see one correct FUSE_FSYNC operation, and one with the
  FHID unset.

  PR:		244431
  Reported by:	Agata <chogata@gmail.com>
  MFC after:	2 weeks

Changes:
  head/sys/fs/fuse/fuse_internal.c
  head/tests/sys/fs/fusefs/fsync.cc
Comment 11 Alan Somers freebsd_committer freebsd_triage 2020-03-09 01:58:37 UTC
Ok, I was able to reproduce the bug at last with my mock file system, even though I can't do it with a real file system.  I'm pretty confident about the cause so I committed it already.  But please let me know if it isn't working.  And thanks for your great bug report!
Comment 12 Agata 2020-03-10 08:59:47 UTC
Yes, it helped, I see correct file handles for FSYNC now and of course vi is behaving good too.

Thanks so much for the patch :)
Comment 13 commit-hook freebsd_committer freebsd_triage 2020-03-22 15:17:14 UTC
A commit references this bug:

Author: asomers
Date: Sun Mar 22 15:17:00 UTC 2020
New revision: 359213
URL: https://svnweb.freebsd.org/changeset/base/359213

Log:
  MFC r358798:

  fusefs: fix fsync for files with multiple open handles

  We were reusing a structure for multiple operations, but failing to
  reinitialize one member.  The result is that a server that cares about FUSE
  file handle IDs would see one correct FUSE_FSYNC operation, and one with the
  FHID unset.

  PR:		244431
  Reported by:	Agata <chogata@gmail.com>

Changes:
_U  stable/12/
  stable/12/sys/fs/fuse/fuse_internal.c
  stable/12/tests/sys/fs/fusefs/fsync.cc