Bug 194293 - FUSE program freezes when seeking pos > file size
Summary: FUSE program freezes when seeking pos > file size
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.1-STABLE
Hardware: amd64 Any
: --- Affects Many People
Assignee: Rick Macklem
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-10 20:30 UTC by Hiroshi Nishida
Modified: 2016-05-28 22:49 UTC (History)
4 users (show)

See Also:


Attachments
fseek test (505 bytes, text/x-csrc)
2014-10-10 21:46 UTC, Hiroshi Nishida
no flags Details
patch fuse so that a WRONLY open becomes a RDWR open (1.30 KB, patch)
2015-12-28 01:51 UTC, rmacklem
no flags Details | Diff
patch fuse so it forces DIRECT_IO for WRONLY opens (1.76 KB, patch)
2015-12-29 23:18 UTC, rmacklem
no flags Details | Diff
invalidate buffer cache blocks when switching to DIRECT_IO (701 bytes, patch)
2015-12-31 01:07 UTC, rmacklem
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Hiroshi Nishida 2014-10-10 20:30:25 UTC
This happens with both 10.1-RC1 and 9.3-Release.

I wrote a simple and straight forward file system program with FUSE.
It just does like:

open: open a file with the same name in a different directory
read: read a file with the same name in a different directory
write: write a file with the same name in a different directory
etc.

However, the program always freezes when saving a xcf file (Gimp's native format) with Gimp.
Once it freezes, it cannot be killed even by shutdown.
Therefore, I always have to reset my PC after shutdown.

After investigating Gimp's source code, I found that this happened when Gimp tried to fseek() a position greater than the file size.

In /usr/ports/graphics/gimp-app/work/gimp-2.8.10/app/xcf/xcf-seek.c, there is xcf_seek_pos().

Adding

fstat(fd, &sb);
if (pos > sb.st_size) {
    ftruncate(fd, pos);
}

to it solved the problem.

This is not only a FUSE's problem, but I guess it will be better to deal with it.

Sincerely,
Comment 1 Hiroshi Nishida 2014-10-10 21:46:33 UTC
Created attachment 148181 [details]
fseek test

Actually, I have noticed that the attached simple program also freezes my FUSE program.
The program just fopen -> fseek(100) -> fwrite().
Please compile it and use for an EXISTING tiny file less than 100bytes like:

% test aaa

Interestingly, it does not freeze when aaa does not exist, in other words, create() is called by FUSE.
It is only affected when an existing small file is specified.
Comment 2 Hiroshi Nishida 2014-10-21 18:16:55 UTC
According to https://bugzilla.gnome.org/show_bug.cgi?id=738329, this seems to happen also with Linux.
So, it's not a FreeBSD specific bug but a FUSE's bug.
Comment 3 Marcus von Appen freebsd_committer freebsd_triage 2015-02-18 11:54:21 UTC
Updated 10.1-BETA and 10.1-RC versioned bugs to 10.1-STABLE.
Comment 4 Rick Macklem freebsd_committer 2015-12-25 04:03:13 UTC
I think I know what causes the hang for the small write
(test.c).
Could you try the program, but change "w" to "rw" in the
fopen().

- I think the problem happens when a file is opened "write only"
  and then a partial block is written. The write of the partial
  buffer cache block requires that the block first be read from
  the file, but "write only" doesn't allow that to happen and
  it gets stuck.

If test.c works ok when opening "rw", then I think this is what
is happening.

--> I am thinking that fuse_vnop_open(..O_WRONLY..) should actually
    do a O_RDWR open. It means that files that only give the user
    write access won't be able to be opened, but that seems to be
    preferred to a hang?

Please let us know if you get to try this, rick.
Comment 5 rmacklem 2015-12-28 01:51:22 UTC
Created attachment 164744 [details]
patch fuse so that a WRONLY open becomes a RDWR open

This patch modifies the FreeBSD Fuse fs (in sys/fs/fuse) so that it
always does a RDWR open when a WRONLY open is requested.
I believe this bug happens when a write of a partial block occurs
when not doing DIRECT_IO. When this happens the buffer cache first
reads the entire block in. Without a RDWR open, this read fails and
leaves the block "stuck", making umount etc. fail.

The problem I see w.r.t. this patch is that a WRONLY open will fail
if the process doesn't have Read access to the file.

An alternate way to patch this would be to force DIRECT_IO for WRONLY
opens. Comments/opinions?

Hopefully nishida@ can test this patch to determine if it fixes the problem?
Comment 6 Hiroshi Nishida 2015-12-29 19:29:32 UTC
Sorry for my late reply.

I have just tested with fopen("rw") and have confirmed that "rw" did not freeze my program and FUSE.
Actually it returned "Operation not permitted" which is a correct response because the file size is < 100 and fseek(100) is 'nonseekable'.

I will try the patch later.

Thanks

Hiroshi Nishida
Comment 7 rmacklem 2015-12-29 23:18:51 UTC
Created attachment 164831 [details]
patch fuse so it forces DIRECT_IO for WRONLY opens

This patch forces fuse to use DIRECT_IO for files opened
WRONLY, so it won't try and read a block in before writing
a partial block.

This fix may be preferable to the other patch, since it shouldn't
make a WRONLY open fail because read/write isn't allowed.

To put this in -head, fuse also needs to be patched to invalidate
buffers when DIRECT_IO is enabled, because otherwise reads may
get stale cached data.
Comment 8 Hiroshi Nishida 2015-12-30 19:25:35 UTC
(In reply to rmacklem from comment #7)

I tested the patch on 10.2-RELEASE-p8 and confirmed it worked.
However, the behavior of fseek() beyond EOF changed to "fseek() extends the original file and writes data at the seek point."
Since fseek() beyond EOF is probably undefined, it will be OK and is much better than freeze.

Thanks.

Hiroshi Nishida
Comment 9 rmacklem 2015-12-31 01:07:48 UTC
Created attachment 164881 [details]
invalidate buffer cache blocks when switching to DIRECT_IO

This patch needs to be used along with 164831 so that stale
data in the buffer cache isn't accessed after switching to
doing direct I/O.

I marked 164744 obsolete since I believe 164831 plus this patch
is the preferred fix. Also, 164744 wasn't complete and wouldn't
have compiled.
Comment 10 Konstantin Belousov freebsd_committer 2015-12-31 11:37:47 UTC
(In reply to rmacklem from comment #9)
You might want to compare fuse_io_invalbuf() and ffs_rawread_sync().  From the very quick inspection, they are mostly equivalent WRT write ops done.
Comment 11 Rick Macklem freebsd_committer 2016-02-05 01:07:23 UTC
Since the patches seem to fix the problem, I plan
on committing them once I have checked the code
against the ffs code. Probably in April 2016.
Comment 12 commit-hook freebsd_committer 2016-05-14 20:04:08 UTC
A commit references this bug:

Author: rmacklem
Date: Sat May 14 20:03:22 UTC 2016
New revision: 299753
URL: https://svnweb.freebsd.org/changeset/base/299753

Log:
  Fix fuse to use DIRECT_IO when required.

  When a file is opened write-only and a partial block was written,
  buffered I/O would try and read the whole block in. This would
  result in a hung thread, since there was no open (fuse filehandle)
  that allowed reading. This patch avoids the problem by forcing
  DIRECT_IO for this case.
  It also sets DIRECT_IO when the file system specifies the FN_DIRECTIO
  flag in its reply to the open.

  Tested by:	nishida@asusa.net, freebsd@moosefs.com
  PR:		194293, 206238
  MFC after:	2 weeks

Changes:
  head/sys/fs/fuse/fuse_file.c
  head/sys/fs/fuse/fuse_vnops.c
Comment 13 commit-hook freebsd_committer 2016-05-15 00:45:31 UTC
A commit references this bug:

Author: rmacklem
Date: Sun May 15 00:45:17 UTC 2016
New revision: 299816
URL: https://svnweb.freebsd.org/changeset/base/299816

Log:
  Fix fuse so that stale buffer cache data isn't read.

  When I/O on a file under fuse is switched from buffered to DIRECT_IO,
  it was possible to read stale (before a recent modification) data from
  the buffer cache. This patch invalidates the buffer cache for the
  file to fix this.

  PR:		194293
  MFC after:	2 weeks

Changes:
  head/sys/fs/fuse/fuse_node.c
Comment 14 Rick Macklem freebsd_committer 2016-05-28 22:49:27 UTC
The patch that fixed this has been committed and MFC'd to stable/10.