This happens with both 10.1-RC1 and 9.3-Release. I wrote a simple and straight forward file system program with FUSE. It just does like: open: open a file with the same name in a different directory read: read a file with the same name in a different directory write: write a file with the same name in a different directory etc. However, the program always freezes when saving a xcf file (Gimp's native format) with Gimp. Once it freezes, it cannot be killed even by shutdown. Therefore, I always have to reset my PC after shutdown. After investigating Gimp's source code, I found that this happened when Gimp tried to fseek() a position greater than the file size. In /usr/ports/graphics/gimp-app/work/gimp-2.8.10/app/xcf/xcf-seek.c, there is xcf_seek_pos(). Adding fstat(fd, &sb); if (pos > sb.st_size) { ftruncate(fd, pos); } to it solved the problem. This is not only a FUSE's problem, but I guess it will be better to deal with it. Sincerely,
Created attachment 148181 [details] fseek test Actually, I have noticed that the attached simple program also freezes my FUSE program. The program just fopen -> fseek(100) -> fwrite(). Please compile it and use for an EXISTING tiny file less than 100bytes like: % test aaa Interestingly, it does not freeze when aaa does not exist, in other words, create() is called by FUSE. It is only affected when an existing small file is specified.
According to https://bugzilla.gnome.org/show_bug.cgi?id=738329, this seems to happen also with Linux. So, it's not a FreeBSD specific bug but a FUSE's bug.
Updated 10.1-BETA and 10.1-RC versioned bugs to 10.1-STABLE.
I think I know what causes the hang for the small write (test.c). Could you try the program, but change "w" to "rw" in the fopen(). - I think the problem happens when a file is opened "write only" and then a partial block is written. The write of the partial buffer cache block requires that the block first be read from the file, but "write only" doesn't allow that to happen and it gets stuck. If test.c works ok when opening "rw", then I think this is what is happening. --> I am thinking that fuse_vnop_open(..O_WRONLY..) should actually do a O_RDWR open. It means that files that only give the user write access won't be able to be opened, but that seems to be preferred to a hang? Please let us know if you get to try this, rick.
Created attachment 164744 [details] patch fuse so that a WRONLY open becomes a RDWR open This patch modifies the FreeBSD Fuse fs (in sys/fs/fuse) so that it always does a RDWR open when a WRONLY open is requested. I believe this bug happens when a write of a partial block occurs when not doing DIRECT_IO. When this happens the buffer cache first reads the entire block in. Without a RDWR open, this read fails and leaves the block "stuck", making umount etc. fail. The problem I see w.r.t. this patch is that a WRONLY open will fail if the process doesn't have Read access to the file. An alternate way to patch this would be to force DIRECT_IO for WRONLY opens. Comments/opinions? Hopefully nishida@ can test this patch to determine if it fixes the problem?
Sorry for my late reply. I have just tested with fopen("rw") and have confirmed that "rw" did not freeze my program and FUSE. Actually it returned "Operation not permitted" which is a correct response because the file size is < 100 and fseek(100) is 'nonseekable'. I will try the patch later. Thanks Hiroshi Nishida
Created attachment 164831 [details] patch fuse so it forces DIRECT_IO for WRONLY opens This patch forces fuse to use DIRECT_IO for files opened WRONLY, so it won't try and read a block in before writing a partial block. This fix may be preferable to the other patch, since it shouldn't make a WRONLY open fail because read/write isn't allowed. To put this in -head, fuse also needs to be patched to invalidate buffers when DIRECT_IO is enabled, because otherwise reads may get stale cached data.
(In reply to rmacklem from comment #7) I tested the patch on 10.2-RELEASE-p8 and confirmed it worked. However, the behavior of fseek() beyond EOF changed to "fseek() extends the original file and writes data at the seek point." Since fseek() beyond EOF is probably undefined, it will be OK and is much better than freeze. Thanks. Hiroshi Nishida
Created attachment 164881 [details] invalidate buffer cache blocks when switching to DIRECT_IO This patch needs to be used along with 164831 so that stale data in the buffer cache isn't accessed after switching to doing direct I/O. I marked 164744 obsolete since I believe 164831 plus this patch is the preferred fix. Also, 164744 wasn't complete and wouldn't have compiled.
(In reply to rmacklem from comment #9) You might want to compare fuse_io_invalbuf() and ffs_rawread_sync(). From the very quick inspection, they are mostly equivalent WRT write ops done.
Since the patches seem to fix the problem, I plan on committing them once I have checked the code against the ffs code. Probably in April 2016.
A commit references this bug: Author: rmacklem Date: Sat May 14 20:03:22 UTC 2016 New revision: 299753 URL: https://svnweb.freebsd.org/changeset/base/299753 Log: Fix fuse to use DIRECT_IO when required. When a file is opened write-only and a partial block was written, buffered I/O would try and read the whole block in. This would result in a hung thread, since there was no open (fuse filehandle) that allowed reading. This patch avoids the problem by forcing DIRECT_IO for this case. It also sets DIRECT_IO when the file system specifies the FN_DIRECTIO flag in its reply to the open. Tested by: nishida@asusa.net, freebsd@moosefs.com PR: 194293, 206238 MFC after: 2 weeks Changes: head/sys/fs/fuse/fuse_file.c head/sys/fs/fuse/fuse_vnops.c
A commit references this bug: Author: rmacklem Date: Sun May 15 00:45:17 UTC 2016 New revision: 299816 URL: https://svnweb.freebsd.org/changeset/base/299816 Log: Fix fuse so that stale buffer cache data isn't read. When I/O on a file under fuse is switched from buffered to DIRECT_IO, it was possible to read stale (before a recent modification) data from the buffer cache. This patch invalidates the buffer cache for the file to fix this. PR: 194293 MFC after: 2 weeks Changes: head/sys/fs/fuse/fuse_node.c
The patch that fixed this has been committed and MFC'd to stable/10.