We're allowed to cache clean data, but overwrites should invalidate or replace cached ranges. +++ This bug was initially created as a clone of Bug #230258 +++
(Again, I'll take if no one else beats me to it.)
As stated in bug 230258, there are 2 workarounds to this : - use FUSE direct_io mount option - sysctl vfs.fuse.data_cache_mode = 0
(In reply to Ben RUBSON from comment #2) > As stated in bug 230258, there are 2 workarounds to this : > - use FUSE direct_io mount option > - sysctl vfs.fuse.data_cache_mode = 0 Makes sense to me — both have the effect of preventing data caching :-). There will be a performance impact depending on your workload.
I think fuse's IO_DIRECT path is a mess. Really all IO should go through the buffer cache, and B_DIRECT and ~B_CACHE are just flags that control the buffer's lifetime once the operation is complete. Removing the "direct" backends entirely (except as implementation details of strategy()) would simplify and correct the caching logic. Looking at UFS; it really only has a non-bufcache "rawread" path that uses pbufs (and flushes all dirty bufs on the vnode first!). There is no equivalent for O_DIRECT writes. And ffs_rawread basically duplicates the ordinary read path for extremely limited cases (single iov, must be sector sized/aligned, etc) — it's unclear to me why it exists. ffs_write() just uses the ordinary buf cache, paying attention to ioflag & IO_DIRECT and using vfs_bio_set_flags(, ioflag) to propagate it to b_flags & B_DIRECT. (B_DIRECT causes the buffer to be released immediately when it is freed, instead of being cached.) I think we should probably learn from UFS for FUSE's IO modes: 1. Keep and enable the direct_io option, for users who truly want to bypass the buf cache entirely. Preferably this is a per-mountpoint option rather than a global, but that is an orthogonal enhancement. Confusingly, this is distinct from opening a file O_DIRECT. Maybe the sysctl/option can be renamed. "raw io?" 2. Do not actually use the "direct" paths in FUSE outside of global direct_io mode (or a future MP-specific always-direct mode). 3. A caveat here is: FUSE filesystems (?)don't have a native sector/block size, but the buf cache is in block units. And, we translate O_WRONLY opens into FUSE FUFH_WRONLY opens. So there will be some trickiness in partial block writes with a O_WRONLY handle when the block is not in cache. Today that is sidestepped by invoking direct mode, but shouldn't be. Anyway, this is all future cleanup ideas for this area. For the more limited scope of fixing just this PR, we can probably draw inspiration from ffs_rawread_sync().
A commit references this bug: Author: asomers Date: Fri Apr 12 19:05:08 UTC 2019 New revision: 346162 URL: https://svnweb.freebsd.org/changeset/base/346162 Log: fusefs: evict invalidated cache contents during write-through fusefs's default cache mode is "writethrough", although it currently works more like "write-around"; writes bypass the cache completely. Since writes bypass the cache, they were leaving stale previously-read data in the cache. This commit invalidates that stale data. It also adds a new global v_inval_buf_range method, like vtruncbuf but for a range of a file. PR: 235774 Reported by: cem Sponsored by: The FreeBSD Foundation Changes: projects/fuse2/sys/fs/fuse/fuse_io.c projects/fuse2/sys/kern/vfs_subr.c projects/fuse2/sys/sys/vnode.h projects/fuse2/tests/sys/fs/fusefs/write.cc
Awesome, thanks Alan!
Perfect, many thanks Alan ! Upcoming fusefs version really is promising :)
This is complete on the fuse2 branch.
A commit references this bug: Author: asomers Date: Fri Apr 26 17:09:27 UTC 2019 New revision: 346756 URL: https://svnweb.freebsd.org/changeset/base/346756 Log: fusefs: fix cache invalidation error from r346162 An off-by-one error led to the last page of a write not being removed from its object, even though that page's buffer was marked as invalid. PR: 235774 Sponsored by: The FreeBSD Foundation Changes: projects/fuse2/sys/kern/vfs_subr.c projects/fuse2/tests/sys/fs/fusefs/write.cc
A commit references this bug: Author: asomers Date: Fri Apr 26 19:47:43 UTC 2019 New revision: 346763 URL: https://svnweb.freebsd.org/changeset/base/346763 Log: fusefs: fix a deadlock in VOP_PUTPAGES As of r346162 fuse now invalidates the cache during writes. But it can't do that when writing from VOP_PUTPAGES, because the write is coming _from_ the cache. Trying to invalidate the cache in that situation causes a deadlock in vm_object_page_remove, because the pages in question have already been busied by the same thread. PR: 235774 Sponsored by: The FreeBSD Foundation Changes: projects/fuse2/sys/fs/fuse/fuse_io.c projects/fuse2/sys/fs/fuse/fuse_io.h projects/fuse2/sys/fs/fuse/fuse_vnops.c
My fix in 346162 was really just a hack to deal with the fact that we didn't truly support writethrough cacheing. But as of 349038, we do. See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237588 . However, the invalidation that I added in 346162 is still applicable to files opened with O_DIRECT, so I won't revert it.
A commit references this bug: Author: asomers Date: Wed Aug 7 00:38:28 UTC 2019 New revision: 350665 URL: https://svnweb.freebsd.org/changeset/base/350665 Log: fusefs: merge from projects/fuse2 This commit imports the new fusefs driver. It raises the protocol level from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and adds many new features. New features include: * Optional kernel-side permissions checks (-o default_permissions) * Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK * Allow interrupting FUSE operations * Support named pipes and unix-domain sockets in fusefs file systems * Forward UTIME_NOW during utimensat(2) to the daemon * kqueue support for /dev/fuse * Allow updating mounts with "mount -u" * Allow exporting fusefs file systems over NFS * Server-initiated invalidation of the name cache or data cache * Respect RLIMIT_FSIZE * Try to support servers as old as protocol 7.4 Performance enhancements include: * Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags * Cache file attributes * Cache lookup entries, both positive and negative * Server-selectable cache modes: writethrough, writeback, or uncached * Write clustering * Readahead * Use counter(9) for statistical reporting PR: 199934 216391 233783 234581 235773 235774 235775 PR: 236226 236231 236236 236291 236329 236381 236405 PR: 236327 236466 236472 236473 236474 236530 236557 PR: 236560 236844 237052 237181 237588 238565 Reviewed by: bcr (man pages) Reviewed by: cem, ngie, rpokala, glebius, kib, bde, emaste (post-commit review on project branch) MFC after: 3 weeks Relnotes: yes Sponsored by: The FreeBSD Foundation Pull Request: https://reviews.freebsd.org/D21110 Changes: _U head/ head/MAINTAINERS head/UPDATING head/etc/mtree/BSD.tests.dist head/sbin/mount_fusefs/mount_fusefs.8 head/sbin/mount_fusefs/mount_fusefs.c head/share/man/man5/fusefs.5 head/sys/fs/fuse/fuse.h head/sys/fs/fuse/fuse_device.c head/sys/fs/fuse/fuse_file.c head/sys/fs/fuse/fuse_file.h head/sys/fs/fuse/fuse_internal.c head/sys/fs/fuse/fuse_internal.h head/sys/fs/fuse/fuse_io.c head/sys/fs/fuse/fuse_io.h head/sys/fs/fuse/fuse_ipc.c head/sys/fs/fuse/fuse_ipc.h head/sys/fs/fuse/fuse_kernel.h head/sys/fs/fuse/fuse_main.c head/sys/fs/fuse/fuse_node.c head/sys/fs/fuse/fuse_node.h head/sys/fs/fuse/fuse_param.h head/sys/fs/fuse/fuse_vfsops.c head/sys/fs/fuse/fuse_vnops.c head/sys/sys/param.h head/tests/sys/fs/Makefile head/tests/sys/fs/fusefs/
A commit references this bug: Author: asomers Date: Sun Sep 15 04:14:34 UTC 2019 New revision: 352351 URL: https://svnweb.freebsd.org/changeset/base/352351 Log: MFC the new fusefs driver MFC r350665, r350990, r350992, r351039, r351042, r351061, r351066, r351113, r351560, r351961, r351963, r352021, r352025, r352230 r350665: fusefs: merge from projects/fuse2 This commit imports the new fusefs driver. It raises the protocol level from 7.8 to 7.23, fixes many bugs, adds a test suite for the driver, and adds many new features. New features include: * Optional kernel-side permissions checks (-o default_permissions) * Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK * Allow interrupting FUSE operations * Support named pipes and unix-domain sockets in fusefs file systems * Forward UTIME_NOW during utimensat(2) to the daemon * kqueue support for /dev/fuse * Allow updating mounts with "mount -u" * Allow exporting fusefs file systems over NFS * Server-initiated invalidation of the name cache or data cache * Respect RLIMIT_FSIZE * Try to support servers as old as protocol 7.4 Performance enhancements include: * Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags * Cache file attributes * Cache lookup entries, both positive and negative * Server-selectable cache modes: writethrough, writeback, or uncached * Write clustering * Readahead * Use counter(9) for statistical reporting PR: 199934 216391 233783 234581 235773 235774 235775 PR: 236226 236231 236236 236291 236329 236381 236405 PR: 236327 236466 236472 236473 236474 236530 236557 PR: 236560 236844 237052 237181 237588 238565 Reviewed by: bcr (man pages) Reviewed by: cem, ngie, rpokala, glebius, kib, bde, emaste (post-commit review on project branch) Relnotes: yes Sponsored by: The FreeBSD Foundation Pull Request: https://reviews.freebsd.org/D21110 r350990: fusefs: add SVN Keywords to the test files Reported by: SVN pre-commit hooks MFC-With: r350665 Sponsored by: The FreeBSD Foundation r350992: fusefs: skip some tests when unsafe aio is disabled MFC-With: r350665 Sponsored by: The FreeBSD Foundation r351039: fusefs: fix intermittency in the default_permissions.Unlink.ok test The test needs to expect a FUSE_FORGET operation. Most of the time the test would pass anyway, because by chance FUSE_FORGET would arrive after the unmount. MFC-With: 350665 Sponsored by: The FreeBSD Foundation r351042: fusefs: Fix the size of fuse_getattr_in In FUSE protocol 7.9, the size of the FUSE_GETATTR request has increased. However, the fusefs driver is currently not sending the additional fields. In our implementation, the additional fields are always zero, so I there haven't been any test failures until now. But fusefs-lkl requires the request's length to be correct. Fix this bug, and also enhance the test suite to catch similar bugs. PR: 239830 MFC-With: 350665 Sponsored by: The FreeBSD Foundation r351061: fusefs: fix the 32-bit build after 351042 Reported by: jhb MFC-With: 351042 Sponsored by: The FreeBSD Foundation r351066: fusefs: fix conditional from r351061 The entirety of r351061 was a copy/paste error. I'm sorry I've been comitting so hastily. Reported by: rpokala Reviewed by: rpokala MFC-With: 351061 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21265 r351113: fusefs: don't send the namespace during listextattr The FUSE_LISTXATTR operation always returns the full list of a file's extended attributes, in all namespaces. There's no way to filter the list server-side. However, currently FreeBSD's fusefs driver sends a namespace string with the FUSE_LISTXATTR request. That behavior was probably copied from fuse_vnop_getextattr, which has an attribute name argument. It's been there ever since extended attribute support was added in r324620. This commit removes it. Reviewed by: cem Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21280 r351560: fusefs: Fix some bugs regarding the size of the LISTXATTR list * A small error in r338152 let to the returned size always being exactly eight bytes too large. * The FUSE_LISTXATTR operation works like Linux's listxattr(2): if the caller does not provide enough space, then the server should return ERANGE rather than return a truncated list. That's true even though in FUSE's case the kernel doesn't provide space to the client at all; it simply requests a maximum size for the list. We previously weren't handling the case where the server returns ERANGE even though the kernel requested as much size as the server had told us it needs; that can happen due to a race. * We also need to ensure that a pathological server that always returns ERANGE no matter what size we request in FUSE_LISTXATTR won't cause an infinite loop in the kernel. As of this commit, it will instead cause an infinite loop that exits and enters the kernel on each iteration, allowing signals to be processed. Reviewed by: cem Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21287 r351961: Coverity fixes in fusefs(5) CID 1404532 fixes a signed vs unsigned comparison error in fuse_vnop_bmap. It could potentially have resulted in VOP_BMAP reporting too many consecutive blocks. CID 1404364 is much worse. It was an array access by an untrusted, user-provided variable. It could potentially have resulted in a malicious file system crashing the kernel or worse. Reported by: Coverity Reviewed by: emaste Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21466 r351963: fusefs: coverity cleanup in the tests Address the following defects reported by Coverity: * Structurally dead code (CID 1404366): set m_quit before FAIL, not after * Unchecked return value of sysctlbyname (CID 1404321) * Unchecked return value of stat(2) (CID 1404471) * Unchecked return value of open(2) (CID 1404402, 1404529) * Unchecked return value of dup(2) (CID 1404478) * Buffer overflows. These are all false positives caused by the fact that Coverity thinks I'm using a buffer to store strings, when in fact I'm really just using it to store a byte array that happens to be initialized with a string. I'm changing the type from char to uint8_t in the hopes that it will placate Coverity. (CID 1404338, 1404350, 1404367, 1404376, 1404379, 1404381, 1404388, 1404403, 1404425, 1404433, 1404434, 1404474, 1404480, 1404484, 1404503, 1404505) * False positive file descriptor leak. I'm going to try to fix this with Coverity modeling, but I'll also change an EXPECT to ASSERT so we don't perform meaningless assertions after the failure. (CID 1404320, 1404324, 1404440, 1404445). * Unannotated file descriptor leak. This will be followed up by a Coverity modeling change. (CID 1404326, 1404334, 1404336, 1404357, 1404361, 1404372, 1404391, 1404395, 1404409, 1404430, 1404448, 1404451, 1404455, 1404457, 1404458, 1404460) * Uninitialized variables in C++ constructors (CID 1404327, 1404346). In the case of m_maxphys, this actually led to part of the FUSE_INIT's response being set to stack garbage during the WriteCluster::clustering test. * Uninitialized sun_len field in struct sockaddr_un (CID 1404330, 1404371, 1404429). Reported by: Coverity Reviewed by: emaste Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21457 r352021: fusefs: suppress some Coverity resource leak CIDs in the tests The fusefs tests deliberately leak file descriptors. To do otherwise would add extra complications to the tests' mock FUSE server. This annotation should hopefully convince Coverity to shut up about the leaks. Reviewed by: uqs Sponsored by: The FreeBSD Foundation r352025: mount_fusefs: fix a segfault on memory allocation failure Reported by: Coverity Coverity CID: 1354188 Sponsored by: The FreeBSD Foundation r352230: fusefs: Fix iosize for FUSE_WRITE in 7.8 compat mode When communicating with a FUSE server that implements version 7.8 (or older) of the FUSE protocol, the FUSE_WRITE request structure is 16 bytes shorter than normal. The protocol version check wasn't applied universally, leading to an extra 16 bytes being sent to such servers. The extra bytes were allocated and bzero()d, so there was no information disclosure. Reviewed by: emaste MFC-With: r350665 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21557 Changes: _U stable/12/ stable/12/MAINTAINERS stable/12/UPDATING stable/12/etc/mtree/BSD.tests.dist stable/12/sbin/mount_fusefs/mount_fusefs.8 stable/12/sbin/mount_fusefs/mount_fusefs.c stable/12/share/man/man5/fusefs.5 stable/12/sys/fs/fuse/fuse.h stable/12/sys/fs/fuse/fuse_device.c stable/12/sys/fs/fuse/fuse_file.c stable/12/sys/fs/fuse/fuse_file.h stable/12/sys/fs/fuse/fuse_internal.c stable/12/sys/fs/fuse/fuse_internal.h stable/12/sys/fs/fuse/fuse_io.c stable/12/sys/fs/fuse/fuse_io.h stable/12/sys/fs/fuse/fuse_ipc.c stable/12/sys/fs/fuse/fuse_ipc.h stable/12/sys/fs/fuse/fuse_kernel.h stable/12/sys/fs/fuse/fuse_main.c stable/12/sys/fs/fuse/fuse_node.c stable/12/sys/fs/fuse/fuse_node.h stable/12/sys/fs/fuse/fuse_param.h stable/12/sys/fs/fuse/fuse_vfsops.c stable/12/sys/fs/fuse/fuse_vnops.c stable/12/sys/sys/param.h stable/12/tests/sys/fs/Makefile stable/12/tests/sys/fs/fusefs/ stable/12/tests/sys/fs/fusefs/access.cc stable/12/tests/sys/fs/fusefs/allow_other.cc stable/12/tests/sys/fs/fusefs/bmap.cc stable/12/tests/sys/fs/fusefs/create.cc stable/12/tests/sys/fs/fusefs/default_permissions.cc stable/12/tests/sys/fs/fusefs/default_permissions_privileged.cc stable/12/tests/sys/fs/fusefs/destroy.cc stable/12/tests/sys/fs/fusefs/dev_fuse_poll.cc stable/12/tests/sys/fs/fusefs/fifo.cc stable/12/tests/sys/fs/fusefs/flush.cc stable/12/tests/sys/fs/fusefs/forget.cc stable/12/tests/sys/fs/fusefs/fsync.cc stable/12/tests/sys/fs/fusefs/fsyncdir.cc stable/12/tests/sys/fs/fusefs/getattr.cc stable/12/tests/sys/fs/fusefs/interrupt.cc stable/12/tests/sys/fs/fusefs/io.cc stable/12/tests/sys/fs/fusefs/link.cc stable/12/tests/sys/fs/fusefs/locks.cc stable/12/tests/sys/fs/fusefs/lookup.cc stable/12/tests/sys/fs/fusefs/mkdir.cc stable/12/tests/sys/fs/fusefs/mknod.cc stable/12/tests/sys/fs/fusefs/mockfs.cc stable/12/tests/sys/fs/fusefs/mockfs.hh stable/12/tests/sys/fs/fusefs/mount.cc stable/12/tests/sys/fs/fusefs/nfs.cc stable/12/tests/sys/fs/fusefs/notify.cc stable/12/tests/sys/fs/fusefs/open.cc stable/12/tests/sys/fs/fusefs/opendir.cc stable/12/tests/sys/fs/fusefs/read.cc stable/12/tests/sys/fs/fusefs/readdir.cc stable/12/tests/sys/fs/fusefs/readlink.cc stable/12/tests/sys/fs/fusefs/release.cc stable/12/tests/sys/fs/fusefs/releasedir.cc stable/12/tests/sys/fs/fusefs/rename.cc stable/12/tests/sys/fs/fusefs/rmdir.cc stable/12/tests/sys/fs/fusefs/setattr.cc stable/12/tests/sys/fs/fusefs/statfs.cc stable/12/tests/sys/fs/fusefs/symlink.cc stable/12/tests/sys/fs/fusefs/unlink.cc stable/12/tests/sys/fs/fusefs/utils.cc stable/12/tests/sys/fs/fusefs/utils.hh stable/12/tests/sys/fs/fusefs/write.cc stable/12/tests/sys/fs/fusefs/xattr.cc
Alan, just to be sure : # uname -a FreeBSD fbsd 12.1-RELEASE-p2 FreeBSD 12.1-RELEASE-p2 GENERIC amd64 Even with entry_timeout=0 and attr_timeout=0, I still need to use one of these 2 workarounds : - use FUSE direct_io mount option - sysctl vfs.fusefs.data_cache_mode = 0 (so sounds like entry_timeout=0 attr_timeout=0 has no effect) There (https://github.com/vgough/encfs/issues/315) you said that : "the entry_timeout and attr_timeout are fully supported in the projects/fuse2 branch, and should be available in the (not yet released) 13.0 and 12.1 releases". Is it the case ? If so, did I perhaps miss something ? Thank you very much !
Ben, I think you're conflating two issues. The entry_timeout and attr_timeout fields concern the lookup cache and attribute cache, respectively. They're separate from the data cache, which is what this bug concerns. Furthermore, vfs.fusefs.data_cache_mode should be ignored when mounting a file system that uses libfuse and was compiled on FreeBSD 12.1 or later. Could you please open a new bug and fully describe the problem you're having?
Alan, many thanks for your prompt reply. I've then opened a new bug to follow this remaining issue : https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244178