We're allowed to cache clean data, but overwrites should invalidate or replace cached ranges.
+++ This bug was initially created as a clone of Bug #230258 +++
(Again, I'll take if no one else beats me to it.)
As stated in bug 230258, there are 2 workarounds to this :
- use FUSE direct_io mount option
- sysctl vfs.fuse.data_cache_mode = 0
(In reply to Ben RUBSON from comment #2)
> As stated in bug 230258, there are 2 workarounds to this :
> - use FUSE direct_io mount option
> - sysctl vfs.fuse.data_cache_mode = 0
Makes sense to me — both have the effect of preventing data caching :-). There will be a performance impact depending on your workload.
I think fuse's IO_DIRECT path is a mess. Really all IO should go through the buffer cache, and B_DIRECT and ~B_CACHE are just flags that control the buffer's lifetime once the operation is complete. Removing the "direct" backends entirely (except as implementation details of strategy()) would simplify and correct the caching logic.
Looking at UFS; it really only has a non-bufcache "rawread" path that uses pbufs (and flushes all dirty bufs on the vnode first!). There is no equivalent for O_DIRECT writes. And ffs_rawread basically duplicates the ordinary read path for extremely limited cases (single iov, must be sector sized/aligned, etc) — it's unclear to me why it exists.
ffs_write() just uses the ordinary buf cache, paying attention to ioflag & IO_DIRECT and using vfs_bio_set_flags(, ioflag) to propagate it to b_flags & B_DIRECT. (B_DIRECT causes the buffer to be released immediately when it is freed, instead of being cached.)
I think we should probably learn from UFS for FUSE's IO modes:
1. Keep and enable the direct_io option, for users who truly want to bypass the buf cache entirely. Preferably this is a per-mountpoint option rather than a global, but that is an orthogonal enhancement. Confusingly, this is distinct from opening a file O_DIRECT. Maybe the sysctl/option can be renamed. "raw io?"
2. Do not actually use the "direct" paths in FUSE outside of global direct_io mode (or a future MP-specific always-direct mode).
3. A caveat here is: FUSE filesystems (?)don't have a native sector/block size, but the buf cache is in block units. And, we translate O_WRONLY opens into FUSE FUFH_WRONLY opens. So there will be some trickiness in partial block writes with a O_WRONLY handle when the block is not in cache. Today that is sidestepped by invoking direct mode, but shouldn't be.
Anyway, this is all future cleanup ideas for this area. For the more limited scope of fixing just this PR, we can probably draw inspiration from ffs_rawread_sync().