This is one of three issues we detected in FreeBSD FUSE while developing our distributed file system. All four issues can be replicated using this simple test script: https://robo.moosefs.com/support/fuse_helloworld.tgz FreeBSD FUSE is keeping attributes it should not keep. Example: when kernel invokes lookup, one can say how long the result is kept in cache. We return 0 (0.0) timeout, but we still get results that suggest file length is kept in cache. What happens: we have two clients using our distributed file system. Each client has access to the same file. For convienience sake, let's say those files are named /mnt/llfuse/hello1 on client one and /mnt/llfuse/hello2 on client two. But both files point to the SAME OBJECT in the filesystem. When client one reads /mnt/llfuse/hello1, a minute later client two writes something to /mnt/llfuse/hello2, and yet a minute later client one reads /mnt/llfuse/hello1, client one does NOT see the data appended by client two. After we perform this set of operations: truncate -s 0 /mnt/llfuse/hello1 truncate -s 0 /mnt/llfuse/hello2 sleep 0.1 echo "t1" >> /mnt/llfuse/hello1 sleep 0.1 echo "t2" >> /mnt/llfuse/hello2 sleep 0.1 echo "t3" >> /mnt/llfuse/hello1 sleep 0.1 echo "t4" >> /mnt/llfuse/hello2 sleep 0.1 echo "t5" >> /mnt/llfuse/hello1 sleep 0.1 echo "t6" >> /mnt/llfuse/hello2 sleep 0.1 We get in result: cat /mnt/llfuse/hello1 t1 t3 t5 cat /mnt/llfuse/hello2 t1 t3 t5 t6 Which is, of course, not the correct content of the file we just wrote to. We tried also to use fuse_lowlevel_notify_inval_entry function to get rid of the problem, but it returns EINVAL and writes "fuse: writing device: Invalid argument" to screen. Best regards, Peter / MooseFS Team
FreeBSD's bufcache caches file contents by default. This can be worked around by having userspace open files with the O_DIRECT option, or by having your FUSE filesystem's FUSE_OPEN handler return FOPEN_DIRECT_IO in open_flags (i.e. set the 'direct_io' flag in fuse_file_info). Your test 'hello' filesystem does not appear set the direct_io flag on hello1 and hello2 (and of course 'echo >>' is unlikely to), so that may explain corrupt results. Given FreeBSD's local cache cannot possibly be consistent with your distributed filesystem (at least without sophisticated cache invalidation that FUSE probably lacks), you likely want to always operate in DIRECT mode.
Re: lowlevel_notify_inval_entry, note that this was added in protocol 7.12 and FreeBSD only supports 7.8: https://libfuse.github.io/doxygen/fuse__lowlevel_8h.html#ab14032b74b0a57a2b3155dd6ba8d6095
Here's a changelog of protocol additions we don't have: https://github.com/libfuse/libfuse/blob/master/include/fuse_kernel.h
I've checked it using "DIRECT" flag - result is almost the same: hello1: t1 t3 t5 t6 hello2: t1 t3 t5 t6 Still no 't2' and no 't4'. Our problem is not in data caching - this can be fixed using 'direct_io' (not vary good solution, but it works). The problem is connected with dentry cache - i.e. file length. Because of this cache operation "echo 't3' >> /mnt/llfuse/hello1" uses old file length and overwrites 't2'. In Linux and OS X this works fine. On FreeBSD it looks like 'timeout' parameter returned with attributes (getattr) and entry data (lookup) is totally ignored. For example - output from OS X (original version - without direct): hello1: t1 t2 t3 t4 t5 hello2: t1 t2 t3 t4 t5 t6 Still no 't6' in 'hello1' - probably they had similar issue and 'fixed' this by invoking 'getattr' whenever file is opened with O_APPEND, so data are appended properly, but then when 'hello1' is read without 'O_APPEND' it also uses "old" file length. On Linux there is 't6' in 'hello1' (proper timeouting attributes). As for now there is no work-around for this issue. We've mentioned 'fuse_lowlevel_notify_inval_entry' only because it could be used as a work-around. We do not use this function in our current implementation.
(In reply to Jakub Kruszona-Zawadzki from comment #4) > We've mentioned 'fuse_lowlevel_notify_inval_entry' only because it could be > used as a work-around. We do not use this function in our current > implementation. Understood, just wanted to clarify why you saw the error message. It's interesting you mention O_APPEND. I forgot about it, but that was totally broken (even on individual files) in 11.0 and I fixed it in CURRENT (r320451) June 2017. It looks like MarkJ MFC'd the commit to 11 July 2017 (r320684) and also to 11.1 (r320689). So if you are actually running 11.0 and not 11.1 per the PR Version field, I'd encourage you to try 11.1 :-). I can't do it now, but I'll look into the timeout value and our file size caching a little more later (maybe tonight). Thanks for the detailed report with repro, your help with investigating, and of course, patience.
Ok, dirent timeouts (entry_timeout*), as well as attr timeouts (attr_timeout) are encoded in fuse_entry_out via e.g., fuse_reply_entry(). Attribute timeout data is also encoded in fuse_attr_out (attr_valid) via e.g. fuse_reply_attr(). There also seems to be some fuse_config entry, attr, and negative timeouts? I'm not sure how all of these are supposed to interact; are implementations supposed to cache the attr_timeout value from lookup? Any documentation of the protocol expectations here would be very welcome. Timeout values don't look optional. I.e., technically we aren't allowed to cache indefinitely. But it's probably safe enough to treat any ULONG_MAX timeout value as an infinity sentinel (if I did the math right, it represents about 600 billion years). It's easy to say what reasonable behavior should be at the limits: 1. Use our normal caching if attr_valid is the maximal value. 2. Don't cache if attr_valid is zero. (Meets your needs.) But that leaves undefined the entire (0, ULONG_MAX) range. I'm confident that actually implementing some sort of timer based cache invalidation would be fairly complicated code-wise, and seems like it would have limited utility vs on-demand cache invalidation. The safe thing is to not cache, but I'm not sure if real-world filesystems actually pass the ULONG_MAX "infinity" timeout value. I guess I can try to dig into the ones I'm familiar with (probably ntfs-3g being most popular) and see what they do.
Some libfuse example filesystems define a magical "NO_TIMEOUT" of 500000, which is neither ULONG_MAX nor 0. In fact, it's just shy of six days. I don't think we should special case that magical value unless real FUSE FS use it.
(In reply to Conrad Meyer from comment #6) > There also seems to be some fuse_config entry, attr, and negative timeouts? I see. This is configured as a global default by high level API consumers, and the global value is passed into the LL routines. We (kernel) can ignore that since we only interact with the LL interface. It seems the highlevel defaults are 1 second each for entry/attr, and 0 for negative entry. I'm leaning towards keeping our current indefinite caching for any non-zero timeout value, and not caching when given a zero timeout/valid duration. That would at least allow for correct operation of your filesystem. (Performance may suffer due to excessive IPC until the more sophisticated notify_inval_entry is supported.)
ntfs-3g appears to use the highlevel API only. It seems like attribute caching is enabled or disabled with a compile-time option ("CACHEING" macro) based on some platform-specific definitions, and nothing else explicitly configures it. The values used are either 0.0, 1.0, or 10.0. So, I'm still leaning towards 0 = do not cache, non-zero = cache indefinitely.
From the FreeBSD side of things, here's the status quo (to the best of my understanding): 1. On create (open + O_CREAT), we: a. cache_enter() (add an entry to the generic namecache which sits between generic namei()/lookup and VFS specific VOP_LOOKUP operations) — ignoring any entry_valid value (bogus). b. Arbitrarily nuke all negative cache entries below the parent directory, for unclear reasons (maybe pessimistic, but orthogonal to this issue). c. Totally ignore any returned attribute and attribute timeout (safe, I think, but maybe pessimistic). 2. In lookup, we: a. Skip right past fuse_entry_out or fuse_attr_out and go directly to the attributes (ignoring cache timeout value). b. And of course cache them indefinitely (cache_attrs()) (bogus). 3. For link, we check link count validity with our cached va_nlink attribute, assuming our attribute cache is valid (bogus). a. Then we ignore the entry_out result entirely (but don't cache the newly created link vnode, which is safe, if pessimistic).
Ah, also anything using fuse_internal_newentry{,_core}: mkdir and symlink.
Please try this patch for the attribute caching issue: https://people.freebsd.org/~cem/pr230258.1of2.1.patch It compiles (on CURRENT) but I have not had a chance to actually test it out yet. Notably, it does *not* fix the lookup/entry caching.
(In reply to Jakub Kruszona-Zawadzki from comment #4) Please try https://people.freebsd.org/~cem/pr230258.2of2.1.patch for entry data (lookup) caching.
I've just checked it and it doesn't work. (...) hello1: t1 t3 t5 hello2: t1 t3 t5 t6 (...) I applied both patches. BTW. Timeouts other than 0.0 and "INFINITY" are very useful - especially in case of directories. In files we have one crucial attribute - size - which shouldn't be cached in any circumstances, but in case of directories attributes are less important and accessing files using long paths leads to lot of lookups and caching them (even for a second) can significantly improve efficiency. In MooseFS this is the default strategy - for files timeout is 0.0, but for directories default is 1.0.
(In reply to Jakub Kruszona-Zawadzki from comment #14) Yeah, we do some special size attribute handling now. Anyway, I am not very familiar with our FUSE implementation and must be missing something! Thanks for testing it and reporting back. I will have to test and debug further at some point, but I don't know when that will be. (Any other FreeBSD contributor: consider this PR poachable.)
More info. 1. I've checked 12.0 and result is the same 2. I've found sysctl option "vfs.fuse.refresh_size" which is set by default to 0. Setting it to 1 resolves the problem partially (but probably at the cost of efficiency). Output: hello1: t1 t2 t3 t4 t5 hello2: t1 t2 t3 t4 t5 t6 3. I've made simple patch myself and it seems that it resolves the issue. In file "fuse_vnops.c" I've copied filesize handling from GETATTR to LOOKUP. In function "fuse_vnop_lookup" just after: if (op == FUSE_GETATTR) { cache_attrs(*vpp, (struct fuse_attr_out *)fdi.answ); } else { cache_attrs(*vpp, (struct fuse_entry_out *)fdi.answ); } I've added this code: { struct fuse_vnode_data *fvdat = VTOFUD(*vpp); if (vnode_isreg(*vpp) && (fvdat->flag & FN_SIZECHANGE) == 0) { /* * This is for those cases when the file size changed without us * knowing, and we want to catch up. */ off_t new_filesize; if (op == FUSE_GETATTR) { new_filesize = ((struct fuse_attr_out *)fdi.answ)->attr.size; } else { new_filesize = ((struct fuse_entry_out *)fdi.answ)->attr.size; } if (fvdat->filesize != new_filesize) { fuse_vnode_setsize(*vpp, cred, new_filesize); } } } And now result is correct: hello1: t1 t2 t3 t4 t5 t6 hello2: t1 t2 t3 t4 t5 t6 I do not know how to make nice patch - this is why I'am sending it like that. Later I'll try to create proper patch from this.
Created attachment 196048 [details] adding filesize handling to lookup (fusefs) This is patch I've promised
*** Bug 218101 has been marked as a duplicate of this bug. ***
(In reply to Jakub Kruszona-Zawadzki from comment #17) Thanks! I think you can simplify somewhat using the existing 'fattr' pointer. new_filesize = fattr->size; Other than that, it looks ok to me. I can't say I totally understand the FN_SIZECHANGE / fuse_vnode_setsize mechanism, but this copy seems to match GETATTR. It might also be nice to refactor it out into a subroutine, but it's fairly small anyway.
What about adding support for attr_timeout and entry_timeout options as proposed in #218101 ?
(In reply to Ben RUBSON from comment #20) Please scroll up to comments #12-13.
Sorry Conrad, yes your patch handles these options (at least the entry_timeout if I red correctly). But has Jakub did not succeed testing it, and proposed a lighter version which does not seem to take care of the options value, I wondered :) Anyway, I'll be glad to test something about to be merged (as I also reported this bug in #218101).
Ok. As I see there are two different problems. One is entry/attr cache timeout and the other is file size handling. In Linux file size is strictly connected with attribute cache and therefore proper handling of attribute timeouts is important there. In FreeBSD it looks that it works differently. My small patch only fixed file size handling (which is very important - much more important than handling other attributes). I've checked the code and what I see is that entry cache is now totally disabled (which is good, because it is better not to cache at all than cache without timeout). Attribute cache is there only in theory ("cached_attrs" field in "struct fuse_vnode_data"). In theory, because in the code attributes are set but ... never used. Only field va_nlink is used in "link" operation and only to check if it reaches FUSE_LINK_MAX. This is why Conrad's patch didn't change anything. To add full attr_timeout support we should add "struct timespec attrtimeout" field to "struct fuse_vnode_data", fill it correctly in "cache_attrs" and use it - mainly in "fuse_vnop_getattr" - check "attrtimeout" before dispatching FUSE_GETATTR and fill "vap" field using cached attributes instead. To add full entry_timeout support maybe we can use "cache_enter_time" instead of "cache_enter" and store timeout in entry cache and then use it in lookup. I might try to create a patch at least for attr_timeout. The only problem is that I'm unfamiliar with writing code on the kernel level. In this case I don't know how to obtain "monotonic" clock in the kernel.
Maybe using "ticks" for timeouts should be enough? What do you think?
Thank you for your explanation Jakub. I did not understand that there were 2 issues. I thought that the wrong returned size was due to the fact that attributes / entries were cached, as on Linux when we do not use attr_timeout=0 / entry_timeout=0. Good news then to know that cache is for now disabled on FreeBSD, and that your patch solves the size issue (I'll give it a try). Not sure attr_timeout / entry_timeout support is then really needed, it has now a more low priority, as you stated. Perhaps it could help with performance in some cases (I can personally live without).
Jakub I applied your patch and unfortunately it does not correct the bug I opened https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218101 for : if the underlying file size is changed, it's not reflected at fuse side. This is why I thought caching may be enabled somewhere in the path and pointed to attr_timeout/entry_timeout options. There may then be more than one bug.
(In reply to Ben RUBSON from comment #26) Yes. I've just checked it. My patch works fine with "APPEND" to file, but simple "stat" shows "old" file size. I'll try to find better solution.
Created attachment 196191 [details] changed filesize handling in fuse file system Ok. I've created new patch. 1. I've removed "FN_SIZECHANGED" flag which in my opinion is not necessary. 2. I've removed "fuse_vnode_savesize" function - this caused unnecessary SETATTR to userspace - fuse filesystems have to update filesize correctly themselves - this SETATTR is not needed and even it can lead to small problems (unexpected change of mtime). 3. I've removed "fuse_vnode_refreshsize" function. Because we set correctly filesize on GETATTR and LOOKUP it seems unnecessary (I'm not sure here, but everything seems to work very well without it). 4. I've removed using cached filesize On GETATTR (fuse filesystems have to track filesize correctly themselves, so size returned by GETATTR should be always correct). 5. I've fixed previous patch (update filesize on LOOKUP)
This new patch helps yes ; now, when underlying file size has changed, stat() returns correct size at fuse side. Perfect, thank you Jakub ! However I found another use case where we still face an issue. Let's assume raw file is the underlying file. And fuse file the corresponding fuse file. 1. create raw file with some data 2. open fuse file 3. verify fuse data : correct 4. close fuse file 5. add some data to raw file 6. open fuse file 7. verify fuse data : garbage I've found 2 tricks which may help to understand the bug : - add a 5b step which stat() fuse file. - use fuse direct_io mount option. We worked on this last year with Conrad : https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218636 But does not seem to help here.
(In reply to Ben RUBSON from comment #29) I see that still file size is sometimes not refreshed correctly: root@fbsdcurr:/mnt/llfuse # truncate -s0 hello1 root@fbsdcurr:/mnt/llfuse # cat hello1 root@fbsdcurr:/mnt/llfuse # cat hello2 root@fbsdcurr:/mnt/llfuse # echo "ala" > hello1 root@fbsdcurr:/mnt/llfuse # cat hello2 ala root@fbsdcurr:/mnt/llfuse # echo "ola" >> hello1 root@fbsdcurr:/mnt/llfuse # cat hello2 ala ola root@fbsdcurr:/mnt/llfuse # echo "ela" >> hello1 root@fbsdcurr:/mnt/llfuse # cat hello2 ala ola root@fbsdcurr:/mnt/llfuse # stat hello2 18446744073390849794 3 -rw-rw-rw- 1 root wheel 0 12 "Aug 14 13:17:38 2018" "Aug 14 13:17:37 2018" "Aug 14 13:17:37 2018" "Jan 1 00:59:59 1970" 4096 0 0 hello2 root@fbsdcurr:/mnt/llfuse # cat hello2 ala ola ela root@fbsdcurr:/mnt/llfuse # after 'echo "ela" >> hello1' first read (cat hello2) reads file using old length, then after 'stat' it seems that file size is correctly refreshed and then it reads correct data with correct size. I'll check it. I don't see any "garbage" - only size is wrong.
Yes, sorry about the "garbage" shortcut. I also think file may be red with the wrong size. Glad you managed to reproduce !
(In reply to Ben RUBSON from comment #31) Ok. It seems that lookup cache is on by default and it causes troubles (because timeout is not implemented). Try to turn off lookup cache using command: sysctl -w vfs.fuse.lookup_cache_enable=0 In my opinion, for now it should be turned off by default. It should be turned on only after proper timeouts implementation. I'll add this change to my patch. In the meantime please check if the above command helps.
Created attachment 196192 [details] changed filesize handling in fuse file system Ok. I've created another patch. 1. I've removed "FN_SIZECHANGED" flag which in my opinion is not necessary. 2. I've removed "fuse_vnode_savesize" function - this caused unnecessary SETATTR to userspace - fuse filesystems have to update filesize correctly themselves - this SETATTR is not needed and even it can lead to small problems (unexpected change of mtime). 3. I've removed "fuse_vnode_refreshsize" function. Because we set correctly filesize on GETATTR and LOOKUP it seems unnecessary (I'm not sure here, but everything seems to work very well without it). 4. I've removed using cached filesize on GETATTR (fuse filesystems have to track filesize correctly themselves, so size returned by GETATTR should be always correct). 5. I've fixed previous patch (update filesize on LOOKUP) 6. I've turned off lookup cache by default (it should be turned off as long as timeouts are not taken into account)
I'm going on long vacation now, so I won't be able to contribute for the next three weeks.
Yes, turning "vfs.fuse.lookup_cache_enable" off helps, perfect. There's still one side effect of caching I think though. If you keep the fuse file opened, while the underlying one is modified : - you read the fuse file up to the expected size, perfect ; - but you read garbage (this time :). Mount option -o direct_io helps as a workaround.
Proposed patch looks like it fixes at least one issue, but there is a lot going on in our FUSE impl. that looks kind of dubious :-(. (In reply to Jakub Kruszona-Zawadzki from comment #28) If FUSE filesystems must track filesize themselves, I'm really not sure what the goal of our FN_SIZECHANGE flag is. :-( Pretty much all of this code dates to the initial import in 2012. I guess more history may be discoverable from the old project branch (I have not looked yet, but will sometime): https://svnweb.freebsd.org/base/projects/fuse/sys/fs/fuse Are there any sort of unit tests associated with libfuse or something I could try to run against our implementation to verify large-scale changes? Thanks again Piotr, Jakub, and Ben for your investigation and discussion.
Heh: https://svnweb.freebsd.org/base?view=revision&revision=242727
Conrad, I'm not sure about a test suite, but perhaps Nikolaus Rath (libfuse maintainer) who worked on libfuse3 FreeBSD support a few months ago could have some info ? EncFS also has some integration tests which widely validate cache etc... and which fail for now (but with current patch here + direct_io workaround). If you want to give them a try and need assistance, feel free to contact me !
(In reply to Ben RUBSON from comment #38) Thanks Ben, these are good resources. I think there are some nice cleanups we could do to our fuse but they are not especially targeted, which makes them risky. I am hesitant to do that without some test coverage. As discussed on this bug it seems like our internal generic attribute cache is entirely unused, except for VOP_LINK and VOP_REMOVE/VOP_RMDIR. In VOP_LINK it is used to protect (?)naive userspace filesystems from a new link exceeded the protocol maximum). However I don't know why kernel fuse cares at all if the link limit is exceeded. And in REMOVE/RMDIR it is dead code! Ugh. While I would like to remove it as misleading and unused, it may make sense for the kernel to cache results on some local-only filesystems for improved performance. So this may be worth fixing. I think I understand our size caching protocol now (FN_SIZECHANGE, fvdat->filesize, etc). When data caching is enabled (vfs.fuse.data_cache_enable), we use it to maintain our idea of cached data size when e.g. non-DIRECT writes extend the file. I think the idea here is that we maintain dirty cached data locally that the upper filesystem does not know about. So we override GETATTR because we may have local file extensions that have not yet been pushed to the user filesystem. I agree that SETATTR (fuse_vnode_savesize()) appears to be the wrong way to push that information into the userspace filesystem. We should instead just flush dirty buffers and the write path will extend the userspace file size by itself. Maybe we need an additional flag on top of data_cache_enable for dirty_cache_enable -- it may be safe (or acceptable) to cache clean data, but not dirty data? That would fix the append-to-two-files example without totally disabling data cache. At the very least we should add documentation strings/comments to the various sysctls, and perhaps rename internal variables for clarity. Maybe they should be mountpoint options eventually but for now that is not our biggest concern. I still don't really like the cache timeout mechanism -- it is a poor substitute for real invalidation, which seems to have been added in a later FUSE protocol anyway. But we can definitely agree on fixing the ignorance-of-zero-TTL value, and I am willing to be persuaded that we should match Linux on observing the actual timeout.
Hm, seems like at least a subset of the sysctl options are already available per-mountpoint, but their use is not documented.
Hm, and actually I don't believe we cache dirty data, so now I really do not understand the function of savesize.
Does the patch from comment 33, along with direct_io resolve the issue? If so, what needs to be done to have the patch pushed in FreeBSD 12? Is there anything a non-programmer can do to help with this? Funding perhaps?
(In reply to jSML4ThWwBID69YC from comment #42) It's not directly commitable as-is, sorry. Obviously I haven't had time to investigate this recently -- sorry about that. 12.0 timeframe is very very close, I would not expect anything to land by then.
Hello Conrad, I understand, there is never enough time. Is this something that can be resolved in a patch release (11.2-p NUM), or would it have to wait for FreeBSD 13? As it stands, this bug prevents using MooseFS on FreeBSD in many work loads. Either way, what can be done to help? Thanks
EDIT: I mean 12-Patch number, or 12.1, etc.
Hello all involved in this thread, Thank you for all your contributions done so far, we all appreciate them. Do you think it would be possible to incorporate the patch from Jakub's comment #33 or modify it in order to make it compliant with everything (Conrad mentioned it cannot be committed "as is")? The key thing is the patch actually solves the issue (maybe not in a very beautiful way, but it does indeed) so it would make FreeBSD better. Any help here is appreciated. Maybe you know somebody from kernel/fs Team who could contribute some of their precious time in order to make it happen? The issue does not affect MooseFS only, but also any other file system which uses FUSE library. We love FreeBSD and support it and its Users. We see that they have major data-consistency issues related to this issue and we would like to help them. Many thanks for all your input! Best, Piotr
I had some work in progress patches I didn't post to phabricator for some reason. I don't recall if they just didn't fix the symptom mentioned here or if I didn't feel satisfied with my testing of them. I'll go ahead and put those up now, anyway.
https://reviews.freebsd.org/D19088 , https://reviews.freebsd.org/D19089
(In reply to Conrad Meyer from comment #48) (If anyone is willing to review or test this, please feel free to add yourself to the phabricator review and leave a note. Or comment on this bug.)
(In reply to Conrad Meyer from comment #47) Ah, they were already posted in comments #12 and #13, and don't fix the issue (noted in #14) because we still cache file *size* incorrectly.
(In reply to Piotr Robert Konopelko (MooseFS) from comment #46) Yes patch from Jakub in comment #33 works, however there's still one issue according to comment #35.
Is the patch in comment 33 the right one to test, or the patches in comment 48?
(In reply to Ben RUBSON from comment #51) Thanks Ben. Ok, I think I have refreshed my memory somewhat. This bug has focused on file size, but I think there's really two lower-level fundamental issues for distributed filesystems or filesystems like hello_ll that present the same file as two vnodes. First: a. FUSE expects write-through caching, i.e., no cached dirty data. That makes a lot of sense, IMO — you still get the benefits of a clean read cache but don't have the consistency problems of eventual flush. Wrong file size is just a side effect of the dirty data cache. Removing it (or disabling it by default) solves one problem, but not other data consistency issues. Second: b. FreeBSD tries to speed up local FUSE filesystems by putting a buf data cache in front of FUSE, to avoid some upcalls into userspace for every IO operation. The cache's lifetime isn't tied to open fds. c. FUSE filesystems don't have a way to invalidate FreeBSD's cache. What is protecting the integrity of the data? FUSE read operation doesn't set some cache timeout, right? It seems the answer is that FUSE kernel implementations are allowed to cache for performance reasons (and Linux does), but the FUSE *open* operation has a keep_cache flag that tells consumers to throw away the cached data. Actually, it seems we (attempt to) honor this flag. So why is the data still cached? Maybe we aren't sending new opens all the way into FUSE? Maybe the flag has the wrong bit representation? I found this thread, which discusses similar issues for a time-traveling filesystem, very helpful: https://sourceforge.net/p/fuse/mailman/message/8902254/ Plan of action: 1. Disable dangerous dirty cache by default (but leave it around, in case it is useful for someone) 2. Investigate why data (incl. size) is cached in spite of keep_cache=0 in hello_ll.
(In reply to jSML4ThWwBID69YC from comment #52) I think both have known deficiencies; no need to test either at this time.
(In reply to Piotr Robert Konopelko (MooseFS) from comment #0) With vfs.fuse.refresh_size=1 (and with my two existing patches), it seems like this produces the correct result. I don't think it indicates we have achieved perfect cache coherency, but it's a datapoint.
(In reply to Conrad Meyer from comment #55) It seems to work here on a single server. At least I've not been able to produce any errors while running those patches and having vfs.fuse.refresh_size=1 set. Testing was done on FreeBSD 12p3. I should be able to do multi client testing in a couple of weeks.
Hi all, many thanks in such a positive and dynamic feedback! @Jakub – would you have a moment to take a look at recent progress on this issue and give your valuable input / comment on this from MooseFS perspective? Do you think we would be able to test it sometime in coming weeks (I mean Conrad's patches from comment #48 with 'vfs.fuse.refresh_size=1')? Thanks, Piotr
(In reply to Piotr Robert Konopelko (MooseFS) from comment #57) No need for that; #48 is incomplete. I have some other patches in progress.
(In reply to Conrad Meyer from comment #58) Sure, thanks for the update! All the best, Piotr
Ok, there were a few confusing things to tease apart. 1. The buf cache in front of FUSE can easily be made a write-around/write-through (clean-only cache, which versions of FUSE prior to 7.23 expect). I've got a patch to add this option and make it the default. Note that this kind of already happens in the dueling append case, as bash opens '>>' redirects O_WRONLY|O_APPEND. 2. I agree fuse_vnode_savesize() is basically harmful, and fuse_vnode_refreshsize() basically unnecessary. Both should be removed, but are not the cause of this bug. I don't have these patches prepared yet. 3. Setsize and cached filesize is needed to maintain even a clean-only cache. FN_SIZECHANGE is only used for dirty data caching and specifically handling VOP_GETATTR with dirty append data. The scope it is set can be reduced somewhat and I have a patch to do this. 4. Dueling appends were observing the correct new size in LOOKUP (attrs returned with successful LOOKUP) but ignored that stat information (other than to cache it in the unused fvdat->cached_attrs, if the attr_valid time was non-zero). In particular, for the dueling append test code, attr_valid was zero and the stats were entirely discarded. By the time the appends got to OPEN, they observed !KEEP_CACHE and (appropriately) destroyed the cache contents, but (!!!) this does not reset or refresh the cache size from FUSE. So the fvdat->filesize remained stale and the next IO_APPEND write simply overwrote the prior append. (FreeBSD performs the FUSE_LOOKUP, FUSE_OPEN, and FUSE_WRITE in separate Vnode operations ("VOPs") from the core kernel, and it wasn't entirely clear to me what should happen where.) I'll have a small patch for (4) to try shortly. It works in my testing. I can't promise it applies cleanly without the rest of the series, but I'll provide all of the patches.
Here is the series: 1. https://reviews.freebsd.org/D19088 (add binary attr_valid cache, no timeouts yet) 2. https://reviews.freebsd.org/D19089 (add binary dentry valid cache, no timeouts yet) 3. https://reviews.freebsd.org/D19161 (reduce scope of FN_SIZECHANGED) 4. https://reviews.freebsd.org/D19162 (add clean-only cache mode and default to it) 5. https://reviews.freebsd.org/D19163 (refresh fvdat->filesize in LOOKUP, very nearly the same thing as Jakub's patch from comment #16 or 17) With #5 in particular, the dueling appends on hello1/hello2 "do the right thing." I think some of the other issues brought up in this bug with various patch incarnations are resolved by the series. There may be remaining issues. When testing, please remember to remove/disable any vfs.fuse sysctl.conf knobs you might have enabled to work issues in the past, e.g., vfs.fuse.data_cache_enable, vfs.fuse.data_cache_invalidate, vfs.fuse.refresh_size, and vfs.fuse.sync_resize should be left default. Thanks!
(Actual time-based cache timeout is a known TODO. Removing some of these hacky workarounds for WB caching and some stylistic cleanups are other known TODOs. What else?)
A commit references this bug: Author: cem Date: Fri Feb 15 22:49:16 UTC 2019 New revision: 344183 URL: https://svnweb.freebsd.org/changeset/base/344183 Log: FUSE: Respect userspace FS "do-not-cache" of file attributes The FUSE protocol demands that kernel implementations cache user filesystem file attributes (vattr data) for a maximum period of time in the range of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10 seconds; or "a long time" to represent indefinite caching. Historically, FreeBSD FUSE has ignored this client directive entirely. This works fine for local-only filesystems, but causes consistency issues with multi-writer network filesystems. For now, respect 0 second cache TTLs and do not cache such metadata. Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds are still cached indefinitely, because it is unclear how a userspace filesystem could do anything sensible with those semantics even if implemented. In the future, as an optimization, we should implement notify_inval_entry, etc, which provide userspace filesystems a way of evicting the kernel cache. One potentially bogus access to invalid cached attribute data was left in fuse_io_strategy. It is restricted behind the undocumented and non-default "vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are deadcode and can be eliminated? Some minor APIs changed to facilitate this: 1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data"). 2. cache_attrs() respects the provided TTL and only caches in the FUSE inode if TTL > 0. It also grows an "out" argument, which, if non-NULL, stores the translated fuse_attr (even if not suitable for caching). 3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help avoid programming mistakes. 4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE link op was weakened (only performed when we have a valid attr cache). The check is racy in a multi-writer network filesystem anyway -- classic TOCTOU. We have to trust any userspace filesystem that rejects local caching to account for it correctly. PR: 230258 (inspired by; does not fix) Changes: head/sys/fs/fuse/fuse_internal.c head/sys/fs/fuse/fuse_internal.h head/sys/fs/fuse/fuse_io.c head/sys/fs/fuse/fuse_node.c head/sys/fs/fuse/fuse_node.h head/sys/fs/fuse/fuse_vnops.c
A commit references this bug: Author: cem Date: Fri Feb 15 22:50:32 UTC 2019 New revision: 344184 URL: https://svnweb.freebsd.org/changeset/base/344184 Log: FUSE: Respect userspace FS "do-not-cache" of path components The FUSE protocol demands that kernel implementations cache user filesystem path components (lookup/cnp data) for a maximum period of time in the range of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10 seconds; or "a long time" to represent indefinite caching. Historically, FreeBSD FUSE has ignored this client directive entirely. This works fine for local-only filesystems, but causes consistency issues with multi-writer network filesystems. For now, respect 0 second cache TTLs and do not cache such metadata. Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds are still cached indefinitely, because it is unclear how a userspace filesystem could do anything sensible with those semantics even if implemented. Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup if the user filesystem did not set a zero second TTL. PR: 230258 (inspired by; does not fix) Changes: head/sys/fs/fuse/fuse_internal.c head/sys/fs/fuse/fuse_node.c head/sys/fs/fuse/fuse_node.h head/sys/fs/fuse/fuse_vfsops.c head/sys/fs/fuse/fuse_vnops.c
A commit references this bug: Author: cem Date: Fri Feb 15 22:51:10 UTC 2019 New revision: 344185 URL: https://svnweb.freebsd.org/changeset/base/344185 Log: FUSE: Only "dirty" cached file size when data is dirty Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update the buf cache bounds as a result of either a read from the underlying FUSE filesystem, or as part of a write-through type operation (like truncate => VOP_SETATTR). In these cases, do not set the FN_SIZECHANGE flag, which indicates that an inode's data is dirty (in particular, that the local buf cache and fvdat->filesize have dirty extended data). PR: 230258 (related) Changes: head/sys/fs/fuse/fuse_io.c head/sys/fs/fuse/fuse_node.c head/sys/fs/fuse/fuse_vnops.c
A commit references this bug: Author: cem Date: Fri Feb 15 22:52:50 UTC 2019 New revision: 344186 URL: https://svnweb.freebsd.org/changeset/base/344186 Log: FUSE: The FUSE design expects writethrough caching At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol specifies only clean data to be cached. Prior to this change, we implement and default to writeback caching. This is ok enough for local only filesystems without hardlinks, but violates the general design contract with FUSE and breaks distributed filesystems or concurrent access to hardlinks of the same inode. In this change, add cache mode as an extension of cache enable/disable. The new modes are UC (was: cache disabled), WT (default), and WB (was: cache enabled). For now, WT caching is implemented as write-around, which meets the goal of only caching clean data. WT can be better than WA for workloads that frequently read data that was recently written, but WA is trivial to implement. Note that this has no effect on O_WRONLY-opened files, which were already coerced to write-around. Refs: * https://sourceforge.net/p/fuse/mailman/message/8902254/ * https://github.com/vgough/encfs/issues/315 PR: 230258 (inspired by) Changes: head/sys/fs/fuse/fuse_io.c head/sys/fs/fuse/fuse_ipc.h head/sys/fs/fuse/fuse_node.c
A commit references this bug: Author: cem Date: Fri Feb 15 22:55:13 UTC 2019 New revision: 344187 URL: https://svnweb.freebsd.org/changeset/base/344187 Log: FUSE: Refresh cached file size when it changes (lookup) The cached fvdat->filesize is indepedent of the (mostly unused) cached_attrs, and we failed to update it when a cached (but perhaps inactive) vnode was found during VOP_LOOKUP to have a different size than cached. As noted in the code comment, this can occur in distributed filesystems or with other kinds of irregular file behavior (anything is possible in FUSE). We do something similar in fuse_vnop_getattr already. PR: 230258 (as reported in description; other issues explored in comments are not all resolved) Reported by: MooseFS FreeBSD Team <freebsd AT moosefs.com> Submitted by: Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version) Changes: head/sys/fs/fuse/fuse_vnops.c
I'm going to mark this bug as closed, and then clone off related bugs for the additional known issues identified in comments on this bug. We want to continue to track issues, but I think this PR is "long enough" and we can track other outstanding issues more clearly in new bugs. I hope that's ok.
(In reply to Conrad Meyer from comment #68) Thanks, Conrad! We unfortunately couldn't test it in past few days, but I hope it will be possible soon. Also adding corresponding MooseFS GitHub issue for reference: https://github.com/moosefs/moosefs/issues/89 We appreciate your work and time spent on this, thanks again! Best, Piotr
(In reply to Piotr Robert Konopelko (MooseFS) from comment #69) No worries. When you get a chance, please try it and let us know your results. Thanks!
(In reply to Ben RUBSON from comment #35) So, I tested your whole 5 patches Conrad, and good news is that they do work :) Many thanks ! I'm however able to reproduce the bug from comment 35, until I use FUSE direct_io mount option or sysctl vfs.fuse.data_cache_mode = 0. FUSE options -oattr_timeout=0 -oentry_timeout=0 have no effect, they must not be mapped to vfs.fuse.data_cache_mode yet.
(In reply to Ben RUBSON from comment #71) > (In reply to Ben RUBSON from comment #35) > > So, I tested your whole 5 patches Conrad, and good news is that they do work > :) Many thanks ! Great! > I'm however able to reproduce the bug from comment 35, until I use FUSE > direct_io mount option or sysctl vfs.fuse.data_cache_mode = 0. I think this is the known issue tracked in bug 235774. > FUSE options -oattr_timeout=0 -oentry_timeout=0 have no effect, they must > not be mapped to vfs.fuse.data_cache_mode yet. How do you observe -oattr_timeout=0 -oentry_timeout=0 having no effect? Or just that they have no effect on the comment #35 issue? Data cache and attr/entry cache are unrelated, I think. If the options are somehow being lost before they make it to the kernel, let's track that in a new PR. Thanks!
I agree Conrad, issue #235774 seems to be the one I face. Let's continue there. Thx !
A commit references this bug: Author: asomers Date: Fri Sep 6 17:56:26 UTC 2019 New revision: 351943 URL: https://svnweb.freebsd.org/changeset/base/351943 Log: MFC r344183-r344187, r344333-r344334, r344407, r344857, r344865 (by cem) r344183: FUSE: Respect userspace FS "do-not-cache" of file attributes The FUSE protocol demands that kernel implementations cache user filesystem file attributes (vattr data) for a maximum period of time in the range of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10 seconds; or "a long time" to represent indefinite caching. Historically, FreeBSD FUSE has ignored this client directive entirely. This works fine for local-only filesystems, but causes consistency issues with multi-writer network filesystems. For now, respect 0 second cache TTLs and do not cache such metadata. Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds are still cached indefinitely, because it is unclear how a userspace filesystem could do anything sensible with those semantics even if implemented. In the future, as an optimization, we should implement notify_inval_entry, etc, which provide userspace filesystems a way of evicting the kernel cache. One potentially bogus access to invalid cached attribute data was left in fuse_io_strategy. It is restricted behind the undocumented and non-default "vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are deadcode and can be eliminated? Some minor APIs changed to facilitate this: 1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data"). 2. cache_attrs() respects the provided TTL and only caches in the FUSE inode if TTL > 0. It also grows an "out" argument, which, if non-NULL, stores the translated fuse_attr (even if not suitable for caching). 3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help avoid programming mistakes. 4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE link op was weakened (only performed when we have a valid attr cache). The check is racy in a multi-writer network filesystem anyway -- classic TOCTOU. We have to trust any userspace filesystem that rejects local caching to account for it correctly. PR: 230258 (inspired by; does not fix) r344184: FUSE: Respect userspace FS "do-not-cache" of path components The FUSE protocol demands that kernel implementations cache user filesystem path components (lookup/cnp data) for a maximum period of time in the range of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10 seconds; or "a long time" to represent indefinite caching. Historically, FreeBSD FUSE has ignored this client directive entirely. This works fine for local-only filesystems, but causes consistency issues with multi-writer network filesystems. For now, respect 0 second cache TTLs and do not cache such metadata. Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds are still cached indefinitely, because it is unclear how a userspace filesystem could do anything sensible with those semantics even if implemented. Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup if the user filesystem did not set a zero second TTL. PR: 230258 (inspired by; does not fix) r344185: FUSE: Only "dirty" cached file size when data is dirty Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update the buf cache bounds as a result of either a read from the underlying FUSE filesystem, or as part of a write-through type operation (like truncate => VOP_SETATTR). In these cases, do not set the FN_SIZECHANGE flag, which indicates that an inode's data is dirty (in particular, that the local buf cache and fvdat->filesize have dirty extended data). PR: 230258 (related) r344186: FUSE: The FUSE design expects writethrough caching At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol specifies only clean data to be cached. Prior to this change, we implement and default to writeback caching. This is ok enough for local only filesystems without hardlinks, but violates the general design contract with FUSE and breaks distributed filesystems or concurrent access to hardlinks of the same inode. In this change, add cache mode as an extension of cache enable/disable. The new modes are UC (was: cache disabled), WT (default), and WB (was: cache enabled). For now, WT caching is implemented as write-around, which meets the goal of only caching clean data. WT can be better than WA for workloads that frequently read data that was recently written, but WA is trivial to implement. Note that this has no effect on O_WRONLY-opened files, which were already coerced to write-around. Refs: * https://sourceforge.net/p/fuse/mailman/message/8902254/ * https://github.com/vgough/encfs/issues/315 PR: 230258 (inspired by) r344187: FUSE: Refresh cached file size when it changes (lookup) The cached fvdat->filesize is indepedent of the (mostly unused) cached_attrs, and we failed to update it when a cached (but perhaps inactive) vnode was found during VOP_LOOKUP to have a different size than cached. As noted in the code comment, this can occur in distributed filesystems or with other kinds of irregular file behavior (anything is possible in FUSE). We do something similar in fuse_vnop_getattr already. PR: 230258 (as reported in description; other issues explored in comments are not all resolved) Reported by: MooseFS FreeBSD Team <freebsd AT moosefs.com> Submitted by: Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version) r344333: fuse: add descriptions for remaining sysctls (Except reclaim revoked; I don't know what that goal of that one is.) r344334: Fuse: whitespace and style(9) cleanup Take a pass through fixing some of the most egregious whitespace issues in fs/fuse. Also fix some style(9) warts while here. Not 100% cleaned up, but somewhat less painful to look at and edit. No functional change. r344407: fuse: Fix a regression introduced in r337165 On systems with non-default DFLTPHYS and/or MAXBSIZE, FUSE would attempt to use a buf cache block size in excess of permitted size. This did not affect most configurations, since DFLTPHYS and MAXBSIZE both default to 64kB. The issue was discovered and reported using a custom kernel with a DFLTPHYS of 512kB. PR: 230260 (comment #9) Reported by: ken@ r344857: FUSE: Prevent trivial panic When open(2) was invoked against a FUSE filesystem with an unexpected flags value (no O_RDONLY / O_RDWR / O_WRONLY), an assertion fired, causing panic. For now, prevent the panic by rejecting such VOP_OPENs with EINVAL. This is not considered the correct long term fix, but does prevent an unprivileged denial-of-service. PR: 236329 Reported by: asomers Reviewed by: asomers Sponsored by: Dell EMC Isilon r344865: fuse: switch from DFLTPHYS/MAXBSIZE to maxcachebuf On GENERIC kernels with empty loader.conf, there is no functional change. DFLTPHYS and MAXBSIZE are both 64kB at the moment. This change allows larger bufcache block sizes to be used when either MAXBSIZE (custom kernel) or the loader.conf tunable vfs.maxbcachebuf (GENERIC) is adjusted higher than the default. Suggested by: ken@ Changes: _U stable/12/ stable/12/sys/fs/fuse/fuse.h stable/12/sys/fs/fuse/fuse_device.c stable/12/sys/fs/fuse/fuse_file.c stable/12/sys/fs/fuse/fuse_file.h stable/12/sys/fs/fuse/fuse_internal.c stable/12/sys/fs/fuse/fuse_internal.h stable/12/sys/fs/fuse/fuse_io.c stable/12/sys/fs/fuse/fuse_ipc.c stable/12/sys/fs/fuse/fuse_ipc.h stable/12/sys/fs/fuse/fuse_node.c stable/12/sys/fs/fuse/fuse_node.h stable/12/sys/fs/fuse/fuse_vfsops.c stable/12/sys/fs/fuse/fuse_vnops.c
A commit references this bug: Author: asomers Date: Fri Sep 6 17:56:27 UTC 2019 New revision: 351943 URL: https://svnweb.freebsd.org/changeset/base/351943 Log: MFC r344183-r344187, r344333-r344334, r344407, r344857, r344865 (by cem) r344183: FUSE: Respect userspace FS "do-not-cache" of file attributes The FUSE protocol demands that kernel implementations cache user filesystem file attributes (vattr data) for a maximum period of time in the range of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10 seconds; or "a long time" to represent indefinite caching. Historically, FreeBSD FUSE has ignored this client directive entirely. This works fine for local-only filesystems, but causes consistency issues with multi-writer network filesystems. For now, respect 0 second cache TTLs and do not cache such metadata. Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds are still cached indefinitely, because it is unclear how a userspace filesystem could do anything sensible with those semantics even if implemented. In the future, as an optimization, we should implement notify_inval_entry, etc, which provide userspace filesystems a way of evicting the kernel cache. One potentially bogus access to invalid cached attribute data was left in fuse_io_strategy. It is restricted behind the undocumented and non-default "vfs.fuse.fix_broken_io" sysctl or "brokenio" mount option; maybe these are deadcode and can be eliminated? Some minor APIs changed to facilitate this: 1. Attribute cache validity is tracked in FUSE inodes ("fuse_vnode_data"). 2. cache_attrs() respects the provided TTL and only caches in the FUSE inode if TTL > 0. It also grows an "out" argument, which, if non-NULL, stores the translated fuse_attr (even if not suitable for caching). 3. FUSE VTOVA(vp) returns NULL if the vnode's cache is invalid, to help avoid programming mistakes. 4. A VOP_LINK check for potential nlink overflow prior to invoking the FUSE link op was weakened (only performed when we have a valid attr cache). The check is racy in a multi-writer network filesystem anyway -- classic TOCTOU. We have to trust any userspace filesystem that rejects local caching to account for it correctly. PR: 230258 (inspired by; does not fix) r344184: FUSE: Respect userspace FS "do-not-cache" of path components The FUSE protocol demands that kernel implementations cache user filesystem path components (lookup/cnp data) for a maximum period of time in the range of [0, ULONG_MAX] seconds. In practice, typical requests are for 0, 1, or 10 seconds; or "a long time" to represent indefinite caching. Historically, FreeBSD FUSE has ignored this client directive entirely. This works fine for local-only filesystems, but causes consistency issues with multi-writer network filesystems. For now, respect 0 second cache TTLs and do not cache such metadata. Non-zero metadata caching TTLs in the range [0.000000001, ULONG_MAX] seconds are still cached indefinitely, because it is unclear how a userspace filesystem could do anything sensible with those semantics even if implemented. Pass fuse_entry_out to fuse_vnode_get when available and only cache lookup if the user filesystem did not set a zero second TTL. PR: 230258 (inspired by; does not fix) r344185: FUSE: Only "dirty" cached file size when data is dirty Most users of fuse_vnode_setsize() set the cached fvdat->filesize and update the buf cache bounds as a result of either a read from the underlying FUSE filesystem, or as part of a write-through type operation (like truncate => VOP_SETATTR). In these cases, do not set the FN_SIZECHANGE flag, which indicates that an inode's data is dirty (in particular, that the local buf cache and fvdat->filesize have dirty extended data). PR: 230258 (related) r344186: FUSE: The FUSE design expects writethrough caching At least prior to 7.23 (which adds FUSE_WRITEBACK_CACHE), the FUSE protocol specifies only clean data to be cached. Prior to this change, we implement and default to writeback caching. This is ok enough for local only filesystems without hardlinks, but violates the general design contract with FUSE and breaks distributed filesystems or concurrent access to hardlinks of the same inode. In this change, add cache mode as an extension of cache enable/disable. The new modes are UC (was: cache disabled), WT (default), and WB (was: cache enabled). For now, WT caching is implemented as write-around, which meets the goal of only caching clean data. WT can be better than WA for workloads that frequently read data that was recently written, but WA is trivial to implement. Note that this has no effect on O_WRONLY-opened files, which were already coerced to write-around. Refs: * https://sourceforge.net/p/fuse/mailman/message/8902254/ * https://github.com/vgough/encfs/issues/315 PR: 230258 (inspired by) r344187: FUSE: Refresh cached file size when it changes (lookup) The cached fvdat->filesize is indepedent of the (mostly unused) cached_attrs, and we failed to update it when a cached (but perhaps inactive) vnode was found during VOP_LOOKUP to have a different size than cached. As noted in the code comment, this can occur in distributed filesystems or with other kinds of irregular file behavior (anything is possible in FUSE). We do something similar in fuse_vnop_getattr already. PR: 230258 (as reported in description; other issues explored in comments are not all resolved) Reported by: MooseFS FreeBSD Team <freebsd AT moosefs.com> Submitted by: Jakub Kruszona-Zawadzki <acid AT moosefs.com> (earlier version) r344333: fuse: add descriptions for remaining sysctls (Except reclaim revoked; I don't know what that goal of that one is.) r344334: Fuse: whitespace and style(9) cleanup Take a pass through fixing some of the most egregious whitespace issues in fs/fuse. Also fix some style(9) warts while here. Not 100% cleaned up, but somewhat less painful to look at and edit. No functional change. r344407: fuse: Fix a regression introduced in r337165 On systems with non-default DFLTPHYS and/or MAXBSIZE, FUSE would attempt to use a buf cache block size in excess of permitted size. This did not affect most configurations, since DFLTPHYS and MAXBSIZE both default to 64kB. The issue was discovered and reported using a custom kernel with a DFLTPHYS of 512kB. PR: 230260 (comment #9) Reported by: ken@ r344857: FUSE: Prevent trivial panic When open(2) was invoked against a FUSE filesystem with an unexpected flags value (no O_RDONLY / O_RDWR / O_WRONLY), an assertion fired, causing panic. For now, prevent the panic by rejecting such VOP_OPENs with EINVAL. This is not considered the correct long term fix, but does prevent an unprivileged denial-of-service. PR: 236329 Reported by: asomers Reviewed by: asomers Sponsored by: Dell EMC Isilon r344865: fuse: switch from DFLTPHYS/MAXBSIZE to maxcachebuf On GENERIC kernels with empty loader.conf, there is no functional change. DFLTPHYS and MAXBSIZE are both 64kB at the moment. This change allows larger bufcache block sizes to be used when either MAXBSIZE (custom kernel) or the loader.conf tunable vfs.maxbcachebuf (GENERIC) is adjusted higher than the default. Suggested by: ken@ Changes: _U stable/12/ stable/12/sys/fs/fuse/fuse.h stable/12/sys/fs/fuse/fuse_device.c stable/12/sys/fs/fuse/fuse_file.c stable/12/sys/fs/fuse/fuse_file.h stable/12/sys/fs/fuse/fuse_internal.c stable/12/sys/fs/fuse/fuse_internal.h stable/12/sys/fs/fuse/fuse_io.c stable/12/sys/fs/fuse/fuse_ipc.c stable/12/sys/fs/fuse/fuse_ipc.h stable/12/sys/fs/fuse/fuse_node.c stable/12/sys/fs/fuse/fuse_node.h stable/12/sys/fs/fuse/fuse_vfsops.c stable/12/sys/fs/fuse/fuse_vnops.c
^Triage: Track merges (apologies for the noise, question from user on IRC)