Summary: | nullfs: Doesn't send release for a file read with 'cat' command (while mounted over a FUSE fs). | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | MooseFS FreeBSD Team <freebsd> | ||||||
Component: | kern | Assignee: | Alan Somers <asomers> | ||||||
Status: | Closed FIXED | ||||||||
Severity: | Affects Some People | CC: | asomers, chris, jSML4ThWwBID69YC, kib, mjg | ||||||
Priority: | --- | Keywords: | needs-patch, needs-qa | ||||||
Version: | 12.1-RELEASE | Flags: | koobs:
maintainer-feedback?
(mjg) koobs: maintainer-feedback? (asomers) koobs: maintainer-feedback? (kib) koobs: mfc-stable12? |
||||||
Hardware: | amd64 | ||||||||
OS: | Any | ||||||||
URL: | https://github.com/moosefs/moosefs/issues/347#issuecomment-604460874 | ||||||||
Attachments: |
|
Description
MooseFS FreeBSD Team
2020-04-17 10:12:50 UTC
It's hard to say where the bug really lies. But I bet dtrace would reveal it. Could you please post full steps to reproduce, including how to install moosefs and the exact mount commands used? I might be able to look at this sometime next week. Caveat up front: nullfs is kinda crap. Userspace close (cat exiting) of ordinary files translates to vn_close1() -> VOP_CLOSE(9), then vput(9). This is all generic and not specific to nullfs. vput() -> vinactive(), then vdropl(). Nullfs does not register a CLOSE vnop, but does have vop_inactive. /* * Do not allow the VOP_INACTIVE to be passed to the lower layer, * since the reference count on the lower vnode is not related to * ours. */ static int null_want_recycle(struct vnode *vp) { struct vnode *lvp; struct null_node *xp; struct mount *mp; struct null_mount *xmp; xp = VTONULL(vp); lvp = NULLVPTOLOWERVP(vp); mp = vp->v_mount; xmp = MOUNTTONULLMOUNT(mp); if ((xmp->nullm_flags & NULLM_CACHE) == 0 || (xp->null_flags & NULLV_DROP) != 0 || (lvp->v_vflag & VV_NOSYNC) != 0) { /* * If this is the last reference and caching of the * nullfs vnodes is not enabled, or the lower vnode is * deleted, then free up the vnode so as not to tie up * the lower vnodes. */ return (1); } return (0); } static int null_inactive(struct vop_inactive_args *ap) { struct vnode *vp; vp = ap->a_vp; if (null_want_recycle(vp)) { vp->v_object = NULL; vrecycle(vp); } return (0); } So I guess quick question, is nullfs caching disabled? It is enabled by default and disabled with mount option 'nocache'. (This option is not documented in any of nullfs(5), mount_nullfs(8), nor mount(8), which is... bad. But it exists in the kernel.) Hello, I've tested both ways. Setting -o nocache seems to allow cmd:release to happen. Here's the tests ran twice. The '/storage/chunk/' directory is a MooseFS mount. Test 1: nocache mount -t nullfs -o nocache /storage/chunk /root/test cat /root/test/test/test.txt Log output: 04.17 17:40:57.903154: uid:0 gid:0 pid:18295 cmd:access (811230,0x1): OK 04.17 17:40:57.903371: uid:0 gid:0 pid:18295 cmd:lookup (811230,test.txt): OK (0.0,23078,1.0,[-r--r--r--:0100444,1,1001,1001,1586303169,1586303169,1586303175,0]) 04.17 17:40:57.903434: uid:0 gid:0 pid:18295 cmd:access (23078,0x4): OK 04.17 17:40:57.903561: uid:0 gid:0 pid:18295 cmd:open (23078) (using cached data from lookup): OK (direct_io:0,keep_cache:0) [handle:01000001] 04.17 17:40:57.903843: uid:0 gid:0 pid:18295 cmd:flush (23078) [handle:01000001,uselocks:0,lock_owner:0000000000004777]: OK 04.17 17:40:57.903885: uid:0 gid:0 pid:18295 cmd:release (23078) [handle:01000001,uselocks:0,lock_owner:0000000000004777]: OK Test 2: default umount /root/test mount -t nullfs /storage/chunk /root/test cat /root/test/test/test.txt Log ouput: 04.17 17:42:21.779677: uid:0 gid:0 pid:18302 cmd:access (811230,0x1): OK 04.17 17:42:21.779848: uid:0 gid:0 pid:18302 cmd:lookup (811230,test.txt): OK (0.0,23078,1.0,[-r--r--r--:0100444,1,1001,1001,1586303169,1586303169,1586303175,0]) 04.17 17:42:21.779893: uid:0 gid:0 pid:18302 cmd:access (23078,0x4): OK 04.17 17:42:21.779920: uid:0 gid:0 pid:18302 cmd:open (23078) (using cached data from lookup): OK (direct_io:0,keep_cache:0) [handle:02000001] 04.17 17:42:21.780024: uid:0 gid:0 pid:18302 cmd:flush (23078) [handle:02000001,uselocks:0,lock_owner:000000000000477E]: OK Test 3: nocache umount /root/test mount -t nullfs -o nocache /storage/chunk /root/test cat /root/test/test/test.txt Log output: 04.17 17:45:25.955375: uid:0 gid:0 pid:18337 cmd:access (811230,0x1): OK 04.17 17:45:25.955582: uid:0 gid:0 pid:18337 cmd:lookup (811230,test.txt): OK (0.0,23078,1.0,[-r--r--r--:0100444,1,1001,1001,1586303169,1586303169,1586303175,0]) 04.17 17:45:25.955626: uid:0 gid:0 pid:18337 cmd:access (23078,0x4): OK 04.17 17:45:25.955655: uid:0 gid:0 pid:18337 cmd:open (23078) (using cached data from lookup): OK (direct_io:0,keep_cache:0) [handle:03000001] 04.17 17:45:25.955762: uid:0 gid:0 pid:18337 cmd:flush (23078) [handle:03000001,uselocks:0,lock_owner:00000000000047A1]: OK 04.17 17:45:25.955800: uid:0 gid:0 pid:18337 cmd:release (23078) [handle:03000001,uselocks:0,lock_owner:00000000000047A1]: OK Test 4: default umount /root/test mount -t nullfs /storage/chunk /root/test cat /root/test/test/test.txt Log ouput: 04.17 17:48:40.735042: uid:0 gid:0 pid:18343 cmd:access (811230,0x1): OK 04.17 17:48:40.735266: uid:0 gid:0 pid:18343 cmd:lookup (811230,test.txt): OK (0.0,23078,1.0,[-r--r--r--:0100444,1,1001,1001,1586303169,1586303169,1586303175,0]) 04.17 17:48:40.735312: uid:0 gid:0 pid:18343 cmd:access (23078,0x4): OK 04.17 17:48:40.735343: uid:0 gid:0 pid:18343 cmd:open (23078) (using cached data from lookup): OK (direct_io:0,keep_cache:0) [handle:04000001] 04.17 17:48:40.735473: uid:0 gid:0 pid:18343 cmd:flush (23078) [handle:04000001,uselocks:0,lock_owner:00000000000047A7]: OK From what I can tell, the cmd:release does not happen unless the nullfs mount includes -o nocache. The 'release' call seems to happens from fuse VOP_INACTIVE(). If this is true, there are two things broken: 1. for nullfs caching nullfs cached vnode keeps a use reference on the lower vnode, which prevents inactivation of lowervp. This can be easily worked around by eg. adding a flag to fuse mount that instructs nullfs to not enable caching over it. But there s a second issue, which is potentially more serious and which, if resolved, make the first issue moot. 2. VFS does not guarantee that VOP_INACTIVE() is called timely. It could be missed due to some races which makes locking incompatible with the VOP_INACTIVE() call requirements, and then the call is not make until the next time usecount goes to zero or the vnode is reclaimed. So if 'release' must be called on time, VOP_INACTIVE() does not guarantee that it happens, even without nullfs. WRT VOP_CLOSE(), it is automatically bypassed to the lower vp, no special code is needed for this to happen. Created attachment 213600 [details]
How to install a minimal test instance of MooseFS on FreeBSD
How to reproduce: Install a test environment as per attached instructions. Then do the following: 1) Create test directories and files: # cd /mnt/mfs # mkdir nulltest # mkdir /mnt/nullfs # echo "foo bar" > nulltest/test.txt 2) Open operation log on another console: # cat /mnt/mfs/.oplog 3) Perform these 4 sets of operations, check the other console with operation log in between: First test: # cat /mnt/mfs/nulltest/test.txt (There should be a release operation in the log) Second test: # mount_nullfs /mnt/mfs/nulltest /mnt/nullfs # cat /mnt/mfs/nulltest/test.txt (So the nullfs is already mounted, but we read via MooseFS and the release is still there) Third test: # cat /mnt/nullfs/test.txt (File was read via nullfs and there is no release in the operation log) Fourth test: # cat /mnt/mfs/nulltest/test.txt (File was once again read directly via MooseFS, but the release operation is still missing) Created attachment 214296 [details]
Disable nullfs cacheing on top of fusefs
Could you please retest after applying this patch?
It works with the patch, thank you! A commit references this bug: Author: asomers Date: Fri May 22 18:03:15 UTC 2020 New revision: 361399 URL: https://svnweb.freebsd.org/changeset/base/361399 Log: Disable nullfs cacheing on top of fusefs Nullfs cacheing can keep a large number of vnodes active. That results in more active FUSE file handles, causing some FUSE servers to use extra resources. Disable nullfs cacheing for fusefs, just like we already do for NFSv4. PR: 245688 Reported by: MooseFS FreeBSD Team <freebsd@moosefs.pro> MFC after: 2 weeks Changes: head/sys/fs/fuse/fuse_vfsops.c A commit references this bug: Author: asomers Date: Fri Jun 12 20:27:38 UTC 2020 New revision: 362115 URL: https://svnweb.freebsd.org/changeset/base/362115 Log: MFC r361399: Disable nullfs cacheing on top of fusefs Nullfs cacheing can keep a large number of vnodes active. That results in more active FUSE file handles, causing some FUSE servers to use extra resources. Disable nullfs cacheing for fusefs, just like we already do for NFSv4. PR: 245688 Reported by: MooseFS FreeBSD Team <freebsd@moosefs.pro> Changes: _U stable/12/ stable/12/sys/fs/fuse/fuse_vfsops.c |