Hi, At MooseFS test lab we run some tests to see if issues mentioned in bug #255945 and bug #253500 were solved in 14-CURRENT and we encountered another issue with posix locks handled via FUSEFS. I executed this simple script: --------- #!/usr/bin/env python import posix import time fd = posix.open("lock.file",posix.O_WRONLY|posix.O_CREAT,0o666) print(fd) print(posix.lockf(fd,posix.F_LOCK,0)) time.sleep(2) print(posix.lockf(fd,posix.F_TLOCK,0)) time.sleep(2) print(posix.lockf(fd,posix.F_ULOCK,0)) --------- When this is run on a local filesystem, the output is as expected, that is: --------- 3 None None None --------- However, when I run it on MooseFS, I get: --------- 3 None Traceback (most recent call last): File "/mnt/mfs/f13/test.py", line 14, in <module> print(posix.lockf(fd,posix.F_TLOCK,0)) BlockingIOError: [Errno 35] Resource temporarily unavailable --------- and that is not correct, because a process, that owns a posix lock on a file, is supposed to be able to lock this file again. So I checked the operation history and this is what I got (I cut out everything that happened before the open/create operation of file "lock.file" as it is not relevant): --------- 05.19 16:38:13.441623: uid:0 gid:0 pid:766 cmd:lookup (1,lock.file): ENOENT (No such file or directory) 05.19 16:38:13.442336: uid:0 gid:0 pid:766 cmd:create (1,lock.file,O_RDWR|O_CREAT,-rw-r--r--:0100644): OK (0.0,233781,1.0,[-rw-r--r--:0100644,1,0,0,1621442293,1621442293,1621442293,0]) (direct_io:0,keep_cache:0,append_mode:0) [handle:01000004] 05.19 16:38:13.443110: uid:0 gid:0 pid:766 cmd:setlkw (inode:233781 owner:0000000000018723 start:0 end:18446744073709551615 type:W) [handle:01000004]: OK 05.19 16:38:15.449501: uid:0 gid:0 pid:766 cmd:setlk (inode:233781 owner:0000000000E3D000 start:0 end:18446744073709551615 type:W) [handle:01000004]: EAGAIN (Resource temporarily unavailable) 05.19 16:38:15.450339: uid:0 gid:0 pid:766 cmd:access (1,0x1): OK 05.19 16:38:15.451038: uid:0 gid:0 pid:766 cmd:lookup (1,f13): OK (1.0,8243,1.0,[drwxr-xr-x:0040755,2,0,0,1621442079,1621442071,1621442071,1001824]) 05.19 16:38:15.451110: uid:0 gid:0 pid:766 cmd:access (8243,0x1): OK 05.19 16:38:15.451743: uid:0 gid:0 pid:766 cmd:lookup (8243,test.py): OK (0.0,233775,1.0,[-rwxr-xr-x:0100755,1,0,0,1621442269,1621442071,1621442071,559]) 05.19 16:38:15.451801: uid:0 gid:0 pid:766 cmd:access (233775,0x4): OK (forced - kernel bug workaround) 05.19 16:38:15.451868: uid:0 gid:0 pid:766 cmd:open (233775,O_RDONLY) (using cached data from lookup): OK (direct_io:0,keep_cache:1,append_mode:0) [handle:01000005] 05.19 16:38:15.452915: uid:0 gid:0 pid:766 cmd:setlk (inode:233775 owner:FFFFFFFF8EB01020 start:0 end:18446744073709551615 type:U) [handle:01000005]: OK 05.19 16:38:15.453583: uid:0 gid:0 pid:766 cmd:setlk (inode:233775 owner:FFFFFFFF8EB01020 start:0 end:18446744073709551615 type:U) [handle:01000005]: OK 05.19 16:38:15.454143: uid:0 gid:0 pid:766 cmd:flush (233775) - releasing all POSIX-type locks for 00000000000002FE (received from kernel): OK 05.19 16:38:15.454151: uid:0 gid:0 pid:766 cmd:flush (233775) [handle:01000005,uselocks:2,lock_owner:00000000000002FE]: OK 05.19 16:38:15.454204: uid:0 gid:0 pid:766 cmd:release (233775) [handle:01000005,uselocks:2,lock_owner:00000000000002FE]: OK 05.19 16:38:15.456512: uid:0 gid:0 pid:766 cmd:setlk (inode:233781 owner:FFFFFFFFCDEFB9F0 start:0 end:18446744073709551615 type:U) [handle:01000004]: OK 05.19 16:38:15.457012: uid:0 gid:0 pid:766 cmd:flush (233781) - releasing all POSIX-type locks for 00000000000002FE (received from kernel): OK 05.19 16:38:15.457022: uid:0 gid:0 pid:766 cmd:flush (233781) [handle:01000004,uselocks:2,lock_owner:00000000000002FE]: OK 05.19 16:38:15.457478: uid:0 gid:0 pid:766 cmd:release (233781) - releasing all POSIX-type locks for 0000000000E3D000 (left by kernel): OK 05.19 16:38:15.457850: uid:0 gid:0 pid:766 cmd:release (233781) - releasing all POSIX-type locks for 0000000000018723 (left by kernel): OK 05.19 16:38:15.457861: uid:0 gid:0 pid:766 cmd:release (233781) [handle:01000004,uselocks:2,lock_owner:00000000000002FE]: OK --------- Just to be sure, I ran it on Linux on MooseFS, it behaved like FreeBSD on local filesystem: displayed the file descriptor number and 3 "None"s. Operation log (from the same moment as the one above, so opening/creating the lockfile) looks like that: --------- 05.19 19:04:33.608503: uid:0 gid:0 pid:1703 cmd:lookup (233782,lock.file): ENOENT (No such file or directory) 05.19 19:04:33.609052: uid:0 gid:0 pid:1703 cmd:create (233782,lock.file,O_WRONLY|O_CREAT,-rw-rw-rw-:0100666): OK (0.0,233784,1.0,[-rw-r--r--:0100644,1,0,0,1621443873,1621443873,1621443873,0]) (direct_io:0,keep_cache:0,append_mode:0) [handle:0A000001] 05.19 19:04:33.609475: uid:0 gid:0 pid:1703 cmd:setlkw (inode:233784 owner:53C3FEC6E42D7114 start:0 end:18446744073709551615 type:W) [handle:0A000001]: OK 05.19 19:04:35.124774: uid:0 gid:0 pid:1667 cmd:getattr (233782) [no handle]: OK (1.0,[drwxr-xr-x:0040755,2,0,0,1621443864,1621443873,1621443873,1]) 05.19 19:04:35.610893: uid:0 gid:0 pid:1703 cmd:setlk (inode:233784 owner:53C3FEC6E42D7114 start:0 end:18446744073709551615 type:W) [handle:0A000001]: OK 05.19 19:04:37.613568: uid:0 gid:0 pid:1703 cmd:setlk (inode:233784 owner:53C3FEC6E42D7114 start:0 end:18446744073709551615 type:U) [handle:0A000001]: OK 05.19 19:04:37.616895: uid:0 gid:0 pid:1703 cmd:flush (233784) - releasing all POSIX-type locks for 53C3FEC6E42D7114 (received from kernel): OK 05.19 19:04:37.616898: uid:0 gid:0 pid:1703 cmd:flush (233784) [handle:0A000001,uselocks:2,lock_owner:53C3FEC6E42D7114]: OK --------- So, clearly, on FreeBSD MooseFS gets some weird owner lock values from FUSE, while on Linux it all looks consistent. As a result, the second lock fails, because MooseFS thinks some other process is trying to obtain the lock.
Thanks for reporting this. I see the bug now. libfuse doesn't really define what the owner field is supposed to be, but the pid seems like a good choice. That's what FreeBSD uses for local locks. Linux does something more complicated, but I don't know that there's a good reason for it. I'll fix it.
Created attachment 225109 [details] Fix the lock owner field during FUSE_SETLK Agata, could you please test MooseFS on FreeBSD 14.0-CURRENT with this patch?
Hi, I tested it with the patch, the script works now and the lock owner value is consistent in all calls.
In code review now.
Just a note, the provided patch applies cleanly to 13.0-RELEASE and resolves the issue.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=18b19f8c6e04935a63a951afe0e540674bc94455 commit 18b19f8c6e04935a63a951afe0e540674bc94455 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-05-20 01:10:15 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2021-06-26 02:40:08 +0000 fusefs: correctly set lock owner during FUSE_SETLK During FUSE_SETLK, the owner field should uniquely identify the calling process. The fusefs module now sets it to the process's pid. Previously, it expected the calling process to set it directly, which was wrong. libfuse also apparently expects the owner field to be set during FUSE_GETLK, though I'm not sure why. PR: 256005 Reported by: Agata <chogata@moosefs.pro> MFC after: 2 weeks Reviewed by: pfg Differential Revision: https://reviews.freebsd.org/D30622 sys/fs/fuse/fuse_vnops.c | 4 ++-- tests/sys/fs/fusefs/locks.cc | 50 ++++++++++++++++++++++++++------------------ 2 files changed, 32 insertions(+), 22 deletions(-)
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7d9771228715bf2a32c7db323accecfb7f2f861c commit 7d9771228715bf2a32c7db323accecfb7f2f861c Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-05-20 01:10:15 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2021-07-27 17:34:08 +0000 fusefs: correctly set lock owner during FUSE_SETLK During FUSE_SETLK, the owner field should uniquely identify the calling process. The fusefs module now sets it to the process's pid. Previously, it expected the calling process to set it directly, which was wrong. libfuse also apparently expects the owner field to be set during FUSE_GETLK, though I'm not sure why. PR: 256005 Reported by: Agata <chogata@moosefs.pro> Reviewed by: pfg Differential Revision: https://reviews.freebsd.org/D30622 (cherry picked from commit 18b19f8c6e04935a63a951afe0e540674bc94455) sys/fs/fuse/fuse_vnops.c | 4 ++-- tests/sys/fs/fusefs/locks.cc | 50 ++++++++++++++++++++++++++------------------ 2 files changed, 32 insertions(+), 22 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c7fe00ae403f08b17e8376cc98195171234bf3d9 commit c7fe00ae403f08b17e8376cc98195171234bf3d9 Author: Alan Somers <asomers@FreeBSD.org> AuthorDate: 2021-05-20 01:10:15 +0000 Commit: Alan Somers <asomers@FreeBSD.org> CommitDate: 2021-07-27 17:44:28 +0000 fusefs: correctly set lock owner during FUSE_SETLK During FUSE_SETLK, the owner field should uniquely identify the calling process. The fusefs module now sets it to the process's pid. Previously, it expected the calling process to set it directly, which was wrong. libfuse also apparently expects the owner field to be set during FUSE_GETLK, though I'm not sure why. PR: 256005 Reported by: Agata <chogata@moosefs.pro> Reviewed by: pfg Differential Revision: https://reviews.freebsd.org/D30622 (cherry picked from commit 18b19f8c6e04935a63a951afe0e540674bc94455) sys/fs/fuse/fuse_vnops.c | 4 ++-- tests/sys/fs/fusefs/locks.cc | 50 ++++++++++++++++++++++++++------------------ 2 files changed, 32 insertions(+), 22 deletions(-)