Bug 256005 - [FUSEFS] Fuse does not correctly set the owner field during FUSE_SETLK and FUSE_SETLKW
Summary: [FUSEFS] Fuse does not correctly set the owner field during FUSE_SETLK and FU...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Alan Somers
URL: https://reviews.freebsd.org/D30622
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-19 17:30 UTC by Agata
Modified: 2021-07-27 17:58 UTC (History)
5 users (show)

See Also:


Attachments
Fix the lock owner field during FUSE_SETLK (6.20 KB, patch)
2021-05-20 01:14 UTC, Alan Somers
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Agata 2021-05-19 17:30:48 UTC
Hi,

At MooseFS test lab we run some tests to see if issues mentioned in bug #255945 and bug #253500 were solved in 14-CURRENT and we encountered another issue with posix locks handled via FUSEFS.

I executed this simple script:

---------
#!/usr/bin/env python

import posix
import time

fd = posix.open("lock.file",posix.O_WRONLY|posix.O_CREAT,0o666)
print(fd)
print(posix.lockf(fd,posix.F_LOCK,0))
time.sleep(2)
print(posix.lockf(fd,posix.F_TLOCK,0))
time.sleep(2)
print(posix.lockf(fd,posix.F_ULOCK,0))
---------

When this is run on a local filesystem, the output is as expected, that is:

---------
3
None
None
None
---------

However, when I run it on MooseFS, I get:

---------
3
None
Traceback (most recent call last):
  File "/mnt/mfs/f13/test.py", line 14, in <module>
    print(posix.lockf(fd,posix.F_TLOCK,0))
BlockingIOError: [Errno 35] Resource temporarily unavailable
---------

and that is not correct, because a process, that owns a posix lock on a file, is supposed to be able to lock this file again.
So I checked the operation history and this is what I got (I cut out everything that happened before the open/create operation of file "lock.file" as it is not relevant):

---------
05.19 16:38:13.441623: uid:0 gid:0 pid:766 cmd:lookup (1,lock.file): ENOENT (No such file or directory)
05.19 16:38:13.442336: uid:0 gid:0 pid:766 cmd:create (1,lock.file,O_RDWR|O_CREAT,-rw-r--r--:0100644): OK (0.0,233781,1.0,[-rw-r--r--:0100644,1,0,0,1621442293,1621442293,1621442293,0]) (direct_io:0,keep_cache:0,append_mode:0) [handle:01000004]
05.19 16:38:13.443110: uid:0 gid:0 pid:766 cmd:setlkw (inode:233781 owner:0000000000018723 start:0 end:18446744073709551615 type:W) [handle:01000004]: OK
05.19 16:38:15.449501: uid:0 gid:0 pid:766 cmd:setlk (inode:233781 owner:0000000000E3D000 start:0 end:18446744073709551615 type:W) [handle:01000004]: EAGAIN (Resource temporarily unavailable)
05.19 16:38:15.450339: uid:0 gid:0 pid:766 cmd:access (1,0x1): OK
05.19 16:38:15.451038: uid:0 gid:0 pid:766 cmd:lookup (1,f13): OK (1.0,8243,1.0,[drwxr-xr-x:0040755,2,0,0,1621442079,1621442071,1621442071,1001824])
05.19 16:38:15.451110: uid:0 gid:0 pid:766 cmd:access (8243,0x1): OK
05.19 16:38:15.451743: uid:0 gid:0 pid:766 cmd:lookup (8243,test.py): OK (0.0,233775,1.0,[-rwxr-xr-x:0100755,1,0,0,1621442269,1621442071,1621442071,559])
05.19 16:38:15.451801: uid:0 gid:0 pid:766 cmd:access (233775,0x4): OK (forced - kernel bug workaround)
05.19 16:38:15.451868: uid:0 gid:0 pid:766 cmd:open (233775,O_RDONLY) (using cached data from lookup): OK (direct_io:0,keep_cache:1,append_mode:0) [handle:01000005]
05.19 16:38:15.452915: uid:0 gid:0 pid:766 cmd:setlk (inode:233775 owner:FFFFFFFF8EB01020 start:0 end:18446744073709551615 type:U) [handle:01000005]: OK
05.19 16:38:15.453583: uid:0 gid:0 pid:766 cmd:setlk (inode:233775 owner:FFFFFFFF8EB01020 start:0 end:18446744073709551615 type:U) [handle:01000005]: OK
05.19 16:38:15.454143: uid:0 gid:0 pid:766 cmd:flush (233775) - releasing all POSIX-type locks for 00000000000002FE (received from kernel): OK
05.19 16:38:15.454151: uid:0 gid:0 pid:766 cmd:flush (233775) [handle:01000005,uselocks:2,lock_owner:00000000000002FE]: OK
05.19 16:38:15.454204: uid:0 gid:0 pid:766 cmd:release (233775) [handle:01000005,uselocks:2,lock_owner:00000000000002FE]: OK
05.19 16:38:15.456512: uid:0 gid:0 pid:766 cmd:setlk (inode:233781 owner:FFFFFFFFCDEFB9F0 start:0 end:18446744073709551615 type:U) [handle:01000004]: OK
05.19 16:38:15.457012: uid:0 gid:0 pid:766 cmd:flush (233781) - releasing all POSIX-type locks for 00000000000002FE (received from kernel): OK
05.19 16:38:15.457022: uid:0 gid:0 pid:766 cmd:flush (233781) [handle:01000004,uselocks:2,lock_owner:00000000000002FE]: OK
05.19 16:38:15.457478: uid:0 gid:0 pid:766 cmd:release (233781) - releasing all POSIX-type locks for 0000000000E3D000 (left by kernel): OK
05.19 16:38:15.457850: uid:0 gid:0 pid:766 cmd:release (233781) - releasing all POSIX-type locks for 0000000000018723 (left by kernel): OK
05.19 16:38:15.457861: uid:0 gid:0 pid:766 cmd:release (233781) [handle:01000004,uselocks:2,lock_owner:00000000000002FE]: OK
---------

Just to be sure, I ran it on Linux on MooseFS, it behaved like FreeBSD on local filesystem: displayed the file descriptor number and 3 "None"s. Operation log (from the same moment as the one above, so opening/creating the lockfile) looks like that:

---------
05.19 19:04:33.608503: uid:0 gid:0 pid:1703 cmd:lookup (233782,lock.file): ENOENT (No such file or directory)
05.19 19:04:33.609052: uid:0 gid:0 pid:1703 cmd:create (233782,lock.file,O_WRONLY|O_CREAT,-rw-rw-rw-:0100666): OK (0.0,233784,1.0,[-rw-r--r--:0100644,1,0,0,1621443873,1621443873,1621443873,0]) (direct_io:0,keep_cache:0,append_mode:0) [handle:0A000001]
05.19 19:04:33.609475: uid:0 gid:0 pid:1703 cmd:setlkw (inode:233784 owner:53C3FEC6E42D7114 start:0 end:18446744073709551615 type:W) [handle:0A000001]: OK
05.19 19:04:35.124774: uid:0 gid:0 pid:1667 cmd:getattr (233782) [no handle]: OK (1.0,[drwxr-xr-x:0040755,2,0,0,1621443864,1621443873,1621443873,1])
05.19 19:04:35.610893: uid:0 gid:0 pid:1703 cmd:setlk (inode:233784 owner:53C3FEC6E42D7114 start:0 end:18446744073709551615 type:W) [handle:0A000001]: OK
05.19 19:04:37.613568: uid:0 gid:0 pid:1703 cmd:setlk (inode:233784 owner:53C3FEC6E42D7114 start:0 end:18446744073709551615 type:U) [handle:0A000001]: OK
05.19 19:04:37.616895: uid:0 gid:0 pid:1703 cmd:flush (233784) - releasing all POSIX-type locks for 53C3FEC6E42D7114 (received from kernel): OK
05.19 19:04:37.616898: uid:0 gid:0 pid:1703 cmd:flush (233784) [handle:0A000001,uselocks:2,lock_owner:53C3FEC6E42D7114]: OK
---------

So, clearly, on FreeBSD MooseFS gets some weird owner lock values from FUSE, while on Linux it all looks consistent. As a result, the second lock fails, because MooseFS thinks some other process is trying to obtain the lock.
Comment 1 Alan Somers freebsd_committer freebsd_triage 2021-05-19 18:43:43 UTC
Thanks for reporting this.  I see the bug now.  libfuse doesn't really define what the owner field is supposed to be, but the pid seems like a good choice.  That's what FreeBSD uses for local locks.  Linux does something more complicated, but I don't know that there's a good reason for it.  I'll fix it.
Comment 2 Alan Somers freebsd_committer freebsd_triage 2021-05-20 01:14:13 UTC
Created attachment 225109 [details]
Fix the lock owner field during FUSE_SETLK

Agata, could you please test MooseFS on FreeBSD 14.0-CURRENT with this patch?
Comment 3 Agata 2021-05-21 12:50:39 UTC
Hi,

I tested it with the patch, the script works now and the lock owner value is consistent in all calls.
Comment 4 Alan Somers freebsd_committer freebsd_triage 2021-06-02 21:45:20 UTC
In code review now.
Comment 5 jSML4ThWwBID69YC 2021-06-22 20:14:32 UTC
Just a note, the provided patch applies cleanly to 13.0-RELEASE and resolves the issue.
Comment 6 commit-hook freebsd_committer freebsd_triage 2021-06-26 02:41:12 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=18b19f8c6e04935a63a951afe0e540674bc94455

commit 18b19f8c6e04935a63a951afe0e540674bc94455
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2021-05-20 01:10:15 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2021-06-26 02:40:08 +0000

    fusefs: correctly set lock owner during FUSE_SETLK

    During FUSE_SETLK, the owner field should uniquely identify the calling
    process.  The fusefs module now sets it to the process's pid.
    Previously, it expected the calling process to set it directly, which
    was wrong.

    libfuse also apparently expects the owner field to be set during
    FUSE_GETLK, though I'm not sure why.

    PR:             256005
    Reported by:    Agata <chogata@moosefs.pro>
    MFC after:      2 weeks
    Reviewed by:    pfg
    Differential Revision: https://reviews.freebsd.org/D30622

 sys/fs/fuse/fuse_vnops.c     |  4 ++--
 tests/sys/fs/fusefs/locks.cc | 50 ++++++++++++++++++++++++++------------------
 2 files changed, 32 insertions(+), 22 deletions(-)
Comment 7 commit-hook freebsd_committer freebsd_triage 2021-07-27 17:40:34 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7d9771228715bf2a32c7db323accecfb7f2f861c

commit 7d9771228715bf2a32c7db323accecfb7f2f861c
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2021-05-20 01:10:15 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2021-07-27 17:34:08 +0000

    fusefs: correctly set lock owner during FUSE_SETLK

    During FUSE_SETLK, the owner field should uniquely identify the calling
    process.  The fusefs module now sets it to the process's pid.
    Previously, it expected the calling process to set it directly, which
    was wrong.

    libfuse also apparently expects the owner field to be set during
    FUSE_GETLK, though I'm not sure why.

    PR:             256005
    Reported by:    Agata <chogata@moosefs.pro>
    Reviewed by:    pfg
    Differential Revision: https://reviews.freebsd.org/D30622

    (cherry picked from commit 18b19f8c6e04935a63a951afe0e540674bc94455)

 sys/fs/fuse/fuse_vnops.c     |  4 ++--
 tests/sys/fs/fusefs/locks.cc | 50 ++++++++++++++++++++++++++------------------
 2 files changed, 32 insertions(+), 22 deletions(-)
Comment 8 commit-hook freebsd_committer freebsd_triage 2021-07-27 17:57:38 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c7fe00ae403f08b17e8376cc98195171234bf3d9

commit c7fe00ae403f08b17e8376cc98195171234bf3d9
Author:     Alan Somers <asomers@FreeBSD.org>
AuthorDate: 2021-05-20 01:10:15 +0000
Commit:     Alan Somers <asomers@FreeBSD.org>
CommitDate: 2021-07-27 17:44:28 +0000

    fusefs: correctly set lock owner during FUSE_SETLK

    During FUSE_SETLK, the owner field should uniquely identify the calling
    process.  The fusefs module now sets it to the process's pid.
    Previously, it expected the calling process to set it directly, which
    was wrong.

    libfuse also apparently expects the owner field to be set during
    FUSE_GETLK, though I'm not sure why.

    PR:             256005
    Reported by:    Agata <chogata@moosefs.pro>
    Reviewed by:    pfg
    Differential Revision: https://reviews.freebsd.org/D30622

    (cherry picked from commit 18b19f8c6e04935a63a951afe0e540674bc94455)

 sys/fs/fuse/fuse_vnops.c     |  4 ++--
 tests/sys/fs/fusefs/locks.cc | 50 ++++++++++++++++++++++++++------------------
 2 files changed, 32 insertions(+), 22 deletions(-)