Hello, Flock no longer works on FreeBSD 13p0 when using FuseFS. This has been tested with two different fusefs clients. Tested: MooseFS 3.0.115 from the ports tree. Tested: fusefs-sshfs-3.7.1 from packages. Here's a simple test to demonstrate the issue using sshfs. 1: Install fusefs-sshfs, flock, and bash. pkg install fusefs-sshfs flock bash 2: Create a sshfs mount. You can do this locally by allowing root login over ssh, or any other user. I'll use root to avoid any permission type issues. service sshd onestart mkdir /mnt/test sshfs root@127.0.0.1:/root /mnt/test The contents of /root should now be showing at /mnt/test 3: Create the test script at /mnt/test/flock-test.sh ------------------- #!/usr/bin/env bash exec 200> ./lock.file echo "Acquire lock" flock -n 200 || exit 1 echo "Sleep 5 secs." sleep 5 echo "Release lock" flock -u 200 || exit 2 echo "Lock released" exit --------------- 4: Make the script executable chmod u+x flock-test.sh 5: Run the script twice. ./flock-test.sh There seems to be different errors depending on if the 'lock.file' previously exists. Note, the same issue happens on MooseFS, but it's more complicated to setup. Here are MooseFS setup instructions, previously written for bug #245689, https://bz-attachments.freebsd.org/attachment.cgi?id=213598 The flock-test.sh script works as expected on FreeBSD 12.2.
Here's the Flock test using PHP instead of sh. ---------- flock-test.php --------------- <?php echo "Hello"; print " World \n"; $fp = fopen("lock.file", "w"); if (flock($fp, LOCK_EX)) { print "Got lock!\n"; print "Sleep five secs.\n"; sleep(5); flock($fp, LOCK_UN); print "Released lock!\n"; } ?> -------------------------------------------- Be sure to run it on a FuseFS mount, and run it twice. For some reason the PHP version succeeds the first time so long as the 'lock.file' file does not exist.
Thanks for the good reproduction case. I'll try to dig into this soon.
Ok, you're going to have to help me out here. What am I looking for? How do I tell if the test passes or fails? What is the difference in behavior on FreeBSD 13.0 vs 12.2?
On FreeBSD 13, the flock function appears to either not create the lock, or not release it. In cases of the lock file already existing, the entire process just hangs when tested with PHP. For example, run the flock-test.sh on FreeBSD 12.2 twice. -- root@freebsd12:/storage/chunk/test # ./flock-test.sh Acquire lock Sleep 5 secs. Release lock Lock released root@freebsd12:/storage/chunk/test # ./flock-test.sh Acquire lock Sleep 5 secs. Release lock Lock released -- It works both times. Now run it on FreeBSD 13. -- root@freebsd13:/storage/chunk/test # ./flock-test.sh Acquire lock Sleep 5 secs. Release lock flock: 200: Invalid argument root@freebsd13:/storage/chunk/test # ./flock-test.sh Acquire lock -- Both runs show errors, when it should just work. Using PHP to test instead of the .sh script shows different errors, but still ends up in locking/unlocking not working. FreeBSD 12.2 is working. Here's the test run twice. -- root@freebsd12:/storage/chunk/test # php flock-test.php Hello World Got lock! Sleep five secs. Released lock! root@freebsd12:/storage/chunk/test # php flock-test.php Hello World Got lock! Sleep five secs. Released lock! -- Now test it on FreeBSD 13. This will actually work if the lock file does not exist. It hangs if the lock file does exist. For example. -- root@freebsd13:/storage/chunk/test # php flock-test.php Hello World Got lock! Sleep five secs. Released lock! -- Now 'ls' the current directory and see that the lock.file file exists. Then run the test again. -- root@freebsd13:/storage/chunk/test # php flock-test.php Hello World ... The process hangs. You need to <ctrl><c> to get out of it.
With the newer instructions I can reproduce this on 13.0-RELEASE. However, I can't reproduce it on stable/13 or 14.0-CURRENT. That means it's probably a dupe of PR 253500, which is fixed in stable/13 but not 13.0-CURRENT. *** This bug has been marked as a duplicate of bug 253500 ***
Confirmed. Rebuilding the kernel with the patch at https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253500 resolves the issue. Any idea when this is going to be in 13-RELEASE?
It will be included in 13.1, which will probably be released in late 2021 or early 2022. I doubt the security officer would approve an errata for this bug; it isn't serious enough. Are you sure the bug doesn't affect 12.2-RELEASE too? It looks to me like it should. I wonder if your test file systems are requesting local locks in your setup. That would be one way to workaround the bug.
Hello, I've tested this across multiple 12.2 servers, and two 13.0 servers. None of the 12.2 ones show the issue, and both the 13.0 ones do. I ran the test three more times on a 12.2 server just now. Here's the results. ----- root@compute02:/storage/chunk/test # uname -a FreeBSD compute02 12.2-RELEASE-p6 FreeBSD 12.2-RELEASE-p6 GENERIC amd64 root@compute02:/storage/chunk/test # ./flock-test.sh Acquire lock Sleep 5 secs. Release lock Lock released root@compute02:/storage/chunk/test # ./flock-test.sh Acquire lock Sleep 5 secs. Release lock Lock released root@compute02:/storage/chunk/test # ./flock-test.sh Acquire lock Sleep 5 secs. Release lock Lock released --- I did test turning off fuse based locking when using MooseFS. This had no effect on FreeBSD 13. Flock fails either way. For reference, the fuse mount option is: -o mfsnobsdlocks -o mfsnoposixlocks This issue first came up because it breaks any application utilizing flock while using FuseFS. For example, Nextcloud, Horde, phpMyAdmin all break. I suspect anybody using a distributed storage system such as MooseFS, LizardFS, GlusterFS, or CephFS to run application based workloads will end up triggering this. Is breaking user space applications not serious enough to approve an errata? Please let me know if I can help in any other way.
If the change is not too large / risky an errata is not a big hurdle. If someone identifies a commit(s) to cherry-pick from stable/13 to releng/13.0, and fills out as much of https://www.freebsd.org/security/errata-template.txt as possible we can queue it up to go along with other advisories and errata in the future.
Before we consider an errata, I want to know why your test passes on 12.2-RELEASE. The buggy code that was fixed by 253500 was already present in that release. Also, I want to know why -o mfsnoposixlocks didn't fix the problem on 13.0-RELEASE. It should've.
(In reply to Alan Somers from comment #10) Yes, we definitely need to understand the issue and be sure that any change is both necessary and sufficient.
(In reply to Alan Somers from comment #10) I've over a dozen FuseFS mount points with MooseFS on FreeBSD 12.2. They all pass the tests. I have two FreeBSD 13 servers. One patched, and one not. The tests pass on the patched one just like it does on 12.2. The non-patched one fails the tests as shown above. I have hardware and time, so I'm open to any suggestions that will provide better information. I've also reached out to the MooseFS developers to see if they have any ideas. I'll fill out the errata form once there is more confidence in a resolution. Thank you for looking into this.
Please repeat your tests while running this dtrace command: sudo dtrace -i 'fbt:fusefs:fuse_vnop_advlock:entry {printf("dataflags=%#x, flags=%#x", ((struct fuse_data*)(args[0]->a_vp->v_mount->mnt_data))->dataflags, args[0]->a_flags);}' In the output, look for bit 0x2000 (FSESS_POSIX_LOCKS) in the dataflags field. If it's not set, then the locks are being handled locally rather than in the FUSE server. Also, look for bit 0x20 (F_FLOCK) in the flags field. If it _is_ set, then the locks are also being handled locally rather than in the FUSE server. When I run your test script with sshfs, I see that FSESS_POSIX_LOCKS is unset and F_FLOCK is set, on both 13.0-RELEASE and 14.0-CURRENT.
(In reply to Alan Somers from comment #13) It looks like FuseFS locks are not being used on FreeBSD 12.2, 13.0 or 13.0 with the patch from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253500. That begs the question as to why FLOCK works on 12.2, but not on 13 without the patch? FreeBSD 13 using MooseFS and flock-test.sh with the patch from https://bz-attachments.freebsd.org/attachment.cgi?id=222867. Test ran twice. dtrace: description 'fbt:fusefs:fuse_vnop_advlock:entry ' matched 1 probe CPU ID FUNCTION:NAME 8 74373 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 22 74373 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 3 74373 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 8 74373 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 FreeBSD 13 using MooseFS and flock-test.sh. No patch. Test ran twice. dtrace: description 'fbt:fusefs:fuse_vnop_advlock:entry ' matched 1 probe CPU ID FUNCTION:NAME 3 74415 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 6 74415 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 10 74415 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 9 74415 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 FreeBSD 12.2 using MooseFS and flock-test.sh. Test ran twice. dtrace: description 'fbt:fuse:fuse_vnop_advlock:entry ' matched 1 probe CPU ID FUNCTION:NAME 2 69829 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 14 69829 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 17 69829 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 10 69829 fuse_vnop_advlock:entry dataflags=0x13014,flags=0x20 Note: I had to alter the dtrace command to reflect fuse instead of fusefs for FreeBSD 12. I hope this is correct: dtrace -i 'fbt:fuse:fuse_vnop_advlock:entry {printf("dataflags=%#x,flags=%#x", ((struct fuse_data*)(args[0]->a_vp->v_mount->mnt_data))->dataflags,args[0]->a_flags);}' Just for fun, here's the test run using SSHFS instead of MooseFS. This corresponds to your test. Freebsd 13 using sshfs and flock-test.sh. No patch. Test ran twice. dtrace: description 'fbt:fusefs:fuse_vnop_advlock:entry ' matched 1 probe CPU ID FUNCTION:NAME 0 82246 fuse_vnop_advlock:entry dataflags=0x11004,flags=0x20 4 82246 fuse_vnop_advlock:entry dataflags=0x11004,flags=0x20 0 82246 fuse_vnop_advlock:entry dataflags=0x11004,flags=0x20 7 82246 fuse_vnop_advlock:entry dataflags=0x11004,flags=0x20
Adding to it, here's testing SSHFS on FreeBSD 12.2. Using the flock-test.sh script and running it twice. dtrace: description 'fbt:fuse:fuse_vnop_advlock:entry ' matched 1 probe CPU ID FUNCTION:NAME 14 69790 fuse_vnop_advlock:entry dataflags=0x11004,flags=0x20 15 69790 fuse_vnop_advlock:entry dataflags=0x11004,flags=0x20 21 69790 fuse_vnop_advlock:entry dataflags=0x11004,flags=0x20 23 69790 fuse_vnop_advlock:entry dataflags=0x11004,flags=0x20
So bug 253500 seems like red herring. We need to find the real cause of the error. Are you proficient with dtrace?
(In reply to Alan Somers from comment #16) Sorry, no. Never used it before today. I'm willing to try though. What am I looking for?
Ok, I see the problem now. There are two relevant changes: 1) In FreeBSD 12.2, the bug would never be triggered because fuse_vnop_advlock would immediately fall back to vop_stdadvlock. It would be triggered by posix file locks, if the FUSE server enables them, but yours does not. But revision 542711e52079f65647ac1daadf0c9e74cb221f3e, which fixed an unrelated bug, reordered some code in fuse_vnop_advlock. So in 13.0, it checks the lock parameters before falling back to fuse_vnop_advlock. 2) Revision 929acdb19acb67cc0e6ee5439df98e28a84d4772 fixed bug 253500 So to summarize the bug's presentation in different versions: * In FreeBSD 12.2 (and probably 12.1), local posix locks work fine, flocks only work locally, and FUSE POSIX locks suffer from the 253500 bug. * In FreeBSD 13.0, both local and FUSE, posix and flock locks suffer from the 253500 bug. * In FreeBSD 14.0-CURRENT and stable/13, local locks and FUSE POSIX locks work fine, and FUSE FLOCK locks are handled locally.
Thank you for the explanation. If I understand correctly, I need FreeBSD 13 with the patch from https://bz-attachments.freebsd.org/attachment.cgi?id=222867 to avoid breaking user land applications. Is that correct? If so, I'll work on that errata form with my user case as an example for why this is worth pushing through.
Yes. Except that it would be better to use the as-committed change: https://github.com/freebsd/freebsd-src/commit/929acdb19acb67cc0e6ee5439df98e28a84d4772#diff-ada8cc502a0a4b7832a0178dae49b9e6bdad8ba16a415a212603817f51bc4c84 And the errata should also include this follow-up commit. https://github.com/freebsd/freebsd-src/commit/9c5aac8f2e84ca4bbdf82514302c08c0453ec59b#diff-ada8cc502a0a4b7832a0178dae49b9e6bdad8ba16a415a212603817f51bc4c84
Hi, We've been asked at MooseFS to look into it, but before we could offer any insight, you guys seemed to have found the problem :) I upgraded one of our lab machines running FreeBSD to 14-CURRENT and I can confirm that flock once again "works", as in: it is handled locally in a way that does not cause any problems to the applications using it when run on a FUSEFS filesystem. Hopefully soon flocks will also be forwarded to FUSEFS, but for now the userland problem is at least gone and situation is again like in 12.2-RELEASE. However, we also tested posix locks while we were at it, because they are forwarded to FUSE, so MooseFS gets to handle them and they were also affected by the bug and subsequent patch. But since it looks like there is a separate (if maybe related) problem, I will open a new issue with detailed description.
The new issue: bug #256005