Tried in 10.4, 11.1-RELEASE, 11.1-STABLE, and 11.2-PRERELEASE client and server. Currently client and server are 11.2-PRERELEASE. Ktrace shows the following : 66181 mysqld CALL close(0x30) 66181 mysqld RET openat 48/0x30 66181 mysqld CALL fcntl(0x30,F_SETLK,0x7fffdd3e5cc0) 66181 mysqld RET close 0 66181 mysqld RET fcntl -1 errno 13 Permission denied Examining a full trace, the files being locked are never locked twice by MySQL or locked by another process. The file closed in the first line is a different file than that opened in the second line. MySQL does this same operation tens or hundreds of thousands of times successfully then fails on one. From all of the trace data that I've been able to gather, the FCNTL works 100% of the time IF the close returns before another thread calls open and F_SETLK and fails 100% of the time that the SETLK completes before the close returns in another thread. Observation affects the results. Failure occurs tens to hundreds of times more rapidly when not tracing the process. The higher the network latency, the more likely it is to happen. With a latency of 200uS, it happens in seconds on a loaded server. With a latency of 100us, it happens in tens of seconds. With a latency of 20uS it happens rarely, and below 15uS I have yet to see this failure. No kernel messages are logged. I have duplicated the problem on a variety of hardware, from 28 core Supermicro motherboards with ECC memory and E5-2XXX V4's to laptops with i3's, 5's, or 7's. The filesystem setup is as follows : server : ZFS on 11.2-PRERELEASE configured for very low latency (optimized SSDs and persistent write caches or sync=disabled). The filesystem is either a base ZFS filesystem or a clone of a snapshot (for easy testing, it happens on either). The client mounts the server system via NFS4 and also runs 11-2-PRERELEASE. Tested with 100Mb, gigabit, 50 gigabit, and 100Gigabit NICs.
I don't recall seeing this PR before, so I'm afraid it is "better late than never". It so happens, that a recent bug/fix might be what caused this. It was a use-after-free of open/lock owners in the housekeeping function (which might have resulted in it processing the wrong open/lock owner.) The fix is committed to main as 1cedb4ea1a79 and will be in 13.1 when it is released. If it is possible to redo your tests with FreeBSD13.1 once it is released and then report back here, that would be appreciated. (A code inspection did not find anything in the sources which might cause this.)