Bug 228087

Summary: F_SETLK randomly fails on NFS4 in threaded operation in MySQL
Product: Base System Reporter: barry.boes <barry.boes>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Many People    
Priority: ---    
Version: 11.1-STABLE   
Hardware: Any   
OS: Any   

Description barry.boes@acciodata.com 2018-05-09 04:25:36 UTC
Tried in 10.4, 11.1-RELEASE, 11.1-STABLE, and 11.2-PRERELEASE client and server.  Currently client and server are 11.2-PRERELEASE.

Ktrace shows the following :

 66181 mysqld   CALL  close(0x30)
 66181 mysqld   RET   openat 48/0x30
 66181 mysqld   CALL  fcntl(0x30,F_SETLK,0x7fffdd3e5cc0)
 66181 mysqld   RET   close 0
 66181 mysqld   RET   fcntl -1 errno 13 Permission denied

Examining a full trace, the files being locked are never locked twice by MySQL or locked by another process.  The file closed in the first line is a different file than that opened in the second line.   MySQL does this same operation tens or hundreds of thousands of times successfully then fails on one.  From all of the trace data that I've been able to gather, the FCNTL works 100% of the time IF the close returns before another thread calls open and F_SETLK and fails 100% of the time that the SETLK completes before the close returns in another thread.
    Observation affects the results.  Failure occurs tens to hundreds of times more rapidly when not tracing the process.

The higher the network latency, the more likely it is to happen.  With a latency of 200uS, it happens in seconds on a loaded server.  With a latency of 100us, it happens in tens of seconds.  With a latency of 20uS it happens rarely, and below 15uS I have yet to see this failure.

No kernel messages are logged.  I have duplicated the problem on a variety of hardware, from 28 core Supermicro motherboards with ECC memory and E5-2XXX V4's to laptops with i3's, 5's, or 7's.

The filesystem setup is as follows :

server : ZFS on 11.2-PRERELEASE configured for very low latency (optimized SSDs and persistent write caches or sync=disabled).

The filesystem is either a base ZFS filesystem or a clone of a snapshot (for easy testing, it happens on either).

The client mounts the server system via NFS4 and also runs 11-2-PRERELEASE.  Tested with 100Mb, gigabit, 50 gigabit, and 100Gigabit NICs.