Created attachment 178896 [details] Don't free unrecovered opens when recovery of them fails When the NFSv4.1 client is recovering from a server crash/reboot and the recovery of opens fails, the code in current free's the unrecovered opens. Byte range locks cannot be kept, since a conflicting lock might have been acquired by a different client while it is not recovered. The server is supposed to not allow a recovery when it has previously not been recovered. However, since all opens for POSIX clients are Share_Deny_None, it seems safe to attempt to recover them again, in case the server allows this. This patch changes the client recovery code so that it retains unrecovered opens, so a subsequent recovery can attempt to recover them. Normally, this should not happen, but the AmazonEFS server crashes/reboots frequently, including while recovery is still being done.
A commit references this bug: Author: rmacklem Date: Tue Apr 11 20:28:15 UTC 2017 New revision: 316717 URL: https://svnweb.freebsd.org/changeset/base/316717 Log: During a server crash recovery, fix the NFSv4.1 client for a NFSERR_BADSESSION during recovery. If the NFSv4.1 client gets a NFSv4.1 NFSERR_BADSESSION reply to an Open/Lock operation while recovering from the server crash/reboot, allow the opens to be retained for a subsequent recovery attempt. Since NFSv4.1 servers should only reply NFSERR_BADSESSION after a crash/reboot that has lost state, this case should almost never happen. However, for the AmazonEFS file service, this has been observed when the client does a fresh TCP connection for RPCs. Reported by: cperciva Tested by: cperciva PR: 216088 MFC after: 2 weeks Changes: head/sys/fs/nfsclient/nfs_clstate.c
Patch has been committed and MFC'd.