Bug 216088

Summary: Get NFSv4.1 client's failure during recovery keep unrecovered opens
Product: Base System Reporter: Rick Macklem <rmacklem>
Component: kernAssignee: Rick Macklem <rmacklem>
Status: Closed FIXED    
Severity: Affects Some People Keywords: patch
Priority: --- Flags: rmacklem: mfc-stable11+
rmacklem: mfc-stable10+
Version: CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Don't free unrecovered opens when recovery of them fails none

Description Rick Macklem freebsd_committer freebsd_triage 2017-01-14 23:07:44 UTC
Created attachment 178896 [details]
Don't free unrecovered opens when recovery of them fails

When the NFSv4.1 client is recovering from a server crash/reboot and the
recovery of opens fails, the code in current free's the unrecovered opens.

Byte range locks cannot be kept, since a conflicting lock might have been
acquired by a different client while it is not recovered. The server is
supposed to not allow a recovery when it has previously not been recovered.

However, since all opens for POSIX clients are Share_Deny_None, it seems
safe to attempt to recover them again, in case the server allows this.

This patch changes the client recovery code so that it retains unrecovered
opens, so a subsequent recovery can attempt to recover them.

Normally, this should not happen, but the AmazonEFS server crashes/reboots
frequently, including while recovery is still being done.
Comment 1 commit-hook freebsd_committer freebsd_triage 2017-04-11 20:28:28 UTC
A commit references this bug:

Author: rmacklem
Date: Tue Apr 11 20:28:15 UTC 2017
New revision: 316717
URL: https://svnweb.freebsd.org/changeset/base/316717

Log:
  During a server crash recovery, fix the NFSv4.1 client for a NFSERR_BADSESSION
  during recovery.

  If the NFSv4.1 client gets a NFSv4.1 NFSERR_BADSESSION reply to an Open/Lock
  operation while recovering from the server crash/reboot, allow the opens
  to be retained for a subsequent recovery attempt. Since NFSv4.1 servers
  should only reply NFSERR_BADSESSION after a crash/reboot that has lost
  state, this case should almost never happen.
  However, for the AmazonEFS file service, this has been observed when
  the client does a fresh TCP connection for RPCs.

  Reported by:	cperciva
  Tested by:	cperciva
  PR:		216088
  MFC after:	2 weeks

Changes:
  head/sys/fs/nfsclient/nfs_clstate.c
Comment 2 Rick Macklem freebsd_committer freebsd_triage 2017-04-26 23:37:02 UTC
Patch has been committed and MFC'd.