Bug 260011 - NFS: Unresponsive mount on AWS EFS
Summary: NFS: Unresponsive mount on AWS EFS
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-RELEASE
Hardware: Any Any
: --- Affects Many People
Assignee: Rick Macklem
URL:
Keywords: needs-qa
Depends on:
Blocks:
 
Reported: 2021-11-24 09:08 UTC by Alex Dupre
Modified: 2022-09-23 21:08 UTC (History)
3 users (show)

See Also:
rmacklem: mfc-stable13+
rmacklem: mfc-stable12+


Attachments
ps axHL (92.04 KB, text/plain)
2022-05-20 10:44 UTC, Alex Dupre
no flags Details
procstat -kk (16.37 KB, text/plain)
2022-05-20 10:45 UTC, Alex Dupre
no flags Details
netstat -a (14.06 KB, text/plain)
2022-05-20 10:45 UTC, Alex Dupre
no flags Details
nfsstat -E -c (2.29 KB, text/plain)
2022-05-20 10:46 UTC, Alex Dupre
no flags Details
mark session slots bad and set session defunct when all slots bad (5.93 KB, patch)
2022-05-22 00:39 UTC, Rick Macklem
no flags Details | Diff
mark session slots bad and set session defunct when all slots bad (5.94 KB, patch)
2022-05-23 02:01 UTC, Rick Macklem
no flags Details | Diff
do not mark recovery needed after nfsrpc_renew() for BADSESSION (1.12 KB, patch)
2022-05-23 21:27 UTC, Rick Macklem
no flags Details | Diff
handle bogus slot# replies for the Sequence op (44.43 KB, patch)
2022-05-27 00:04 UTC, Rick Macklem
no flags Details | Diff
handle bogus slot# replies for the Sequence op (43.54 KB, patch)
2022-05-27 01:36 UTC, Rick Macklem
no flags Details | Diff
Do not mark recovery for NFSERR_BADSESSION (1.48 KB, patch)
2022-07-11 01:08 UTC, Rick Macklem
no flags Details | Diff
Mark bad slots and DestroySession when all bad (stable/13) (48.59 KB, patch)
2022-07-11 01:13 UTC, Rick Macklem
no flags Details | Diff
Mark bad slots and DestroySession when all bad (stable/13) (48.99 KB, patch)
2022-07-12 01:25 UTC, Rick Macklem
no flags Details | Diff
fix nd_slotid during recovery from a NFSERR_BADSESSION server reply (896 bytes, patch)
2022-08-25 23:49 UTC, Rick Macklem
no flags Details | Diff
fix nd_slotid during recovery from a NFSERR_BADSESSION server reply (874 bytes, patch)
2022-08-27 22:49 UTC, Rick Macklem
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Dupre freebsd_committer freebsd_triage 2021-11-24 09:08:04 UTC
I'm experiencing annoying issues with an AWS EFS mountpoint on FreeBSD 13 EC2 instances. The filesystem is mounted by 3 instances (2 with the same access patterns, 1 with a different one)

Initially I had the /etc/fstab entry configured with: 

`rw,nosuid,noatime,bg,nfsv4,minorversion=1,rsize=1048576,wsize=1048576,timeo=600,oneopenown`

and this after a few days led my java application to have all threads blocked on never returning `stat64` kernel calls, without the ability to even kill -9 the process.

After digging it up it seems the normal behavior for hard mount points, even if I fail to understand why one should prefer to have the system completely freezed when the NFS mount point is not responding.

So I later changed the configuration with:

`rw,nosuid,noatime,bg,nfsv4,minorversion=1,intr,soft,retrans=2,rsize=1048576,wsize=1048576,timeo=600,oneopenown`

by adding `intr,soft,retrans=2`.

Btw, I think there is a typo in mount_nfs(8), it says to set `retrycnt` instead of `retrans` for the `soft` option, can you confirm?

After the change `nfsstat -m` reports:
`nfsv4,minorversion=1,oneopenown,tcp,resvport,soft,intr,cto,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=16777216,timeout=120,retrans=2`

I wonder why it seems that the timeo,rsize,wsize have been ignored, but this is irrelevant to the issue.

After a few days the application on the two similar EC2 instances stopped working again, though. Any command accessing the mounted efs filesystem didn't complete in reasonable time (ls, df, umount, etc.), but I could kill the processes. The only way to recover the situation was to reboot the instances, though.

On one of them I've seen the following kernel messages, but they have been generated only when I tried to debug the issue hours later, and only on one EC2 instance, so I'm not sure if they are relevant or helpful:

```
kernel: newnfs: server 'fs-xxx.efs.us-east-1.amazonaws.com' error: fileid changed. fsid 0:0: expected fileid 0x4d2369b89a58a920, got 0x2. (BROKEN NFS SERVER OR MIDDLEWARE)
kernel: nfs server fs-xxx.efs.us-east-1.amazonaws.com:/: not responding
```

The third EC2 instance survived and was still able to access the filesystem, but I think it wasn't accessing the filesystem when there have been the network/nfs issue  that affected the two others.
Comment 1 Rick Macklem freebsd_committer freebsd_triage 2021-11-24 15:44:42 UTC
Assorted comments:
- FreeBSD 13.0 shipped with a bug in the TCP stack which could
  result in a missed socket receive upcall.
  --> This normally causes problems for the server, but could
      cause problems for the client side as well.
  When you do a "netstat -a" on the hung system, the TCP connection
  for the mount is shown as ESTABLISHED with Recv-Q non-zero.
  --> The fix is tp upgrade to stable/13.
  There is more info on this in PR#254590.

- AWS uses a small, fixed number of open_owners, which is why
  "oneopenown" is needed. If it also uses a small, fixed number
  of lock_owners (for byte range locking instead of Windows Open locks),
  then you could run out of these.
  --> If so, you are SOL (Shit Outa Luck) and my only suggestion
      would be to try an nfsv3 mount with the "nolockd" mount option.
  "netstat -E -c" should show you how many lock_owners are allocated
  at the time of the hang.

Other than that, if AWS has now added support for delegations, there
could be assorted breakage. Not running the nfscbd(8) daemon should
avoid isuuing of delegations, if that is the case.

If a soft mount fails a syscall, then the session slot is screwed up
and this makes the mount fail in weird ways.
"umount -N <mnt_path>" is a much better way to deal with hung mounts.

As a starting point, posting the output of:
ps axHl
procstat -kk
netstat -a
nfsstat -E -c
on the client when hung will give us more information.

Good luck with it, rick
Comment 2 Colin Percival freebsd_committer freebsd_triage 2021-11-24 16:46:17 UTC
No ideas about the hanging, unless it's networking related, but...

> expected fileid 0x4d2369b89a58a920, got 0x2

This looks suspiciously like ENOENT is getting turned into a fileid value somewhere.
Comment 3 Rick Macklem freebsd_committer freebsd_triage 2021-11-24 22:45:10 UTC
Or the server has followed the *unix* tradition of
using fileno == 2 for the root directory of a file
system, but did not indicate that a server file system
boundary was crossed by changing the FSID (file system ID),
which is an attribute that defines which server file
system is which.

I have no idea if Amazon's EFSI uses multiple file systems?
Comment 4 Alex Dupre freebsd_committer freebsd_triage 2021-11-25 10:15:20 UTC
Thanks for the debug suggestions, I'll run those commands next time it happens and report here.

For the records, I'm not running any additional NFS daemon and haven't anything NFS-specific in rc.conf, it's just a plain mount, and it's not a heavily accessed file system either.

I see you recommend to use `hard` mounts. I tried to use the `soft` mounts to avoid the infinite hanging and inability to kill, hoping that would help in recovering, but from what I've understood now even the hard mount point should recover when the NFS server comes back, so the problem was really a different one and is affecting both mount types.

Any idea why a few arguments seems to have been ignored? Does it make sense to set higher rsize/wsize on tcp endpoints? I see that the recent efs automounter doesn't use any of them, so probably it is not worth.
Comment 5 Rick Macklem freebsd_committer freebsd_triage 2021-11-25 16:41:04 UTC
Mount options are "negotiated" with the NFS server and
other tunables in the system.
For example, to increase rsize/wsize to 128K, you must
set vfs.maxbcachebuf=131072 in /boot/loader.conf.

To increase rsize/wsize to 1Mbyte, you must
set vfs,maxbcachebuf=1048576 in /boot/loader.conf
and set kern.ipc.maxsockbuf=4737024 (or larger)
in /etc/sysctl.conf.
--> This assumes you have at least 4Gbytes of ram on the
    system.  The further you move away from defaults,
    the less widely tested your configuration is.
Also, in the case rszie/wsize, the system will use the
largest size that is "negotiable" given other tuning.
The use of the rsize/wsize options is mainly to reduce
the size below the maximum negotiable.
--> From my limited testing, sizes above 256K do not
    perform better, but what works best for EFS?
    I have no idea.

If a server restarts, clients should recover. If a client
is hung like you describe, either due to an unresponsive server,
a broken server (that generates bogus replies or no replies to
certain RPCs) or a client bug:
# umount -N <mnt_path>
is your best bet at getting rid of the mount.
Comment 6 Rick Macklem freebsd_committer freebsd_triage 2021-11-25 17:13:59 UTC
Oh, and my comment 2.r.t. don't run the nfscbd may
not be good advice. Here's a more complete answer:

The nfscbd(8) provides callback handling when it
is running.  Callbacks need to be working for the
server to issue delegations or layouts (the latter is
pNFS only).

When the nfscbd(8) is not running, the server should
work fine and never issue delegations or layouts.
It should set a flag in the reply to the Sequence
operation (the first one in each compound RPC)
called SEQ4_STATUS_CB_PATH_DOWN. This is an FYI
for the client.

I found another round of bugs related to delegations
during a recent IETF NFSv4 testing event. These are
fixed in stable/13, but not 13.0.
--> As such, delegations are problematic and you don't
    want them being issued.
    --> Don't run the nfscbd(8) daemon.
Unfortunately Amazon does not attend these testing
events, so what their server does is ??? for me.

However, if it is known that the Amazon EFS never
issues delegations or layouts (I believe cpercival@
said that was the case three years ago), then the
server might be broken and get "perterbed" by the
callbacks not working.
--> In this case, you should run the nfscbd daemon
    by setting nfscbd_enable="YES" in your /etc/rc.conf
    or start it manually.

And, given the above, I think you can see why my initial
advice was just "don't run it".
Comment 7 Colin Percival freebsd_committer freebsd_triage 2021-11-25 18:24:04 UTC
FWIW, EFS is still documented as "does not support... Client delegation or callbacks of any type".
Comment 8 Alex Dupre freebsd_committer freebsd_triage 2022-05-20 10:44:34 UTC
Created attachment 234050 [details]
ps axHL
Comment 9 Alex Dupre freebsd_committer freebsd_triage 2022-05-20 10:45:09 UTC
Created attachment 234051 [details]
procstat -kk
Comment 10 Alex Dupre freebsd_committer freebsd_triage 2022-05-20 10:45:43 UTC
Created attachment 234052 [details]
netstat -a
Comment 11 Alex Dupre freebsd_committer freebsd_triage 2022-05-20 10:46:10 UTC
Created attachment 234053 [details]
nfsstat -E -c
Comment 12 Alex Dupre freebsd_committer freebsd_triage 2022-05-20 10:49:04 UTC
I happened again, I've attached the outputs of the requested programs.

While locked it was continuing to emit the following messages:

nfs server fs-xxx.efs.us-east-1.amazonaws.com:/: is alive again
nfs server fs-xxx.efs.us-east-1.amazonaws.com:/: is alive again
nfs server fs-xxx.efs.us-east-1.amazonaws.com:/: is alive again

I had to unmount and re-mount the EFS share to make it working again.
Comment 13 Rick Macklem freebsd_committer freebsd_triage 2022-05-22 00:39:57 UTC
Created attachment 234101 [details]
mark session slots bad and set session defunct when all slots bad

Ok, I looked at the attachments...
- There were 154 ExchangeIDs. Thats about 153 more than there
  should be. These should only occur once when doing the mount,
  plus after a server reboot.
  - Unfortunately the Amazon EFS has a fundamental design flaw,
    where a new TCP connection may connect to a different "cluster"
    (whatever Amazon considers a cluster to be) and this different
    cluster does not know any of the open/lock state and acts like
    a rebooted NFSv4 server.
    There may be other things that cause this.
To use it reliably, you need to avoid these ExchangeIDs (recovery cycles).
If you monitor the TCP connection to the server via repeated "netstat -a"
calls and you see the connection change (different client port#), then
that is causing the problem (because of the Amazon EFS design flaw).

Using "soft,intr" is asking for trouble, because an interrupted syscall
leaves the file system in a non-deterministic state. Something called
sessions maintains "exactly once" RPC semantics and this breaks the
session slot. Once all the session slots are broken, the client must
do one of these recoveries.
--> It is much better to use hard mounts and "umount -N" if/when a mount
    point is hung.

For this case, it is stuck partially through one of these recoveries,
because the "nfscl" thread that does the recovery is stuck waiting for
a session slot. I'm not sure how that can happen, since the session
would normally be marked "defunct" so that "nfscl" would not be waiting
for a slot in the session to become available.

I have attached this patch, which might help?
It does two things differently...
- It scans through all sessions looking for a match to mark defunct
  instead of just doing the first/current one. I cannot think how a
  new session would be created without the previous one being marked
  defunct, but since your "ps axHl" suggests that happens, this might
  fix the problem
- It keeps track of bad slots (caused by a soft,intr RPC failing without
  completing) and marks the session defunct when all slots are bad.
  This might make "soft,intr" mounts work better.

If you can try the patch and it improves the situation, it could be
considered for a FreeBSD commit. I doubt it will ever be committed
otherwise, because I have no way of reproducing what you are getting.
Comment 14 Rick Macklem freebsd_committer freebsd_triage 2022-05-23 02:01:26 UTC
Created attachment 234129 [details]
mark session slots bad and set session defunct when all slots bad

Fixed version of patch.
Comment 15 Rick Macklem freebsd_committer freebsd_triage 2022-05-23 21:27:35 UTC
Created attachment 234164 [details]
do not mark recovery needed after nfsrpc_renew() for BADSESSION

This is a much simpler patch that might fix the problem you saw.
I am not sure, but without this patch, the "NFSCLFLAGS_RECOVER"
would be set twice when nfsrpc_renew() gets a NFSERR_BADSESSION
error reply.  I still can't see how that would cause two recovery
cycles, but if it did, that would explain how a broken session
might not be marked "defunct".

The patch also adds a diagnostic printf() for the case where the
broken session is not marked "defunct".

The previous patch would not have been acceptable for a commit to
FreeBSD.

I will once more mention that "soft" mounts will never work correctly
for NFSv4.1.
Comment 16 Alex Dupre freebsd_committer freebsd_triage 2022-05-24 09:28:25 UTC
I've just built the custom kernel and removed "soft,intr" on one of the machines accessing the EFS filesystem. ExchangeId and CreateSess are both 1 now, let's see how it goes in the next days, thank you for the patch.

For the records, these are the values on the other machines with stock kernel:

Uptime 181 days:
   ExchangeId   CreateSess
          109          221

Uptime 4 days:
   ExchangeId   CreateSess
            4            7
Comment 17 Rick Macklem freebsd_committer freebsd_triage 2022-05-27 00:04:23 UTC
Created attachment 234241 [details]
handle bogus slot# replies for the Sequence op

cpercival@ emailed with some diagnostics (that I did not
realize were not in 13.0) which indicates that the Amazon
EFS server is pretty badly broken.
It sometimes (I don't know how frequently) returns the wrong
slotid for a session. (It is required by the RFC to be the same
as the request.)

Once this happens, there is no way to know which slot# the server
actually used.

This patch (which is rather large and, unfortunately, will not apply
to 13.0, but should apply to stable/13 and 13.1, I think?) marks both
of the slots (the one in the request and the one in the reply) bad,
so they will no longer be used.

When all slots get marked "bad", it does a DestroySession operation,
which should make subsequent uses of the session fail with
NFSERR_BADSESSION.
An NFSERR_BADSESSION reply should, in turn, start a recovery cycle
which should create a new session that can be used.

This patch has been tested against a hacked FreeBSD nfsd that replies
with a bogus slot# once every 100 RPCs and seems to work ok.

I have no idea if the Amazon EFS server will behave the same way, but
I am hoping cpercival@ will be able to test it.

I believe this serious bug in the Amazon EFS server would explain
your hangs.
Comment 18 Rick Macklem freebsd_committer freebsd_triage 2022-05-27 01:36:35 UTC
Created attachment 234243 [details]
handle bogus slot# replies for the Sequence op

Update the patch to one that applies to stable/13.
I will try and come up with a version that applies
to releng/13.0.

I can also provide a version that applies to main/FreeBSD14,
if anyone wants it.
Comment 19 Alex Dupre freebsd_committer freebsd_triage 2022-05-27 05:40:12 UTC
The machine where I've applied the patch has been already updated to 13.1-RELEASE, and I was going to do the same for the others, so a patch for 13.0 is not needed by me.

In these 3 days of testing of the previous patch nothing happened, the ExchangeIDs count remained the same on all (patched and not patched) machines and no kernel message appeared in /var/log/messages, I'll build a new kernel asap with the updated patch.
Comment 20 Rick Macklem freebsd_committer freebsd_triage 2022-05-27 13:49:46 UTC
Sounds good. cperciva@ is trying to contact Amazon,
to see if they can fix the bug that the 13.1 diagnostics
found.
Comment 21 commit-hook freebsd_committer freebsd_triage 2022-05-27 21:34:48 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a7bb120f8b8735bc4c417b3c8fc2308c3d2964aa

commit a7bb120f8b8735bc4c417b3c8fc2308c3d2964aa
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-05-27 21:32:46 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-05-27 21:32:46 +0000

    nfscl: Add a diagnostic printf() for a "should never happen" case

    When a NFSv4.1/4.2 session to the NFS server (not a pNFS DS) is
    replaced, the old session should always be marked defunct by
    nfsess_defunct being set non-zero.

    However, the hang reported by the PR suggests that this might
    be the case.

    This patch adds a printf() to indicate this has somehow happened.

    PR:     260011
    MFC after:      2 weeks

 sys/fs/nfsclient/nfs_clrpcops.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
Comment 22 Alex Dupre freebsd_committer freebsd_triage 2022-06-03 07:04:38 UTC
Today on the patched machine I got:

   ExchangeId   CreateSess
            3            5

The nfs client changed tcp port and I couldn't find any new kernel message in /var/log/messages
Comment 23 Rick Macklem freebsd_committer freebsd_triage 2022-06-03 23:47:24 UTC
Nothing surprising. Some TCP reconnects are happening
and I cannot guess why.

It is a known (I call it a design flaw, but they might
consider it a feature) that a new TCP connection can be
made to a different "cluster" (their term) resulting in
a recovery being triggered by the server's NFSERR_BADSESSION
reply.

The fact that there are no logged kernel printf()s indicates
that you are not observing the problem that cperciva@ emailed
me about and is exploring with Amazon.

Btw, during recovery there will be a:
create_session that fails
exchange_id that works
create_session that works
which is why the createsession count more than exchange_id.

Until it logs messages or hangs, we probably will not know more
about this.
Comment 24 commit-hook freebsd_committer freebsd_triage 2022-06-10 22:14:21 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=70efcaffa180d5b00c7b8eaca7a271693933a38c

commit 70efcaffa180d5b00c7b8eaca7a271693933a38c
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-05-27 21:32:46 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-06-10 22:12:50 +0000

    nfscl: Add a diagnostic printf() for a "should never happen" case

    When a NFSv4.1/4.2 session to the NFS server (not a pNFS DS) is
    replaced, the old session should always be marked defunct by
    nfsess_defunct being set non-zero.

    However, the hang reported by the PR suggests that this might
    be the case.

    This patch adds a printf() to indicate this has somehow happened.

    PR:     260011

    (cherry picked from commit a7bb120f8b8735bc4c417b3c8fc2308c3d2964aa)

 sys/fs/nfsclient/nfs_clrpcops.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
Comment 25 commit-hook freebsd_committer freebsd_triage 2022-06-10 22:24:23 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d40f62f89d6a5ab9fc9c1911c783beef9c80188d

commit d40f62f89d6a5ab9fc9c1911c783beef9c80188d
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-05-27 21:32:46 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-06-10 22:22:57 +0000

    nfscl: Add a diagnostic printf() for a "should never happen" case

    When a NFSv4.1/4.2 session to the NFS server (not a pNFS DS) is
    replaced, the old session should always be marked defunct by
    nfsess_defunct being set non-zero.

    However, the hang reported by the PR suggests that this might
    be the case.

    This patch adds a printf() to indicate this has somehow happened.

    PR:     260011

    (cherry picked from commit a7bb120f8b8735bc4c417b3c8fc2308c3d2964aa)

 sys/fs/nfsclient/nfs_clrpcops.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
Comment 26 commit-hook freebsd_committer freebsd_triage 2022-07-01 21:44:59 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c0d14b0220ae22d25462cef191f20e9f04c5e87e

commit c0d14b0220ae22d25462cef191f20e9f04c5e87e
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-01 21:43:17 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-01 21:43:17 +0000

    mount_nfs: Warn that intr, soft are not safe for NFSv4

    If the "intr" and/or "soft" mount options are used for
    NFSv4 mounts, the protocol can be broken when the
    operation returns without waiting for the RPC reply.
    The likelyhood of failure increases for NFSv4.1/4.2
    mounts, since the session slot will be broken when
    an RPC reply is not processed.

    This is mentioned in the BUGS section of "man mount_nfs",
    but more needs to be done.  This patch adds code that
    generates a warning message when the mount is done.

    PR: 260011
    Reviewed by:    emaste
    MFC after:      2 weeks
    Differential Revision:  https://reviews.freebsd.org/D35407

 sbin/mount_nfs/mount_nfs.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
Comment 27 commit-hook freebsd_committer freebsd_triage 2022-07-03 20:38:50 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=aba7a81ab71ebef1bb20404634bc3c58ba615310

commit aba7a81ab71ebef1bb20404634bc3c58ba615310
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-03 20:37:23 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-03 20:37:23 +0000

    mount_nfs.8: Update BUGS section for NFSv4.1/4.2

    If the "intr" and/or "soft" mount options are used for
    NFSv4 mounts, the protocol can be broken when the
    operation returns without waiting for the RPC reply.
    The likelyhood of failure increases for NFSv4.1/4.2
    mounts, since the session slot will be broken when
    an RPC reply is not processed.

    This is mentioned in the BUGS section of "man mount_nfs",
    but there was no specific mention of the session slot
    problem.  This patch adds a sentence for this case.

    PR: 260011
    Reviewed by:    gbe
    MFC after:      2 weeks
    Differential Revision:  https://reviews.freebsd.org/D35693

 sbin/mount_nfs/mount_nfs.8 | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
Comment 28 commit-hook freebsd_committer freebsd_triage 2022-07-08 14:39:35 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=be7b87de16ffbabb81989e13a4b19a178e3ab8ee

commit be7b87de16ffbabb81989e13a4b19a178e3ab8ee
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-08 14:37:36 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-08 14:37:36 +0000

    nfscl: Fix setting of nfsess_defunct for nfscl_hasexpired()

    Commit a7bb120f8b87 added a printf for the case where recovery
    has not marked the session defunct by setting nfsess_defunct
    to 1.  It turns out that nfscl_hasexpired() calls
    nfsrpc_setclient() directly, without setting nfsess_defunct.
    This patch replaces the printf with code that sets
    nfsess_defunct to 1 to handle this case.

    If SIGTERM is issued to a process when it is doing I/O on
    an "intr" mount, the NFSv4 server may reply NFSERR_BADSTATEID,
    due to the Open being prematurely closed.
    This can result in a call to nfscl_hasexpired() to do a
    recovery.

    This would explain at least one hang described in the PR.

    PR: 260011
    MFC after:      2 weeks

 sys/fs/nfsclient/nfs_clrpcops.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)
Comment 29 commit-hook freebsd_committer freebsd_triage 2022-07-09 00:04:18 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=326bcf9394c74ac19afda7c741b3521d91b46b10

commit 326bcf9394c74ac19afda7c741b3521d91b46b10
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-08 23:58:06 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-08 23:58:06 +0000

    nfscl: Add a cred argument to nfscl_reqstart()

    To deal with broken session slots caused by the use of the
    "soft" and/or "intr" mount options, nfsv4_sequencelookup()
    will be modified to track the potentially broken session
    slots.  Then, when all session slots are potentially
    broken, do a DeleteSession operation, so that the NFSv4
    server will reply NFSERR_BADSESSION to uses of the session.
    These changes will be done in future commits.  However,
    to do the DeleteSession RPC, a "cred" argument is needed
    for nfscl_reqstart().  This patch adds this argument,
    which is unused at this time.  If the argument is NULL,
    it indicates that DeleteSession should not be done
    (usually because the RPC does not use sessions).

    This patch should not cause any semantics change.

    PR: 260011
    MFC after:      2 weeks

 sys/fs/nfs/nfs_commonsubs.c     |   2 +-
 sys/fs/nfs/nfs_var.h            |   3 +-
 sys/fs/nfs/nfscl.h              |   4 +-
 sys/fs/nfsclient/nfs_clrpcops.c | 178 +++++++++++++++++++++-------------------
 sys/fs/nfsserver/nfs_nfsdport.c |  16 ++--
 5 files changed, 108 insertions(+), 95 deletions(-)
Comment 30 commit-hook freebsd_committer freebsd_triage 2022-07-09 00:29:24 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2b766d5e5a73fce7120131d56bffc1715f71e7c9

commit 2b766d5e5a73fce7120131d56bffc1715f71e7c9
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-09 00:27:23 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-09 00:27:23 +0000

    nfscl: Change the cred argument to non-NULL for pNFS proxies

    Commit 326bcf9394c7 added a "cred" argument to nfscl_reqstart().
    For the pNFS proxy calls on the server, the argument
    should be "cred" instead of NULL.
    This patch fixes this.

    Since the argument is not yet used, this patch
    should not result in a semantics change.

    PR: 260011
    MFC after:      2 weeks

 sys/fs/nfsserver/nfs_nfsdport.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)
Comment 31 commit-hook freebsd_committer freebsd_triage 2022-07-09 15:03:56 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=dff31ae1c59cab9437e88bfd0f2abd35ddaa98f1

commit dff31ae1c59cab9437e88bfd0f2abd35ddaa98f1
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-09 15:02:14 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-09 15:02:14 +0000

    nfscl: Move nfsrpc_destroysession into nfscommon

    This patch moves nfsrpc_destroysession() into nfscommon.ko
    and also modifies its arguments slightly.  This will allow
    the function to be called from nfsv4_sequencelookup() in
    a future commit.

    This patch should not result in a semantics change.

    PR: 260011
    MFC after:      2 weeks

 sys/fs/nfs/nfs_commonsubs.c     | 28 ++++++++++++++++++++++++++++
 sys/fs/nfs/nfs_var.h            |  4 ++--
 sys/fs/nfsclient/nfs_clrpcops.c | 28 ----------------------------
 sys/fs/nfsclient/nfs_clstate.c  |  6 +++---
 4 files changed, 33 insertions(+), 33 deletions(-)
Comment 32 commit-hook freebsd_committer freebsd_triage 2022-07-09 21:47:05 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=40ada74ee1dade637675cc22d50da8254046a197

commit 40ada74ee1dade637675cc22d50da8254046a197
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-09 21:43:16 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-09 21:43:16 +0000

    nfscl: Add optional support for slots marked bad

    This patch adds support for session slots marked bad
    to nfsv4_sequencelookup().  An additional boolean
    argument indicates if the check for slots marked bad
    should be done.

    The "cred" argument added to nfscl_reqstart() by
    commit 326bcf9394c7 is now passed into nfsv4_setquence()
    so that it can optionally set the boolean argument
    for nfsv4_sequencelookup().  When optionally enabled,
    nfsv4_setsequence() will do a DestroySession when all
    slots are marked bad.

    Since the code that marks slots bad is not yet committed,
    this patch should not result in a semantics change.

    PR: 260011
    MFC after:      2 weeks

 sys/fs/nfs/nfs_commonkrpc.c      |  2 +-
 sys/fs/nfs/nfs_commonsubs.c      | 69 ++++++++++++++++++++++++++++++----------
 sys/fs/nfs/nfs_var.h             |  4 +--
 sys/fs/nfs/nfsclstate.h          |  1 +
 sys/fs/nfsserver/nfs_nfsdstate.c |  4 +--
 5 files changed, 59 insertions(+), 21 deletions(-)
Comment 33 commit-hook freebsd_committer freebsd_triage 2022-07-09 23:14:22 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=627f1555f571b5328637dbfbe441ed89c84db20c

commit 627f1555f571b5328637dbfbe441ed89c84db20c
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-09 23:12:31 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-09 23:12:31 +0000

    nfscl: Initialize nfsess_badslots to zero

    Commit 40ada74ee1da added a field to mark bad session slots.
    This patch ensures that the field is initialized to 0.

    PR: 260011
    MFC after:      2 weeks

 sys/fs/nfsclient/nfs_clrpcops.c | 1 +
 1 file changed, 1 insertion(+)
Comment 34 commit-hook freebsd_committer freebsd_triage 2022-07-10 20:35:02 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=981ef32230b2fe46969c90b53864bdca4f1c3ae5

commit 981ef32230b2fe46969c90b53864bdca4f1c3ae5
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-10 20:33:19 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-10 20:33:19 +0000

    nfscl: Enable detection of bad session slots

    To deal with broken session slots caused by the use of the
    "soft" and/or "intr" mount options, nfsv4_sequencelookup()
    has been modified to track the potentially broken session
    slots (commit 40ada74ee1da).  Then, when all session slots
    are potentially broken, nfsv4_sequencelookup() does a
    DeleteSession operation, so that the NFSv4.1/4.2 server will
    reply NFSERR_BADSESSION to uses of the session.
    The client will then recover by doing a CreateSession to
    acquire a new session.

    This patch adds the code that marks potentially bad
    slots, so that the above semantics become functional.
    It has been successfully tested against a FreeBSD
    NFSv4.1/4.2 server, but does not work against a Linux 5.15
    NFSv4.1/4.2 server. (The Linux 5.15 server creates
    a new session with the same sessionid as the destroyed
    one and, as such, keeps returning NFSERR_BADSESSION.
    I believe this is a bug in the Linux server.)

    However, this should not cause a regression and will
    make "intr" mounts fairly usable against the NFSv4.1/4.2
    servers where it works.

    PR: 260011
    MFC after:      2 weeks

 sys/fs/nfs/nfs_commonkrpc.c | 53 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 44 insertions(+), 9 deletions(-)
Comment 35 commit-hook freebsd_committer freebsd_triage 2022-07-10 21:00:08 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8f4a5fc6bc6c4e9e0739326199d73ee4401ebd58

commit 8f4a5fc6bc6c4e9e0739326199d73ee4401ebd58
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-10 20:56:38 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-10 20:56:38 +0000

    nfscl: Do not call nfscl_hasexpired() for NFSv4.1/4.2

    Commit 981ef32230b2 enabled marking of potentially bad
    session slots when an RPC is interrupted if the "intr"
    mount option is used.  As such, it no longer makes
    sense to call nfscl_hasexpired() for I/O operations that
    reply NFSERR_BADSTATEID for NFSv4.1/4.2, which does a full
    recovery of NFSv4 open state, destroying all byte range locks.
    Recovery of open state should not be usually needed, since
    the session slot has been marked potentially bad and,
    although opens for the process that has been terminated via
    a signal may be broken, locks for other processes will still
    be valid.

    This patch disables calls to nfscl_hasexpired for NFSv4.1/4.2
    mounts, when I/O RPCs receive NFSERR_BADSTATEID replies.
    It does not affect the behaviour of NFSv4.0 mounts nor
    hard (non "intr") mounts.

    PR: 260011
    MFC after:      2 weeks

 sys/fs/nfsclient/nfs_clrpcops.c | 42 ++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)
Comment 36 Rick Macklem freebsd_committer freebsd_triage 2022-07-11 01:08:37 UTC
Created attachment 235179 [details]
Do not mark recovery for NFSERR_BADSESSION

Since NFSERR_BADSESSION triggers recovery elsewhere, do not do
it in these two cases, since that could cause a duplicate recovery.
Comment 37 Rick Macklem freebsd_committer freebsd_triage 2022-07-11 01:13:11 UTC
Created attachment 235180 [details]
Mark bad slots and DestroySession when all bad (stable/13)

When an RPC exits without processing an RPC reply
for NFSv4.1/4.2, the session slot may be broken,
since the sequence_number has not been maintained.

This patch marks those slots bad and, if all slot are
marked bad, it does a DestroySession, so that a new
session will be acquired in response to a NFSERR_BADSESSION
error reply from the server.

This does not work correctly against a Linux 5.15 knfsd server,
due to what I believe is a bug in the server, where the
new sessionid is the same as the old one.
Comment 38 Rick Macklem freebsd_committer freebsd_triage 2022-07-11 01:20:51 UTC
The two patches are in main and will be MFC'd.

They only affect mounts that use the "soft" and/or
"intr" mount options.

The second patch does not work against a Linux 5.15
knfsd server, due to what I believe is a bug.
The Linux server replies with the same sessionid
for a new session as the old one.

The second patch should apply to stable/13 now,
but will not be needed once the MFC happens in
one month (early Aug. 2022).

I will check to see if it applies to releng13.1.
Comment 39 commit-hook freebsd_committer freebsd_triage 2022-07-11 22:59:27 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2adb30740b12d4b80b8a1eb04b58ce0f6eb51de1

commit 2adb30740b12d4b80b8a1eb04b58ce0f6eb51de1
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-11 22:51:27 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-11 22:58:07 +0000

    nfscl: Replace "cred" with NULL to cleanup code

    Commit 326bcf9394c7 added a new "cred" argument to nfscl_reqstart().
    Fsinfo is a NFSv3 RPC and since the "cred" argument is not
    used for NFSv3, it does not matter what is passed in.
    However, to be consistent with the rest of the patch, change the
    argument to NULL.

    This patch should not result in a semantics change.

    PR: 260011
    MFC after:      2 weeks

 sys/fs/nfsclient/nfs_clrpcops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 40 commit-hook freebsd_committer freebsd_triage 2022-07-11 23:53:36 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d4a11b3e3bddb1fa3bdd101e12aea6f5937356fa

commit d4a11b3e3bddb1fa3bdd101e12aea6f5937356fa
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-11 23:50:34 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-11 23:50:34 +0000

    nfscl: Fix CreateSession for an established ClientID

    Commit 981ef32230b2 added optional use of the session
    slots marked bad to recover a new session when all
    slots are marked bad.  The recovery worked against
    a FreeBSD NFSv4.1/4.2 server, but not a Linux one.
    It turns out that it was a bug in the FreeBSD client
    and not the Linux server.

    This patch fixes the client so that DeleteSession
    followed by CreateSession after receiving a
    NFSERR_BADSESSION error reply works against the
    Linux server (and conforms to the RFC).

    This also implies that the FreeBSD NFSv4.1/4.2
    server needs to be fixed in a future commit.
    Without the fix, the FreeBSD server does a full
    recovery, including creation of a new ClientID,
    but since "intr" mounts were broken, this does
    not result in a regression.

    This patch only affects the case where a CreateSession
    is done for an already confirmed ClientID, which was
    not being done prior to commit 981ef32230b2.

    PR: 260011
    MFC after:      2 weeks

 sys/fs/nfsclient/nfs_clrpcops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 41 Rick Macklem freebsd_committer freebsd_triage 2022-07-12 01:25:09 UTC
Created attachment 235200 [details]
Mark bad slots and DestroySession when all bad (stable/13)

This version has a 1 line fix which corrects handling
of the CreateSession sequenceid# so that it works
correctly against a Linux 5.15 knfsd NFSv4.1/4.2 server.
Comment 42 commit-hook freebsd_committer freebsd_triage 2022-07-13 23:30:46 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=088ba4356a114416b1d68bf5ae3b3e0accf6d0df

commit 088ba4356a114416b1d68bf5ae3b3e0accf6d0df
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-13 23:28:56 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-13 23:28:56 +0000

    nfsd: Fix CreateSession for an established ClientID

    I mis-read the RFC w.r.t. handling of the sequenceid
    when a CreateSession is done after the initial one
    that confirms the ClientID.  Fortunately this does
    not affect most extant NFSv4.1/4.2 clients, since
    they only acquire a single session for TCP for a
    ClientID (Solaris might be an exception?).

    This patch fixes the server to handle this case,
    where the RFC requires the sequenceid be incremented
    for each CreateSession and is required to reply to
    a retried CreateSession with a cached reply.
    It adds a field to nfsclient called lc_prevsess,
    which caches the sessionid, which is the only field
    in a CreateSession reply that will change for a
    retry, to implement this reply cache.

    The recent commits up to d4a11b3e3bdd that mark
    session slots bad when "intr" and/or "soft" mounts
    are used by the client needs this server patch.
    Without this patch, the client will do a full
    recovery, including a new ClientID, losing all
    byte range locks.  However, prior to the recent
    client commits, the client would hang when all
    session slots were bad, so even without this
    patch it is not a regression.

    PR: 260011
    MFC after:      2 weeks

 sys/fs/nfs/nfsrvstate.h          |  1 +
 sys/fs/nfsserver/nfs_nfsdstate.c | 63 ++++++++++++++++++++++++++--------------
 2 files changed, 43 insertions(+), 21 deletions(-)
Comment 43 commit-hook freebsd_committer freebsd_triage 2022-07-15 01:50:14 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=02915e5ff91f45bf7454c35a24a54bcda83d47ac

commit 02915e5ff91f45bf7454c35a24a54bcda83d47ac
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-01 21:43:17 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-15 01:45:43 +0000

    mount_nfs: Warn that intr, soft are not safe for NFSv4

    If the "intr" and/or "soft" mount options are used for
    NFSv4 mounts, the protocol can be broken when the
    operation returns without waiting for the RPC reply.
    The likelyhood of failure increases for NFSv4.1/4.2
    mounts, since the session slot will be broken when
    an RPC reply is not processed.

    This is mentioned in the BUGS section of "man mount_nfs",
    but more needs to be done.  This patch adds code that
    generates a warning message when the mount is done.

    PR: 260011
    (cherry picked from commit c0d14b0220ae22d25462cef191f20e9f04c5e87e)

 sbin/mount_nfs/mount_nfs.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)
Comment 44 commit-hook freebsd_committer freebsd_triage 2022-07-18 00:07:49 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9fe17b520a4a0dfd79d31a5d271705b214f474fc

commit 9fe17b520a4a0dfd79d31a5d271705b214f474fc
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-03 20:37:23 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-18 00:04:33 +0000

    mount_nfs.8: Update BUGS section for NFSv4.1/4.2

    If the "intr" and/or "soft" mount options are used for
    NFSv4 mounts, the protocol can be broken when the
    operation returns without waiting for the RPC reply.
    The likelyhood of failure increases for NFSv4.1/4.2
    mounts, since the session slot will be broken when
    an RPC reply is not processed.

    This is mentioned in the BUGS section of "man mount_nfs",
    but there was no specific mention of the session slot
    problem.  This patch adds a sentence for this case.

    PR: 260011

    (cherry picked from commit aba7a81ab71ebef1bb20404634bc3c58ba615310)

 sbin/mount_nfs/mount_nfs.8 | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
Comment 45 commit-hook freebsd_committer freebsd_triage 2022-07-18 00:12:52 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=775904f5b4ec82af9f8b536eaebe6876d2e1f412

commit 775904f5b4ec82af9f8b536eaebe6876d2e1f412
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-03 20:37:23 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-18 00:08:54 +0000

    mount_nfs.8: Update BUGS section for NFSv4.1/4.2

    If the "intr" and/or "soft" mount options are used for
    NFSv4 mounts, the protocol can be broken when the
    operation returns without waiting for the RPC reply.
    The likelyhood of failure increases for NFSv4.1/4.2
    mounts, since the session slot will be broken when
    an RPC reply is not processed.

    This is mentioned in the BUGS section of "man mount_nfs",
    but there was no specific mention of the session slot
    problem.  This patch adds a sentence for this case.

    PR: 260011

    (cherry picked from commit aba7a81ab71ebef1bb20404634bc3c58ba615310)

 sbin/mount_nfs/mount_nfs.8 | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
Comment 46 commit-hook freebsd_committer freebsd_triage 2022-07-23 20:42:34 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f115e0e9e3198da0f21a0cb32bee966a27181c8d

commit f115e0e9e3198da0f21a0cb32bee966a27181c8d
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-08 14:37:36 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-23 20:38:21 +0000

    nfscl: Fix setting of nfsess_defunct for nfscl_hasexpired()

    Commit a7bb120f8b87 added a printf for the case where recovery
    has not marked the session defunct by setting nfsess_defunct
    to 1.  It turns out that nfscl_hasexpired() calls
    nfsrpc_setclient() directly, without setting nfsess_defunct.
    This patch replaces the printf with code that sets
    nfsess_defunct to 1 to handle this case.

    If SIGTERM is issued to a process when it is doing I/O on
    an "intr" mount, the NFSv4 server may reply NFSERR_BADSTATEID,
    due to the Open being prematurely closed.
    This can result in a call to nfscl_hasexpired() to do a
    recovery.

    This would explain at least one hang described in the PR.

    PR: 260011
    MFC after:      2 weeks

    (cherry picked from commit be7b87de16ffbabb81989e13a4b19a178e3ab8ee)

 sys/fs/nfsclient/nfs_clrpcops.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)
Comment 47 commit-hook freebsd_committer freebsd_triage 2022-07-23 20:54:37 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c6e56b2a89ea54b4861a622772b86d04076499f3

commit c6e56b2a89ea54b4861a622772b86d04076499f3
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-08 14:37:36 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-23 20:50:38 +0000

    nfscl: Fix setting of nfsess_defunct for nfscl_hasexpired()

    Commit a7bb120f8b87 added a printf for the case where recovery
    has not marked the session defunct by setting nfsess_defunct
    to 1.  It turns out that nfscl_hasexpired() calls
    nfsrpc_setclient() directly, without setting nfsess_defunct.
    This patch replaces the printf with code that sets
    nfsess_defunct to 1 to handle this case.

    If SIGTERM is issued to a process when it is doing I/O on
    an "intr" mount, the NFSv4 server may reply NFSERR_BADSTATEID,
    due to the Open being prematurely closed.
    This can result in a call to nfscl_hasexpired() to do a
    recovery.

    This would explain at least one hang described in the PR.

    PR: 260011
    MFC after:      2 weeks

    (cherry picked from commit be7b87de16ffbabb81989e13a4b19a178e3ab8ee)

 sys/fs/nfsclient/nfs_clrpcops.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)
Comment 48 commit-hook freebsd_committer freebsd_triage 2022-07-24 19:51:31 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d0ab875c5f663d0beaa262bf3349d4d2238f505a

commit d0ab875c5f663d0beaa262bf3349d4d2238f505a
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-08 23:58:06 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-24 19:41:57 +0000

    nfscl: Add a cred argument to nfscl_reqstart()

    To deal with broken session slots caused by the use of the
    "soft" and/or "intr" mount options, nfsv4_sequencelookup()
    will be modified to track the potentially broken session
    slots.  Then, when all session slots are potentially
    broken, do a DeleteSession operation, so that the NFSv4
    server will reply NFSERR_BADSESSION to uses of the session.
    These changes will be done in future commits.  However,
    to do the DeleteSession RPC, a "cred" argument is needed
    for nfscl_reqstart().  This patch adds this argument,
    which is unused at this time.  If the argument is NULL,
    it indicates that DeleteSession should not be done
    (usually because the RPC does not use sessions).

    This patch should not cause any semantics change.

    PR: 260011
    MFC after:      2 weeks

    (cherry picked from commit 326bcf9394c74ac19afda7c741b3521d91b46b10)

 sys/fs/nfs/nfs_commonsubs.c     |   2 +-
 sys/fs/nfs/nfs_var.h            |   3 +-
 sys/fs/nfs/nfscl.h              |   4 +-
 sys/fs/nfsclient/nfs_clrpcops.c | 174 +++++++++++++++++++++-------------------
 sys/fs/nfsserver/nfs_nfsdport.c |  14 ++--
 5 files changed, 105 insertions(+), 92 deletions(-)
Comment 49 commit-hook freebsd_committer freebsd_triage 2022-07-25 21:03:30 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=516998709ade4cf551660fa572ed7c5aa0a7fcff

commit 516998709ade4cf551660fa572ed7c5aa0a7fcff
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-09 00:27:23 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-25 20:57:54 +0000

    nfscl: Change the cred argument to non-NULL for pNFS proxies

    Commit 326bcf9394c7 added a "cred" argument to nfscl_reqstart().
    For the pNFS proxy calls on the server, the argument
    should be "cred" instead of NULL.
    This patch fixes this.

    Since the argument is not yet used, this patch
    should not result in a semantics change.

    PR: 260011
    (cherry picked from commit 2b766d5e5a73fce7120131d56bffc1715f71e7c9)

 sys/fs/nfsserver/nfs_nfsdport.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)
Comment 50 commit-hook freebsd_committer freebsd_triage 2022-07-25 21:09:32 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=488f9d852787dd03126d7fac8d76316ecb86da84

commit 488f9d852787dd03126d7fac8d76316ecb86da84
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-09 15:02:14 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-25 21:05:45 +0000

    nfscl: Move nfsrpc_destroysession into nfscommon

    This patch moves nfsrpc_destroysession() into nfscommon.ko
    and also modifies its arguments slightly.  This will allow
    the function to be called from nfsv4_sequencelookup() in
    a future commit.

    This patch should not result in a semantics change.

    PR: 260011
    (cherry picked from commit dff31ae1c59cab9437e88bfd0f2abd35ddaa98f1)

 sys/fs/nfs/nfs_commonsubs.c     | 28 ++++++++++++++++++++++++++++
 sys/fs/nfs/nfs_var.h            |  4 ++--
 sys/fs/nfsclient/nfs_clrpcops.c | 28 ----------------------------
 sys/fs/nfsclient/nfs_clstate.c  |  6 +++---
 4 files changed, 33 insertions(+), 33 deletions(-)
Comment 51 commit-hook freebsd_committer freebsd_triage 2022-07-27 20:15:19 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=39f07893a5f6e12d6b3484de842d450ef6934904

commit 39f07893a5f6e12d6b3484de842d450ef6934904
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-09 21:43:16 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-27 20:10:02 +0000

    nfscl: Add optional support for slots marked bad

    This patch adds support for session slots marked bad
    to nfsv4_sequencelookup().  An additional boolean
    argument indicates if the check for slots marked bad
    should be done.

    The "cred" argument added to nfscl_reqstart() by
    commit 326bcf9394c7 is now passed into nfsv4_setquence()
    so that it can optionally set the boolean argument
    for nfsv4_sequencelookup().  When optionally enabled,
    nfsv4_setsequence() will do a DestroySession when all
    slots are marked bad.

    Since the code that marks slots bad is not yet committed,
    this patch should not result in a semantics change.

    PR: 260011
    (cherry picked from commit 40ada74ee1dade637675cc22d50da8254046a197)

 sys/fs/nfs/nfs_commonkrpc.c      |  2 +-
 sys/fs/nfs/nfs_commonsubs.c      | 69 ++++++++++++++++++++++++++++++----------
 sys/fs/nfs/nfs_var.h             |  4 +--
 sys/fs/nfs/nfsclstate.h          |  1 +
 sys/fs/nfsserver/nfs_nfsdstate.c |  4 +--
 5 files changed, 59 insertions(+), 21 deletions(-)
Comment 52 commit-hook freebsd_committer freebsd_triage 2022-07-27 20:24:21 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=99bdbd808e3234e57709950cf0fd88d39d1edf22

commit 99bdbd808e3234e57709950cf0fd88d39d1edf22
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-09 23:12:31 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-27 20:20:17 +0000

    nfscl: Initialize nfsess_badslots to zero

    Commit 40ada74ee1da added a field to mark bad session slots.
    This patch ensures that the field is initialized to 0.

    PR: 260011
    (cherry picked from commit 627f1555f571b5328637dbfbe441ed89c84db20c)

 sys/fs/nfsclient/nfs_clrpcops.c | 1 +
 1 file changed, 1 insertion(+)
Comment 53 commit-hook freebsd_committer freebsd_triage 2022-07-27 20:29:23 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=fb8e858c69fe1773ce1e28efda17fa23304f58ad

commit fb8e858c69fe1773ce1e28efda17fa23304f58ad
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-11 22:51:27 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-27 20:25:54 +0000

    nfscl: Replace "cred" with NULL to cleanup code

    Commit 326bcf9394c7 added a new "cred" argument to nfscl_reqstart().
    Fsinfo is a NFSv3 RPC and since the "cred" argument is not
    used for NFSv3, it does not matter what is passed in.
    However, to be consistent with the rest of the patch, change the
    argument to NULL.

    This patch should not result in a semantics change.

    PR: 260011

    (cherry picked from commit 2adb30740b12d4b80b8a1eb04b58ce0f6eb51de1)

 sys/fs/nfsclient/nfs_clrpcops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 54 commit-hook freebsd_committer freebsd_triage 2022-07-28 20:16:26 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f5fb5e07df5bd7a6df170474a3df724ee3538785

commit f5fb5e07df5bd7a6df170474a3df724ee3538785
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-10 20:33:19 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-28 20:11:07 +0000

    nfscl: Enable detection of bad session slots

    To deal with broken session slots caused by the use of the
    "soft" and/or "intr" mount options, nfsv4_sequencelookup()
    has been modified to track the potentially broken session
    slots (commit 40ada74ee1da).  Then, when all session slots
    are potentially broken, nfsv4_sequencelookup() does a
    DeleteSession operation, so that the NFSv4.1/4.2 server will
    reply NFSERR_BADSESSION to uses of the session.
    The client will then recover by doing a CreateSession to
    acquire a new session.

    This patch adds the code that marks potentially bad
    slots, so that the above semantics become functional.
    It has been successfully tested against a FreeBSD
    NFSv4.1/4.2 server, but does not work against a Linux 5.15
    NFSv4.1/4.2 server. (The Linux 5.15 server creates
    a new session with the same sessionid as the destroyed
    one and, as such, keeps returning NFSERR_BADSESSION.
    I believe this is a bug in the Linux server.)

    However, this should not cause a regression and will
    make "intr" mounts fairly usable against the NFSv4.1/4.2
    servers where it works.

    PR: 260011
    (cherry picked from commit 981ef32230b2fe46969c90b53864bdca4f1c3ae5)

 sys/fs/nfs/nfs_commonkrpc.c | 53 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 44 insertions(+), 9 deletions(-)
Comment 55 commit-hook freebsd_committer freebsd_triage 2022-07-28 20:21:28 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=99ff1bd55733d8fd20f6b70922639fa25b434ce9

commit 99ff1bd55733d8fd20f6b70922639fa25b434ce9
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-10 20:56:38 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-28 20:16:11 +0000

    nfscl: Do not call nfscl_hasexpired() for NFSv4.1/4.2

    Commit 981ef32230b2 enabled marking of potentially bad
    session slots when an RPC is interrupted if the "intr"
    mount option is used.  As such, it no longer makes
    sense to call nfscl_hasexpired() for I/O operations that
    reply NFSERR_BADSTATEID for NFSv4.1/4.2, which does a full
    recovery of NFSv4 open state, destroying all byte range locks.
    Recovery of open state should not be usually needed, since
    the session slot has been marked potentially bad and,
    although opens for the process that has been terminated via
    a signal may be broken, locks for other processes will still
    be valid.

    This patch disables calls to nfscl_hasexpired for NFSv4.1/4.2
    mounts, when I/O RPCs receive NFSERR_BADSTATEID replies.
    It does not affect the behaviour of NFSv4.0 mounts nor
    hard (non "intr") mounts.

    PR: 260011
    (cherry picked from commit 8f4a5fc6bc6c4e9e0739326199d73ee4401ebd58)

 sys/fs/nfsclient/nfs_clrpcops.c | 36 +++++++++++++++++++++++++++---------
 1 file changed, 27 insertions(+), 9 deletions(-)
Comment 56 commit-hook freebsd_committer freebsd_triage 2022-07-28 20:22:30 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9bbf57a2ad4666371d8de3cde0a9b8879e826590

commit 9bbf57a2ad4666371d8de3cde0a9b8879e826590
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-11 23:50:34 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-28 20:18:49 +0000

    nfscl: Fix CreateSession for an established ClientID

    Commit 981ef32230b2 added optional use of the session
    slots marked bad to recover a new session when all
    slots are marked bad.  The recovery worked against
    a FreeBSD NFSv4.1/4.2 server, but not a Linux one.
    It turns out that it was a bug in the FreeBSD client
    and not the Linux server.

    This patch fixes the client so that DeleteSession
    followed by CreateSession after receiving a
    NFSERR_BADSESSION error reply works against the
    Linux server (and conforms to the RFC).

    This also implies that the FreeBSD NFSv4.1/4.2
    server needs to be fixed in a future commit.
    Without the fix, the FreeBSD server does a full
    recovery, including creation of a new ClientID,
    but since "intr" mounts were broken, this does
    not result in a regression.

    This patch only affects the case where a CreateSession
    is done for an already confirmed ClientID, which was
    not being done prior to commit 981ef32230b2.

    PR: 260011
    (cherry picked from commit d4a11b3e3bddb1fa3bdd101e12aea6f5937356fa)

 sys/fs/nfsclient/nfs_clrpcops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 57 commit-hook freebsd_committer freebsd_triage 2022-07-28 20:27:31 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=db2eff1cff8c374f04030bb116c07e3112ecc516

commit db2eff1cff8c374f04030bb116c07e3112ecc516
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-07-13 23:28:56 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-07-28 20:20:33 +0000

    nfsd: Fix CreateSession for an established ClientID

    I mis-read the RFC w.r.t. handling of the sequenceid
    when a CreateSession is done after the initial one
    that confirms the ClientID.  Fortunately this does
    not affect most extant NFSv4.1/4.2 clients, since
    they only acquire a single session for TCP for a
    ClientID (Solaris might be an exception?).

    This patch fixes the server to handle this case,
    where the RFC requires the sequenceid be incremented
    for each CreateSession and is required to reply to
    a retried CreateSession with a cached reply.
    It adds a field to nfsclient called lc_prevsess,
    which caches the sessionid, which is the only field
    in a CreateSession reply that will change for a
    retry, to implement this reply cache.

    The recent commits up to d4a11b3e3bdd that mark
    session slots bad when "intr" and/or "soft" mounts
    are used by the client needs this server patch.
    Without this patch, the client will do a full
    recovery, including a new ClientID, losing all
    byte range locks.  However, prior to the recent
    client commits, the client would hang when all
    session slots were bad, so even without this
    patch it is not a regression.

    PR: 260011
    (cherry picked from commit 088ba4356a114416b1d68bf5ae3b3e0accf6d0df)

 sys/fs/nfs/nfsrvstate.h          |  1 +
 sys/fs/nfsserver/nfs_nfsdstate.c | 63 ++++++++++++++++++++++++++--------------
 2 files changed, 43 insertions(+), 21 deletions(-)
Comment 58 Rick Macklem freebsd_committer freebsd_triage 2022-07-28 20:34:42 UTC
The patches that help (but do not completely fix)
problems with using "soft" and/or "intr" mount
options for NFSv4 mounts have now been MFC'd to
stable/13.

As such, I am closing this PR.
If hangs occur for NFSv4 mounts using neither the
"soft" nor "intr" mount options occur, they should
be done as a separate PR, to avoid confusion.
Comment 59 Rick Macklem freebsd_committer freebsd_triage 2022-08-25 23:49:32 UTC
Created attachment 236118 [details]
fix nd_slotid during recovery from a NFSERR_BADSESSION server reply

This small patch fixes a problem where nd_slotid was
not being updated when the client did an RPC retry
after the server reported NFSERR_BADSESSION.
It might be related to this problem.

This patch will soon be in main, etc.
Comment 60 commit-hook freebsd_committer freebsd_triage 2022-08-26 03:36:32 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=2b612c9d3bb528551de9eaabbdbadae89a36ba8b

commit 2b612c9d3bb528551de9eaabbdbadae89a36ba8b
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-08-26 03:33:31 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-08-26 03:33:31 +0000

    nfscl: Fix handling of a bad session slot (NFSv4.1/4.2)

    When a session has been marked defunct by the server
    sending a NFSERR_BADSESSION reply to the NFSv4.1/4.2
    client, nfsv4_sequencelookup() returns NFSERR_BADSESSION
    without actually assigning a session slot.
    Without this patch, newnfs_request() would erroneously
    free slot 0.

    This could result in the slot being reused prematurely,
    but most likely just generated a "freeing free slot!!"
    console message.

    This patch fixes the code to not do the erroneous
    freeing of the slot for this case.

    PR:     260011
    MFC after:      1 week

 sys/fs/nfs/nfs_commonkrpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 61 commit-hook freebsd_committer freebsd_triage 2022-08-26 03:49:36 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=8e59ec29e47f6ec64f54ddd88cab388ae536f0ff

commit 8e59ec29e47f6ec64f54ddd88cab388ae536f0ff
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-08-26 03:48:04 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-08-26 03:48:04 +0000

    nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION

    When the NFSv4.1/4.2 client is handling a server error
    of NFSERR_BADSESSION, it retries RPCs with a new session.
    Without this patch, the nd_slotid was not being updated
    for the new session.

    This would result in a bogus console message like
    "Wrong session srvslot=X slot=Y" and then it would
    free the incorrect slot, often generating a
    "freeing free slot!!" console message as well.

    This patch fixes the problem.

    Note that FreeBSD NFSv4.1/4.2 servers only
    generate a NFSERR_BADSESSION error after a reboot
    or after a client does a DestroySession operation.

    PR:     260011
    MFC after:      1 week

 sys/fs/nfs/nfs_commonkrpc.c | 3 +++
 1 file changed, 3 insertions(+)
Comment 62 Rick Macklem freebsd_committer freebsd_triage 2022-08-27 22:49:41 UTC
Created attachment 236170 [details]
fix nd_slotid during recovery from a NFSERR_BADSESSION server reply

Replaces patch with one I believe is more correct.
Comment 63 Rick Macklem freebsd_committer freebsd_triage 2022-08-27 22:52:52 UTC
I have reopened the PR, since I now believe that
attachment#236170 [details] (lats in above list)
fixes the problem. It will be committed to
main and then MFC'd.

I will close the PR again after it is MFC'd.
Comment 64 commit-hook freebsd_committer freebsd_triage 2022-08-27 23:04:50 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=fb29f817586972444d65b1548287a51f27891639

commit fb29f817586972444d65b1548287a51f27891639
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-08-27 23:03:18 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-08-27 23:03:18 +0000

    nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION

    When the NFSv4.1/4.2 client is handling a server error
    of NFSERR_BADSESSION, it retries RPCs with a new session.
    Without this patch, the nd_slotid was not being updated
    for the new session.

    This would result in a bogus console message like
    "Wrong session srvslot=X slot=Y" and then it would
    free the incorrect slot, often generating a
    "freeing free slot!!" console message as well.

    This patch fixes the problem.

    Note that FreeBSD NFSv4.1/4.2 servers only
    generate a NFSERR_BADSESSION error after a reboot
    or after a client does a DestroySession operation.

    PR:     260011
    MFC after:      1 week

 sys/fs/nfs/nfs_commonkrpc.c | 2 ++
 1 file changed, 2 insertions(+)
Comment 65 commit-hook freebsd_committer freebsd_triage 2022-09-02 01:57:08 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a6f3979ad1923b7e742a6b4464917c9baea5b058

commit a6f3979ad1923b7e742a6b4464917c9baea5b058
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-08-27 23:03:18 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-09-02 01:52:59 +0000

    nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION

    When the NFSv4.1/4.2 client is handling a server error
    of NFSERR_BADSESSION, it retries RPCs with a new session.
    Without this patch, the nd_slotid was not being updated
    for the new session.

    This would result in a bogus console message like
    "Wrong session srvslot=X slot=Y" and then it would
    free the incorrect slot, often generating a
    "freeing free slot!!" console message as well.

    This patch fixes the problem.

    Note that FreeBSD NFSv4.1/4.2 servers only
    generate a NFSERR_BADSESSION error after a reboot
    or after a client does a DestroySession operation.

    PR:     260011
    (cherry picked from commit fb29f817586972444d65b1548287a51f27891639)

 sys/fs/nfs/nfs_commonkrpc.c | 2 ++
 1 file changed, 2 insertions(+)
Comment 66 commit-hook freebsd_committer freebsd_triage 2022-09-02 02:10:13 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=761033d3aace36eea1d82bbce86e06206b4cb17c

commit 761033d3aace36eea1d82bbce86e06206b4cb17c
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-08-26 03:33:31 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-09-02 02:07:05 +0000

    nfscl: Fix handling of a bad session slot (NFSv4.1/4.2)

    When a session has been marked defunct by the server
    sending a NFSERR_BADSESSION reply to the NFSv4.1/4.2
    client, nfsv4_sequencelookup() returns NFSERR_BADSESSION
    without actually assigning a session slot.
    Without this patch, newnfs_request() would erroneously
    free slot 0.

    This could result in the slot being reused prematurely,
    but most likely just generated a "freeing free slot!!"
    console message.

    This patch fixes the code to not do the erroneous
    freeing of the slot for this case.

    PR:     260011
    (cherry picked from commit 2b612c9d3bb528551de9eaabbdbadae89a36ba8b)

 sys/fs/nfs/nfs_commonkrpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 67 commit-hook freebsd_committer freebsd_triage 2022-09-02 02:16:14 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e1dd3e56878907f4321fa5456eb23914673dc930

commit e1dd3e56878907f4321fa5456eb23914673dc930
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-08-26 03:33:31 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-09-02 02:12:21 +0000

    nfscl: Fix handling of a bad session slot (NFSv4.1/4.2)

    When a session has been marked defunct by the server
    sending a NFSERR_BADSESSION reply to the NFSv4.1/4.2
    client, nfsv4_sequencelookup() returns NFSERR_BADSESSION
    without actually assigning a session slot.
    Without this patch, newnfs_request() would erroneously
    free slot 0.

    This could result in the slot being reused prematurely,
    but most likely just generated a "freeing free slot!!"
    console message.

    This patch fixes the code to not do the erroneous
    freeing of the slot for this case.

    PR:     260011
    (cherry picked from commit 2b612c9d3bb528551de9eaabbdbadae89a36ba8b)

 sys/fs/nfs/nfs_commonkrpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 68 commit-hook freebsd_committer freebsd_triage 2022-09-02 02:17:15 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=1dab52045cb26b945bd1737e2b40a52c6d4325cf

commit 1dab52045cb26b945bd1737e2b40a52c6d4325cf
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2022-08-27 23:03:18 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2022-09-02 02:13:52 +0000

    nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION

    When the NFSv4.1/4.2 client is handling a server error
    of NFSERR_BADSESSION, it retries RPCs with a new session.
    Without this patch, the nd_slotid was not being updated
    for the new session.

    This would result in a bogus console message like
    "Wrong session srvslot=X slot=Y" and then it would
    free the incorrect slot, often generating a
    "freeing free slot!!" console message as well.

    This patch fixes the problem.

    Note that FreeBSD NFSv4.1/4.2 servers only
    generate a NFSERR_BADSESSION error after a reboot
    or after a client does a DestroySession operation.

    PR:     260011
    (cherry picked from commit fb29f817586972444d65b1548287a51f27891639)

 sys/fs/nfs/nfs_commonkrpc.c | 2 ++
 1 file changed, 2 insertions(+)
Comment 69 Rick Macklem freebsd_committer freebsd_triage 2022-09-23 21:08:47 UTC
Since the patches have been MFC'd to stable/13
and cperciva@ reports that problems he was observing
related to NFSv4.1 sessions are no longer occurring
for a stable/13 client, I am closing this PR.

If anyone observes similar problems with stable/13
or 13.2 (when released) it can be re-opened.