During recent testing of a Linux client NFSv4.1 mount to a FreeBSD server, breakage of both client and server was observed after a network partitioning between them. The FreeBSD server did not reply to a retried RPC using the session's cached reply as it should. The Linux client sometimes advances the sequence# for the session slot by 2 instead of 1. The attached patches alleviate the above problems and should be applied to all NFS servers handling NFSv4 mounts. Fortunately, network partitioning should be a rare event and the patches are only needed when that happens.
Created attachment 223859 [details] fix NFSv4.1/4.2 server session for RPC retries This patch fixes the NFSv4 server so that it correctly sends a reply from the one cached in the session's slot when an RPC retry occurs. (RPC retries are rare for NFSv4, but can occur after a new TCP connection has been established for an NFv4.1/4.2 mount by the client.) Two things needed to be fixed: - don't set nd_repstat to NFSERR_IO when the pseudo error NFSERR_REPLYFROMCACHE is returned. - actually use the reply in "m".
Created attachment 223860 [details] cut the Linux client some slack w.r.t. session sequence# After a network partitioning is healed, some versions of Linux client advance the sequende# for the session slot by 2 instead of 1. This patch allows these cases to work. Although technically a violation of RFC5661, it seems harmless to do, since the NFS4ERR_SEQ_MISORDERED will still be generated if an "out of order" RPC is subsequently received, since it will have a sequence# less than what the server expects. When this goes into main, etc, I will enable it based on a sysctl, so that the server can optionally be RFC5661 conformant.
Created attachment 223861 [details] make the session's cached reply work for multiple retries of an RPC Having multiple retries of the same RPC should be extremely rare, since a correctly functioning client will create a new TCP connection for each of them. As such, the unpatched code assumed it would *never* happen. However it seems prudent to handle that case as far as possible. This patch adds m_copym(..M_NOWAIT) calls so that the session slot will retain the cached reply for a subsequent retry unless the m_copym() fails.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9edaceca8165e2864267547311daf145bb520270 commit 9edaceca8165e2864267547311daf145bb520270 Author: Rick Macklem <rmacklem@FreeBSD.org> AuthorDate: 2021-04-11 23:51:25 +0000 Commit: Rick Macklem <rmacklem@FreeBSD.org> CommitDate: 2021-04-11 23:51:25 +0000 nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661 Recent testing of network partitioning a FreeBSD NFSv4.1 server from a Linux NFSv4.1 client identified problems with both the FreeBSD server and Linux client. Sometimes, after some Linux NFSv4.1/4.2 clients establish a new TCP connection, they will advance the sequence number for a session slot by 2 instead of 1. RFC5661 specifies that a server should reply NFS4ERR_SEQ_MISORDERED for this case. This might result in a system call error in the client and seems to disable future use of the slot by the client. Since advancing the sequence number by 2 seems harmless, allow this case if vfs.nfs.linuxseqsesshack is non-zero. Note that, if the order of RPCs is actually reversed, a subsequent RPC with a smaller sequence number value for the slot will be received. This will result in a NFS4ERR_SEQ_MISORDERED reply. This has not been observed during testing. Setting vfs.nfs.linuxseqsesshack to 0 will provide RFC5661 compliant behaviour. This fix affects the fairly rare case where a NFSv4 Linux client does a TCP reconnect and then apparently erroneously increments the sequence number for the session slot twice during the reconnect cycle. PR: 254816 MFC after: 2 weeks sys/fs/nfs/nfs_commonsubs.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)
Comment on attachment 223860 [details] cut the Linux client some slack w.r.t. session sequence# It turns out that the Linux client intentionally does an RPC of just Sequence with the seqid advanced by 2, to test the session slot for correct sequence#. As such the server should conform to RFC5661 and this patch is not recommended.
A commit in branch stable/12 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=14fc640c2e97885a82a7651b5ef912710e283d13 commit 14fc640c2e97885a82a7651b5ef912710e283d13 Author: Rick Macklem <rmacklem@FreeBSD.org> AuthorDate: 2021-04-10 22:50:25 +0000 Commit: Rick Macklem <rmacklem@FreeBSD.org> CommitDate: 2021-04-24 22:52:21 +0000 nfsd: fix replies from session cache for multiple retries Recent testing of network partitioning a FreeBSD NFSv4.1 server from a Linux NFSv4.1 client identified problems with both the FreeBSD server and Linux client. Commit 05a39c2c1c18 fixed replying with the cached reply in in the session slot if same session slot sequence#. However, the code uses the reply and, as such, will fail for a subsequent retry of the RPC. A subsequent retry would be an extremely rare event, but this patch fixes this, so long as m_copym(..M_NOWAIT) does not fail, which should also be a rare event. This fix affects the exceedingly rare case where a NFSv4 client retries a non-idempotent RPC, such as a lock operation, multiple times. Note that retries only occur after the client has needed to create a new TCP connection, with a new TCP connection for each retry. PR: 254816 (cherry picked from commit 22cefe3d8378f58adcdbb2c7589b9f30c2a38315) sys/fs/nfs/nfs_commonsubs.c | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-)
The first and third patches have been MFC'd. The second one is not needed and has been abandoned. I am not going to close this PR until releases have the patches, so that it is easier to find.
^Triage: committed back in 2021.