Bug 254816

Summary: after network partitioning, handling retries of RPCs is broken
Product: Base System Reporter: Rick Macklem <rmacklem>
Component: kernAssignee: Rick Macklem <rmacklem>
Status: Closed FIXED    
Severity: Affects Some People CC: freqlabs
Priority: --- Flags: rmacklem: mfc-stable13+
rmacklem: mfc-stable12+
Version: Unspecified   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
fix NFSv4.1/4.2 server session for RPC retries
none
cut the Linux client some slack w.r.t. session sequence#
rmacklem: maintainer-approval-
make the session's cached reply work for multiple retries of an RPC none

Description Rick Macklem freebsd_committer freebsd_triage 2021-04-06 13:35:12 UTC

    
Comment 1 Rick Macklem freebsd_committer freebsd_triage 2021-04-06 13:43:01 UTC
During recent testing of a Linux client NFSv4.1
mount to a FreeBSD server, breakage of both
client and server was observed after a network
partitioning between them.

The FreeBSD server did not reply to a retried
RPC using the session's cached reply as it should.

The Linux client sometimes advances the sequence#
for the session slot by 2 instead of 1.

The attached patches alleviate the above problems
and should be applied to all NFS servers handling
NFSv4 mounts. Fortunately, network partitioning
should be a rare event and the patches are only
needed when that happens.
Comment 2 Rick Macklem freebsd_committer freebsd_triage 2021-04-06 13:50:31 UTC
Created attachment 223859 [details]
fix NFSv4.1/4.2 server session for RPC retries

This patch fixes the NFSv4 server so that it
correctly sends a reply from the one cached in the
session's slot when an RPC retry occurs.
(RPC retries are rare for NFSv4, but can
 occur after a new TCP connection has been established
 for an NFv4.1/4.2 mount by the client.)

Two things needed to be fixed:
- don't set nd_repstat to NFSERR_IO when the pseudo
  error NFSERR_REPLYFROMCACHE is returned.
- actually use the reply in "m".
Comment 3 Rick Macklem freebsd_committer freebsd_triage 2021-04-06 13:58:34 UTC
Created attachment 223860 [details]
cut the Linux client some slack w.r.t. session sequence#

After a network partitioning is healed, some
versions of Linux client advance the sequende#
for the session slot by 2 instead of 1.

This patch allows these cases to work.
Although technically a violation of RFC5661,
it seems harmless to do, since the
NFS4ERR_SEQ_MISORDERED will still be generated
if an "out of order" RPC is subsequently received,
since it will have a sequence# less than what
the server expects.

When this goes into main, etc, I will enable
it based on a sysctl, so that the server can
optionally be RFC5661 conformant.
Comment 4 Rick Macklem freebsd_committer freebsd_triage 2021-04-06 14:04:49 UTC
Created attachment 223861 [details]
make the session's cached reply work for multiple retries of an RPC

Having multiple retries of the same RPC should be
extremely rare, since a correctly functioning
client will create a new TCP connection for each
of them.
As such, the unpatched code assumed it would
*never* happen.

However it seems prudent to handle that case
as far as possible.

This patch adds m_copym(..M_NOWAIT) calls
so that the session slot will retain the
cached reply for a subsequent retry unless
the m_copym() fails.
Comment 5 commit-hook freebsd_committer freebsd_triage 2021-04-11 23:54:59 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9edaceca8165e2864267547311daf145bb520270

commit 9edaceca8165e2864267547311daf145bb520270
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2021-04-11 23:51:25 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2021-04-11 23:51:25 +0000

    nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661

    Recent testing of network partitioning a FreeBSD NFSv4.1
    server from a Linux NFSv4.1 client identified problems
    with both the FreeBSD server and Linux client.

    Sometimes, after some Linux NFSv4.1/4.2 clients establish
    a new TCP connection, they will advance the sequence number
    for a session slot by 2 instead of 1.
    RFC5661 specifies that a server should reply
    NFS4ERR_SEQ_MISORDERED for this case.
    This might result in a system call error in the client and
    seems to disable future use of the slot by the client.
    Since advancing the sequence number by 2 seems harmless,
    allow this case if vfs.nfs.linuxseqsesshack is non-zero.

    Note that, if the order of RPCs is actually reversed,
    a subsequent RPC with a smaller sequence number value
    for the slot will be received.  This will result in
    a NFS4ERR_SEQ_MISORDERED reply.
    This has not been observed during testing.
    Setting vfs.nfs.linuxseqsesshack to 0 will provide
    RFC5661 compliant behaviour.

    This fix affects the fairly rare case where a NFSv4
    Linux client does a TCP reconnect and then apparently
    erroneously increments the sequence number for the
    session slot twice during the reconnect cycle.

    PR:     254816
    MFC after:      2 weeks

 sys/fs/nfs/nfs_commonsubs.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)
Comment 6 Rick Macklem freebsd_committer freebsd_triage 2021-04-14 20:27:02 UTC
Comment on attachment 223860 [details]
cut the Linux client some slack w.r.t. session sequence#

It turns out that the Linux client
intentionally does an RPC of just
Sequence with the seqid advanced by
2, to test the session slot for
correct sequence#.

As such the server should conform to
RFC5661 and this patch is not
recommended.
Comment 7 commit-hook freebsd_committer freebsd_triage 2021-04-24 22:57:19 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=14fc640c2e97885a82a7651b5ef912710e283d13

commit 14fc640c2e97885a82a7651b5ef912710e283d13
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2021-04-10 22:50:25 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2021-04-24 22:52:21 +0000

    nfsd: fix replies from session cache for multiple retries

    Recent testing of network partitioning a FreeBSD NFSv4.1
    server from a Linux NFSv4.1 client identified problems
    with both the FreeBSD server and Linux client.

    Commit 05a39c2c1c18 fixed replying with the cached reply in
    in the session slot if same session slot sequence#.
    However, the code uses the reply and, as such,
    will fail for a subsequent retry of the RPC.
    A subsequent retry would be an extremely rare event,
    but this patch fixes this, so long as m_copym(..M_NOWAIT)
    does not fail, which should also be a rare event.

    This fix affects the exceedingly rare case where a NFSv4
    client retries a non-idempotent RPC, such as a lock
    operation, multiple times.  Note that retries only occur
    after the client has needed to create a new TCP connection,
    with a new TCP connection for each retry.

    PR:     254816

    (cherry picked from commit 22cefe3d8378f58adcdbb2c7589b9f30c2a38315)

 sys/fs/nfs/nfs_commonsubs.c | 34 ++++++++++++++++++++++++++++++----
 1 file changed, 30 insertions(+), 4 deletions(-)
Comment 8 Rick Macklem freebsd_committer freebsd_triage 2021-04-24 23:01:22 UTC
The first and third patches have been MFC'd.
The second one is not needed and has been
abandoned.

I am not going to close this PR until releases
have the patches, so that it is easier to find.
Comment 9 Mark Linimon freebsd_committer freebsd_triage 2024-01-10 02:52:45 UTC
^Triage: committed back in 2021.