Bug 254816 - after network partitioning, handling retries of RPCs is broken
Summary: after network partitioning, handling retries of RPCs is broken
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: --- Affects Some People
Assignee: Rick Macklem
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-04-06 13:35 UTC by Rick Macklem
Modified: 2021-04-14 20:27 UTC (History)
1 user (show)

See Also:


Attachments
fix NFSv4.1/4.2 server session for RPC retries (1.15 KB, patch)
2021-04-06 13:50 UTC, Rick Macklem
no flags Details | Diff
cut the Linux client some slack w.r.t. session sequence# (741 bytes, patch)
2021-04-06 13:58 UTC, Rick Macklem
rmacklem: maintainer-approval-
Details | Diff
make the session's cached reply work for multiple retries of an RPC (2.29 KB, patch)
2021-04-06 14:04 UTC, Rick Macklem
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rick Macklem freebsd_committer 2021-04-06 13:35:12 UTC

    
Comment 1 Rick Macklem freebsd_committer 2021-04-06 13:43:01 UTC
During recent testing of a Linux client NFSv4.1
mount to a FreeBSD server, breakage of both
client and server was observed after a network
partitioning between them.

The FreeBSD server did not reply to a retried
RPC using the session's cached reply as it should.

The Linux client sometimes advances the sequence#
for the session slot by 2 instead of 1.

The attached patches alleviate the above problems
and should be applied to all NFS servers handling
NFSv4 mounts. Fortunately, network partitioning
should be a rare event and the patches are only
needed when that happens.
Comment 2 Rick Macklem freebsd_committer 2021-04-06 13:50:31 UTC
Created attachment 223859 [details]
fix NFSv4.1/4.2 server session for RPC retries

This patch fixes the NFSv4 server so that it
correctly sends a reply from the one cached in the
session's slot when an RPC retry occurs.
(RPC retries are rare for NFSv4, but can
 occur after a new TCP connection has been established
 for an NFv4.1/4.2 mount by the client.)

Two things needed to be fixed:
- don't set nd_repstat to NFSERR_IO when the pseudo
  error NFSERR_REPLYFROMCACHE is returned.
- actually use the reply in "m".
Comment 3 Rick Macklem freebsd_committer 2021-04-06 13:58:34 UTC
Created attachment 223860 [details]
cut the Linux client some slack w.r.t. session sequence#

After a network partitioning is healed, some
versions of Linux client advance the sequende#
for the session slot by 2 instead of 1.

This patch allows these cases to work.
Although technically a violation of RFC5661,
it seems harmless to do, since the
NFS4ERR_SEQ_MISORDERED will still be generated
if an "out of order" RPC is subsequently received,
since it will have a sequence# less than what
the server expects.

When this goes into main, etc, I will enable
it based on a sysctl, so that the server can
optionally be RFC5661 conformant.
Comment 4 Rick Macklem freebsd_committer 2021-04-06 14:04:49 UTC
Created attachment 223861 [details]
make the session's cached reply work for multiple retries of an RPC

Having multiple retries of the same RPC should be
extremely rare, since a correctly functioning
client will create a new TCP connection for each
of them.
As such, the unpatched code assumed it would
*never* happen.

However it seems prudent to handle that case
as far as possible.

This patch adds m_copym(..M_NOWAIT) calls
so that the session slot will retain the
cached reply for a subsequent retry unless
the m_copym() fails.
Comment 5 commit-hook freebsd_committer 2021-04-11 23:54:59 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9edaceca8165e2864267547311daf145bb520270

commit 9edaceca8165e2864267547311daf145bb520270
Author:     Rick Macklem <rmacklem@FreeBSD.org>
AuthorDate: 2021-04-11 23:51:25 +0000
Commit:     Rick Macklem <rmacklem@FreeBSD.org>
CommitDate: 2021-04-11 23:51:25 +0000

    nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661

    Recent testing of network partitioning a FreeBSD NFSv4.1
    server from a Linux NFSv4.1 client identified problems
    with both the FreeBSD server and Linux client.

    Sometimes, after some Linux NFSv4.1/4.2 clients establish
    a new TCP connection, they will advance the sequence number
    for a session slot by 2 instead of 1.
    RFC5661 specifies that a server should reply
    NFS4ERR_SEQ_MISORDERED for this case.
    This might result in a system call error in the client and
    seems to disable future use of the slot by the client.
    Since advancing the sequence number by 2 seems harmless,
    allow this case if vfs.nfs.linuxseqsesshack is non-zero.

    Note that, if the order of RPCs is actually reversed,
    a subsequent RPC with a smaller sequence number value
    for the slot will be received.  This will result in
    a NFS4ERR_SEQ_MISORDERED reply.
    This has not been observed during testing.
    Setting vfs.nfs.linuxseqsesshack to 0 will provide
    RFC5661 compliant behaviour.

    This fix affects the fairly rare case where a NFSv4
    Linux client does a TCP reconnect and then apparently
    erroneously increments the sequence number for the
    session slot twice during the reconnect cycle.

    PR:     254816
    MFC after:      2 weeks

 sys/fs/nfs/nfs_commonsubs.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)
Comment 6 Rick Macklem freebsd_committer 2021-04-14 20:27:02 UTC
Comment on attachment 223860 [details]
cut the Linux client some slack w.r.t. session sequence#

It turns out that the Linux client
intentionally does an RPC of just
Sequence with the seqid advanced by
2, to test the session slot for
correct sequence#.

As such the server should conform to
RFC5661 and this patch is not
recommended.