Bug 245022

Summary: Problem with FreeBSD NLM interoperating with Netapp Filer after software upgrade
Product: Base System Reporter: Rick Macklem <rmacklem>
Component: kernAssignee: Rick Macklem <rmacklem>
Status: Closed FIXED    
Severity: Affects Some People CC: danny
Priority: --- Flags: rmacklem: mfc-stable12+
rmacklem: mfc-stable11+
Version: 11.2-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
create a tunable to set the NLM client to use TCP
none
modify kernel UDP client to use global xid none

Description Rick Macklem freebsd_committer freebsd_triage 2020-03-24 00:29:05 UTC
Created attachment 212663 [details]
create a tunable to set the NLM client to use TCP

After upgrading the software on a Netapp Filer, serious interoperability
problems were observed between the FreeBSD NLM client and Netapp server.
(The NLM protocol is a separate protocol from NFSv3 that provides file
 locking. Also known as rpc.lockd.)
Reported via email by Daniel Braniss (danny@cs.huji.ac.il).

Although the problem(s) were not completely diagnosed, it appeared to
be related to reuse of the xid in the RPC over UDP header.

Adam McDougall (mcdouga9@egr.msu.edu) mentioned resolving a similar
issue with the FreeBSD NLM->Netapp Filer by switching the NLM to use TCP
instead of UDP.

I have created this bug report to track this and have two simple patches
that might resolve (or make it easier to deal with) attached here.
Comment 1 Rick Macklem freebsd_committer freebsd_triage 2020-03-24 00:35:00 UTC
Created attachment 212664 [details]
modify kernel UDP client to use global xid

This patch modifies the kernel RPC UDP client so that it uses a
single global xid instead of one "per connection".
I couldn't see exactly how the "per connection" xid could end
up reusing the same value, but since a "connection" is a sketchy
concept anyhow and a global xid will not repeat for 4billion RPCs,
this should avoid any reuse of the same xid value.
(I suspect the "per connection xid" code was inherited from userland
 RPC library code, where a global value is not practical.)
Comment 2 Rick Macklem freebsd_committer freebsd_triage 2020-04-05 21:12:23 UTC
A variant of the second patch has been committed to head and a variant
of the first one will be committed to head later to-day.
Comment 3 commit-hook freebsd_committer freebsd_triage 2020-04-05 21:27:02 UTC
A commit references this bug:

Author: rmacklem
Date: Sun Apr  5 21:08:17 UTC 2020
New revision: 359643
URL: https://svnweb.freebsd.org/changeset/base/359643

Log:
  Change the xid for client side krpc over UDP to a global value.

  Without this patch, the xid used for the client side krpc requests over
  UDP was initialized for each "connection". A "connection" for UDP is
  rather sketchy and for the kernel NLM a new one is created every 2minutes.
  A problem with client side interoperability with a Netapp server for the NLM
  was reported and it is believed to be caused by reuse of the same xid.
  Although this was never completely diagnosed by the reporter, I could see
  how the same xid might get reused, since it is initialized to a value
  based on the TOD clock every two minutes.
  I suspect initializing the value for every "connection" was inherited from
  userland library code, where having a global xid was not practical.
  However, implementing a global "xid" for the kernel rpc is straightforward
  and will ensure that an xid value is not reused for a long time. This
  patch does that and is hoped it will fix the Netapp interoperability
  problem.

  PR:		245022
  Reported by:	danny@cs.huji.ac.il
  MFC after:	2 weeks

Changes:
  head/sys/rpc/clnt_dg.c
Comment 4 Rick Macklem freebsd_committer freebsd_triage 2020-04-05 22:27:40 UTC
I won't be committing the first patch for now.
Although it sets the client side of the NLM to use TCP and someone
reported that helped for them when dealing with a Netapp server,
the FreeBSD server does not handle TCP when the client tries to
use it.

I am not sure if I will investigate the NLM over TCP problem,
since I consider it should be deprecated in favour of using
NFSv4.1 (or NFSv4.2 when available) for cases where distributed
locking for NFS is required.
Comment 5 Rick Macklem freebsd_committer freebsd_triage 2020-04-08 01:33:59 UTC
I played around with the NLM tunable to set use of TCP.
It appears that rpcbind always replies with the UDP port#,
so it doesn't work.
I think setting a fixed port# via "-p" for both rpc.statd and
rpc.lockd might make it work.

I am hoping that patch#2 will resolve the problem, so I
don't need to bother trying to fix the rpcbind problem.
Comment 6 commit-hook freebsd_committer freebsd_triage 2020-04-20 01:17:26 UTC
A commit references this bug:

Author: rmacklem
Date: Mon Apr 20 01:17:00 UTC 2020
New revision: 360109
URL: https://svnweb.freebsd.org/changeset/base/360109

Log:
  MFC: r359643
  Change the xid for client side krpc over UDP to a global value.

  Without this patch, the xid used for the client side krpc requests over
  UDP was initialized for each "connection". A "connection" for UDP is
  rather sketchy and for the kernel NLM a new one is created every 2minutes.
  A problem with client side interoperability with a Netapp server for the NLM
  was reported and it is believed to be caused by reuse of the same xid.
  Although this was never completely diagnosed by the reporter, I could see
  how the same xid might get reused, since it is initialized to a value
  based on the TOD clock every two minutes.
  I suspect initializing the value for every "connection" was inherited from
  userland library code, where having a global xid was not practical.
  However, implementing a global "xid" for the kernel rpc is straightforward
  and will ensure that an xid value is not reused for a long time. This
  patch does that and is hoped it will fix the Netapp interoperability
  problem.

  PR:		245022

Changes:
_U  stable/12/
  stable/12/sys/rpc/clnt_dg.c
Comment 7 commit-hook freebsd_committer freebsd_triage 2020-04-20 01:26:29 UTC
A commit references this bug:

Author: rmacklem
Date: Mon Apr 20 01:26:18 UTC 2020
New revision: 360110
URL: https://svnweb.freebsd.org/changeset/base/360110

Log:
  MFC: r359643
  Change the xid for client side krpc over UDP to a global value.

  Without this patch, the xid used for the client side krpc requests over
  UDP was initialized for each "connection". A "connection" for UDP is
  rather sketchy and for the kernel NLM a new one is created every 2minutes.
  A problem with client side interoperability with a Netapp server for the NLM
  was reported and it is believed to be caused by reuse of the same xid.
  Although this was never completely diagnosed by the reporter, I could see
  how the same xid might get reused, since it is initialized to a value
  based on the TOD clock every two minutes.
  I suspect initializing the value for every "connection" was inherited from
  userland library code, where having a global xid was not practical.
  However, implementing a global "xid" for the kernel rpc is straightforward
  and will ensure that an xid value is not reused for a long time. This
  patch does that and is hoped it will fix the Netapp interoperability
  problem.

  PR:		245022

Changes:
_U  stable/11/
  stable/11/sys/rpc/clnt_dg.c
Comment 8 Rick Macklem freebsd_committer freebsd_triage 2020-04-20 02:44:29 UTC
Second patch has been committed and MFC'd.
Eanbling TCP using the first patch doesn't work correctly,
due to the call to rpcbind returning the UDP port# instead
of TCP port#.

If the second patch does not resolve the interoperability
problem with the Netapp filer, this PR can be reopened
and I will work on fixing the first patch.