Bug 242890

Summary: vmxnet3: Problem when RSS option is configured
Product: Base System Reporter: Andriy Gapon <avg>
Component: kernAssignee: Andriy Gapon <avg>
Status: Closed FIXED    
Severity: Affects Some People CC: me, ncrogers, net, pkelsey, shurd
Priority: --- Keywords: regression
Version: CURRENTFlags: koobs: mfc-stable12+
koobs: mfc-stable11-
Hardware: Any   
OS: Any   
URL: https://reviews.freebsd.org/D23147

Description Andriy Gapon freebsd_committer 2019-12-26 09:52:32 UTC
We are observing a strange problem on VMware when a kernel is compiled with RSS option.
The problem appears as lib resolver not being able to resolve DNS names.
E.g., "host google.com" would resolve the name, but "ping google.com" would fail with "Host name lookup failure".
A packet capture shows that both commands send exactly the same requests and get the same replies, but in the case of ping the host replies a port unreachable ICMP message.
For example:
10.180.106.180.29707 > 10.180.106.5.53: 50740+ A? s3.us-east-2.amazonaws.com. (55)
10.180.106.5.53 > 10.180.106.180.29707: 50740 1/0/0 A 52.219.80.194 (71)
10.180.106.180 > 10.180.106.5: ICMP 10.180.106.180 udp port 29707 unreachable, length 107

One difference between host(1)'s code and libc resolver's code might be that the latter performs connect(2) on the datagram / UDP socket used for DNS queries.

On the kernel side, I see that in_pcblookup_mbuf() fails to find the PCB matching the reply packet.
Potentially interesting fields from the mbuf are:
                uint32_t flowid = 0xc48d76be
                uint8_t rsstype = 0x81 // M_HASHTYPE_RSS_IPV4
Comment 1 Andriy Gapon freebsd_committer 2019-12-26 10:01:29 UTC
I see that base r343291, in addition to converting vmx to iflib, enabled previously ifdef-ed out code that sets packet's rsstype based on the hardware reported rss_type.  Before that commit rsstype was always set M_HASHTYPE_OPAQUE_HASH.
Comment 2 Andriy Gapon freebsd_committer 2019-12-26 10:08:44 UTC
I see that vmxnet3_reinit_rss_shared_data() uses an RSS key that's different from the system RSS key defined sys/net/rss_config.c. I think that the different keys can result in in_pcblookup_mbuf() failure because of mismatching hash values.
Comment 3 Patrick Kelsey freebsd_committer 2020-01-12 20:40:13 UTC
(In reply to Andriy Gapon from comment #2)

When I converted the vmxnet3 driver to iflib, I enabled the RSS code based on iflib internals and looking sideways at the bnxt driver, and not so much by thinking through the RSS code's fundamental requirements.

What I saw in the bnxt driver was that it was setting the RSS key using arc4rand() in bnxt_attach_pre(), and that it is always using the hash value for the flowid in bnxt_pkt_get_l2().  That lead me to believe that the rss key value did not have to be anything specific, and is why the way the vmxnet3 code behaves with respect to this issue is functionally the same as what bnxt does.

If I am not missing something further, perhaps this same issue exists for the bnxt driver as well.
Comment 4 commit-hook freebsd_committer 2020-01-23 11:05:11 UTC
A commit references this bug:

Author: avg
Date: Thu Jan 23 11:05:03 UTC 2020
New revision: 357042
URL: https://svnweb.freebsd.org/changeset/base/357042

Log:
  vmxnet3: add support for RSS kernel option

  We observe at least one problem: if a UDP socket is connect(2)-ed, then a
  received packet that matches the connection cannot be matched to the
  corresponding PCB because of an incorrect flow ID.  That was oberved for DNS
  requests from the libc resolver.  We got this problem because FreeBSD
  r343291 enabled code that can set rsstype of received packets to values
  other than M_HASHTYPE_OPAQUE_HASH.  Earlier that code was under 'ifdef
  notyet'.

  The essence of this change is to use the system-wide RSS key instead of
  some historic hardcoded key when the software RSS is enabled and it is
  configured to use Toeplitz algorithm (the default).
  In all other cases, the driver reports the opaque hash type for received
  packets while still using Toeplitz algorithm with the internal key.

  PR:		242890
  Reviewed by:	pkelsey
  Sponsored by:	Panzura
  Differential Revision: https://reviews.freebsd.org/D23147

Changes:
  head/sys/dev/vmware/vmxnet3/if_vmx.c
  head/sys/dev/vmware/vmxnet3/if_vmxvar.h
  head/sys/modules/vmware/vmxnet3/Makefile
Comment 5 Kubilay Kocak freebsd_committer freebsd_triage 2020-01-23 11:27:10 UTC
^Triage: Assign to committer resolving

Assume the iflib changes didn't end up in stable/11, so mark this not for merging there. If they did and the plan is to also MFC base r357042 there, please set the mfc-stable11 flag to ? accordingly

@Andriy/Patrick Would we want a new issue for bnxt if it turns out it's affected too, or will we track that here, and close when both drivers fixes are committed/merged?
Comment 6 commit-hook freebsd_committer 2020-02-27 15:09:30 UTC
A commit references this bug:

Author: avg
Date: Thu Feb 27 15:08:44 UTC 2020
New revision: 358386
URL: https://svnweb.freebsd.org/changeset/base/358386

Log:
  MFC r357042: vmxnet3: add support for RSS kernel option

  We observe at least one problem: if a UDP socket is connect(2)-ed, then a
  received packet that matches the connection cannot be matched to the
  corresponding PCB because of an incorrect flow ID.  That was oberved for DNS
  requests from the libc resolver.  We got this problem because FreeBSD
  r343291 enabled code that can set rsstype of received packets to values
  other than M_HASHTYPE_OPAQUE_HASH.  Earlier that code was under 'ifdef
  notyet'.

  The essence of this change is to use the system-wide RSS key instead of
  some historic hardcoded key when the software RSS is enabled and it is
  configured to use Toeplitz algorithm (the default).
  In all other cases, the driver reports the opaque hash type for received
  packets while still using Toeplitz algorithm with the internal key.

  PR:		242890
  Sponsored by:	Panzura

Changes:
_U  stable/12/
  stable/12/sys/dev/vmware/vmxnet3/if_vmx.c
  stable/12/sys/dev/vmware/vmxnet3/if_vmxvar.h
  stable/12/sys/modules/vmware/vmxnet3/Makefile