We are observing a strange problem on VMware when a kernel is compiled with RSS option.
The problem appears as lib resolver not being able to resolve DNS names.
E.g., "host google.com" would resolve the name, but "ping google.com" would fail with "Host name lookup failure".
A packet capture shows that both commands send exactly the same requests and get the same replies, but in the case of ping the host replies a port unreachable ICMP message.
10.180.106.180.29707 > 10.180.106.5.53: 50740+ A? s3.us-east-2.amazonaws.com. (55)
10.180.106.5.53 > 10.180.106.180.29707: 50740 1/0/0 A 22.214.171.124 (71)
10.180.106.180 > 10.180.106.5: ICMP 10.180.106.180 udp port 29707 unreachable, length 107
One difference between host(1)'s code and libc resolver's code might be that the latter performs connect(2) on the datagram / UDP socket used for DNS queries.
On the kernel side, I see that in_pcblookup_mbuf() fails to find the PCB matching the reply packet.
Potentially interesting fields from the mbuf are:
uint32_t flowid = 0xc48d76be
uint8_t rsstype = 0x81 // M_HASHTYPE_RSS_IPV4
I see that base r343291, in addition to converting vmx to iflib, enabled previously ifdef-ed out code that sets packet's rsstype based on the hardware reported rss_type. Before that commit rsstype was always set M_HASHTYPE_OPAQUE_HASH.
I see that vmxnet3_reinit_rss_shared_data() uses an RSS key that's different from the system RSS key defined sys/net/rss_config.c. I think that the different keys can result in in_pcblookup_mbuf() failure because of mismatching hash values.
(In reply to Andriy Gapon from comment #2)
When I converted the vmxnet3 driver to iflib, I enabled the RSS code based on iflib internals and looking sideways at the bnxt driver, and not so much by thinking through the RSS code's fundamental requirements.
What I saw in the bnxt driver was that it was setting the RSS key using arc4rand() in bnxt_attach_pre(), and that it is always using the hash value for the flowid in bnxt_pkt_get_l2(). That lead me to believe that the rss key value did not have to be anything specific, and is why the way the vmxnet3 code behaves with respect to this issue is functionally the same as what bnxt does.
If I am not missing something further, perhaps this same issue exists for the bnxt driver as well.