Bug 237321 - vmx(4) iflib driver fails when number of CPU cores is not a power of two
Summary: vmx(4) iflib driver fails when number of CPU cores is not a power of two
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Patrick Kelsey
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-16 20:59 UTC by ncrogers
Modified: 2020-03-17 03:32 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ncrogers 2019-04-16 20:59:10 UTC
This is regarding the MFC of r343291 to 12-STABLE in r344027 (Convert vmx(4) to being an iflib driver.)

If you configure a VMware ESXi guest with vmxnet3 interfaces and atypical number of CPU cores (e.g 6), the iflib/vmx(4) driver throws an error "device enable command failed" at bootup and periodically when trying to use the interface. This is because , for reasons I do not understand, the driver is unhappy when number of TX/RX queues is not a power of two. Because the driver defaults to 8 queues, whenever a guest boots with 6 cores iflib automatically reduces the number of queues to equal the number of CPU cores.

This is resolved by setting the number of tx/rx queues to 1,2, or 4, whichever is less than or equal to the number of CPU cores.

For example, on a 6 core system the following works:
dev.vmx.0.iflib.override_ntxqs=4
dev.vmx.0.iflib.override_nrxqs=4

I believe either iflib or the vmx driver specifically needs to be corrected to handle an irregular number of queues or limit it to powers of two when there is fewer than 8 cores.

Note that this for whatever reason does NOT happen in VMware Fusion under macOS, but seems to always happen in an ESXi environment.

In all cases, hw.pci.honor_msi_blacklist is set to 0.
Comment 1 Patrick Kelsey freebsd_committer 2019-04-17 01:32:47 UTC
It looks like some of the vmxnet3 device implementations require power of two queue configurations, so the driver should enforce this constraint.

The non-iflib version of the driver enforced this constraint:
https://svnweb.freebsd.org/base/head/sys/dev/vmware/vmxnet3/if_vmx.c?revision=333813&view=markup#l531

The Linux version of the driver also enforces this constraint:
https://github.com/torvalds/linux/blob/750afb08ca71310fcf0c4e2cb1565c63b8235b60/drivers/net/vmxnet3/vmxnet3_drv.c#L3280

Apparently I lost track of this detail when converting the driver to iflib and missed it in testing because I initially tested on VMWare Workstation, which, like Fusion, works fine with non-power of two queue configurations, and did not retest this aspect when moving to ESXi.

I will develop a patch for this.
Comment 2 ncrogers 2019-04-18 18:50:23 UTC
I stumbled upon this differential revision which may resolve the issue at the iflib level?

https://reviews.freebsd.org/D19880
Comment 3 commit-hook freebsd_committer 2020-03-17 03:32:57 UTC
A commit references this bug:

Author: pkelsey
Date: Tue Mar 17 03:32:13 UTC 2020
New revision: 359029
URL: https://svnweb.freebsd.org/changeset/base/359029

Log:
  Restore power-of-2 queue count constraint from r290948

  When vmx(4) was converted to an iflib driver in r343291, the
  power-of-2 queue count constraint was removed as it appeared that
  current implementations of the VMXNET3 virtual device no longer
  required that constraint.  It turns out that some of the
  implementations still do, and on such systems, the device will fail to
  initialize when configured with a non-power-of-2 RX or TX queue count.

  PR:		237321
  Reported by:	ncrogers@gmail.com
  MFC after:	1 week

Changes:
  head/sys/dev/vmware/vmxnet3/if_vmx.c