This is regarding the MFC of r343291 to 12-STABLE in r344027 (Convert vmx(4) to being an iflib driver.)
If you configure a VMware ESXi guest with vmxnet3 interfaces and atypical number of CPU cores (e.g 6), the iflib/vmx(4) driver throws an error "device enable command failed" at bootup and periodically when trying to use the interface. This is because , for reasons I do not understand, the driver is unhappy when number of TX/RX queues is not a power of two. Because the driver defaults to 8 queues, whenever a guest boots with 6 cores iflib automatically reduces the number of queues to equal the number of CPU cores.
This is resolved by setting the number of tx/rx queues to 1,2, or 4, whichever is less than or equal to the number of CPU cores.
For example, on a 6 core system the following works:
I believe either iflib or the vmx driver specifically needs to be corrected to handle an irregular number of queues or limit it to powers of two when there is fewer than 8 cores.
Note that this for whatever reason does NOT happen in VMware Fusion under macOS, but seems to always happen in an ESXi environment.
In all cases, hw.pci.honor_msi_blacklist is set to 0.
It looks like some of the vmxnet3 device implementations require power of two queue configurations, so the driver should enforce this constraint.
The non-iflib version of the driver enforced this constraint:
The Linux version of the driver also enforces this constraint:
Apparently I lost track of this detail when converting the driver to iflib and missed it in testing because I initially tested on VMWare Workstation, which, like Fusion, works fine with non-power of two queue configurations, and did not retest this aspect when moving to ESXi.
I will develop a patch for this.
I stumbled upon this differential revision which may resolve the issue at the iflib level?
A commit references this bug:
Date: Tue Mar 17 03:32:13 UTC 2020
New revision: 359029
Restore power-of-2 queue count constraint from r290948
When vmx(4) was converted to an iflib driver in r343291, the
power-of-2 queue count constraint was removed as it appeared that
current implementations of the VMXNET3 virtual device no longer
required that constraint. It turns out that some of the
implementations still do, and on such systems, the device will fail to
initialize when configured with a non-power-of-2 RX or TX queue count.
Reported by: firstname.lastname@example.org
MFC after: 1 week