Bug 198406

Summary: More than 4 esxi vmxnet3 interfaces causes vlans attached to vmxnet interfaces to stop working
Product: Base System Reporter: Nick Hilliard <nick>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Some People CC: avv314, bryanv, c.lilwah, emaste, gahope5, luca, ncrogers, rgrimes, rsaanon, timothyyl, tomasz.lutelmowski
Priority: ---    
Version: 10.1-RELEASE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
More than 3 interfaces, VMXNET3 & E1000 detaches none

Description Nick Hilliard 2015-03-08 00:29:30 UTC
environment:
- vmware esxi 5.5.0 build 1892794
- freebsd 10.1-RELEASE amd64 VMs with vmxnet3 virtual ethernet interfaces
- vlans defined on vmx1 to vmx3

If a freebsd VM has 4 or fewer vmxnet3 interfaces, all vlans attached to any of the vmx interfaces will work fine, as expected.

If the freebsd VM has 5 more vmxnet3 interfaces, all vlans to all of the vmx interfaces will stop working.  tcpdump shows that no traffic is received, and that any traffic transmitted is not received by the destination host.  If the VM is shut down and the number of vmxnet3 interfaces is reduced to 4 or fewer, all the vlan interfaces start working again.

This is repeatable.
Comment 1 Nick Hilliard 2015-03-08 00:30:45 UTC
oops, typo.  that should read:

"If the freebsd VM has 5 or more vmxnet3 interfaces..."
Comment 2 timothyyl 2015-05-06 21:05:04 UTC
I believe I am seeing something very similar. In my case, everything works fine until I add a 5th interface, at which point nothing on that interface will work. The other interfaces continue to pass traffic, however. This essentially means that pfsense in VMware is useless if you plan on using one interface per port group and letting VMware do the tagging, since many places have greater than 5 VLANs in use. Any suggestions?
Comment 3 Nick Hilliard 2015-05-07 12:18:38 UTC
you can work around the problem by having 4 or fewer vmxnet3 interfaces, and then using vlan interfaces on freebsd.
Comment 4 timothyyl 2015-05-07 12:20:45 UTC
(In reply to nick from comment #3)

Thanks. I've switched to e1000 interfaces and the problem disappeared.
Comment 5 Nick Hilliard 2015-05-07 12:26:54 UTC
i wish you well with that :-)  if you're running anything later than esxi build 799733, the em driver will periodically hang with watchdog timeout kernel errors.  This was catastrophic in freebsd 8/9 but has improved in freebsd 10.1 to the point that it now only causes a small amount of packet loss.  ymmv.
Comment 6 timothyyl 2015-05-07 12:39:31 UTC
(In reply to nick from comment #5)

Awesome. Thanks for the heads up. Hopefully this bug will get some attention soon, as I would think that using port groups with FreeBSD/pfsense would be fairly common.
Comment 7 Andrew 2017-05-16 20:06:03 UTC
Just reproduced this bug on ESXi 6.5 with FreeBSD 11.0-RELEASE-p9 with four vmxnet3 interfaces 

Obviously problem goes from wrong FreeBSD vmx interface numbering in compare to VM ethernet interface order.

In my case mapping was follows:
ethernet0 -> vmx1
ethernet1 -> vmx2
ethernet2 -> vmx3
ethernet3 -> vmx0
 
I think something goes wrong on PCI probing/mapping stage. 
I've tried (not all options) to change pci bus numbers for ethernet interfaces in vmx config but without success.
Comment 8 Bryan Venteicher freebsd_committer freebsd_triage 2017-06-24 17:34:31 UTC
There is no limit in the vmx(4) driver; it sounds like interrupts are broken when the fifth interface is added.

If the driver negotiates multiqueue (only if MSIX is available) perhaps we are exhausting some MSIX limit but an error is not bubbled up. You could try setting the `hw.vmx.mq_disable` tunable to disable multiqueue.

Unfortunately, I don't have access to an VMware ESXi environment to do any investigation.
Comment 9 Shawn 2017-09-29 06:58:47 UTC
for those of you who are still running into this problem:

The issue seems to be on the VMWare side with the ethernet pciSlotNumber (which is generated only on booting the VM with new vmxnet3 interfaces)

The interfaces, as they are presented to the VM, are out of order. 
You will see this since the MAC addresses do not correspond between VMWare and the VM's ifconfig.

the pci slot mapping is explained here

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2047927

we worked around this problem my editing the VM.vmx file and re-arranging the slotnumber values

Lab environment: ESXi 5.5.0 2068190, FreeBSD 10.1, FreeBSD 10.2 (sorry haven't tested FreeBSD 11 as yet)

After adding all your interfaces to the VM (>4), boot up the VM then shut it down.

SSH to the VMWare Host (you could theoretically download the vmx file from the datastore, edit it and upload it again but we didn't do it that way)

cd /vmfs/volumes/DATASTORE/VM

vi into VM.vmx and remove all lines with ethernetX.pciSlotNumber

at the end of the file input this mapping (for as many interfaces as you need)

ethernet0.pciSlotNumber = "160"
ethernet1.pciSlotNumber = "1184"
ethernet2.pciSlotNumber = "192"
ethernet3.pciSlotNumber = "1216"
ethernet4.pciSlotNumber = "224"
ethernet5.pciSlotNumber = "1248"
ethernet6.pciSlotNumber = "256"

save file and exit

boot up your VM.

All interfaces should match up with the correct MAC addresses now. You can confirm with ifconfig and correlating it with the VM's settings in VMWare

If you are adding more than 8 interfaces the mapping should look like this

PciSlot# --------------------Interfaces ----------Hex values for Pcislot#
160 1184 2208	===>	vmx0	vmx1	vmx2	===>	A0	4A0	8A0
192 1216 2240 ===>	vmx3	vmx4	vmx5	===>	C0	4C0	8C0
224 1248 2272 ===>	vmx6	vmx7	vmx8	===>	E0	4E0	8E0
256 1280 2304	===>	vmx9	vmx10	vmx11	===>	100	400	900
Comment 10 Luca Lesinigo 2019-07-10 13:54:45 UTC
I stumbled on this issue as well, so I'll add a few data points, hope it helps.

- host VMware ESXi 6.5 build 13635690 (latest patch as of this writing)
- guest pfSense 2.4.4_p3 (latest patch, based on FreeBSD 11.2-RELEASE-p10 amd64)
- VM: initially version 10 (ESXi 5.5), with four VMXNET3 interfaces

Adding the fifth VMXNET3 interface did not do any harm to the previous ones or their ordering, but the fifth simply didn't show up, instead getting this in dmesg:

pci7: <ACPI PCI bus> on pcib4
vmx4: <VMware VMXNET3 Ethernet Adapter> at device 0.0 on pci7
vmx4: Ethernet address: 00:50:56:ad:b8:21
vmx4: detached
pci7: detached

Trying to upgrade VM to latest hardware (version 13, ESXi 6.5) did not change anything.
Using VMXNET2 for the fifth interface (keeping the first four ones as VMXNET3) did not change anything.

I couldn't try Shawn's workaround and I couldn't use that in production anyway, so I resorted to keeping the first four interfaces as VMXNET3 and adding the fifth as E1000. The system is now working correctly with all five interfaces.
Comment 11 rsaanon 2019-09-18 15:34:37 UTC
Created attachment 207605 [details]
More than 3 interfaces, VMXNET3 & E1000 detaches

Running pfSense 2.4.4 (Guest OS FreeBSD 11 x64) under ESXi 6.7U3.

When adding the 4th virtual network adapter to the VM (and irrespective of whether VMXNET3 OR E1000 is chosen) , I see "detached" (see attached screenshot) message on the console of the VM.  The result is that the newly added network adapter does not show up in the pfsense.

So, how can I get this working?

Thanks!

-rsa
Comment 12 Tomek Lutelmowski 2020-02-25 11:46:11 UTC
pfSense 2.4.4/2.4.5 under ESXi 6.7U3 here.

I had similar issue which had driven me nuts. With 4 VMXNET3 cards, under high load (~500mbit/s) pfSense was cutting off traffic on all VMXNET3s connected to vSwitches with physical interfaces (traffic within virtual switch was fine). During downtime, tcpdump on interface doesn't show any incoming traffic). I tried changing VM driver to E1000, adjusting buffers, fiddle with MSI/interrupt settings, but no joy.

Then I've tried pfsense 2.5.0 (devel, BSD 12) - and the problem is gone! Now, what is really, really strange... I've created another VM machine (2.4.5 RC) in order to revert to more stable version. It was enough to boot up this machine, without any configuration (but interfaces connected to VM switches), and the problem reappeared already on working machine (2.5.0) on the same ESXi! Whats more - I had to reboot ESXi in order to fix it, as shutting down 2.4.5 was not enough. 

If you run pfSense under VMWare, look at gateway quality statistics for drops, you may notice than under your VMXNET3 interface (connected to physical NIC via vSwitch) may be dropping packets. 

I hope this will help someone.

Cheers,
Tomek