Bug 219672 - vmxnet3: with LRO enabled under FreeBSD 11 as a router, outgoing speed of forwarded traffic becomes slower
Summary: vmxnet3: with LRO enabled under FreeBSD 11 as a router, outgoing speed of for...
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-31 03:44 UTC by John Wolfe
Modified: 2017-06-19 15:47 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Wolfe 2017-05-31 03:44:32 UTC
The following vmxnet3 driver performance issue was report to open-vm-tools in https://github.com/vmware/open-vm-tools/issues/166

Since vmxnet3 is the community based driver on FreeBSD, the issue is being cross-filed with FreeBSD. This bug number will be forwarded to the reporter who will be encouraged to provided needed information to this problem report.

=====
Thank you for the excellent open-vm-tools package!

With Large Receive Offload (LRO) enabled under FreeBSD 11 virtual machine as a router, outgoing speed of forwarded traffic becomes 500 times slower with VMXNET3 on HP Proliant G8/G9 (Broadcom BCM5719 enthernet controller chipset)!!

We are using it with pfSense (under FreeBSD 11) virtual appliances (virtual machine) under VMWare ESXi hosts on HP Proliant G8/G9 servers, all virtual machines have 1-2 VMXNET3 adapters.

We have tried pfSense version from 2.3.0-RELEASE to 2.4.0-BETA (built on Fri May 26 19:15:04 CDT 2017), Open-VM-Tools package 10.1.0,1, FreeBSD 11.0-RELEASE-p10.

We have tried VMWare ESXi version from 6.0 to 6.5.0 with all Hewlett-Packard drivers (highest version of ESXi that we’ve used is HPE Customized Image ESXi 6.5.0 version 650.9.6.5.27 released on May 2017 and based on ESXi 6.5.0 Vmkernel Release Build 5146846).

Regardless of the pfSense version or the VMWare version, on FreeBSD 11.0-RELEASE-p10, if I un-check an option in pfSense to “Disable hardware large receive offload” (to enable hardware large receive offload) – the virtual machines that are routed via pfSense (FreeBSD) have very low upload speed (about 1/500th of their normal speed) or drop connections. To get their speed back to normal, I have to check this option ON.

Other hardware offload options do not have problems – i have them unchecked to enable hardware offload of checksums and TCP segmentation.

The Broadcom BCM5719 chipset, that supports Large Receive Offload (LRO) is quite cheap and ubiquitous, released in 2013. VMWare has added support of hardware LRO to VMXNET3 also in 2013. In Windows, LRO is supported since Windows Server 2012 and Windows 8 (since 2012). FreeBSD supports it from version 8 (since 2009).

There is Open-VM-Tools 10.1.5 version already available at https://github.com/vmware/open-vm-tools/ , maybe it fixes the issue with Large Receive Offload (LRO) under FreeBSD with VMXNET3?

I saw some forum messages where people discourage using VMXNET3 adapter, in favour of E1000 adapter, quote from https://forum.pfsense.org/index.php?topic=98309.0 : „We saw much better performance from the E1000 than VMXnet2 and 3”.

There is a VMWare blog on the benefits of LRO for Linux and Windows – see https://blogs.vmware.com/performance/2015/06/vmxnet3-lro.html According to this blog entry, LRO saves valuable CPU cycles, and is also very beneficial in VM-VM local traffic where VMs are located in the same host, communicating with each other through a virtual switch.

I suspect that the problem is somewhere in the open-vm-tools-nox11 package - may it includes not fully compatible or not fully stable VMWare drivers for VMXNET3 -- because Windows machines from our servers connected to Internet either directly or via pfSense have LRO enabled and don’t have performance degradation.

There is definitely an incompatibility issues in open-vm-tools on VMXNET3 under FreeBSD with Large Receive Offload (LRO)! Other hadware TCP offload are working properlly, because VMXNET3 under Windows makes LRO correctly!
Comment 1 Maxim Masiutin 2017-05-31 21:32:40 UTC
It should be emphasized that the speed drops are very significant.

I have done tests on a gigabit internet connection from our server to a server of our ISP:

with LRO disabled
ping: 2.01ms, download: 235.13 Mbps, upload 410.41 Mbps

with LRO enabled
ping: 2.01ms, download: 149.73 Mbps, upload 0.80 Mbps

Ping remained the same and download almost the same, but upload dropped from 410 Mbps to 0.80 Mbps - 512 times drop!

And remote desktop connections are practically unusable, with frequent connection drops.

I suspect that there is a bug, because if it would have been a simple misconfiguration, the speed drop would not have been so huge.
Comment 2 Maxim Masiutin 2017-06-01 01:05:43 UTC
There came a reply from pfSense guys. They have agreed that the difference on my side is excessive. They have tested on their hardware and there were no
difference with LRO enabled or not. So they have confirmed that it does look like there is a problem.
Comment 3 Mark Peek freebsd_committer freebsd_triage 2017-06-08 22:13:57 UTC
From my testing this is not a bug and everything is working as designed. I am seeing a large decrease in performance when LRO is turned on and using pfSense as a gateway. This is due to the originating packets having the IP DF (don’t fragment) flag set which then gets combined into larger packets via LRO. When this (larger) packet needs to be fragmented to match the other NIC the FreeBSD kernel sees the DF flag, drops the packet, and then sends back an ICMP “unreachable - need to frag” message to the sender. The reason it works at all is due to other traffic which disallows the LRO to occur and some packets get forwarded.

One test I did was turning LRO on and using scp to put a file onto the pfSense appliance which resulted in good performance (not seeing the same drop in performance). I would be interested if you 1) see good performance with LRO turned on and scp a large file to the appliance and 2) see ICMP "need to frag" with LRO turned on and scp to a machine on the remote side.

Since the pfSense appliance is being used as a gateway you should leave LRO turned off.
Comment 4 Maxim Masiutin 2017-06-08 22:20:49 UTC
Thank you very much for your investigation! I will send the link to your message to pfSense guys. I think that it would have been reasonable to hide this option altogether from the user interface, or add a notice to the user interface to only enable this option if pfSense is used as an endpoint, not as a router. Currently, such a notice exists only in the online documentation, not in the user interface, which tells that this option is not recommended and turned off by default due to the bugs int he driver software. In fact there were no bugs but misconfiguration. The note in the user interface should clearly indicate the real reason and not mislead the users. Thank you very much again for your help.
Comment 5 Mark Peek freebsd_committer freebsd_triage 2017-06-15 16:16:59 UTC
Maxim, were you able to run the experiments, discuss with the pfSense team, and/or confirm the conclusions I added in comment #3? Just looking to see if this bug (and the linked VMware bug) can be closed out.
Comment 6 Maxim Masiutin 2017-06-15 17:05:35 UTC
I agree with your conclusions, but I didn't make the tests that you have suggested : (1. see good performance with LRO turned on and scp a large file to the appliance and 2. see ICMP "need to frag" with LRO turned on and scp to a machine on the remote side.)

Since when I use pfSense as a router, and when I ran speed tests on virtual machines that used pfSense as a router, and download speed almost didn't suffer, but upload speed dropped from around 400 MBps to around 0.5 MBps, I suspect that your conclusions are correct.

So I agree with you that it was a misconfiguration, not a bug. But I didn't make the tests that you have suggested.

I have contacted pfSense developers and asked them to change the notice in the configuration panel in pfSense. Currently, there is a notice saying that LRO is disabled by default because "most drivers have bugs". I have asked them to change this to "LRO should only be enabled on an end point". I am sure that the description of the LRO user interface was the cause of all problems that misled me.

I will watch future versions of pfSense and will remind them if they won't change the notice.

Don't get me wrong, I'm just a simple user, not a professional sysadmin, and I'm not skilled enough, for example, to see ICMP "need to frag" with LRO turned on and scp to a machine on the remote side.

I would have suggested to close this bugtracker entry.

Thank you very much again for your help.
Comment 7 Mark Peek freebsd_committer freebsd_triage 2017-06-19 15:47:22 UTC
Closing as not a bug. Per discussion, routing with LRO can cause packets to not be forwarded causing this decrease in performance.