192013 – [xen] [pf] pf performance very bad in xen when tso enabled

Bug 192013 - [xen] [pf] pf performance very bad in xen when tso enabled

Summary: [xen] [pf] pf performance very bad in xen when tso enabled

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	Any Any

Importance:	--- Affects Only Me
Assignee:	Dag-Erling Smørgrav

URL:
Keywords:

Duplicates (1):	194838 (view as bug list)
Depends on:
Blocks:

Reported:	2014-07-21 14:56 UTC by Steve Wills
Modified:	2015-11-14 11:46 UTC (History)
CC List:	11 users (show)

See Also:

Attachments
Proposed patch (587 bytes, patch) 2014-07-21 14:56 UTC, Steve Wills	no flags	Details \| Diff
More conservative patch (918 bytes, patch) 2014-07-22 17:34 UTC, Dag-Erling Smørgrav	no flags	Details \| Diff
patch for pf delayed_csum bug (1.27 KB, patch) 2014-07-23 07:02 UTC, Dag-Erling Smørgrav	no flags	Details \| Diff
Calculate non-hardware-assisted checksums (1.12 KB, patch) 2014-11-25 11:47 UTC, Dag-Erling Smørgrav	no flags	Details \| Diff
Show Obsolete (3) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Steve Wills freebsd_committer

2014-07-21 14:56:34 UTC

Created attachment 144852 [details]
Proposed patch

When using Xen and pf, performance of outbound traffic is very slow until tso is disabled. The attached patch fixes it. This may or may not be related to bug 135178.

Comment 1 Mark Felder freebsd_committer

2014-07-21 18:37:20 UTC

also bug 154428

Comment 2 Dag-Erling Smørgrav freebsd_committer

2014-07-22 17:34:21 UTC

Created attachment 144883 [details]
More conservative patch

When you say “TSO is disabled”, I assume you mean on the virtual interface (xnN) in the guest?

I'm not comfortable removing this block of code without a clear understanding of why it's there.  It looks to me like the intent is to ensure that all packets have a valid checksum before they enter pf.  Are we certain that pf won't break in some subtle way if that assumption is violated?

BTW, the patch does not remove the equivalent code in pf_check6_out(), but that code is broken anyway (as the comment points out).

A more conservative fix is to keep calculating the checksum before pf_test *unless* hardware checksum offloading is enabled.

Comment 3 Steve Wills freebsd_committer

2014-07-22 17:50:17 UTC

(In reply to Dag-Erling Smørgrav from comment #2)
> Created attachment 144883 [details]
> More conservative patch
> 
> When you say “TSO is disabled”, I assume you mean on the virtual interface
> (xnN) in the guest?
> 

Yes, just running "ifconfig xn0 -tso4" makes the issue disappear.

> I'm not comfortable removing this block of code without a clear
> understanding of why it's there.  It looks to me like the intent is to
> ensure that all packets have a valid checksum before they enter pf.  Are we
> certain that pf won't break in some subtle way if that assumption is
> violated?
 
I share your concern, hopefully the new patch addresses that. I can't say what else might break. :)

> BTW, the patch does not remove the equivalent code in pf_check6_out(), but
> that code is broken anyway (as the comment points out).

Good point, I can test the IPv6 case as well if needed.

> A more conservative fix is to keep calculating the checksum before pf_test
> *unless* hardware checksum offloading is enabled.

I like it, testing now to ensure it still fixes the issue, will report back later.

Comment 4 Dag-Erling Smørgrav freebsd_committer

2014-07-22 18:10:34 UTC

Please wait, the IPv6 part of my patch is wrong.  I will post a new patch in a bit.

Comment 5 Dag-Erling Smørgrav freebsd_committer

2014-07-23 07:02:55 UTC

Created attachment 144902 [details]
patch for pf delayed_csum bug

Here's a cleaned-up patch.  Instead of removing the call to in_delayed_csum(), it calls it only if the set of missing checksums does not completely overlap with the list of hardware-assisted checksums.

I'm not sure it fixes the issue, but it's at least a step in the right direction.

Comment 6 Steve Wills freebsd_committer

2014-07-23 18:59:15 UTC

(In reply to Dag-Erling Smørgrav from comment #5)
> Created attachment 144902 [details]
> patch for pf delayed_csum bug
> 
> Here's a cleaned-up patch.  Instead of removing the call to
> in_delayed_csum(), it calls it only if the set of missing checksums does not
> completely overlap with the list of hardware-assisted checksums.
> 
> I'm not sure it fixes the issue, but it's at least a step in the right
> direction.

This new patch builds fine and seems to solve the problem at least on IPv4.

Comment 7 Dag-Erling Smørgrav freebsd_committer

2014-07-23 19:18:50 UTC

Something about the patch was nagging me, and I suddenly realized a) why pf insists on having a correct checksum and b) why this breaks TSO.  The answer to a) is that there are cases (e.g. scrub, redirect) where it modifies the packet and then recalculates the checksum.  I *think*, but am not 100% positive, that the answer to b) is that the code does not clear the CSUM_TSO flag, so the adapter expects the checksum field to be 0 and calculates an incorrect checksum.

(Actually, I'm surprised that any of the patches actually work; my guess is that you're simply not exercising the parts of pf that require a correct checksum.  Unfortunately, I don't have a Xen environment to test on, so I can't reproduce the bug.)

There are basically two ways to go from here: the first is to keep the code there but modify it so that it clears the TSO flag after calculating the checksum, and the other is to fix pf so that it doesn't try to update the checksum if CSUM_DELAY_DATA is set, but instead makes sure that the correct flags are set.  The latter option is clearly preferable as the former basically defeats hardware offloading, thereby increasing CPU usage.

I also want to modify in6_delayed_cksum() so it works like in_delayed_cksum() (i.e. takes no arguments other than the mbuf).

Comment 8 Brian Rak 2014-08-28 22:00:18 UTC

I've found that this issue is also causing problems for FreeBSD running as a KVM guest using the VirtIO network drivers.

When Linux is the KVM host, it does some validation on the packets it gets from the guest.  When it receives packets with the TSO flag set, but without the flag indicating a header is needed, it rejects them and logs an error.  

On the Linux host, this manifests itself as a kernel warning (skb_warn_bad_offload in case anyone's searching for this later on), and poor performance (since I believe it drops the invalid packet).

Initially, I thought this was a bug in the virtio driver, but this post https://lists.freebsd.org/pipermail/freebsd-hackers/2014-August/045916.html led me to the pf code instead.

I tried the patch here, and I can confirm that it resolves the issue I was seeing when FreeBSD is running as a KVM guest.  I cannot confirm anything with pf, as I only have a trivial rule present.

Comment 9 Dag-Erling Smørgrav freebsd_committer

2014-09-04 10:11:10 UTC

To summarize:

The attached patch unbreaks IPv4 but reduces performance because it effectively disables TSO.  It may or may not fix IPv6 as well; the IPv6 checksum computation is much more complicated, and I'm not sure the in6_delayed_cksum() call is correct.  But at least it's no worse than it was before.

I have been working on a complete patch, where checksum offloading is disabled only for packets that pf needs to modify.  Unfortunately, this has proven to be very difficult due to the structure of the pf code and the large number of places in which a packet and its checksum can be modified.  I'm not sure it's realistically feasible without rewriting significant portions of the pf code.

Comment 10 Dag-Erling Smørgrav freebsd_committer

2014-11-25 11:47:13 UTC

Created attachment 149819 [details]
Calculate non-hardware-assisted checksums

This new patch (relative to latest head) is the best compromise I could come up with.  It does not prevent pf from updating the checksum, but it forces recalculation before entering pf if there are delayed checksums which are not handled in hardware, so the result will be correct.  If the delayed checksums are all handled in hardware, the result of pf's checksum manipulation will be garbage, but the NIC will compute the correct checksum.  Of course, this assumes that the hardware does not require the checksum field to be zero...

Comment 11 Bryan Drewery freebsd_committer

2015-05-18 17:30:15 UTC

I hit this in EC2. It was very surprising, even with an empty pf.conf it occurs.

Comment 12 Bryan Drewery freebsd_committer

2015-05-18 17:30:24 UTC

on 10.1-GENERIC.

Comment 13 John Baldwin freebsd_committer

2015-05-19 20:32:05 UTC

*** Bug 194838 has been marked as a duplicate of this bug. ***

Comment 14 Allan Jude freebsd_committer

2015-06-02 04:03:00 UTC

Does setting: net.inet.tcp.tso=0 solve the issue as well? 

Would it make sense to just make this sysctl default to 0 when virtualization is detected?

Comment 15 Ruben Kerkhof 2015-06-02 07:30:04 UTC

I only tested KVM, but it indeed solves it for me.

Comment 16 Mark Felder freebsd_committer

2015-06-02 13:01:26 UTC

(In reply to Allan Jude from comment #14)

I think this makes sense considering how long this has been going on. It would also require less out-of-the-box changes for companies providing FreeBSD VM images to choose from.

Comment 17 Roger Pau Monné freebsd_committer

2015-06-02 14:06:13 UTC

(In reply to Mark Felder from comment #16)

IMHO unconditionally disabling it without the user knowing is just a dirty workaround. What's the problem with latest patch from des, does it solve the issue?

Comment 18 Dag-Erling Smørgrav freebsd_committer

2015-06-02 14:30:24 UTC

It does not.  The best compromise for now is to disable TSO.  The checksum code in pf is so convoluted that a complete solution requires rewriting a significant portion of pf.  It may be more cost-effective to import a newer and presumably cleaner version of pf.

Comment 19 Roger Pau Monné freebsd_committer

2015-06-02 14:33:08 UTC

(In reply to Allan Jude from comment #14)

But if the guest has PCI passthrough devices disabling TSO globally seems wrong, it should be disabled in virtio-net and xen-netfront only.

Comment 20 Allan Jude freebsd_committer

2015-06-02 14:45:42 UTC

(In reply to Roger Pau MonnÃ© from comment #19)

That also seems reasonable. Don't forget to include whatever device Microsoft Hyper-V uses as well.

Comment 21 Roger Pau Monné freebsd_committer

2015-06-02 14:49:56 UTC

(In reply to Allan Jude from comment #20)

Right, that's hv_netvsc AFAIK. Does this also affect VMXNET3 (vmware PV nic)?

Comment 22 Bartek Rutkowski freebsd_committer

2015-07-03 17:35:12 UTC

Is there a test that could be run to determine if that issue occurs or not? I've 10.1-RELEASE-p10 vm's inside Citrix XenServer 6.5 SP1 (basically, latest patched version) and I am using pf there along with following interface options:

xn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=503<RXCSUM,TXCSUM,TSO4,LRO>

and I can't complain on anything when it comes to network performance, but perhaps I am missing something?

Comment 23 Mark Felder freebsd_committer

2015-07-03 19:04:44 UTC

That's a really new version of Xen and probably doesn't have the problem. But TSO issues are also observed in KVM. I can't comment on VMWare.

Comment 24 Ruben Kerkhof 2015-07-08 10:28:26 UTC

I just found there's a tunable for the kvm virtio nic: hw.vtnet.tso_disable=1.

Comment 25 Mark Felder freebsd_committer

2015-07-08 15:50:24 UTC

(In reply to Ruben Kerkhof from comment #24)

Is there a reason why one would want to do that instead of net.inet.tcp.tso=0 ?

Comment 26 Ruben Kerkhof 2015-07-09 07:49:08 UTC

Yes, I think the latter disables tso for all nics. We have vms where customers can switch between virtio-net and the em driver, and we only want TSO disabled for virtio-net.

Comment 27 Mark Felder freebsd_committer

2015-07-09 12:39:27 UTC

(In reply to Ruben Kerkhof from comment #26)

I was wondering if such a scenario could exist, but wasn't sure. Having to manually disable tso with ifconfig for each interface would be tedious and error-prone, so it makes good sense to have this sysctl.

Comment 28 Kristof Provost freebsd_committer

2015-11-01 14:56:49 UTC

Has anyone had the chance to test for this issue with 289316 (head) or 289703 (stable/10)?

I suspect this is fixed now.

Comment 29 Dag-Erling Smørgrav freebsd_committer

2015-11-01 18:07:16 UTC

I haven't tested it, but the patch looks good.  I started on something similar earlier this year but gave up because the pf code gives me a headache.

Comment 30 Ruben Kerkhof 2015-11-14 11:41:46 UTC

I just tested it on kvm and there it's solved now. Thanks a lot.