Created attachment 144852 [details] Proposed patch When using Xen and pf, performance of outbound traffic is very slow until tso is disabled. The attached patch fixes it. This may or may not be related to bug 135178.
also bug 154428
Created attachment 144883 [details] More conservative patch When you say “TSO is disabled”, I assume you mean on the virtual interface (xnN) in the guest? I'm not comfortable removing this block of code without a clear understanding of why it's there. It looks to me like the intent is to ensure that all packets have a valid checksum before they enter pf. Are we certain that pf won't break in some subtle way if that assumption is violated? BTW, the patch does not remove the equivalent code in pf_check6_out(), but that code is broken anyway (as the comment points out). A more conservative fix is to keep calculating the checksum before pf_test *unless* hardware checksum offloading is enabled.
(In reply to Dag-Erling Smørgrav from comment #2) > Created attachment 144883 [details] > More conservative patch > > When you say “TSO is disabled”, I assume you mean on the virtual interface > (xnN) in the guest? > Yes, just running "ifconfig xn0 -tso4" makes the issue disappear. > I'm not comfortable removing this block of code without a clear > understanding of why it's there. It looks to me like the intent is to > ensure that all packets have a valid checksum before they enter pf. Are we > certain that pf won't break in some subtle way if that assumption is > violated? I share your concern, hopefully the new patch addresses that. I can't say what else might break. :) > BTW, the patch does not remove the equivalent code in pf_check6_out(), but > that code is broken anyway (as the comment points out). Good point, I can test the IPv6 case as well if needed. > A more conservative fix is to keep calculating the checksum before pf_test > *unless* hardware checksum offloading is enabled. I like it, testing now to ensure it still fixes the issue, will report back later.
Please wait, the IPv6 part of my patch is wrong. I will post a new patch in a bit.
Created attachment 144902 [details] patch for pf delayed_csum bug Here's a cleaned-up patch. Instead of removing the call to in_delayed_csum(), it calls it only if the set of missing checksums does not completely overlap with the list of hardware-assisted checksums. I'm not sure it fixes the issue, but it's at least a step in the right direction.
(In reply to Dag-Erling Smørgrav from comment #5) > Created attachment 144902 [details] > patch for pf delayed_csum bug > > Here's a cleaned-up patch. Instead of removing the call to > in_delayed_csum(), it calls it only if the set of missing checksums does not > completely overlap with the list of hardware-assisted checksums. > > I'm not sure it fixes the issue, but it's at least a step in the right > direction. This new patch builds fine and seems to solve the problem at least on IPv4.
Something about the patch was nagging me, and I suddenly realized a) why pf insists on having a correct checksum and b) why this breaks TSO. The answer to a) is that there are cases (e.g. scrub, redirect) where it modifies the packet and then recalculates the checksum. I *think*, but am not 100% positive, that the answer to b) is that the code does not clear the CSUM_TSO flag, so the adapter expects the checksum field to be 0 and calculates an incorrect checksum. (Actually, I'm surprised that any of the patches actually work; my guess is that you're simply not exercising the parts of pf that require a correct checksum. Unfortunately, I don't have a Xen environment to test on, so I can't reproduce the bug.) There are basically two ways to go from here: the first is to keep the code there but modify it so that it clears the TSO flag after calculating the checksum, and the other is to fix pf so that it doesn't try to update the checksum if CSUM_DELAY_DATA is set, but instead makes sure that the correct flags are set. The latter option is clearly preferable as the former basically defeats hardware offloading, thereby increasing CPU usage. I also want to modify in6_delayed_cksum() so it works like in_delayed_cksum() (i.e. takes no arguments other than the mbuf).
I've found that this issue is also causing problems for FreeBSD running as a KVM guest using the VirtIO network drivers. When Linux is the KVM host, it does some validation on the packets it gets from the guest. When it receives packets with the TSO flag set, but without the flag indicating a header is needed, it rejects them and logs an error. On the Linux host, this manifests itself as a kernel warning (skb_warn_bad_offload in case anyone's searching for this later on), and poor performance (since I believe it drops the invalid packet). Initially, I thought this was a bug in the virtio driver, but this post https://lists.freebsd.org/pipermail/freebsd-hackers/2014-August/045916.html led me to the pf code instead. I tried the patch here, and I can confirm that it resolves the issue I was seeing when FreeBSD is running as a KVM guest. I cannot confirm anything with pf, as I only have a trivial rule present.
To summarize: The attached patch unbreaks IPv4 but reduces performance because it effectively disables TSO. It may or may not fix IPv6 as well; the IPv6 checksum computation is much more complicated, and I'm not sure the in6_delayed_cksum() call is correct. But at least it's no worse than it was before. I have been working on a complete patch, where checksum offloading is disabled only for packets that pf needs to modify. Unfortunately, this has proven to be very difficult due to the structure of the pf code and the large number of places in which a packet and its checksum can be modified. I'm not sure it's realistically feasible without rewriting significant portions of the pf code.
Created attachment 149819 [details] Calculate non-hardware-assisted checksums This new patch (relative to latest head) is the best compromise I could come up with. It does not prevent pf from updating the checksum, but it forces recalculation before entering pf if there are delayed checksums which are not handled in hardware, so the result will be correct. If the delayed checksums are all handled in hardware, the result of pf's checksum manipulation will be garbage, but the NIC will compute the correct checksum. Of course, this assumes that the hardware does not require the checksum field to be zero...
I hit this in EC2. It was very surprising, even with an empty pf.conf it occurs.
on 10.1-GENERIC.
*** Bug 194838 has been marked as a duplicate of this bug. ***
Does setting: net.inet.tcp.tso=0 solve the issue as well? Would it make sense to just make this sysctl default to 0 when virtualization is detected?
I only tested KVM, but it indeed solves it for me.
(In reply to Allan Jude from comment #14) I think this makes sense considering how long this has been going on. It would also require less out-of-the-box changes for companies providing FreeBSD VM images to choose from.
(In reply to Mark Felder from comment #16) IMHO unconditionally disabling it without the user knowing is just a dirty workaround. What's the problem with latest patch from des, does it solve the issue?
It does not. The best compromise for now is to disable TSO. The checksum code in pf is so convoluted that a complete solution requires rewriting a significant portion of pf. It may be more cost-effective to import a newer and presumably cleaner version of pf.
(In reply to Allan Jude from comment #14) But if the guest has PCI passthrough devices disabling TSO globally seems wrong, it should be disabled in virtio-net and xen-netfront only.
(In reply to Roger Pau Monné from comment #19) That also seems reasonable. Don't forget to include whatever device Microsoft Hyper-V uses as well.
(In reply to Allan Jude from comment #20) Right, that's hv_netvsc AFAIK. Does this also affect VMXNET3 (vmware PV nic)?
Is there a test that could be run to determine if that issue occurs or not? I've 10.1-RELEASE-p10 vm's inside Citrix XenServer 6.5 SP1 (basically, latest patched version) and I am using pf there along with following interface options: xn0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=503<RXCSUM,TXCSUM,TSO4,LRO> and I can't complain on anything when it comes to network performance, but perhaps I am missing something?
That's a really new version of Xen and probably doesn't have the problem. But TSO issues are also observed in KVM. I can't comment on VMWare.
I just found there's a tunable for the kvm virtio nic: hw.vtnet.tso_disable=1.
(In reply to Ruben Kerkhof from comment #24) Is there a reason why one would want to do that instead of net.inet.tcp.tso=0 ?
Yes, I think the latter disables tso for all nics. We have vms where customers can switch between virtio-net and the em driver, and we only want TSO disabled for virtio-net.
(In reply to Ruben Kerkhof from comment #26) I was wondering if such a scenario could exist, but wasn't sure. Having to manually disable tso with ifconfig for each interface would be tedious and error-prone, so it makes good sense to have this sysctl.
Has anyone had the chance to test for this issue with 289316 (head) or 289703 (stable/10)? I suspect this is fixed now.
I haven't tested it, but the patch looks good. I started on something similar earlier this year but gave up because the pf code gives me a headache.
I just tested it on kvm and there it's solved now. Thanks a lot.