Bug 154428

Summary: [xen] xn0 network interface and PF - Massive performance drop
Product: Base System Reporter: Alex <alex>
Component: kernAssignee: Kristof Provost <kp>
Status: Closed FIXED    
Severity: Affects Only Me CC: bdrewery, meyer.sydney, mmpestorich, pgadmin, rainer, royger, thinker.lp, xen
Priority: Normal Flags: koobs: mfc-stable10+
koobs: mfc-stable9+
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Alex 2011-02-01 01:30:09 UTC
Hi Guys,

Have been forced to file a PR as I have had no answer on this from the freebsd-xen mailing list.

I am running FreeBSD under a XEN HVM environment with a commercial VPS provider. I recently went from running a generic type of kernel to one that includes the XENHVM options. I now have a network interface called xn0 instead of re0, It was obviously necessary to update my pf.conf as the interface name has changed.

All i did was edit the pf.conf file, and replace all instances of re0 with xn0. The performance seems to be aweful. I was wondering why network connectivity was so slow. A download test from apache struggled to do 2KB/s. I disabled pf and suddenly the speed skyrocketed. Any ideas where to look? I have the following in my kernel for PF:

device pf
device pflog
device pfsync
options         ALTQ
options         ALTQ_CBQ        # Class Bases Queuing (CBQ)
options         ALTQ_RED        # Random Early Detection (RED)
options         ALTQ_RIO        # RED In/Out
options         ALTQ_HFSC       # Hierarchical Packet Scheduler (HFSC)
options         ALTQ_PRIQ       # Priority Queuing (PRIQ)
options         ALTQ_NOPCC      # Required for SMP build

and pf.conf (very basic setup):
--------------------------------

mailblocklist = "{ 69.6.26.0/24 }"
#blacklist = "{ 202.16.0.11 }"

# Rule  0 (xn0)
#pass in quick on xn0 inet proto icmp from any  to (xn0)  label "RULE 0 -- ACCEPT "

#block mail server(s) that continue to try and send me junk
block in quick on xn0 inet proto tcp  from $mailblocklist to (xn0) port 25

#block anyone else who's in the blacklist
#block in quick on xn0 inet from $blacklist to (xn0)

pass in quick on xn0 inet proto tcp  from any  to (xn0) port { 110, 25, 80, 443, 21, 53 } flags any  label "RULE 0 -- ACCEPT "
pass in  quick on xn0 inet proto udp  from any  to (xn0) port 53  label "RULE 0 -- ACCEPT "

#
# Rule  1 (lo0)
pass  quick on lo0 inet  from any  to any no state  label "RULE 1 -- ACCEPT "
#
# Rule  2 (xn0) -- allow all outbound connectivity
pass out  quick on xn0 inet  from any  to any  label "RULE 2 -- ACCEPT "

# Rule  3 (xn0)
# deny all not matched by above
block in quick on xn0 inet  from any  to any no state  label "RULE 3 -- DROP "

--------------------------

Any ideas why I would be seeing such a performance hit? I need to get to the bottom of this as leaving a public facing machine with it's firewall disabled is bad news.

I am not sure whether this a PF or Network interface issue.

How-To-Repeat: Install freebsd 8.2RC2 in a XEN HVM environment (could also affect other versions of freebsd), build the XENHVM kernel then enable a simple PF ruleset like above. Test network throughput with PF enabled and also without PF enabled and witness the difference.
Comment 1 joovke 2011-02-01 03:29:58 UTC
Confirmed problem still evident in 8.2 RC3 (did a full source csup and 
buildworld).
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2011-02-01 04:53:08 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-xen

Over to maintainer(s).
Comment 3 Alex 2011-02-13 03:01:51 UTC
 Hi,

 Any update on this? I've had to disable external connections for some 
 services on my VPS due to dictionary/brute force attacks and having no 
 ability to use PF to firewall out the offending IP's/ranges. If nobody 
 is interested, I will go back to a generic kernel.
Comment 4 Alex 2011-02-13 04:21:20 UTC
 Fixed by net.inet.tcp.tso: 1 -> 0

 but why?? found this by trial an error.  setting net.inet.tcp.tso to 0 
 with pf enabled gives good performance, if i set it to 1, speeds plumet 
 to below dialup!
Comment 5 Colin Percival freebsd_committer freebsd_triage 2011-02-13 04:31:07 UTC
On 02/12/11 20:18, Alex wrote:
> Fixed by net.inet.tcp.tso: 1 -> 0
> 
> but why?? found this by trial an error.  setting net.inet.tcp.tso to 0
> with pf enabled gives good performance, if i set it to 1, speeds plumet
> to below dialup!

There have been problems with Xen and TSO in the past relating to how much
data gets handed off to the hypervisor at once... why this would cause issues
only with PF, I have no idea, though.

-- 
Colin Percival
Security Officer, FreeBSD | freebsd.org | The power to serve
Founder / author, Tarsnap | tarsnap.com | Online backups for the truly paranoid
Comment 6 Alex 2011-02-13 04:45:28 UTC
Beats me. Perhaps someone can look into it as i am out of my league with 
this one from this point onwards. I am happy to try any patches and 
rebuild and report the outcomes if need be.

Its a *big* relief to have a firewall again though!
Comment 7 joovke 2011-02-14 12:07:29 UTC
I stumbled across PR 135178, perhaps there is some relationship with 
these PR's, though the reporter of that PR has not responded in some time.

On 02/13/11 15:31, Colin Percival wrote:
> On 02/12/11 20:18, Alex wrote:
>> Fixed by net.inet.tcp.tso: 1 ->  0
>>
>> but why?? found this by trial an error.  setting net.inet.tcp.tso to 0
>> with pf enabled gives good performance, if i set it to 1, speeds plumet
>> to below dialup!
> There have been problems with Xen and TSO in the past relating to how much
> data gets handed off to the hypervisor at once... why this would cause issues
> only with PF, I have no idea, though.
>
Comment 8 Mark Felder freebsd_committer freebsd_triage 2012-08-23 22:14:18 UTC
I've hit this on 9.0-RELEASE as well using XCP 1.5beta as the hypervisor.
Comment 9 Mark Felder freebsd_committer freebsd_triage 2013-07-15 14:43:30 UTC
I wasn't able to replicate this on an 8.4 XENHVM kernel -- perhaps this  
has now been fixed?

When 9.2-RELEASE drops we should test there as well before closing this  
out.
Comment 10 Alex 2013-07-16 05:09:49 UTC
On 2013-07-15 23:43, Mark Felder wrote:
> I wasn't able to replicate this on an 8.4 XENHVM kernel -- perhaps
> this  has now been fixed?
>
> When 9.2-RELEASE drops we should test there as well before closing 
> this  out.

Hi Mark,

You're certain TSO is enabled for the NIC? ie: Not disabled via 
ifconfig or sysctl?

Cheers,
Alex.
Comment 11 Bryan Drewery freebsd_committer freebsd_triage 2015-05-18 17:34:17 UTC
I just hit this on 10.1-GENERIC on EC2. Empty pf.conf with pf enabled = horrible performance. Disabling pf or TSO with pf fixes it.
Comment 12 Jonas Liepuonius 2015-09-20 23:04:17 UTC
Hello,

so I've encountered the same problem on FreeBSD 10.2 on XenServer 6.5.

Enabling pf basically shuts off my interfaces, pings go through but nothing else.

I tried setting net.inet.tcp.tso=0 on boot in /etc/sysctl.conf and setting it manually with sysctl but with no luck. ifconfig xn0 -tso -lro didn't help me either.
Comment 13 Bryan Drewery freebsd_committer freebsd_triage 2015-09-21 19:50:46 UTC
(In reply to Bryan Drewery from comment #11)
> I just hit this on 10.1-GENERIC on EC2. Empty pf.conf with pf enabled =
> horrible performance. Disabling pf or TSO with pf fixes it.

My exact issue with with ifconfig having TSO4 on the interfaces. Just changing the sysctl to 0 fixes the issue. Setting to 1 brings back the extreme packet loss/latency issues.
Comment 14 Bryan Drewery freebsd_committer freebsd_triage 2015-10-02 17:03:14 UTC
Possible patch in https://reviews.freebsd.org/D3779
Comment 15 commit-hook freebsd_committer freebsd_triage 2015-10-14 16:22:11 UTC
A commit references this bug:

Author: kp
Date: Wed Oct 14 16:21:41 UTC 2015
New revision: 289316
URL: https://svnweb.freebsd.org/changeset/base/289316

Log:
  pf: Fix TSO issues

  In certain configurations (mostly but not exclusively as a VM on Xen) pf
  produced packets with an invalid TCP checksum.

  The problem was that pf could only handle packets with a full checksum. The
  FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
  addresses, length and protocol).
  Certain network interfaces expect to see the pseudo-header checksum, so they
  end up producing packets with invalid checksums.

  To fix this stop calculating the full checksum and teach pf to only update TCP
  checksums if TSO is disabled or the change affects the pseudo-header checksum.

  PR:		154428, 193579, 198868
  Reviewed by:	sbruno
  MFC after:	1 week
  Relnotes:	yes
  Sponsored by:	RootBSD
  Differential Revision:	https://reviews.freebsd.org/D3779

Changes:
  head/sys/net/pfvar.h
  head/sys/netpfil/pf/pf.c
  head/sys/netpfil/pf/pf_ioctl.c
  head/sys/netpfil/pf/pf_norm.c
Comment 16 Alex 2015-10-14 22:32:21 UTC
Good to see some resolution to this.

But seriously:

2011-02-01 12:30:09 AEDT 

That's when I opened this PR. Why has it taken over 4 years to fix???? Bit long long in the tooth?

Anyway. Better late than never. Kudos to those who actually did something about fixing this issue.

Alex.
Comment 17 Jonas Liepuonius 2015-10-19 23:13:44 UTC
Well I tried latest FreeBSD-HEAD that should include the patch, but unfortunately it didn't solve the problem for me. I can still get pings through, but nothing else. TCP connections just time out. My issue is either different or there's something special about my XenServer, anyways great work guys!
Comment 18 Sydney Meyer 2015-10-19 23:25:26 UTC
(In reply to Jonas Liepuonius from comment #17)

I had also tested a recent snapshot of both, head and stable, and on my Xen 4.4/Linux 4.2 setup the IPv4 TCP performance problems seem to be gone, i.e. host-host single IPv4 TCP stream.

The routing problems, as in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=188261, are still present, i.e. icmp works, tcp times out.
Comment 19 Alex 2015-10-19 23:26:26 UTC
@Jonas

I think your issue is something else. The issue was never no tcp flow at all, it was excruciatingly slow tcp performance.
Comment 20 Jonas Liepuonius 2015-10-19 23:40:03 UTC
I probably have the routing issue mentioned above by Sydney. Anyways, we're moving in the right direction.
Comment 21 commit-hook freebsd_committer freebsd_triage 2015-10-21 15:33:21 UTC
A commit references this bug:

Author: kp
Date: Wed Oct 21 15:32:21 UTC 2015
New revision: 289703
URL: https://svnweb.freebsd.org/changeset/base/289703

Log:
  MFC r289316:

  pf: Fix TSO issues

  In certain configurations (mostly but not exclusively as a VM on Xen) pf
  produced packets with an invalid TCP checksum.

  The problem was that pf could only handle packets with a full checksum. The
  FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
  addresses, length and protocol).
  Certain network interfaces expect to see the pseudo-header checksum, so they
  end up producing packets with invalid checksums.

  To fix this stop calculating the full checksum and teach pf to only update TCP
  checksums if TSO is disabled or the change affects the pseudo-header checksum.

  PR:             154428, 193579, 198868
  Relnotes:       yes
  Sponsored by:   RootBSD

Changes:
_U  stable/10/
  stable/10/sys/net/pfvar.h
  stable/10/sys/netpfil/pf/pf.c
  stable/10/sys/netpfil/pf/pf_ioctl.c
  stable/10/sys/netpfil/pf/pf_norm.c
Comment 22 commit-hook freebsd_committer freebsd_triage 2015-12-25 15:13:08 UTC
A commit references this bug:

Author: kp
Date: Fri Dec 25 15:12:12 UTC 2015
New revision: 292731
URL: https://svnweb.freebsd.org/changeset/base/292731

Log:
  pf: Fix TSO issues

  In certain configurations (mostly but not exclusively as a VM on Xen) pf
  produced packets with an invalid TCP checksum.

  The problem was that pf could only handle packets with a full checksum. The
  FreeBSD IP stack produces TCP packets with a pseudo-header checksum (only
  addresses, length and protocol).
  Certain network interfaces expect to see the pseudo-header checksum, so they
  end up producing packets with invalid checksums.

  To fix this stop calculating the full checksum and teach pf to only update TCP
  checksums if TSO is disabled or the change affects the pseudo-header checksum.

  PR:             154428, 193579, 198868
  Sponsored by:   RootBSD

Changes:
  stable/9/sys/contrib/pf/net/pf.c
  stable/9/sys/contrib/pf/net/pf_ioctl.c
  stable/9/sys/contrib/pf/net/pf_norm.c
  stable/9/sys/contrib/pf/net/pfvar.h
Comment 23 Kubilay Kocak freebsd_committer freebsd_triage 2015-12-25 15:51:39 UTC
Assign to committer that's taking care of it
Comment 24 Andreas Pflug 2016-01-30 10:24:58 UTC
Is this fix supposed to be included in 10.2p11? If so, it doesn't fix https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197344 which is a similar problem but is present whether pf is enabled or not.
Comment 25 Kristof Provost freebsd_committer freebsd_triage 2016-01-30 14:14:40 UTC
(In reply to Andreas Pflug from comment #24)
The fix for the TSO problem with pf is included in 10.2p11, yes.

That problem was exclusively a pf problem though, so if you saw the problem with pf disabled then I'd expect it to be a different problem.
Comment 26 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:49:01 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.