Bug 172648 - [pf] [ip6]: 'scrub reassemble tcp' breaks IPv6 packet checksum on SYN ACK
Summary: [pf] [ip6]: 'scrub reassemble tcp' breaks IPv6 packet checksum on SYN ACK
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.1-PRERELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Kristof Provost
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-12 20:10 UTC by Mark.Martinec
Modified: 2019-06-18 14:22 UTC (History)
7 users (show)

See Also:


Attachments
Minimal pf config (947 bytes, text/plain)
2015-01-23 21:40 UTC, Daniel Ylitalo
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mark.Martinec 2012-10-12 20:10:00 UTC
When pf (packet filter) is enabled and configured with 'scrub reassemble tcp',
IPv6 TCP connections take 9 seconds to establish. Packet capture shows
checksum errors on SYN ACK packets but not on other packets.

A TCP connection establishment (SYN) on IPv6 is (re-)tried four times,
with a 3 second delay between each attempt, while the TCP options are
being simplified each time by the kernel (dropping ECN, CWR, window
scaling, and dropping a timestamp options). Only the fourth attempt
is successful, with no other options but SACK, and this TCP session
then proceeds normally.

Disabling 'scrub reassemble tcp' in the pf avoids the problem.
Similarly, turning off net.inet.tcp.rfc1323 on either end
also avoids the problem, even with 'reassemble tcp' enabled.

The problem does not occur on IPv4 sessions, only on IPv6.

The problem is not associated with interface checksum offloading,
it is repeatable on gif, em, and re interfaces. Also a packet capture
(wireshark) shows packet checksum errors on SYN ACK packets (but
not on the SYN packet) in the first couple of failed attempts, and
no checksum errors on other packets (e.g. after a successfully
established session).

My guess is that the TCP timestamp option triggers a pf bug,
which then miscalculates a packet checksum on SYN ACK.

Fix: 

No known fix.

Two workarounds:
- don't use 'scrub reassemble tcp' in PF, or disable PF
- sysctl net.inet.tcp.rfc1323=0
How-To-Repeat: Use the following trivial pf config file:

  scrub all reassemble tcp
  pass all

Then try to establish any TCP session to any IPv6 address.
Any client will do (telnet, ssh, curl, web browser).
Try for example:
  curl -6 -L http://tools.ietf.org/rfc/rfc3021.txt | wc -l

The connection will 'hang' for 9 seconds (until a sufficiently
dumbed-down SYN options are tried), then it proceeds normally.
Comment 1 Mark.Martinec 2012-10-13 13:21:43 UTC
Btw, the effect described here looks very similar,
checksum errors on a SYN reply with IPv6 and pf:

http://lists.freebsd.org/pipermail/freebsd-stable/2012-July/068990.html

  Regression with jails/IPv6/pf
  Matthew Seaman <m.seaman@infracaninophile.co.uk>
  Thu Jul 26 23:10:43 UTC 2012


Mark
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2012-10-13 23:23:28 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-pf

Over to maintainer(s).
Comment 4 Kurt Jaeger freebsd_committer 2014-11-10 15:14:22 UTC
(In reply to Kurt Jaeger from comment #3)
> See 
> 
> https://lists.freebsd.org/pipermail/freebsd-net/2014-November/040319.html

Patch from Ermal Luçi inline in:

https://lists.freebsd.org/pipermail/freebsd-pf/2014-November/007500.html
Comment 5 Kurt Jaeger freebsd_committer 2014-12-07 11:01:06 UTC
In PR 179392 the commit r274709 worked on checksums. Can someone reproduce
the problem with that fix applied ?
Comment 6 Daniel Ylitalo 2015-01-06 01:42:09 UTC
I just tried the inline patch to these files on 10.1-p3
sys/netinet6/ip6_output.c
sys/netinet6/ip6_var.h
sys/netpfil/pf/pf_ioctl.c

(there was nothing changed in sys/netpfil/pf/pf.c if I'm reading the patch correctly)

It seems it does not work, this rule does not end up with the traffic at ::1.8080;
rdr pass log on igb0 inet6 proto tcp from any to any port 80 -> ::1 port 8080

Although I can see the rule being executed in pflog;
rule 10..16777216/0(match): rdr in on igb0: 2a00:1a28:1200:11::2.56746 > ::1.8080: Flags [S], seq 4110669173, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 2462372368 ecr 0], length 0
Comment 7 Gleb Smirnoff freebsd_committer 2015-01-23 18:35:16 UTC
Can you check whether the bug is still valid for stable/10?
Comment 8 Daniel Ylitalo 2015-01-23 21:40:57 UTC
Created attachment 152061 [details]
Minimal pf config

Just built kernel against stable r.277607

Added the pf devices to the generic kernel and rebuilt, the bug is still there.

Attachinig minimal pf config.

Here is how to test:

On the server with pf:
nc -6 -l 8080

From any other ipv6 enabled server:
nc -6 yourfbsdserversipv6 80
Comment 9 doktornotor 2015-06-14 15:55:27 UTC
(In reply to Gleb Smirnoff from comment #7)

This bug has been valid for 8.x, 9.x, 10.x and is not solved anywhere. There's no need to validate, a fix is strongly needed, though, exactly like it was fixed for ipfw (Bug 145733)
Comment 10 Gleb Smirnoff freebsd_committer 2015-06-15 09:09:57 UTC
Kristof, can you please look at this bug?
Comment 11 Gleb Smirnoff freebsd_committer 2015-06-15 09:11:13 UTC
Sorry, markp@. For unknown reason Bugzilla rewrites kp@FreeBSD.org to your login name. I will take bug to me, before this is fixed.
Comment 12 Gleb Smirnoff freebsd_committer 2015-06-15 09:52:55 UTC
Kristof, can you please look at this bug?
Comment 13 Kristof Provost freebsd_committer 2015-06-23 12:25:32 UTC
I've thus far been unable to reproduce this on either a bhyve guest (vtnet) or a physical machine (ale(4)).

I might be missing some part of the reproduction scenario, but I don't see what.
My pf.conf:
> scrub all reassemble tcp
> pass all

> curl -6 -L http://tools.ietf.org/rfc/rfc3021.txt | wc -l
Responds almost instantly. There's no 9 second delay.
Comment 14 Daniel Ylitalo 2015-06-23 17:18:05 UTC
If you change the listen port of your webserver to 8080 and then change 
pass all 
to 
rdr pass log on $ext inet6 proto tcp from any to any port 80 -> port 8080

You will see that the rule is executed in pflog but the traffic never ends up on the webserver
Comment 15 Mark Felder freebsd_committer 2015-07-15 13:48:35 UTC
I initially noticed this bug on all of my previous employer's FreeBSD servers when I upgraded to FreeBSD 9.x. The cause was definitely "scrub all reassemble tcp" as removing it across all our servers solved the problem for us. The symptoms we had were long connection establishment times and once connected it was very high latency and terribly slow; ssh over IPv6 was unusable.
Comment 16 Kristof Provost freebsd_committer 2015-07-21 20:09:10 UTC
(In reply to daniel from comment #14)

I'm still having no luck reproducing this.

The rdar rule you gave as an example (rdr pass log on $ext inet6 proto tcp from any to any port 80 -> port 8080) doesn't actually work because it doesn't specify a redirect target.

I've tested with this one instead:
rdr log on $ext inet6 proto tcp from any to any port 1234 -> 2001:db8::2 port 8080
and things work correctly.

(As an aside, I was initially testing with a rule which redirected to ::1. This doesn't work, and I'm not quite sure yet if I think that's a feature or a bug.)
Comment 17 Daniel Ylitalo 2015-07-21 20:40:43 UTC
(In reply to Kristof Provost from comment #16)

Sorry for missing to specify the target of ::1 in the reply to you, must have been tired :)

I did however specify it in the other comments in this PR.

I don't know if its a missing feature or bug but it does work to rdr with a target of 127.0.0.1 for ipv4, this is why I've been pushing this PR for ipv6 (because I thought it was a bug)

Sorry for any missunderstandings.
Comment 18 Kristof Provost freebsd_committer 2015-07-21 20:55:15 UTC
No worries.

From my perspective that means we're looking at two different issues though.
The first (which I can't reproduce) is that 'scrub all reassemble tcp' breaks TCP checksums.

The second is that rdr to ::1 doesn't work. This I can reproduce, and perhaps even do something about.

Let's look at the second one first, because there's more likely to be progress on that.
Comment 19 Kristof Provost freebsd_committer 2015-07-25 12:53:23 UTC
Ok, I think I've got a handle on what the problem is.

With rdr to ::1 we fail the scope check in ip6_input() (right after the pfil hook) because we have a packet to localhost with a m->m_pkthdr.rcvif which is not a loopback interface.
That can be fixed by having pf rewrite the rcvif, but that'd special-case rdr to ::1.

We've got a similar problem for the reply. There we've got a packet from ::1 to something else. This fails the scope lookup too.
In essence the problem is that we've already made the routing decision before pf gets the chance to rewrite the destination address.

I'm not quite sure how to fix this though.
Comment 20 Daniel Ylitalo 2015-07-27 08:35:27 UTC
If I understand the problem correctly wouldn't that also be the same problem if you were to rdr ipv6 from a public nic to a LAN nic? As you can't affect the routing from PF?

I.e:
rdr pass log on igb0 inet6 proto tcp from any to 2a00:1a28:1252::2 port 80 -> fc00::1234 port 8080

Or any scenario where you want to rdr the traffic from one nic to another
Comment 21 Kristof Provost freebsd_committer 2015-07-28 08:32:30 UTC
At first glance the issue is in in6_clearscope() and in6_setscope(). Those will only fail for the loopback address (::1) or link local addresses.
The rdr rule does work with GUAs.
Comment 22 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:49:53 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 23 ultramage 2019-06-18 14:22:58 UTC
I have been observing this issue for several years. I believe last time I tested it was on FreeBSD 11.2 from 2017. I re-tested this today on FreeBSD 12.0-RELEASE-p3 r343997 and have not noticed the disruption in tcp traffic that comes from bad checksums. From the tcpdumps, it looks like the OS is correctly performing timestamp randomization/masquerade on behalf of computers on both sides of the connection. So... I guess it's fixed? Don't know when exactly. A second confirmation would be appreciated.