Bug 253476

Summary: ipfw keepalive: tcp_do_segment: Timestamp missing, segment silently dropped
Product: Base System Reporter: Stefan Ehmann <shoesoft>
Component: kernAssignee: freebsd-ipfw (Nobody) <ipfw>
Status: Open ---    
Severity: Affects Only Me CC: ae, chris, donner, emaste, fbsdbugs4, freebsd, ipfw, marek, raul.munoz, rgrimes, rscheff, topical, tuexen
Priority: ---    
Version: 13.0-STABLE   
Hardware: amd64   
OS: Any   

Description Stefan Ehmann 2021-02-13 07:28:59 UTC
After upgrading from 12.2 to 13.0-BETA1 TCP connections can no longer transfer data after being idle.

git bisect revealed the regression was introduced in
283c76c7c3f2f634f19f303a771a3f81fe890cab

The problem is that the keepalive packets for keep-state rules are dropped. Dynamic rules are removed and the existing connections no longer work.
As a workaround net.inet.tcp.tolerate_missing_ts=1 can be used.

How to reproduce:
The test setup uses epair for network communication to rule out a faulty 3rd party TCP stack. I can also reproduce the problem with real network hosts.
Tested with 13.0-BETA1 amd64 VM-image in virtualbox.

Terminal 1:
kldload ipfw
sysctl net.inet.tcp.log_debug=1
sysctl net.inet.ip.fw.dyn_ack_lifetime=60 # optional, speeds up testing
ipfw add allow tcp from me to any setup keep-state
ifconfig epair0 create
ifconfig epair0a 10.0.0.101 up
ifconfig epair0b 10.0.0.102 up
while true; do ipfw -Da list; sleep 1; done # list dynamic rules, sh syntax

Terminal 2:
nc -l 10.0.0.101 12345

Terminal 3:
nc -s 10.0.0.102 10.0.0.101 12345

With TCP debug enabled, I can see the following log entry:
TCP: [10.0.0.102]:13140 to [10.0.0.101]:12345 tcpflags 0x10<ACK>; tcp_do_segment: Timestamp missing, segment silently dropped

The timer of the dynamic rule expires and is removed.
Comment 1 Richard Scheffenegger freebsd_committer 2021-02-13 15:58:23 UTC
Looking into this; without the tolerate_missing_ts, using the TCP_NOOPT socket option would also result in sessions not being set up:

 nc --no-tcpopt -l 10.0.0.101 12345 &
 nc -s 10.0.0.102 10.0.0.101 12345

The server side would respond without timestamp, and even so it was not yet negotiated, the client would apparently request the SYN,ACK missing the TS...



16:48:10.515731 IP 10.0.0.102.46343 > 10.0.0.101.12345: Flags [S], seq 1670959926, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 1505575656 ecr 0], length 0
16:48:10.515908 IP 10.0.0.101.12345 > 10.0.0.102.46343: Flags [S.], seq 3686780973, ack 1670959927, win 65535, length 0
16:48:10.515959 IP 10.0.0.102.46343 > 10.0.0.101.12345: Flags [.], ack 1, win 65535, length 0
16:48:11.514681 IP 10.0.0.101.12345 > 10.0.0.102.46343: Flags [S.], seq 3686780973, ack 1670959927, win 65535, length 0
16:48:11.514829 IP 10.0.0.102.46343 > 10.0.0.101.12345: Flags [.], ack 1, win 65535, length 0
16:48:13.713713 IP 10.0.0.101.12345 > 10.0.0.102.46343: Flags [S.], seq 3686780973, ack 1670959927, win 65535, length 0
16:48:13.713975 IP 10.0.0.102.46343 > 10.0.0.101.12345: Flags [.], ack 1, win 65535, length 0
16:48:17.916016 IP 10.0.0.101.12345 > 10.0.0.102.46343: Flags [S.], seq 3686780973, ack 1670959927, win 65535, length 0
16:48:17.917153 IP 10.0.0.102.46343 > 10.0.0.101.12345: Flags [.], ack 1, win 65535, length 0
16:48:58.977772 IP 10.0.0.102.46343 > 10.0.0.101.12345: Flags [.], ack 1, win 0, length 0
16:48:58.979551 IP 10.0.0.101.12345 > 10.0.0.102.46343: Flags [R], seq 3686780974, win 0, length 0


For the reported issue, this doesn't appear to be ipfw related per se,
the persist / keepalive segemtns are being sent without TS, even though TS was negotiated:


16:26:39.655911 IP 10.0.0.102.43262 > 10.0.0.101.12345: Flags [S], seq 1299597894, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 3308112659 ecr 0], length 0
16:26:39.656269 IP 10.0.0.101.12345 > 10.0.0.102.43262: Flags [S.], seq 3229957175, ack 1299597895, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 2214622743 ecr 3308112659], length 0
16:26:39.656370 IP 10.0.0.102.43262 > 10.0.0.101.12345: Flags [.], ack 1, win 1277, options [nop,nop,TS val 3308112659 ecr 2214622743], length 0

16:27:23.455682 IP 10.0.0.102.43262 > 10.0.0.101.12345: Flags [.], ack 1, win 0, length 0 ## should have the TS option too.
16:27:28.456635 IP 10.0.0.102.43262 > 10.0.0.101.12345: Flags [.], ack 1, win 0, length 0
16:27:33.453985 IP 10.0.0.102.43262 > 10.0.0.101.12345: Flags [.], ack 1, win 0, length 0
16:27:38.456062 IP 10.0.0.102.43262 > 10.0.0.101.12345: Flags [.], ack 1, win 0, length 0
16:28:06.343812 IP 10.0.0.102.43262 > 10.0.0.101.12345: Flags [F.], seq 1, ack 1, win 1277, options [nop,nop,TS val 3308199350 ecr 2214622743], length 0
16:28:06.344116 IP 10.0.0.101.12345 > 10.0.0.102.43262: Flags [.], ack 2, win 1277, options [nop,nop,TS val 2214709434 ecr 3308199350], length 0
16:28:06.344363 IP 10.0.0.101.12345 > 10.0.0.102.43262: Flags [F.], seq 1, ack 2, win 1277, options [nop,nop,TS val 2214709434 ecr 3308199350], length 0
16:28:06.344419 IP 10.0.0.102.43262 > 10.0.0.101.12345: Flags [.], ack 2, win 1276, options [nop,nop,TS val 3308199350 ecr 2214709434], length 0
Comment 2 Michael Tuexen freebsd_committer 2021-02-13 16:34:20 UTC
I can take a look... Should I take over the bugreport?
Comment 3 Richard Scheffenegger freebsd_committer 2021-02-13 16:56:27 UTC
Handing back to #ipfw, as the offensive code is in ip_fw_dynamic.c, and there is not sufficient state there to reconstruct a keepalive ACK (TSval, TSecr of the last outgoing packet).

The issue around TF_NOOPT is in D28652
Comment 4 Helge Oldach 2021-02-14 06:12:48 UTC
I believe this goes back to D27148, enforcing TSopt for RFC 7323 compliance if negotiated. This is MFC'd so I expect we will see said behaviour in recent stable/12 (and stable/11) as well.

I am however wondering about the strict TSopt requirement for keepalive packets not carrying user data (as in this case). IMHO this requirement will help neither PAWS nor RTTM: We are in an idle situation where there is simply no user data to transmit, so there is no risk of corrupting user data nor a compelling reason for RTT measurements.

RFC 7223 section 3.2 already mentions situations where the guidelines are too strict.

The case with ipfw generated keepalives is that ipfw apparently doesn't store sufficient context to generate a proper TSopt. We can perhaps reuse previous TSval/TSecr but that might potentially impact PAWS/RTTM?

As a quick fix meant to improve interoperability I would suggest to amend the tolerate_missing_ts sysctl in a way that we have three alternatives for dealing with negotiated TSopt:
- Strict RFC 7223, dropping packets w/o TSopt (currently: tolerate_missing_ts=0)
- Accept packets w/o TSopt if zero data
- Accept all packets (currently: tolerate_missing_ts=1)
Comment 5 Michael Tuexen freebsd_committer 2021-02-14 13:34:37 UTC
(In reply to Helge Oldach from comment #4)

This middlebox code lets an RFC compliant end-point look like an endpoint violating the specification. So if the peer wants to talk to a broken end-point, the peer can set the corresponding sysctl. So I don't see a value in making the configuration more complex.

I also haven't thought about the consequence of a firewall pretending that an end point is still alive, although it might not be.

Personally, I wouldn't expect a middlebox inserting packets in a communication, which break the specification, but instead would like them to filter out communications which break the specifications. But that is a personal opinion.
Comment 6 Helge Oldach 2021-02-14 14:00:13 UTC
(In reply to Michael Tuexen from comment #5)
Indeed a proper fix would be in ipfw - but that is our code as well, and it looks like a major effort while tweaking TSopt slightly seems more straightforward.

The essence of this bug report is that D27148 broke working setups. Maybe tolerate_missing_ts=1 should be a sensible default?
Comment 7 Michael Tuexen freebsd_committer 2021-02-14 17:45:54 UTC
(In reply to Helge Oldach from comment #6)
> Indeed a proper fix would be in ipfw - but that is our code as well, and it
> looks like a major effort while tweaking TSopt slightly seems more straightforward.

OK. We agree that there this is a bug in ipfw. Why not use in ipfw a timeout which is in tune with standard keepalive timeout. Then there is no need for ipfw to send out packets pretending that a peer is still alive...

> The essence of this bug report is that D27148 broke working setups.
> Maybe tolerate_missing_ts=1 should be a sensible default?

D27148 breaks setups with broken peers or with middleboxes transforming non-broken peers into broken peers. D27148 just uncovers bug in ipfw which has been there for a longer time. In my personal view, having a sysctl you have to tweak if you want to communicate with broken peers is fine. You seem to have a different opinion.
I can bring this up at the next bi-weekly transport telco and see what others think.
Comment 8 Andrey V. Elsukov freebsd_committer 2021-02-15 21:49:42 UTC
(In reply to Michael Tuexen from comment #7)

>OK. We agree that there this is a bug in ipfw. Why not use in ipfw a timeout 
>which is in tune with standard keepalive timeout. Then there is no need for ipfw 
>to send out packets pretending that a peer is still alive...

ipfw by default uses 300 seconds as TTL for TCP states. The default keepalive idle interval in TCP stack, AFAIR, is 2 hours. For 2 hours typical gateway with ipfw for some network can create several tens millions of states. Small interval is used to reduce memory requirements and CPU usage, since state search can be done for every packet several times, depending from the ruleset.
This keepalive implementation in ipfw was used and worked well at least last 20 years.
Comment 9 Michael Tuexen freebsd_committer 2021-02-15 21:54:44 UTC
(In reply to Andrey V. Elsukov from comment #8)
>This keepalive implementation in ipfw was used and worked well at least
>last 20 years.

So you think it is good behaviour to let compliant end-points look as broken ones?
Comment 10 Andrey V. Elsukov freebsd_committer 2021-02-15 22:02:22 UTC
(In reply to Michael Tuexen from comment #9)
> So you think it is good behaviour to let compliant end-points look as broken ones?

I did not investigate the problem and didn't read corresponding RFC, I even don't use keepalives personally, so I just say how it looks like from user point of view - it worked for 20 years, and now it doesn't :)
Comment 11 Richard Scheffenegger freebsd_committer 2021-02-16 11:58:04 UTC
To add more confusion: 

The advisory UTO option could be used by ipfw to piggy-back on some ACKs, to indicate to the end hosts, within which interval it expects the end hosts to send out another keepalive or data/control packet.

https://tools.ietf.org/html/rfc5482

(Obviously, FBSD would need to implement processing of this options too.
Also, it doesn't help the puristic case, where no modification of the on-wire packet stream is allowable).

Pragmatically, the most easy way out would be for ipfw to keep additional state with the TSopt val and ecr of the most recent packet, and include the TSopt in ipfw-originated keepalives.
Comment 12 Richard Scheffenegger freebsd_committer 2021-02-25 17:58:22 UTC
We discussed this behavior in the #transport call today.

The stack behaves according to RFC73232 with the default options.

The consensus of the groups was that the TCP stack default should not be changed. The current behavior of ipfw with the injected keepalive would break with other (non-fbsd) stacks which adhere to RFC7323 just the same.

Two other remediation options were discussed: 
a) retain more state in ipfw when a timestamp option is present, and use the most recent TSval / ecr combination observed when injecting the keepalive
b) intercept the 3WHS and remove the timestamp option there.

Option a) was preferred heavily - option b) fails when the firewall only gets to see an ongoing session (e.g. rerouting events) but not the syn, and reduces the information available to the TCP endpoints to run a number of mechanisms designed to improve performance and enhance data integrity at high speeds.

option a) could still fail, for example if ipfw does not see the most recent segment in the direction, where the keepalive is to be injected - as per RFC7323, old timestamp values render the segment not acceptable. But in the vast majority of instances, that approach will make ipfw compliant with RFC7323.
Comment 13 Helge Oldach 2021-02-25 19:03:44 UTC
(In reply to Richard Scheffenegger from comment #12)
Adding state to ipfw is fine, but will not be quick and easy I suspect.

Until that is fixed, can we either default to tolerate_missing_ts=1, or just silently accept rather than drop zero data (!) packets that are missing TS (yes, gently violating RFC7323)? Either option should restore the behaviour before D27148.

Again, just for the time being until ipfw is fixed.
Comment 14 topical 2021-04-28 10:52:39 UTC
*** Bug 255438 has been marked as a duplicate of this bug. ***