Bug 217637 - One TCP connection accepted TWO times
Summary: One TCP connection accepted TWO times
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: Michael Tuexen
URL: https://reviews.freebsd.org/D10272
Keywords: needs-qa
Depends on:
Blocks:
 
Reported: 2017-03-08 09:44 UTC by Alexandre martins
Modified: 2017-10-02 19:26 UTC (History)
14 users (show)

See Also:
koobs: mfc-stable11+
koobs: mfc-stable10+


Attachments
tcpdump from the client side view (3.32 KB, text/plain)
2017-03-08 09:44 UTC, Alexandre martins
no flags Details
Server side view capture (3.95 KB, application/vnd.tcpdump.pcap)
2017-03-10 09:36 UTC, Alexandre martins
no flags Details
Client side view (3.87 KB, application/vnd.tcpdump.pcap)
2017-03-10 09:36 UTC, Alexandre martins
no flags Details
Scapy client (1.42 KB, text/x-python)
2017-03-14 13:10 UTC, Alexandre martins
no flags Details
packetdrill script to reproduce the duplicate accept problem (2.19 KB, text/x-csrc)
2017-03-14 21:03 UTC, Michael Tuexen
no flags Details
Linux server, short delay between header and data (2.12 KB, application/vnd.tcpdump.pcap)
2017-03-20 13:42 UTC, Alexandre martins
no flags Details
Linux server, long delay between header and data (1.72 KB, application/vnd.tcpdump.pcap)
2017-03-20 13:44 UTC, Alexandre martins
no flags Details
packetdrill script to show how Linux handles data on closed socket (2.20 KB, text/x-csrc)
2017-03-20 14:35 UTC, Michael Tuexen
no flags Details
Updated packetdrill script to show how Linux handled data on closed socket (2.19 KB, text/x-csrc)
2017-03-20 14:59 UTC, Michael Tuexen
no flags Details
patch to not send acks in this case (672 bytes, patch)
2017-07-24 22:09 UTC, Richard Russo
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Alexandre martins 2017-03-08 09:44:17 UTC
Created attachment 180626 [details]
tcpdump from the client side view

Hello,

I currently having a strange problem, my web server accept two time the SAME tcp connection.

After digging, it's due to a TCP retransmission. There is the TCP scenario:


Client     |     SERVER     | Comment

SYN ------------->
    <------------- SYN/ACK
ACK ------------->             Connection accepted in the server

PUSH/ACK -------->             POST /someURL HTTP/1.0\r\nContent-length: 10\r\n\r\n

    <------------- PUSH/ACK    500 error
    <------------- FIN/ACK     Connection closed in the server

PUSH/ACK -------->             data of the post
    <------------- RST         This reset seems to be lost or filtered by some firewall 

PUSH/ACK -------->             Re-xmit all: POST /someURL HTTP/1.0\r\nContent-length: 10\r\n\r\ndatadatada
        ?????????? =======>    this cause the connection to be RE-ACCEPTED by the server !?!?!

ACK ------------->             Ack the 500 error message
ACK ------------->             Ack the FIN
    <------------- ACK         This ACK seems BROKEN. The TCP sequence mismatch

And the connection never close properly until the TCP timeout.

I attach the tcpdump of this connection from the client side.
Comment 1 Hiren Panchasara freebsd_committer freebsd_triage 2017-03-10 01:06:17 UTC
If possible, it'd be useful to look at real pcaps (and not text excerpts) from both ends.
Comment 2 Alexandre martins 2017-03-10 09:36:24 UTC
Created attachment 180691 [details]
Server side view capture
Comment 3 Alexandre martins 2017-03-10 09:36:47 UTC
Created attachment 180692 [details]
Client side view
Comment 4 Alexandre martins 2017-03-10 09:41:09 UTC
I attached the capture of the most simple test that I made to reproduce the bug.

The connection pass through a firewall that drop the TCP reset.

The server logs that connection is accepted two time.
Comment 5 Hiren Panchasara freebsd_committer freebsd_triage 2017-03-11 00:57:41 UTC
Adding Michael for inputs as he had done some cleanup in this area.

What FreeBSD version are you running on both sides? Is it 10,3R as reported in the bug? Do you mind trying -head to rule out any fixes?

Server sends FIN and on the following request sends RST which gets dropped somehow and doesn't reach client. But even after receiving first FIN, client asks for data to which servers reopens the connection (which seems like the real bug), sends data and again sends FIN after that. Now client acks that FIN and sends its own FIN which server acks but client now doesn't like it. So it keeps sending FIN which server keeps acking.

Also, client and server seem off by 10seconds?
Comment 6 Michael Tuexen freebsd_committer freebsd_triage 2017-03-11 08:27:31 UTC
Which OSes are used on the client and on the server side? Both FreeBSD 10.3?
Comment 7 Alexandre martins 2017-03-13 08:36:28 UTC
The server run FreeBSD 10.3. The client (originally) was a Android smartphone. In the capture, I use a Ubuntu 16.10 up-to-date.

I cut the capture to avoid to have a big file, but in reality, the fin/ack storm runs about 1 minute. After that a reset is send and the storm ends.

I'll try to build a head version today. I hope to give you the result in few hours.
Comment 8 Alexandre martins 2017-03-14 10:25:19 UTC
I made the test on the fresh compiled FreeBSD:

root@FreeBSD-head:~ # uname -a
FreeBSD FreeBSD-head 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r315233+87dae13e7e9(master): Tue Mar 14 10:04:57 CET 2017     root@FreeBSD-head:/usr/obj/root/freebsd/sys/GENERIC  i386

The problem is still present, the connection is accepted 2 times by the server. The trace look the same.
Comment 9 Michael Tuexen freebsd_committer freebsd_triage 2017-03-14 10:58:21 UTC
Great. Let me try to reproduce the issue with packetdrill. If that works, we have a simple way of debugging the problem.
Comment 10 Alexandre martins 2017-03-14 13:10:06 UTC
Created attachment 180814 [details]
Scapy client

I made a scapy script that produce the bug. This client ignore the RST, but don't produce the storm.

To make it working, you must have a server that respond to the post before the end of the data.
Comment 11 Michael Tuexen freebsd_committer freebsd_triage 2017-03-14 21:03:56 UTC
Created attachment 180829 [details]
packetdrill script to reproduce the duplicate accept problem

This is a test script which reproduces the duplicate accept problem on FreeBSD head.
Comment 12 Michael Tuexen freebsd_committer freebsd_triage 2017-03-15 19:36:23 UTC
I did some testing and can explain some of the aspects of the traces, especially the ones on the FreeBSD side.

The client establishes the TCP connection and send the first fragment of the HTTP request. The server receives this, sends a response which acknowledges the first fragment and closes the socket. This results in sending the FIN-ACK. This is covered in Frame 3 to Frame 8 of the server.pcap file. This looks OK.

Now the server receives the second fragment of the HTTP request and responds correctly with a RST segment. This gets dropped by some middlebox and is not received by the client. This is Frame 9 and 10.

The client does not have processed the response from the server. I have no idea why. This response also ACKs the first fragment of the HTTP request. Since it was not processed, the client retransmits the complete HTTP request in Frame 11.

When the server gets this TCP ACK segment, it finds a listening socket and verifies that the segment is a valid SYN-cookie. Therefore it establishes a new connection, moves it into ESTABLISHED and processes the data. It sends an error message and closes the socket which results in the sending of the FIN-ACK segment. This explains Frame 12 and Frame 13.

The peer processes these two segments and sends corresponding ACK-segments in Frame 14 and 15. I guess the client also closes the socket which results in the sending of the FIN-ACK in Frame 16. The server ACKs it in Frame 17.

However, this ACK is not processed by the client, therefore the client retransmits its FIN-ACK, which the server ACK again. This is Frame 18 an 19. Since the ACK from the server is never processed, this pattern repeats until the client finally gives up.

What I do not understand: Why does the client not process the packets it doesn't process. The client.pcap shows that they are received. The sequence numbers look OK to me. However, the client side is not FreeBSD.

So at least the above explains the FreeBSD behaviour.
Comment 13 Alexandre martins 2017-03-16 08:47:41 UTC
I want to remind you that the original client is a smartphone. The first time that I saw the problem, I made a tcpdump on the wireless box, not on the smartphone itself.

The server response may have been delayed into the wifi process (poor signal ?) and takes time to reach the phone (but has already been captured into the pcap). The phone may have done a re-transmit because it thinks that the http request was lost.

I just managed that to reproduce it through the scapy script on the ubuntu with a iptables configuration that drops the TCP reset.
Comment 14 Michael Tuexen freebsd_committer freebsd_triage 2017-03-16 12:04:13 UTC
(In reply to Alexandre martins from comment #13)
> I want to remind you that the original client is a smartphone. The first time that
> I saw the problem, I made a tcpdump on the wireless box, not on the
> smartphone itself.

OK. That explains why the client has not processed TCP segments which were reported
as received. They are just dropped between the wireless box and the smartphone.

> The server response may have been delayed into the wifi process (poor signal ?)
> and takes time to reach the phone (but has already been captured into the pcap).
> The phone may have done a re-transmit because it thinks that the http request
> was lost.
I guess this is exactly why the client was retransmitting the complete HTTP request.
Please note that the retransmission looks like the third message (an ACK) of the
three way handshake. It only contains in addition some data and has the PSH bit set.
Therefore it looks like such a handshake message and the server accepts it and
establishes the TCP connection (again). 

>I just managed that to reproduce it through the scapy script on the ubuntu with
>a iptables configuration that drops the TCP reset.
The packetdrill script allows you to reproduce the double accept behaviour
on a FreeBSD head system. I used that to figure out why it happens.
Comment 15 Sepherosa Ziehau 2017-03-16 12:36:27 UTC
(In reply to Michael Tuexen from comment #12)

I think we should drop the ACK w/ data (in this case the HTTP request), if syncookie is going to be used.  This at least fix this issue on our side.
Comment 16 Alexandre martins 2017-03-16 12:56:33 UTC
I think it's not a good idea because, to avoid DDoS attack, firewalls limits the count of SYN packet by seconds. If the connection can be re-openned with a simple ACK, how can we mitigate DDoS ?
Comment 17 Sepherosa Ziehau 2017-03-16 12:59:11 UTC
(In reply to Alexandre martins from comment #16)

Well, ACK w/ data in the 3-way handshake is already _not_ a normal case; I am not sure about fast-open though.  But for non-fast-open cases, we definitely want to drop ACK w/ data in 3-way handshake.
Comment 18 Michael Tuexen freebsd_committer freebsd_triage 2017-03-16 13:55:17 UTC
(In reply to Sepherosa Ziehau from comment #17)
Having data on the third message is fine by the TCP specification. I don't think we should drop them in general.
Comment 19 Michael Tuexen freebsd_committer freebsd_triage 2017-03-16 14:01:51 UTC
(In reply to Alexandre martins from comment #16)
If you don't want to use syn-cookies, you can disable them by setting the sysctl variable net.inet.tcp.syncookies to 0.
Comment 20 Michael Tuexen freebsd_committer freebsd_triage 2017-03-16 14:24:40 UTC
One could add some additional logic to only accept syn-cookies if either syn-caches are not used or if they are used, there has been recently some entries dropped due to memory shortage.

That would fix this issue when the syncache did not overflow recently.
Comment 21 Sepherosa Ziehau 2017-03-17 01:42:44 UTC
(In reply to Michael Tuexen from comment #18)

Hmm, do any OS's TCP stacks really send data along w/ the last ACK in the 3-way handshake at all?

And if we are checking the syncookie, it indicates we suffered short-of-memory previously.  I don't think we need to record that situation.

Let me rephrase my original suggestion (I am not a native speaker, sorry):
Drop the last ACK w/ data, before syncookie is going to be checked.
Comment 22 Michael Tuexen freebsd_committer freebsd_triage 2017-03-17 07:03:11 UTC
(In reply to Sepherosa Ziehau from comment #21)
> Hmm, do any OS's TCP stacks really send data along w/ the last ACK in the
> 3-way handshake at all?
Assume the the initial ACK without data being the third message of the handshake is
lost and the client will send data, like an HTTP client. Then the server will get
exactly such a packet. This will happen with most OSes, I guess.

> And if we are checking the syncookie, it indicates we suffered short-of-memory
> previously.  I don't think we need to record that situation.
This is not true. We use the syncookie if we don't find an entry in the syncache.
The reason can be that we had an overflow, but it can also be that we don't find
it because we already used the entry to setup a connection. This is the code path
we use in the traces provided.
Comment 23 Sepherosa Ziehau 2017-03-17 07:30:23 UTC
(In reply to Michael Tuexen from comment #22)

Thank you for the explanation for the ACK w/ data.


As for syncookie usage.  If we use syncookie it probably means two things:
- No INPCB for the connection.  That's why we enter syncache/syncookie code in the first place.
- No syncache.  Short of memory could be one of the cause.  The connections was established and dropped (this bug report).

I think it's acceptable and reasonable heuristic to just drop the ACK w/ data in side syncache before checking the syncookie.
Comment 24 slw 2017-03-17 10:06:13 UTC
(In reply to Michael Tuexen from comment #12)

> When the server gets this TCP ACK segment, it finds a listening socket and verifies that the segment is a valid SYN-cookie.

can you explain some more?
why SYN-cookie is valid for this ACK? I mean only first ACK to SYN-ACK can valid, other ACK use different SEQ and must be invalid.
Comment 25 Alexandre martins 2017-03-17 10:46:35 UTC
(In reply to slw from comment #24)
In fact, the first ACK is replayed.

-> SYN (seq 1)
<- SYN/ACK (seq 80 ACK 1)
-> ACK (seq 1 ACK 81)

-> [DATA] (seq 1 ACK 81)
<- [DATA] (delayed/lost/ignored/...)
<- FIN (delayed/lost/ignored/...)
-> [DATA]
<- RST => the connection disappear from server (delayed/lost/ignored/...)
-> [DATA: Replay first packet] (seq 1 ACK 81) => Reopen the connection without SYN + SYN/ACK !
Comment 26 slw 2017-03-17 11:17:53 UTC
(In reply to Alexandre martins from comment #25)

Ah, I see. Like server to early discard inpcb for this connection/do incorrect state transmission (need some like CLOSE_WAIT for 2msl, I mean).

For existing inpcp in CLOSE_WAIT state RST must generated with ACK (I mean RST w/o ACK filtered by some firewals) and retransmited data also will be replayed by correct FIN/ACK and server replay.
Comment 27 Michael Tuexen freebsd_committer freebsd_triage 2017-03-17 15:53:43 UTC
(In reply to slw from comment #24)
Consider a client doing a connect() call followed by a send() call. The connect call triggers the three way handshake. Assume that the third message is lost. So the send call triggers the sending of a TCP segment with the same sequence number as the third message of the handshake and it contains data. From the server perspective this TCP segment passes the syncookie checks.
Comment 28 Julian Elischer freebsd_committer freebsd_triage 2017-03-19 09:59:41 UTC
to me it looks like the client is implementing TTCP.
Comment 29 Michael Tuexen freebsd_committer freebsd_triage 2017-03-19 13:13:33 UTC
(In reply to Julian Elischer from comment #28)
No, I think even FreeBSD would behave the same as a client if the client does
socket()
connect()
send(part1)
send(part2)
and the third segment from the three way handshake is lost and both segments triggered by the send() calls are lost. It will retransmit the whole data after a timeout.
Comment 30 slw 2017-03-19 13:26:14 UTC
(In reply to Michael Tuexen from comment #29)

I am ask again about CLOSE_WAIT state and retransmit lost replay.
Comment 31 Michael Tuexen freebsd_committer freebsd_triage 2017-03-19 14:04:54 UTC
(In reply to slw from comment #30)
The client retransmits the whole data and only acks the SYN. So he hasn't processed the FIN sent by the server. All that has been dropped. So I think the client is not in CLOSE_WAIT, but in ESTABLISED.
Comment 32 slw 2017-03-19 14:13:40 UTC
server CLOSE_WAIT, not client.
after application close socket server must wait for acknowledgment sended data and resend it if need.
So client retransmit must be routed to existing inpcb in CLOSE_WAIT state and server must resend lost segments and FIN again. Not RST, not droped incoming segments.
Comment 33 Michael Tuexen freebsd_committer freebsd_triage 2017-03-19 17:10:55 UTC
(In reply to slw from comment #32)
The server does
socket()
listen()
accept()
read()
write()
close()
which means it reads the first part of the request, sends the response (en error message) and closes the socket. Then the state is FIN-WAIT-1. Then it receives the
second part of the request. Since the socket is closed, it responds with a RST segment and moves the connection to CLOSED. This covers the first TCP connection of the two.

After that that the client retransmits the complete request, which results in establishing the second TCP connection (via the syncookie processing). The server sends a response (the same error message) and closes the socket. So it is again in FIN-WAIT-1. The FIN gets acked by the client. So the server goes into FIN-WAIT-2. Then the client sends its fin. This is processed by the server and acked. Now the server is in TIME-WAIT. However, the ACK is not received by the client and the client retransmits the FIN-ACK.

So I don't see any end point moving into CLOSE-WAIT state...
Comment 34 slw 2017-03-19 18:03:30 UTC
(In reply to Michael Tuexen from comment #33)

Right, FIN_WAIT_1.

> Then the state is FIN-WAIT-1. Then it receives the second part of the request. Since the socket is closed, it responds with a RST segment and moves the connection to CLOSED.

I am don't understund this. CLOSING must be only after got FIN.
FIN don't received, data in buffer don't ack, what reasson to CLOSING transition and lost data?

I mean resend response and FIN again.

"If a socket structure is associated with the file structure, soclose is called, f_data is cleared, and any posted error is returned. 

soclose Function

This function aborts any connections that are pending on the socket (i.e., that have not yet been
accepted by a process), waits for data to be transmitted to the foreign system, and releases the data
structures that are no longer needed.
soclose is shown in Figure 15.39."

"In general, TCP must still process incoming data,
so it calls tcpacked to handle incoming acknowledgements, tcpdata to process data in the segment, and tcpswindow to adjust its sending window size. Once the input has been processed, tcpfin1 checks to see if it should make a state transition. If the TCBF_RDONE bit is set, a FIN has arrived and so has all data in the sequence up to the FIN. If the TCPF_FIN bit is cleared, an ACK has arrived for the outgoing FIN. Tcpfin1 uses these two bits to determine whether to make a transition to the CLOSING, FINWAIT2, or TIMEWAIT states"
Comment 35 Michael Tuexen freebsd_committer freebsd_triage 2017-03-19 18:32:06 UTC
(In reply to slw from comment #34)
> I am don't understund this. CLOSING must be only after got FIN.
> FIN don't received, data in buffer don't ack, what reasson to CLOSING transition
> and lost data?
I haven't said that the state CLOSING is ever used. Not sure why you bring this up...

> I mean resend response and FIN again.
This doesn't happen because NEW data arrives and the application has called close().
So this new data cannot be delivered to the application. The TCP stack has do drop it. That is why a RST is sent and the connection is moved to CLOSED.

I"m not sure which text you are citing, but are you trying to argue that a TCP stack should drop data and not signal this to the other side? Just letting the peer think everything is fine?
Comment 36 slw 2017-03-19 18:49:16 UTC
(In reply to Michael Tuexen from comment #35)

> This doesn't happen because NEW data arrives and the application has called
close().
So this new data cannot be delivered to the application. The TCP stack has do
drop it. That is why a RST is sent and the connection is moved to CLOSED.

I mean in this case RST must send in sequence, w/ ACK set and ACKed recv data.
And don't moved to CLOSED.

RFC793:

"
3.5.  Closing a Connection

  CLOSE is an operation meaning "I have no more data to send."  The
  notion of closing a full-duplex connection is subject to ambiguous
  interpretation, of course, since it may not be obvious how to treat
  the receiving side of the connection.  We have chosen to treat CLOSE
  in a simplex fashion.  The user who CLOSEs may continue to RECEIVE
  until he is told that the other side has CLOSED also.  Thus, a program
  could initiate several SENDs followed by a CLOSE, and then continue to
  RECEIVE until signaled that a RECEIVE failed because the other side
  has CLOSED.  We assume that the TCP will signal a user, even if no
  RECEIVEs are outstanding, that the other side has closed, so the user
  can terminate his side gracefully.  A TCP will reliably deliver all
  buffers SENT before the connection was CLOSED so a user who expects no
  data in return need only wait to hear the connection was CLOSED
  successfully to know that all his data was received at the destination
  TCP.  Users must keep reading connections they close for sending until
  the TCP says no more data.

  Case 1:  Local user initiates the close

    In this case, a FIN segment can be constructed and placed on the
    outgoing segment queue.  No further SENDs from the user will be
    accepted by the TCP, and it enters the FIN-WAIT-1 state.  RECEIVEs
    are allowed in this state.  All segments preceding and including FIN
    will be retransmitted until acknowledged.  When the other TCP has
    both acknowledged the FIN and sent a FIN of its own, the first TCP
    can ACK this FIN.  Note that a TCP receiving a FIN will ACK but not
    send its own FIN until its user has CLOSED the connection also.
"
Comment 37 Michael Tuexen freebsd_committer freebsd_triage 2017-03-19 20:17:31 UTC
(In reply to slw from comment #36)
Please note that RFC 793 defines the CLOSE operation as "I have no more data to send."
So this does NOT map to the close() system call in the socket API, but to the shutdown(..., SHUT_WR) system call. When doing this, you can continue to receive data.
But I guess the server call close(), not shutdown() in the case we are looking at. That is why the RST gets send.
Comment 38 slw 2017-03-19 21:01:55 UTC
(In reply to Michael Tuexen from comment #37)

RST+ACK, in sequence of data.
And don't do transition from FIN-WAIT-1 until got ACK to FIN.

"All segments preceding and including FIN will be retransmitted until acknowledged." This is irrelevant to close()/shutdown().

Only after acknowledged FIN RST may generated and this RST may be w/ ACK set (because this state is "state except SYN-SENT").
Comment 39 Michael Tuexen freebsd_committer freebsd_triage 2017-03-19 23:13:23 UTC
(In reply to slw from comment #38)
The text you are referring to deals with the case that the application has called shutdown(..., SHUT_WR), not close(). So the text doesn't apply here...
Comment 40 Sepherosa Ziehau 2017-03-20 02:35:53 UTC
(In reply to Michael Tuexen from comment #33)

I think the code that transit the FIN-WAIT-1 to CLOSED is this:
        /*
         * If new data are received on a connection after the
         * user processes are gone, then RST the other end.
         */
        if ((so->so_state & SS_NOFDREF) &&
            tp->t_state > TCPS_CLOSE_WAIT && tlen) {
                KASSERT(ti_locked == TI_RLOCKED, ("%s: SS_NOFDEREF && "
                    "CLOSE_WAIT && tlen ti_locked %d", __func__, ti_locked));
                INP_INFO_RLOCK_ASSERT(&V_tcbinfo);

                if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
                        log(LOG_DEBUG, "%s; %s: %s: Received %d bytes of data "
                            "after socket was closed, "
                            "sending RST and removing tcpcb\n",
                            s, __func__, tcpstates[tp->t_state], tlen);
                        free(s, M_TCPLOG);
                }
                tp = tcp_close(tp);
                TCPSTAT_INC(tcps_rcvafterclose);
                rstreason = BANDLIM_UNLIMITED;
                goto dropwithreset;
        }

I don't completely understand the background for the ->CLOSED transition here.
RFC 793 on page 36 seems to say:

    If an incoming segment has a security level, or compartment, or
    precedence which does not exactly match the level, and compartment,
    and precedence requested for the connection,a reset is sent and
    connection goes to the CLOSED state.  The reset takes its sequence
    number from the ACK field of the incoming segment.

I don't think we actually implemented any security level/compartment/precedence.  So do we really need to do the ->CLOSED transition here?

Thanks,
sephe
Comment 41 Mike Karels freebsd_committer freebsd_triage 2017-03-20 05:10:54 UTC
Yes, if new data are received after the close, there is no way to deliver data anywhere.  If we ack it, the peer may just keep sending data, the window may go closed, and the peer could probe it forever. The appropriate response is an RST. And the connection can't do anything further, so CLOSED is the correct state.

It seems to me that this situation is an unavoidable flaw of syn cookies.
Comment 42 Sepherosa Ziehau 2017-03-20 05:38:27 UTC
(In reply to Mike Karels from comment #41)

Would transition from {FIN-WAIT-1,FIN-WAIT-2,CLOSING} to TIME-WAIT for this code segment be more "appropriate" than transition to CLOSED?
Comment 43 Alexandre martins 2017-03-20 09:06:20 UTC
I tested on Linux. Its TCP stack just continue to reset the connection by increment its sequence.

-> SYN (seq 1)
<- SYN/ACK (seq 80 ACK 1)
-> ACK (seq 1 ACK 81)

-> [DATA] (seq 1 ACK 81)
<- [DATA 100] (seq 81)
<- FIN (delayed/lost/ignored/...)
-> [DATA]
<- RST (seq 181)
-> [DATA]
<- RST (seq 182)
-> [DATA]
<- RST (seq 183)
-> [DATA]
<- RST (seq 184)
-> [DATA]

...
Comment 44 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 12:21:10 UTC
(In reply to Alexandre martins from comment #43)
Please share a .pcap file.
Comment 45 slw 2017-03-20 12:45:53 UTC
(In reply to Alexandre martins from comment #43)

Can you do additional tests in this setup?

1. Lost DATA from server, FIN and first RST
2. Lost DATA from server, FIN and TWO first RST
Comment 46 Alexandre martins 2017-03-20 13:42:54 UTC
Created attachment 180996 [details]
Linux server, short delay between header and data

Capture using the scapy script.
A short delay (<200ms) is done between header and data
Comment 47 Alexandre martins 2017-03-20 13:44:06 UTC
Created attachment 180997 [details]
Linux server, long delay between header and data

Capture using the scapy script.
A short delay (>400ms) is done between header and data

We can see that the Linux stack don't do the same thing ...
Comment 48 slw 2017-03-20 14:17:59 UTC
(In reply to Mike Karels from comment #41)

> Yes, if new data are received after the close, there is no way to deliver data anywhere.  If we ack it, the peer may just keep sending data, the window may go closed, and the peer could probe it forever. The appropriate response is an RST. And the connection can't do anything further, so CLOSED is the correct state.

RFC requried to retransmit data until acked. No matter how to application close socket (do you realy mean this retransmit must be depeneds on apllication reading socket? this is strange and irrational). Yes, ack all received data may be not good. Not sure. May be don't ack it before got ACK to FIN? And only after ACKed FIN generate RST+ACK (RST MUST be w/ ACK, w/o correct ACK RST ignored by Linux and Windows, I am check this before. And accepte such RST for live connection by FreeBSD is bug, violate RFC and security issuse).

Generated RST before ACKed FIN don't allow to make sure about "All segments preceding and including FIN will be retransmitted until acknowledged."

> It seems to me that this situation is an unavoidable flaw of syn cookies.

Please, ignore syn cookies here. See only to "last data chunk not acked, FIN lost".
Comment 49 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 14:35:48 UTC
Created attachment 180999 [details]
packetdrill script to show how Linux handles data on closed socket
Comment 50 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 14:37:49 UTC
(In reply to Michael Tuexen from comment #49)
I have added a packetdrill script to show how Linux handles the reception of data when the user called closed before the data is received. It does the same a FreeBSD (as I would expect): It sends a reset and closes the socket. If the peer retransmits the data, a new RST is generated.
Comment 51 slw 2017-03-20 14:46:24 UTC
(In reply to Michael Tuexen from comment #50)

===
+0.00 write(4, ..., 395) = 395
+0.00 > P. 1:396(395) ack 46 win 58
+0.00 close(4) = 0
+0.00 > F. 396:396(0) ack 46 win 58
+0.00 < P. 46:54(8) ack 396 win 8192
===

this is different case: remote side ACK response from server.
for correct comparation "+0.00 > P. 1:396(395) ack 46 win 58" must be lost
Comment 52 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 14:54:39 UTC
I ran the script on Ubuntu 16.10 (kernel 4.8).
Not sure if the behaviour depends on the version...
@Alexandre: Can you figure out whether the server calls shutdown() or close()?
Not sure why the behaviour is different.
Comment 53 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 14:59:20 UTC
Created attachment 181000 [details]
Updated packetdrill script to show how Linux handled data on closed socket

This corrects the bug in the script that the data from the server was acked.
Comment 54 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 15:00:05 UTC
(In reply to slw from comment #51)
Right. I updated the script. The behaviour hasn't changed.
Comment 55 Alexandre martins 2017-03-20 15:01:18 UTC
(In reply to Michael Tuexen from comment #52)

In all of my tests, the server call close() because of the error. I didn't check the shutdown(.., SHUT_WR) case.
Comment 56 slw 2017-03-20 15:09:42 UTC
(In reply to Michael Tuexen from comment #54)

1. Linux generated RST w/ ACK. socket don't moved to CLOSED state?

2. This is also wrong behavior: server try to reset connection, client ignore this (by wrong ACK on RST packet). Also, data is lost.
Comment 57 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 15:29:31 UTC
(In reply to Alexandre martins from comment #55)
Great. That is what I'm also testing. It seems that the trace with the long delay shows the same as my testing. However, the trace with a short delay is different. Not sure why. And I don't see a reason why they should be different...
Comment 58 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 15:32:23 UTC
(In reply to slw from comment #56)
> 1. Linux generated RST w/ ACK. socket don't moved to CLOSED state?
Linux generates a RST, not a RST-ACK. I guess it is moved to the CLOSED state. Can't check.

> 2. This is also wrong behavior: server try to reset connection, client ignore
> this (by wrong ACK on RST packet). Also, data is lost.
You are right, user data is lost (the second part of the request). There is no way
around that. since the user closed the socket before the data arrived. That is why the server sends a RST.
Comment 59 slw 2017-03-20 15:35:22 UTC
(In reply to Michael Tuexen from comment #58)

> Linux generates a RST, not a RST-ACK. 

May bad, yes

> You are right, user data is lost (the second part of the request). There is no way around that. since the user closed the socket before the data arrived. That is why the server sends a RST.

No, different data: server response. And no reasson to not deliver/resend it to client.
Comment 60 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 15:43:06 UTC
(In reply to slw from comment #59)
> No, different data: server response. And no reasson to not deliver/resend
> it to client.
Well, the server knows that he has to drop incoming data, since the application has
closed the socket and there is new data to deliver. I guess we agree on that.
The reaction is to let the peer now as soon as possible that something went wrong.
I guess this is what we might no agree upon. 
However this is the reason why the server sends a RST without trying to retransmit
data he has sent but which has not been acknowledged by the peer.
Comment 61 slw 2017-03-20 15:58:57 UTC
(In reply to Michael Tuexen from comment #60)

> Well, the server knows that he has to drop incoming data, since the application
> has closed the socket and there is new data to deliver. I guess we agree on that.

Agree.

> The reaction is to let the peer now as soon as possible that something went wrong.
> I guess this is what we might no agree upon. 
> However this is the reason why the server sends a RST without trying to retransmit
> data he has sent but which has not been acknowledged by the peer.

Now server have several task:

1. retransmit his response and (optional) FIN
2. let the peer that something went wrong.

I am don't see ideal answer to this.

Send RST now may be don't allow to be sure about acknowledged server data. Not sure. Need some tests: retransmit data, retransmit FIN, send RST+ACK, SEQ=FIN.SEQ+1, see reaction from remote: got ACK to data? got ack to FIN?

In any cases retransmit is mandatory, by RFC. Some tricks to discard client data may be used here (zero window? silent discard and only SEQ/ACK analize? ACk and discard? send RST+ACK (as you see, RST w/o ACK ignored by client) after client acknowledged server data?
Comment 62 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 16:19:49 UTC
(In reply to slw from comment #61)
I don't see why just sending a RST is not allowed. The RFC describes a graceful termination of a TCP connection. We are in a situation where it is not graceful.
It is more like handling the ABORT primitive...
Comment 63 slw 2017-03-20 16:29:18 UTC
(In reply to Michael Tuexen from comment #62)

Accepted RST can cause silent, immediatly closing connection by client. No acknowledged will be sent to server (in case of serevr responce will be delivered to client) and server will be retransmit response until also got RST. In case of responce not delivered to client responce will be lost and no chance to deleivered.

Sending RST is ok in case RST accepted and processed after client sent ACK so server FIN (FIN, not server data). Need test this on real clients (Linux, windows, android and iOS).

Server can do graceful termination of this TCP connection and no reason to don't do this (application do gracefull close and only rare combination of packet loss case this behaviour).
Comment 64 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 16:47:29 UTC
(In reply to slw from comment #63)
> Accepted RST can cause silent, immediatly closing connection by client.
> No acknowledged will be sent to server (in case of serevr responce will be
> delivered to client) and server will be retransmit response until also got RST.
No. The server sends a RST and move the connection to CLOSED. The connection is
terminated immediately from the server perspective.
> In case of responce not delivered to client responce will be lost and no chance
> to deleivered.
> Sending RST is ok in case RST accepted and processed after client sent ACK
> so server FIN (FIN, not server data). Need test this on real clients
> (Linux, windows, android and iOS).
No. This is an ungraceful termination. No need to wait for anything. All peers should
handle the RST.

> Server can do graceful termination of this TCP connection and no reason to
> don't do this (application do gracefull close and only rare combination
> of packet loss case this behaviour).
No. Incoming user data is lost on the server side. I think the server should notify
the peer as soon as possible. Please note that the RST is only sent if the server
has to drop user data. If that is not the case, a graceful shutdown will be performed.
Comment 65 slw 2017-03-20 16:54:33 UTC
(In reply to Michael Tuexen from comment #64)

> No. The server sends a RST and move the connection to CLOSED. The connection is
> terminated immediately from the server perspective.

This is wrong behaviour. This is cause lost of server data.

> No. This is an ungraceful termination. No need to wait for anything. All peers
> should handle the RST.

No.

pwrite();
close();

This is graceful termination.

> No. Incoming user data is lost on the server side. I think the server should
notify
> the peer as soon as possible. Please note that the RST is only sent if the
server
> has to drop user data. If that is not the case, a graceful shutdown will be
performed.

User data is second question for server. Main question is resend server respond to client. This is RFC requiriment. And this is more impotant for user -- see respond from server, not just droped connection.

Server can send RST after client ACK server respose and server FIN.
Comment 66 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 17:31:08 UTC
(In reply to slw from comment #65)
> This is wrong behaviour. This is cause lost of server data.
No, the loss of data is caused by the application calling close() *before* incoming user data arrived. So the TCP stack on the server has to drop that user data.

> pwrite();
> close();

> This is graceful termination.
Sure. This is what the application triggers. However, when user data arrives after
the close call, this gets ungraceful, since this user data can't be delivered to
the user anymore. To avoid this, the application could call shutdown(..., SHUT_WR)
to trigger the sending of the FIN, then process incoming data until a FIN from the
peer arrives and then calling close(). But the application didn't. This would be
more in line of how RFC 793 describes a connection termination. Please note that
the CLOSE primitive in RFC 793 maps to a shutdown(..., SHUT_WR) system call, not
to the close() system call. Bad naming... 

I don't see text in RFC 793, where it is required that you continue to process
a connection after you know that it failed. I think the RFC doesn't cover the case
where the application says "I don't want to receive anymore".
Comment 67 slw 2017-03-20 18:47:49 UTC
(In reply to Michael Tuexen from comment #66)

> No, the loss of data is caused by the application calling close() *before* 
> incoming user data arrived.

Loss of sended application data caused by close() before incoming user data arrived? Realy? TCP don't have such claim.

> So the TCP stack on the server has to drop that user data.

No problem. This is not application data.

> Sure. This is what the application triggers. However, when user data arrives after
> the close call, this gets ungraceful, since this user data can't be delivered to
> the user anymore.

I am not afraid about user data, I am afraid about server data. Server data lost and this is problem.
No problem about lose user data.

> I don't see text in RFC 793, where it is required that you continue to process
> a connection after you know that it failed. 

What reason to failed connection?

> I think the RFC doesn't cover the case
> where the application says "I don't want to receive anymore".

Yes, RFC says all CLOSE is 'means "I have no more to send" but does not mean "I will not receive any more."'

I mean "Thus, it should be acceptable to make several SEND calls, followed by a CLOSE, and expect all the data to be sent to the destination." have precendece over all.

"Reset Generation" allow generation RST from ESTABLISHED/FIN-WAIT-1/fIN-WAIT-2 state only "If an incoming segment has a security level, or compartment, or precedence which does not exactly match the level"

Also, "3.9.  Event Processing", "SEGMENT ARRIVES" don't generate RST in ESTABLISHED/FIN-WAIT-1/FIN-WAIT-2 STATE.

I mean conection not in failed state until all server data and FIN will be ACKed. Or at timeout.
After this, connection may be moved to failed state. Not until.

PS: May proposal resolve next issuse:

1. server response not lost any more
2. client see valid replay from server, not just 'connection droped'
3. late segments from client don't mach syn-cookei and don't re-open connection
4. no new restrictions for first segment syn cookie processing
5. client side of connection clearly closed (currently generated RST w/o ACK ignored by all clients except FreeBSD. this is another bug)
6. this is compatible w/ existing applications
Comment 68 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 20:44:27 UTC
(In reply to slw from comment #67)
> Loss of sended application data caused by close() before incoming user
> data arrived? Realy? TCP don't have such claim.
Hmm. My understanding is that TCP provides a reliable bi-directional byte-stream.
This means that no user data is lost in any of the two directions.
> No problem. This is not application data.
For me it doesn't matter in which direction user data is dropped. If that happens,
TCP fails.
> I am not afraid about user data, I am afraid about server data.
> Server data lost and this is problem.
> No problem about lose user data.
I don't agree. TCP does not provide a service which is reliable in one direction and
unreliable in the other. If user data is dropped, the connection failed.

RFC 793 only talks about the CLOSE primitive, which shuts down only the writing direction, not the reading one. This is not related to the close() call, the program
is using. The problem comes up because the reading direction is shut down. This is
not described in the RFC. Therefore you have to go back to the high level description
of the service TCP provides.
Comment 69 slw 2017-03-20 21:00:38 UTC
(In reply to Michael Tuexen from comment #68)

> Hmm. My understanding is that TCP provides a reliable bi-directional byte-stream.
> This means that no user data is lost in any of the two directions.

"Lost" in this context is "not delivered to remote IP stack". Not application, IP stack.
Remote system ACK data after place in system buffer, not after reading by application.
I.e. while system can accept data and generate ACK -- all ok. Data delivered.
Comment 70 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 21:12:16 UTC
(In reply to slw from comment #69)
> "Lost" in this context is "not delivered to remote IP stack".
> Not application, IP stack.
The semantic of an ACK is "delivered to the remote TCP stack. Not received by the
application.
But the service is from app to app. In my understanding the TCP should signal
to the peer if it knows that some data will not be delivered to the app. This
signalling is sending an RST.
> Remote system ACK data after place in system buffer, not after reading
> by application.
> I.e. while system can accept data and generate ACK -- all ok. Data delivered.
That is fine by the semantic of an ACK.
But my point is that a TCP connection fails if some data can't be delivered to
the application.
Comment 71 slw 2017-03-20 21:22:54 UTC
(In reply to Michael Tuexen from comment #70)

RFC: "An acknowledgment by TCP does not guarantee that the data has been
  delivered to the end user, but only that the receiving TCP has taken
  the responsibility to do so."

Nothing about failed delivered to application. This information is unavailable on TCP level.
Comment 72 Michael Tuexen freebsd_committer freebsd_triage 2017-03-20 21:40:27 UTC
(In reply to slw from comment #71)
I'm not arguing the semantic of the TCP level ACKs.

I'm saying that if a TCP stack knows that some user data can not be delivered
to its application, it should signal that to the peer.

This includes:
* receiving data after the socket was closed or shutdown(..., SHUT_RD) was called.
* closing the socket or calling shutdown(..., SHUT_RD) with user data still in the
  receive buffer.

I think:
* these four cases should be handled the same way.
* Resetting the TCP connection is what I would expect.

I can write some packetdrill test scripts to see what the current behaviour is
under FreeBSD (and Linux).
Comment 73 Sepherosa Ziehau 2017-03-21 01:23:10 UTC
(In reply to Michael Tuexen from comment #72)

IMO, transition from {FIN-WAIT-1,FIN-WAIT-2,CLOSING} to TIME_WAIT instead of CLOSED upon receiving data from a closed socket can partially cover this case by leveraging the 2MSL waiting, i.e. keep sending RST.  Once 2MSL is done, syncookie kicks in, which is unfixable.
Comment 74 Mike Karels freebsd_committer freebsd_triage 2017-03-21 03:08:30 UTC
In responding to an earlier comment directed to me, and to the comments since:

This connection is dead.  Only an RST and CLOSED state apply.  Data cannot be delivered to the application, which has lost the handle to this connection, and may not even be alive in some cases.  TCP cannot wait for a FIN, for example, because there may be gigabytes of data to deliver before the FIN, and there is no where for them to go.

fwiw, I added the original version of this code about 30 years ago.  I believe it was in 4.3BSD.  It is still correct.
Comment 75 Michael Tuexen freebsd_committer freebsd_triage 2017-03-21 07:34:24 UTC
(In reply to Mike Karels from comment #74)
I agree completely with Mike.
Comment 76 slw 2017-03-21 10:57:49 UTC
(In reply to Mike Karels from comment #74)

Im don't talk "TCP wait FIN", only "TCP wait ACK=SERVER_FIN.SEQ". Server TCP have sending and not confirmed data, what reason to discard this server response?

After got ACK=FIN.SEQ no objection to send RST.

About incoming data to server. Is close(fd) equalent to sutdown(fd, SHUT_RDWR)?
If yes -- incoming data can acknowledged and then silently discarded:

==
SHUT_RD
The read half of the connection is closed— No more data can be received on the
socket and any data currently in the socket receive buffer is discarded. The
process can no longer issue any of the read functions on the socket. Any data
received after this call for a TCP socket is acknowledged and then silently
discarded.

SHUT_WR
The write half of the connection is closed— In the case of TCP, this is called a
half-close (Section 18.5 of TCPv1). Any data currently in the socket send buffer
will be sent, followed by TCP's normal connection termination sequence. As we
mentioned earlier, this closing of the write half is done regardless of whether or
not the socket descriptor's reference count is currently greater than 0. The
process can no longer issue any of the write functions on the socket.

SHUT_RDWR
The read half and the write half of the connection are both closed— This is
equivalent to calling shutdown twice: first with SHUT_RD and then with SHUT_WR.
==
Comment 77 Sepherosa Ziehau 2017-03-22 08:06:12 UTC
(In reply to Mike Karels from comment #74)

Is this {FIN-WAIT-1,FIN-WAIT-2,CLOSING} to CLOSED transition by the quoted code kinda similar to TWA (RFC1337)?  And would it suffer the same hazards listed in that RFC?

Thanks,
sephe
Comment 78 Mike Karels freebsd_committer freebsd_triage 2017-03-24 00:50:25 UTC
(In reply to Sepherosa Ziehau from comment #77)

I don't think it is similar, although I haven't thought it through in great detail.  For starters, there is no TIME_WAIT to assassinate.  I don't see how the situations would be similar.

The only way I can think of to stop this would be to invent a new state, similar to TIME_WAIT but not awaiting only FIN, that would remember that the connection has terminated unsuccessfully.  I don't think this is warranted.
Comment 79 Sepherosa Ziehau 2017-03-24 02:26:46 UTC
(In reply to Mike Karels from comment #78)

Hmm, WAIT1, WAIT2 and CLOSING will transit to TIME_WAIT, so killing them (moving to CLOSED) in the code we talked assassinates TIME_WAIT indirectly.

Or add a flag to tcpcb for TIME_WAIT state.
Comment 80 Michael Tuexen freebsd_committer freebsd_triage 2017-04-04 09:56:23 UTC
I put a patch under review: https://reviews.freebsd.org/D10272

If syncookies are used in case of an overflow of the syncache, its usage is restricted to the case where there was an overflow recently. This mitigates the problem describe in the PR.

I'm still investing the behaviour in case of close()/shutdown(,SHUT_RD) and received user data.
Comment 81 commit-hook freebsd_committer freebsd_triage 2017-04-20 19:20:34 UTC
A commit references this bug:

Author: tuexen
Date: Thu Apr 20 19:19:34 UTC 2017
New revision: 317208
URL: https://svnweb.freebsd.org/changeset/base/317208

Log:
  Syncoockies can be used in combination with the syncache. If the cache
  overflows, syncookies are used.
  This patch restricts the usage of syncookies in this case: accept
  syncookies only if there was an overflow of the syncache recently.
  This mitigates a problem reported in PR217637, where is syncookie was
  accepted without any recent drops.
  Thanks to glebius@ for suggesting an improvement.

  PR:			217637
  Reviewed by:		gnn, glebius
  MFC after:		1 week
  Sponsored by:		Netflix, Inc.
  Differential Revision:	https://reviews.freebsd.org/D10272

Changes:
  head/sys/netinet/tcp_syncache.c
  head/sys/netinet/tcp_syncache.h
Comment 82 Alexandre martins 2017-04-24 14:55:39 UTC
I confirm that the workaround works for my test case.

I let you develop a more complex solution to avoid data loss and/or TCP re-open on heavy load.

Thank you for the patch :)
Comment 83 commit-hook freebsd_committer freebsd_triage 2017-06-01 08:32:54 UTC
A commit references this bug:

Author: tuexen
Date: Thu Jun  1 08:32:35 UTC 2017
New revision: 319400
URL: https://svnweb.freebsd.org/changeset/base/319400

Log:
  MFC r317208:

  Syncoockies can be used in combination with the syncache. If the cache
  overflows, syncookies are used.
  This patch restricts the usage of syncookies in this case: accept
  syncookies only if there was an overflow of the syncache recently.
  This mitigates a problem reported in PR217637, where is syncookie was
  accepted without any recent drops.
  Thanks to glebius@ for suggesting an improvement.

  PR:			217637
  Reviewed by:		gnn, glebius
  Differential Revision:	https://reviews.freebsd.org/D10272

Changes:
_U  stable/11/
  stable/11/sys/netinet/tcp_syncache.c
  stable/11/sys/netinet/tcp_syncache.h
Comment 84 Kubilay Kocak freebsd_committer freebsd_triage 2017-06-02 00:16:12 UTC
Track merge to stable/11
Comment 85 Kubilay Kocak freebsd_committer freebsd_triage 2017-06-02 00:18:28 UTC
Since the original report was on 10.3-RELEASE (comment 4) and 10.4 will be released, request/ask about merge to stable/10.

While I'm here, assign to committer resolving.
Comment 86 Kubilay Kocak freebsd_committer freebsd_triage 2017-06-02 00:18:58 UTC
Previous comment was supposed to read " comment 7 "
Comment 87 Richard Russo 2017-07-24 22:09:05 UTC
Created attachment 184680 [details]
patch to not send acks in this case

We've recently started hitting this at WhatsApp as well. I've applied the syncookie patches from CURRENT to 10.3 manually, and it successfully prevents this from happening as long as the syncache hasn't overflowed recently or been disabled.

Unfortunately, if the syncache does overflow, when this case does happen, once the connection is re-opened, the connection states on each peer are out of sync, and each peer will respond to a packet with unreasonable seq/ack data by sending an empty ack with the current seq/ack; the other peer will find this unreasonable and the resulting packet storms were causing availability problems.

I've attached a patch we've been running on a few machines. With this, when the connections do get into this state, we don't contribute to the packet storm; instead, the connection will end up eventually closing without sending very many packets.

I have some complete connection pcaps available (from before patching), and can share them (after masking IPs and tcp payloads) if they'll be useful. From the traces I've seen, we're getting many retransmits from the peer (or a middlebox), and also the peer ends the connection soon after opening, by sending a FIN. Our host acks the FIN and also closes with a FIN. After the peer's ack of our FIN, we receive a new ACK that's a retransmit of the original ACK, and reopen the connection at the connection original SEQ/ACK, while the peer is in TIME_WAIT at the final SEQ/ACK. In the traces I was able to capture, the peers were mobile devices across the world and on high latency links.
Comment 88 Richard Russo 2017-07-25 04:07:51 UTC
Also, I had intended to include a link to a similar change in the Linux kernel, https://github.com/torvalds/linux/commit/96e0bf4b5193d0d97d139f99e2dd128763d55521 (Although the comments there say this behavior is consistent with the RFC, I don't think it is consistent however, the RFC says, "If the ACK acks something not yet sent (SEG.ACK > SND.NXT) then send an ACK, drop the segment, and return.")
Comment 89 Jonathan T. Looney freebsd_committer freebsd_triage 2017-08-09 01:16:22 UTC
FWIW, I have created a patch to address the behavior reported in comment #87. It maintains the RFC-compliant behavior of sending ACKs, but rate limits them. This isn't perfect, but seems better (to me) than either of the extreme alternatives (drop every packet or ACK every packet). I'd be curious to have someone try this and see if it provides effective mitigation.
Comment 90 Jonathan T. Looney freebsd_committer freebsd_triage 2017-08-09 01:17:06 UTC
(In reply to Jonathan T. Looney from comment #89)
Forgot to include a link to the patch:

https://reviews.freebsd.org/D11929
Comment 91 Richard Russo 2017-08-09 05:20:38 UTC
For us, this may have limited effect, because our icmplim is much higher than the default (16k), because we do want to send closed port RSTs in high volumewhen our service ports are closed. This patch should keep our system available, but we'd still be sending out a rather large number of acks on these broken connections. Given that r317208 means we only see this condition when the syncache overflows, this may be a reasonable trade off.

I have been thinking that it might be useful if these acks (or other unexpected packets) would actually push the connection into one of the existing state recovery behaviors, such as immediately triggering a keepalive sequence if one wasn't already ongoing. That's certainly a larger change though.

I'll get this patch up somewhere and let you know how it goes.
Comment 92 Jonathan T. Looney freebsd_committer freebsd_triage 2017-08-09 13:21:17 UTC
(In reply to Richard Russo from comment #91)
> For us, this may have limited effect, because our icmplim is much higher than the default (16k)
> [...]

Yes, I agree it is a problem you can't set these independently. I had already been planning to propose a change to allow separate limits for all the rate-limited things. You would then be able to set this limit separately.
Comment 93 commit-hook freebsd_committer freebsd_triage 2017-08-09 13:26:51 UTC
A commit references this bug:

Author: tuexen
Date: Wed Aug  9 13:26:12 UTC 2017
New revision: 322315
URL: https://svnweb.freebsd.org/changeset/base/322315

Log:
  MFC r317208:

  Syncoockies can be used in combination with the syncache. If the cache
  overflows, syncookies are used.
  This patch restricts the usage of syncookies in this case: accept
  syncookies only if there was an overflow of the syncache recently.
  This mitigates a problem reported in PR217637, where is syncookie was
  accepted without any recent drops.
  Thanks to glebius@ for suggesting an improvement.

  PR:			217637
  Reviewed by:		gnn, glebius
  Differential Revision:	https://reviews.freebsd.org/D10272

Changes:
_U  stable/10/
  stable/10/sys/netinet/tcp_syncache.c
  stable/10/sys/netinet/tcp_syncache.h
Comment 94 Michael Tuexen freebsd_committer freebsd_triage 2017-08-09 13:29:11 UTC
(In reply to Kubilay Kocak from comment #85)
Merged to stable/10 in r322315.
Comment 95 Richard Russo 2017-08-18 20:31:19 UTC
(In reply to Richard Russo from comment #91)

I've tested the linked patch, manually applied to 10.3, setting syncookie only mode (to trigger the condition), and the icmp limit tuned to a reasonable range, I was able to see the limiting kick in, as expected. This looks good for us (and will be better when rate limits are individually adjustable!)