Created attachment 180626 [details] tcpdump from the client side view Hello, I currently having a strange problem, my web server accept two time the SAME tcp connection. After digging, it's due to a TCP retransmission. There is the TCP scenario: Client | SERVER | Comment SYN -------------> <------------- SYN/ACK ACK -------------> Connection accepted in the server PUSH/ACK --------> POST /someURL HTTP/1.0\r\nContent-length: 10\r\n\r\n <------------- PUSH/ACK 500 error <------------- FIN/ACK Connection closed in the server PUSH/ACK --------> data of the post <------------- RST This reset seems to be lost or filtered by some firewall PUSH/ACK --------> Re-xmit all: POST /someURL HTTP/1.0\r\nContent-length: 10\r\n\r\ndatadatada ?????????? =======> this cause the connection to be RE-ACCEPTED by the server !?!?! ACK -------------> Ack the 500 error message ACK -------------> Ack the FIN <------------- ACK This ACK seems BROKEN. The TCP sequence mismatch And the connection never close properly until the TCP timeout. I attach the tcpdump of this connection from the client side.
If possible, it'd be useful to look at real pcaps (and not text excerpts) from both ends.
Created attachment 180691 [details] Server side view capture
Created attachment 180692 [details] Client side view
I attached the capture of the most simple test that I made to reproduce the bug. The connection pass through a firewall that drop the TCP reset. The server logs that connection is accepted two time.
Adding Michael for inputs as he had done some cleanup in this area. What FreeBSD version are you running on both sides? Is it 10,3R as reported in the bug? Do you mind trying -head to rule out any fixes? Server sends FIN and on the following request sends RST which gets dropped somehow and doesn't reach client. But even after receiving first FIN, client asks for data to which servers reopens the connection (which seems like the real bug), sends data and again sends FIN after that. Now client acks that FIN and sends its own FIN which server acks but client now doesn't like it. So it keeps sending FIN which server keeps acking. Also, client and server seem off by 10seconds?
Which OSes are used on the client and on the server side? Both FreeBSD 10.3?
The server run FreeBSD 10.3. The client (originally) was a Android smartphone. In the capture, I use a Ubuntu 16.10 up-to-date. I cut the capture to avoid to have a big file, but in reality, the fin/ack storm runs about 1 minute. After that a reset is send and the storm ends. I'll try to build a head version today. I hope to give you the result in few hours.
I made the test on the fresh compiled FreeBSD: root@FreeBSD-head:~ # uname -a FreeBSD FreeBSD-head 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r315233+87dae13e7e9(master): Tue Mar 14 10:04:57 CET 2017 root@FreeBSD-head:/usr/obj/root/freebsd/sys/GENERIC i386 The problem is still present, the connection is accepted 2 times by the server. The trace look the same.
Great. Let me try to reproduce the issue with packetdrill. If that works, we have a simple way of debugging the problem.
Created attachment 180814 [details] Scapy client I made a scapy script that produce the bug. This client ignore the RST, but don't produce the storm. To make it working, you must have a server that respond to the post before the end of the data.
Created attachment 180829 [details] packetdrill script to reproduce the duplicate accept problem This is a test script which reproduces the duplicate accept problem on FreeBSD head.
I did some testing and can explain some of the aspects of the traces, especially the ones on the FreeBSD side. The client establishes the TCP connection and send the first fragment of the HTTP request. The server receives this, sends a response which acknowledges the first fragment and closes the socket. This results in sending the FIN-ACK. This is covered in Frame 3 to Frame 8 of the server.pcap file. This looks OK. Now the server receives the second fragment of the HTTP request and responds correctly with a RST segment. This gets dropped by some middlebox and is not received by the client. This is Frame 9 and 10. The client does not have processed the response from the server. I have no idea why. This response also ACKs the first fragment of the HTTP request. Since it was not processed, the client retransmits the complete HTTP request in Frame 11. When the server gets this TCP ACK segment, it finds a listening socket and verifies that the segment is a valid SYN-cookie. Therefore it establishes a new connection, moves it into ESTABLISHED and processes the data. It sends an error message and closes the socket which results in the sending of the FIN-ACK segment. This explains Frame 12 and Frame 13. The peer processes these two segments and sends corresponding ACK-segments in Frame 14 and 15. I guess the client also closes the socket which results in the sending of the FIN-ACK in Frame 16. The server ACKs it in Frame 17. However, this ACK is not processed by the client, therefore the client retransmits its FIN-ACK, which the server ACK again. This is Frame 18 an 19. Since the ACK from the server is never processed, this pattern repeats until the client finally gives up. What I do not understand: Why does the client not process the packets it doesn't process. The client.pcap shows that they are received. The sequence numbers look OK to me. However, the client side is not FreeBSD. So at least the above explains the FreeBSD behaviour.
I want to remind you that the original client is a smartphone. The first time that I saw the problem, I made a tcpdump on the wireless box, not on the smartphone itself. The server response may have been delayed into the wifi process (poor signal ?) and takes time to reach the phone (but has already been captured into the pcap). The phone may have done a re-transmit because it thinks that the http request was lost. I just managed that to reproduce it through the scapy script on the ubuntu with a iptables configuration that drops the TCP reset.
(In reply to Alexandre martins from comment #13) > I want to remind you that the original client is a smartphone. The first time that > I saw the problem, I made a tcpdump on the wireless box, not on the > smartphone itself. OK. That explains why the client has not processed TCP segments which were reported as received. They are just dropped between the wireless box and the smartphone. > The server response may have been delayed into the wifi process (poor signal ?) > and takes time to reach the phone (but has already been captured into the pcap). > The phone may have done a re-transmit because it thinks that the http request > was lost. I guess this is exactly why the client was retransmitting the complete HTTP request. Please note that the retransmission looks like the third message (an ACK) of the three way handshake. It only contains in addition some data and has the PSH bit set. Therefore it looks like such a handshake message and the server accepts it and establishes the TCP connection (again). >I just managed that to reproduce it through the scapy script on the ubuntu with >a iptables configuration that drops the TCP reset. The packetdrill script allows you to reproduce the double accept behaviour on a FreeBSD head system. I used that to figure out why it happens.
(In reply to Michael Tuexen from comment #12) I think we should drop the ACK w/ data (in this case the HTTP request), if syncookie is going to be used. This at least fix this issue on our side.
I think it's not a good idea because, to avoid DDoS attack, firewalls limits the count of SYN packet by seconds. If the connection can be re-openned with a simple ACK, how can we mitigate DDoS ?
(In reply to Alexandre martins from comment #16) Well, ACK w/ data in the 3-way handshake is already _not_ a normal case; I am not sure about fast-open though. But for non-fast-open cases, we definitely want to drop ACK w/ data in 3-way handshake.
(In reply to Sepherosa Ziehau from comment #17) Having data on the third message is fine by the TCP specification. I don't think we should drop them in general.
(In reply to Alexandre martins from comment #16) If you don't want to use syn-cookies, you can disable them by setting the sysctl variable net.inet.tcp.syncookies to 0.
One could add some additional logic to only accept syn-cookies if either syn-caches are not used or if they are used, there has been recently some entries dropped due to memory shortage. That would fix this issue when the syncache did not overflow recently.
(In reply to Michael Tuexen from comment #18) Hmm, do any OS's TCP stacks really send data along w/ the last ACK in the 3-way handshake at all? And if we are checking the syncookie, it indicates we suffered short-of-memory previously. I don't think we need to record that situation. Let me rephrase my original suggestion (I am not a native speaker, sorry): Drop the last ACK w/ data, before syncookie is going to be checked.
(In reply to Sepherosa Ziehau from comment #21) > Hmm, do any OS's TCP stacks really send data along w/ the last ACK in the > 3-way handshake at all? Assume the the initial ACK without data being the third message of the handshake is lost and the client will send data, like an HTTP client. Then the server will get exactly such a packet. This will happen with most OSes, I guess. > And if we are checking the syncookie, it indicates we suffered short-of-memory > previously. I don't think we need to record that situation. This is not true. We use the syncookie if we don't find an entry in the syncache. The reason can be that we had an overflow, but it can also be that we don't find it because we already used the entry to setup a connection. This is the code path we use in the traces provided.
(In reply to Michael Tuexen from comment #22) Thank you for the explanation for the ACK w/ data. As for syncookie usage. If we use syncookie it probably means two things: - No INPCB for the connection. That's why we enter syncache/syncookie code in the first place. - No syncache. Short of memory could be one of the cause. The connections was established and dropped (this bug report). I think it's acceptable and reasonable heuristic to just drop the ACK w/ data in side syncache before checking the syncookie.
(In reply to Michael Tuexen from comment #12) > When the server gets this TCP ACK segment, it finds a listening socket and verifies that the segment is a valid SYN-cookie. can you explain some more? why SYN-cookie is valid for this ACK? I mean only first ACK to SYN-ACK can valid, other ACK use different SEQ and must be invalid.
(In reply to slw from comment #24) In fact, the first ACK is replayed. -> SYN (seq 1) <- SYN/ACK (seq 80 ACK 1) -> ACK (seq 1 ACK 81) -> [DATA] (seq 1 ACK 81) <- [DATA] (delayed/lost/ignored/...) <- FIN (delayed/lost/ignored/...) -> [DATA] <- RST => the connection disappear from server (delayed/lost/ignored/...) -> [DATA: Replay first packet] (seq 1 ACK 81) => Reopen the connection without SYN + SYN/ACK !
(In reply to Alexandre martins from comment #25) Ah, I see. Like server to early discard inpcb for this connection/do incorrect state transmission (need some like CLOSE_WAIT for 2msl, I mean). For existing inpcp in CLOSE_WAIT state RST must generated with ACK (I mean RST w/o ACK filtered by some firewals) and retransmited data also will be replayed by correct FIN/ACK and server replay.
(In reply to slw from comment #24) Consider a client doing a connect() call followed by a send() call. The connect call triggers the three way handshake. Assume that the third message is lost. So the send call triggers the sending of a TCP segment with the same sequence number as the third message of the handshake and it contains data. From the server perspective this TCP segment passes the syncookie checks.
to me it looks like the client is implementing TTCP.
(In reply to Julian Elischer from comment #28) No, I think even FreeBSD would behave the same as a client if the client does socket() connect() send(part1) send(part2) and the third segment from the three way handshake is lost and both segments triggered by the send() calls are lost. It will retransmit the whole data after a timeout.
(In reply to Michael Tuexen from comment #29) I am ask again about CLOSE_WAIT state and retransmit lost replay.
(In reply to slw from comment #30) The client retransmits the whole data and only acks the SYN. So he hasn't processed the FIN sent by the server. All that has been dropped. So I think the client is not in CLOSE_WAIT, but in ESTABLISED.
server CLOSE_WAIT, not client. after application close socket server must wait for acknowledgment sended data and resend it if need. So client retransmit must be routed to existing inpcb in CLOSE_WAIT state and server must resend lost segments and FIN again. Not RST, not droped incoming segments.
(In reply to slw from comment #32) The server does socket() listen() accept() read() write() close() which means it reads the first part of the request, sends the response (en error message) and closes the socket. Then the state is FIN-WAIT-1. Then it receives the second part of the request. Since the socket is closed, it responds with a RST segment and moves the connection to CLOSED. This covers the first TCP connection of the two. After that that the client retransmits the complete request, which results in establishing the second TCP connection (via the syncookie processing). The server sends a response (the same error message) and closes the socket. So it is again in FIN-WAIT-1. The FIN gets acked by the client. So the server goes into FIN-WAIT-2. Then the client sends its fin. This is processed by the server and acked. Now the server is in TIME-WAIT. However, the ACK is not received by the client and the client retransmits the FIN-ACK. So I don't see any end point moving into CLOSE-WAIT state...
(In reply to Michael Tuexen from comment #33) Right, FIN_WAIT_1. > Then the state is FIN-WAIT-1. Then it receives the second part of the request. Since the socket is closed, it responds with a RST segment and moves the connection to CLOSED. I am don't understund this. CLOSING must be only after got FIN. FIN don't received, data in buffer don't ack, what reasson to CLOSING transition and lost data? I mean resend response and FIN again. "If a socket structure is associated with the file structure, soclose is called, f_data is cleared, and any posted error is returned. soclose Function This function aborts any connections that are pending on the socket (i.e., that have not yet been accepted by a process), waits for data to be transmitted to the foreign system, and releases the data structures that are no longer needed. soclose is shown in Figure 15.39." "In general, TCP must still process incoming data, so it calls tcpacked to handle incoming acknowledgements, tcpdata to process data in the segment, and tcpswindow to adjust its sending window size. Once the input has been processed, tcpfin1 checks to see if it should make a state transition. If the TCBF_RDONE bit is set, a FIN has arrived and so has all data in the sequence up to the FIN. If the TCPF_FIN bit is cleared, an ACK has arrived for the outgoing FIN. Tcpfin1 uses these two bits to determine whether to make a transition to the CLOSING, FINWAIT2, or TIMEWAIT states"
(In reply to slw from comment #34) > I am don't understund this. CLOSING must be only after got FIN. > FIN don't received, data in buffer don't ack, what reasson to CLOSING transition > and lost data? I haven't said that the state CLOSING is ever used. Not sure why you bring this up... > I mean resend response and FIN again. This doesn't happen because NEW data arrives and the application has called close(). So this new data cannot be delivered to the application. The TCP stack has do drop it. That is why a RST is sent and the connection is moved to CLOSED. I"m not sure which text you are citing, but are you trying to argue that a TCP stack should drop data and not signal this to the other side? Just letting the peer think everything is fine?
(In reply to Michael Tuexen from comment #35) > This doesn't happen because NEW data arrives and the application has called close(). So this new data cannot be delivered to the application. The TCP stack has do drop it. That is why a RST is sent and the connection is moved to CLOSED. I mean in this case RST must send in sequence, w/ ACK set and ACKed recv data. And don't moved to CLOSED. RFC793: " 3.5. Closing a Connection CLOSE is an operation meaning "I have no more data to send." The notion of closing a full-duplex connection is subject to ambiguous interpretation, of course, since it may not be obvious how to treat the receiving side of the connection. We have chosen to treat CLOSE in a simplex fashion. The user who CLOSEs may continue to RECEIVE until he is told that the other side has CLOSED also. Thus, a program could initiate several SENDs followed by a CLOSE, and then continue to RECEIVE until signaled that a RECEIVE failed because the other side has CLOSED. We assume that the TCP will signal a user, even if no RECEIVEs are outstanding, that the other side has closed, so the user can terminate his side gracefully. A TCP will reliably deliver all buffers SENT before the connection was CLOSED so a user who expects no data in return need only wait to hear the connection was CLOSED successfully to know that all his data was received at the destination TCP. Users must keep reading connections they close for sending until the TCP says no more data. Case 1: Local user initiates the close In this case, a FIN segment can be constructed and placed on the outgoing segment queue. No further SENDs from the user will be accepted by the TCP, and it enters the FIN-WAIT-1 state. RECEIVEs are allowed in this state. All segments preceding and including FIN will be retransmitted until acknowledged. When the other TCP has both acknowledged the FIN and sent a FIN of its own, the first TCP can ACK this FIN. Note that a TCP receiving a FIN will ACK but not send its own FIN until its user has CLOSED the connection also. "
(In reply to slw from comment #36) Please note that RFC 793 defines the CLOSE operation as "I have no more data to send." So this does NOT map to the close() system call in the socket API, but to the shutdown(..., SHUT_WR) system call. When doing this, you can continue to receive data. But I guess the server call close(), not shutdown() in the case we are looking at. That is why the RST gets send.
(In reply to Michael Tuexen from comment #37) RST+ACK, in sequence of data. And don't do transition from FIN-WAIT-1 until got ACK to FIN. "All segments preceding and including FIN will be retransmitted until acknowledged." This is irrelevant to close()/shutdown(). Only after acknowledged FIN RST may generated and this RST may be w/ ACK set (because this state is "state except SYN-SENT").
(In reply to slw from comment #38) The text you are referring to deals with the case that the application has called shutdown(..., SHUT_WR), not close(). So the text doesn't apply here...
(In reply to Michael Tuexen from comment #33) I think the code that transit the FIN-WAIT-1 to CLOSED is this: /* * If new data are received on a connection after the * user processes are gone, then RST the other end. */ if ((so->so_state & SS_NOFDREF) && tp->t_state > TCPS_CLOSE_WAIT && tlen) { KASSERT(ti_locked == TI_RLOCKED, ("%s: SS_NOFDEREF && " "CLOSE_WAIT && tlen ti_locked %d", __func__, ti_locked)); INP_INFO_RLOCK_ASSERT(&V_tcbinfo); if ((s = tcp_log_addrs(inc, th, NULL, NULL))) { log(LOG_DEBUG, "%s; %s: %s: Received %d bytes of data " "after socket was closed, " "sending RST and removing tcpcb\n", s, __func__, tcpstates[tp->t_state], tlen); free(s, M_TCPLOG); } tp = tcp_close(tp); TCPSTAT_INC(tcps_rcvafterclose); rstreason = BANDLIM_UNLIMITED; goto dropwithreset; } I don't completely understand the background for the ->CLOSED transition here. RFC 793 on page 36 seems to say: If an incoming segment has a security level, or compartment, or precedence which does not exactly match the level, and compartment, and precedence requested for the connection,a reset is sent and connection goes to the CLOSED state. The reset takes its sequence number from the ACK field of the incoming segment. I don't think we actually implemented any security level/compartment/precedence. So do we really need to do the ->CLOSED transition here? Thanks, sephe
Yes, if new data are received after the close, there is no way to deliver data anywhere. If we ack it, the peer may just keep sending data, the window may go closed, and the peer could probe it forever. The appropriate response is an RST. And the connection can't do anything further, so CLOSED is the correct state. It seems to me that this situation is an unavoidable flaw of syn cookies.
(In reply to Mike Karels from comment #41) Would transition from {FIN-WAIT-1,FIN-WAIT-2,CLOSING} to TIME-WAIT for this code segment be more "appropriate" than transition to CLOSED?
I tested on Linux. Its TCP stack just continue to reset the connection by increment its sequence. -> SYN (seq 1) <- SYN/ACK (seq 80 ACK 1) -> ACK (seq 1 ACK 81) -> [DATA] (seq 1 ACK 81) <- [DATA 100] (seq 81) <- FIN (delayed/lost/ignored/...) -> [DATA] <- RST (seq 181) -> [DATA] <- RST (seq 182) -> [DATA] <- RST (seq 183) -> [DATA] <- RST (seq 184) -> [DATA] ...
(In reply to Alexandre martins from comment #43) Please share a .pcap file.
(In reply to Alexandre martins from comment #43) Can you do additional tests in this setup? 1. Lost DATA from server, FIN and first RST 2. Lost DATA from server, FIN and TWO first RST
Created attachment 180996 [details] Linux server, short delay between header and data Capture using the scapy script. A short delay (<200ms) is done between header and data
Created attachment 180997 [details] Linux server, long delay between header and data Capture using the scapy script. A short delay (>400ms) is done between header and data We can see that the Linux stack don't do the same thing ...
(In reply to Mike Karels from comment #41) > Yes, if new data are received after the close, there is no way to deliver data anywhere. If we ack it, the peer may just keep sending data, the window may go closed, and the peer could probe it forever. The appropriate response is an RST. And the connection can't do anything further, so CLOSED is the correct state. RFC requried to retransmit data until acked. No matter how to application close socket (do you realy mean this retransmit must be depeneds on apllication reading socket? this is strange and irrational). Yes, ack all received data may be not good. Not sure. May be don't ack it before got ACK to FIN? And only after ACKed FIN generate RST+ACK (RST MUST be w/ ACK, w/o correct ACK RST ignored by Linux and Windows, I am check this before. And accepte such RST for live connection by FreeBSD is bug, violate RFC and security issuse). Generated RST before ACKed FIN don't allow to make sure about "All segments preceding and including FIN will be retransmitted until acknowledged." > It seems to me that this situation is an unavoidable flaw of syn cookies. Please, ignore syn cookies here. See only to "last data chunk not acked, FIN lost".
Created attachment 180999 [details] packetdrill script to show how Linux handles data on closed socket
(In reply to Michael Tuexen from comment #49) I have added a packetdrill script to show how Linux handles the reception of data when the user called closed before the data is received. It does the same a FreeBSD (as I would expect): It sends a reset and closes the socket. If the peer retransmits the data, a new RST is generated.
(In reply to Michael Tuexen from comment #50) === +0.00 write(4, ..., 395) = 395 +0.00 > P. 1:396(395) ack 46 win 58 +0.00 close(4) = 0 +0.00 > F. 396:396(0) ack 46 win 58 +0.00 < P. 46:54(8) ack 396 win 8192 === this is different case: remote side ACK response from server. for correct comparation "+0.00 > P. 1:396(395) ack 46 win 58" must be lost
I ran the script on Ubuntu 16.10 (kernel 4.8). Not sure if the behaviour depends on the version... @Alexandre: Can you figure out whether the server calls shutdown() or close()? Not sure why the behaviour is different.
Created attachment 181000 [details] Updated packetdrill script to show how Linux handled data on closed socket This corrects the bug in the script that the data from the server was acked.
(In reply to slw from comment #51) Right. I updated the script. The behaviour hasn't changed.
(In reply to Michael Tuexen from comment #52) In all of my tests, the server call close() because of the error. I didn't check the shutdown(.., SHUT_WR) case.
(In reply to Michael Tuexen from comment #54) 1. Linux generated RST w/ ACK. socket don't moved to CLOSED state? 2. This is also wrong behavior: server try to reset connection, client ignore this (by wrong ACK on RST packet). Also, data is lost.
(In reply to Alexandre martins from comment #55) Great. That is what I'm also testing. It seems that the trace with the long delay shows the same as my testing. However, the trace with a short delay is different. Not sure why. And I don't see a reason why they should be different...
(In reply to slw from comment #56) > 1. Linux generated RST w/ ACK. socket don't moved to CLOSED state? Linux generates a RST, not a RST-ACK. I guess it is moved to the CLOSED state. Can't check. > 2. This is also wrong behavior: server try to reset connection, client ignore > this (by wrong ACK on RST packet). Also, data is lost. You are right, user data is lost (the second part of the request). There is no way around that. since the user closed the socket before the data arrived. That is why the server sends a RST.
(In reply to Michael Tuexen from comment #58) > Linux generates a RST, not a RST-ACK. May bad, yes > You are right, user data is lost (the second part of the request). There is no way around that. since the user closed the socket before the data arrived. That is why the server sends a RST. No, different data: server response. And no reasson to not deliver/resend it to client.
(In reply to slw from comment #59) > No, different data: server response. And no reasson to not deliver/resend > it to client. Well, the server knows that he has to drop incoming data, since the application has closed the socket and there is new data to deliver. I guess we agree on that. The reaction is to let the peer now as soon as possible that something went wrong. I guess this is what we might no agree upon. However this is the reason why the server sends a RST without trying to retransmit data he has sent but which has not been acknowledged by the peer.
(In reply to Michael Tuexen from comment #60) > Well, the server knows that he has to drop incoming data, since the application > has closed the socket and there is new data to deliver. I guess we agree on that. Agree. > The reaction is to let the peer now as soon as possible that something went wrong. > I guess this is what we might no agree upon. > However this is the reason why the server sends a RST without trying to retransmit > data he has sent but which has not been acknowledged by the peer. Now server have several task: 1. retransmit his response and (optional) FIN 2. let the peer that something went wrong. I am don't see ideal answer to this. Send RST now may be don't allow to be sure about acknowledged server data. Not sure. Need some tests: retransmit data, retransmit FIN, send RST+ACK, SEQ=FIN.SEQ+1, see reaction from remote: got ACK to data? got ack to FIN? In any cases retransmit is mandatory, by RFC. Some tricks to discard client data may be used here (zero window? silent discard and only SEQ/ACK analize? ACk and discard? send RST+ACK (as you see, RST w/o ACK ignored by client) after client acknowledged server data?
(In reply to slw from comment #61) I don't see why just sending a RST is not allowed. The RFC describes a graceful termination of a TCP connection. We are in a situation where it is not graceful. It is more like handling the ABORT primitive...
(In reply to Michael Tuexen from comment #62) Accepted RST can cause silent, immediatly closing connection by client. No acknowledged will be sent to server (in case of serevr responce will be delivered to client) and server will be retransmit response until also got RST. In case of responce not delivered to client responce will be lost and no chance to deleivered. Sending RST is ok in case RST accepted and processed after client sent ACK so server FIN (FIN, not server data). Need test this on real clients (Linux, windows, android and iOS). Server can do graceful termination of this TCP connection and no reason to don't do this (application do gracefull close and only rare combination of packet loss case this behaviour).
(In reply to slw from comment #63) > Accepted RST can cause silent, immediatly closing connection by client. > No acknowledged will be sent to server (in case of serevr responce will be > delivered to client) and server will be retransmit response until also got RST. No. The server sends a RST and move the connection to CLOSED. The connection is terminated immediately from the server perspective. > In case of responce not delivered to client responce will be lost and no chance > to deleivered. > Sending RST is ok in case RST accepted and processed after client sent ACK > so server FIN (FIN, not server data). Need test this on real clients > (Linux, windows, android and iOS). No. This is an ungraceful termination. No need to wait for anything. All peers should handle the RST. > Server can do graceful termination of this TCP connection and no reason to > don't do this (application do gracefull close and only rare combination > of packet loss case this behaviour). No. Incoming user data is lost on the server side. I think the server should notify the peer as soon as possible. Please note that the RST is only sent if the server has to drop user data. If that is not the case, a graceful shutdown will be performed.
(In reply to Michael Tuexen from comment #64) > No. The server sends a RST and move the connection to CLOSED. The connection is > terminated immediately from the server perspective. This is wrong behaviour. This is cause lost of server data. > No. This is an ungraceful termination. No need to wait for anything. All peers > should handle the RST. No. pwrite(); close(); This is graceful termination. > No. Incoming user data is lost on the server side. I think the server should notify > the peer as soon as possible. Please note that the RST is only sent if the server > has to drop user data. If that is not the case, a graceful shutdown will be performed. User data is second question for server. Main question is resend server respond to client. This is RFC requiriment. And this is more impotant for user -- see respond from server, not just droped connection. Server can send RST after client ACK server respose and server FIN.
(In reply to slw from comment #65) > This is wrong behaviour. This is cause lost of server data. No, the loss of data is caused by the application calling close() *before* incoming user data arrived. So the TCP stack on the server has to drop that user data. > pwrite(); > close(); > This is graceful termination. Sure. This is what the application triggers. However, when user data arrives after the close call, this gets ungraceful, since this user data can't be delivered to the user anymore. To avoid this, the application could call shutdown(..., SHUT_WR) to trigger the sending of the FIN, then process incoming data until a FIN from the peer arrives and then calling close(). But the application didn't. This would be more in line of how RFC 793 describes a connection termination. Please note that the CLOSE primitive in RFC 793 maps to a shutdown(..., SHUT_WR) system call, not to the close() system call. Bad naming... I don't see text in RFC 793, where it is required that you continue to process a connection after you know that it failed. I think the RFC doesn't cover the case where the application says "I don't want to receive anymore".
(In reply to Michael Tuexen from comment #66) > No, the loss of data is caused by the application calling close() *before* > incoming user data arrived. Loss of sended application data caused by close() before incoming user data arrived? Realy? TCP don't have such claim. > So the TCP stack on the server has to drop that user data. No problem. This is not application data. > Sure. This is what the application triggers. However, when user data arrives after > the close call, this gets ungraceful, since this user data can't be delivered to > the user anymore. I am not afraid about user data, I am afraid about server data. Server data lost and this is problem. No problem about lose user data. > I don't see text in RFC 793, where it is required that you continue to process > a connection after you know that it failed. What reason to failed connection? > I think the RFC doesn't cover the case > where the application says "I don't want to receive anymore". Yes, RFC says all CLOSE is 'means "I have no more to send" but does not mean "I will not receive any more."' I mean "Thus, it should be acceptable to make several SEND calls, followed by a CLOSE, and expect all the data to be sent to the destination." have precendece over all. "Reset Generation" allow generation RST from ESTABLISHED/FIN-WAIT-1/fIN-WAIT-2 state only "If an incoming segment has a security level, or compartment, or precedence which does not exactly match the level" Also, "3.9. Event Processing", "SEGMENT ARRIVES" don't generate RST in ESTABLISHED/FIN-WAIT-1/FIN-WAIT-2 STATE. I mean conection not in failed state until all server data and FIN will be ACKed. Or at timeout. After this, connection may be moved to failed state. Not until. PS: May proposal resolve next issuse: 1. server response not lost any more 2. client see valid replay from server, not just 'connection droped' 3. late segments from client don't mach syn-cookei and don't re-open connection 4. no new restrictions for first segment syn cookie processing 5. client side of connection clearly closed (currently generated RST w/o ACK ignored by all clients except FreeBSD. this is another bug) 6. this is compatible w/ existing applications
(In reply to slw from comment #67) > Loss of sended application data caused by close() before incoming user > data arrived? Realy? TCP don't have such claim. Hmm. My understanding is that TCP provides a reliable bi-directional byte-stream. This means that no user data is lost in any of the two directions. > No problem. This is not application data. For me it doesn't matter in which direction user data is dropped. If that happens, TCP fails. > I am not afraid about user data, I am afraid about server data. > Server data lost and this is problem. > No problem about lose user data. I don't agree. TCP does not provide a service which is reliable in one direction and unreliable in the other. If user data is dropped, the connection failed. RFC 793 only talks about the CLOSE primitive, which shuts down only the writing direction, not the reading one. This is not related to the close() call, the program is using. The problem comes up because the reading direction is shut down. This is not described in the RFC. Therefore you have to go back to the high level description of the service TCP provides.
(In reply to Michael Tuexen from comment #68) > Hmm. My understanding is that TCP provides a reliable bi-directional byte-stream. > This means that no user data is lost in any of the two directions. "Lost" in this context is "not delivered to remote IP stack". Not application, IP stack. Remote system ACK data after place in system buffer, not after reading by application. I.e. while system can accept data and generate ACK -- all ok. Data delivered.
(In reply to slw from comment #69) > "Lost" in this context is "not delivered to remote IP stack". > Not application, IP stack. The semantic of an ACK is "delivered to the remote TCP stack. Not received by the application. But the service is from app to app. In my understanding the TCP should signal to the peer if it knows that some data will not be delivered to the app. This signalling is sending an RST. > Remote system ACK data after place in system buffer, not after reading > by application. > I.e. while system can accept data and generate ACK -- all ok. Data delivered. That is fine by the semantic of an ACK. But my point is that a TCP connection fails if some data can't be delivered to the application.
(In reply to Michael Tuexen from comment #70) RFC: "An acknowledgment by TCP does not guarantee that the data has been delivered to the end user, but only that the receiving TCP has taken the responsibility to do so." Nothing about failed delivered to application. This information is unavailable on TCP level.
(In reply to slw from comment #71) I'm not arguing the semantic of the TCP level ACKs. I'm saying that if a TCP stack knows that some user data can not be delivered to its application, it should signal that to the peer. This includes: * receiving data after the socket was closed or shutdown(..., SHUT_RD) was called. * closing the socket or calling shutdown(..., SHUT_RD) with user data still in the receive buffer. I think: * these four cases should be handled the same way. * Resetting the TCP connection is what I would expect. I can write some packetdrill test scripts to see what the current behaviour is under FreeBSD (and Linux).
(In reply to Michael Tuexen from comment #72) IMO, transition from {FIN-WAIT-1,FIN-WAIT-2,CLOSING} to TIME_WAIT instead of CLOSED upon receiving data from a closed socket can partially cover this case by leveraging the 2MSL waiting, i.e. keep sending RST. Once 2MSL is done, syncookie kicks in, which is unfixable.
In responding to an earlier comment directed to me, and to the comments since: This connection is dead. Only an RST and CLOSED state apply. Data cannot be delivered to the application, which has lost the handle to this connection, and may not even be alive in some cases. TCP cannot wait for a FIN, for example, because there may be gigabytes of data to deliver before the FIN, and there is no where for them to go. fwiw, I added the original version of this code about 30 years ago. I believe it was in 4.3BSD. It is still correct.
(In reply to Mike Karels from comment #74) I agree completely with Mike.
(In reply to Mike Karels from comment #74) Im don't talk "TCP wait FIN", only "TCP wait ACK=SERVER_FIN.SEQ". Server TCP have sending and not confirmed data, what reason to discard this server response? After got ACK=FIN.SEQ no objection to send RST. About incoming data to server. Is close(fd) equalent to sutdown(fd, SHUT_RDWR)? If yes -- incoming data can acknowledged and then silently discarded: == SHUT_RD The read half of the connection is closed— No more data can be received on the socket and any data currently in the socket receive buffer is discarded. The process can no longer issue any of the read functions on the socket. Any data received after this call for a TCP socket is acknowledged and then silently discarded. SHUT_WR The write half of the connection is closed— In the case of TCP, this is called a half-close (Section 18.5 of TCPv1). Any data currently in the socket send buffer will be sent, followed by TCP's normal connection termination sequence. As we mentioned earlier, this closing of the write half is done regardless of whether or not the socket descriptor's reference count is currently greater than 0. The process can no longer issue any of the write functions on the socket. SHUT_RDWR The read half and the write half of the connection are both closed— This is equivalent to calling shutdown twice: first with SHUT_RD and then with SHUT_WR. ==
(In reply to Mike Karels from comment #74) Is this {FIN-WAIT-1,FIN-WAIT-2,CLOSING} to CLOSED transition by the quoted code kinda similar to TWA (RFC1337)? And would it suffer the same hazards listed in that RFC? Thanks, sephe
(In reply to Sepherosa Ziehau from comment #77) I don't think it is similar, although I haven't thought it through in great detail. For starters, there is no TIME_WAIT to assassinate. I don't see how the situations would be similar. The only way I can think of to stop this would be to invent a new state, similar to TIME_WAIT but not awaiting only FIN, that would remember that the connection has terminated unsuccessfully. I don't think this is warranted.
(In reply to Mike Karels from comment #78) Hmm, WAIT1, WAIT2 and CLOSING will transit to TIME_WAIT, so killing them (moving to CLOSED) in the code we talked assassinates TIME_WAIT indirectly. Or add a flag to tcpcb for TIME_WAIT state.
I put a patch under review: https://reviews.freebsd.org/D10272 If syncookies are used in case of an overflow of the syncache, its usage is restricted to the case where there was an overflow recently. This mitigates the problem describe in the PR. I'm still investing the behaviour in case of close()/shutdown(,SHUT_RD) and received user data.
A commit references this bug: Author: tuexen Date: Thu Apr 20 19:19:34 UTC 2017 New revision: 317208 URL: https://svnweb.freebsd.org/changeset/base/317208 Log: Syncoockies can be used in combination with the syncache. If the cache overflows, syncookies are used. This patch restricts the usage of syncookies in this case: accept syncookies only if there was an overflow of the syncache recently. This mitigates a problem reported in PR217637, where is syncookie was accepted without any recent drops. Thanks to glebius@ for suggesting an improvement. PR: 217637 Reviewed by: gnn, glebius MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D10272 Changes: head/sys/netinet/tcp_syncache.c head/sys/netinet/tcp_syncache.h
I confirm that the workaround works for my test case. I let you develop a more complex solution to avoid data loss and/or TCP re-open on heavy load. Thank you for the patch :)
A commit references this bug: Author: tuexen Date: Thu Jun 1 08:32:35 UTC 2017 New revision: 319400 URL: https://svnweb.freebsd.org/changeset/base/319400 Log: MFC r317208: Syncoockies can be used in combination with the syncache. If the cache overflows, syncookies are used. This patch restricts the usage of syncookies in this case: accept syncookies only if there was an overflow of the syncache recently. This mitigates a problem reported in PR217637, where is syncookie was accepted without any recent drops. Thanks to glebius@ for suggesting an improvement. PR: 217637 Reviewed by: gnn, glebius Differential Revision: https://reviews.freebsd.org/D10272 Changes: _U stable/11/ stable/11/sys/netinet/tcp_syncache.c stable/11/sys/netinet/tcp_syncache.h
Track merge to stable/11
Since the original report was on 10.3-RELEASE (comment 4) and 10.4 will be released, request/ask about merge to stable/10. While I'm here, assign to committer resolving.
Previous comment was supposed to read " comment 7 "
Created attachment 184680 [details] patch to not send acks in this case We've recently started hitting this at WhatsApp as well. I've applied the syncookie patches from CURRENT to 10.3 manually, and it successfully prevents this from happening as long as the syncache hasn't overflowed recently or been disabled. Unfortunately, if the syncache does overflow, when this case does happen, once the connection is re-opened, the connection states on each peer are out of sync, and each peer will respond to a packet with unreasonable seq/ack data by sending an empty ack with the current seq/ack; the other peer will find this unreasonable and the resulting packet storms were causing availability problems. I've attached a patch we've been running on a few machines. With this, when the connections do get into this state, we don't contribute to the packet storm; instead, the connection will end up eventually closing without sending very many packets. I have some complete connection pcaps available (from before patching), and can share them (after masking IPs and tcp payloads) if they'll be useful. From the traces I've seen, we're getting many retransmits from the peer (or a middlebox), and also the peer ends the connection soon after opening, by sending a FIN. Our host acks the FIN and also closes with a FIN. After the peer's ack of our FIN, we receive a new ACK that's a retransmit of the original ACK, and reopen the connection at the connection original SEQ/ACK, while the peer is in TIME_WAIT at the final SEQ/ACK. In the traces I was able to capture, the peers were mobile devices across the world and on high latency links.
Also, I had intended to include a link to a similar change in the Linux kernel, https://github.com/torvalds/linux/commit/96e0bf4b5193d0d97d139f99e2dd128763d55521 (Although the comments there say this behavior is consistent with the RFC, I don't think it is consistent however, the RFC says, "If the ACK acks something not yet sent (SEG.ACK > SND.NXT) then send an ACK, drop the segment, and return.")
FWIW, I have created a patch to address the behavior reported in comment #87. It maintains the RFC-compliant behavior of sending ACKs, but rate limits them. This isn't perfect, but seems better (to me) than either of the extreme alternatives (drop every packet or ACK every packet). I'd be curious to have someone try this and see if it provides effective mitigation.
(In reply to Jonathan T. Looney from comment #89) Forgot to include a link to the patch: https://reviews.freebsd.org/D11929
For us, this may have limited effect, because our icmplim is much higher than the default (16k), because we do want to send closed port RSTs in high volumewhen our service ports are closed. This patch should keep our system available, but we'd still be sending out a rather large number of acks on these broken connections. Given that r317208 means we only see this condition when the syncache overflows, this may be a reasonable trade off. I have been thinking that it might be useful if these acks (or other unexpected packets) would actually push the connection into one of the existing state recovery behaviors, such as immediately triggering a keepalive sequence if one wasn't already ongoing. That's certainly a larger change though. I'll get this patch up somewhere and let you know how it goes.
(In reply to Richard Russo from comment #91) > For us, this may have limited effect, because our icmplim is much higher than the default (16k) > [...] Yes, I agree it is a problem you can't set these independently. I had already been planning to propose a change to allow separate limits for all the rate-limited things. You would then be able to set this limit separately.
A commit references this bug: Author: tuexen Date: Wed Aug 9 13:26:12 UTC 2017 New revision: 322315 URL: https://svnweb.freebsd.org/changeset/base/322315 Log: MFC r317208: Syncoockies can be used in combination with the syncache. If the cache overflows, syncookies are used. This patch restricts the usage of syncookies in this case: accept syncookies only if there was an overflow of the syncache recently. This mitigates a problem reported in PR217637, where is syncookie was accepted without any recent drops. Thanks to glebius@ for suggesting an improvement. PR: 217637 Reviewed by: gnn, glebius Differential Revision: https://reviews.freebsd.org/D10272 Changes: _U stable/10/ stable/10/sys/netinet/tcp_syncache.c stable/10/sys/netinet/tcp_syncache.h
(In reply to Kubilay Kocak from comment #85) Merged to stable/10 in r322315.
(In reply to Richard Russo from comment #91) I've tested the linked patch, manually applied to 10.3, setting syncookie only mode (to trigger the condition), and the icmp limit tuned to a reasonable range, I was able to see the limiting kick in, as expected. This looks good for us (and will be better when rate limits are individually adjustable!)