Bug 253464

Summary: [tcp] Fast Recovery not entered, when SACK negotiated, but no SACK blocks received in DupAcks
Product: Base System Reporter: Richard Scheffenegger <rscheff>
Component: kernAssignee: Richard Scheffenegger <rscheff>
Status: Closed FIXED    
Severity: Affects Only Me CC: tuexen
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Richard Scheffenegger freebsd_committer freebsd_triage 2021-02-12 17:44:22 UTC
Just to document a peculiar effect.

Using packetdrill to set up a session, with fbsd being the client (actively establishing the session), SACK was negotiated. However, the script itself
would only send duplicate ACKs without SACK blocks.

In this instance, Fastrecovery doesn't appear to get triggered - even after a high number of duplicate ACKs. Only RTO resume forward progress eventually.

Note that this is an edge case: hosts negotiating SACK are expected to provide the information which segment triggered the duplicate ACK with the SACK option, thus impact is small.

But curiously enough, this effect does NOT happen, when fbsd is the server side (listening and accepting the session) - even when the remainder of the two sripts is absolutely identical.

```
// Testing ECN Packet markings
// Active Client

--tolerance_usecs=50000

// Flush Hostcache
0.0 `sysctl net.inet.tcp.cc.algorithm=newreno`
+0.01 `sysctl net.inet.tcp.ecn.enable=1`
+0.01 `sysctl net.inet.tcp.initcwnd_segments=10`
+0.01 `sysctl net.inet.tcp.hostcache.purgenow=1`

// Create a TCP socket.
+0.50 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
+0.01 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0.01 setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0.01 setsockopt(4, SOL_SOCKET, SO_DEBUG, [1], 4) = 0
+0.01 setsockopt(4, SOL_SOCKET, SO_SNDBUF, [1048576], 4) = 0

// Establish a TCP connection with ECN

+0.04 connect(4, ..., ...) = -1 // EINPROGRESS

+0 >[noecn] SEW 0:0(0) win 65535 <mss 1460, nop,wscale 6,sackOK,TS val ... ecr 0>
+0 <[noecn] SE. 0:0(0) ack 1 win 65535 <mss 1000, nop, wscale 10, sackOK, eol, nop>
+0 >[noecn] .   1:1(0) ack 1 <...>

+0.01 getsockopt(4, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
+0.01 fcntl(4, F_SETFL, O_RDWR) = 0   // set back to blocking

// Send IW plus 1 segment, check ECN bits
1.00 write(4, ..., 11000) = 11000
+0    >[ect0]  .      1:1001(1000) ack 1
+0    <[noecn] E.     1:1(0) ack 1001 win 65535
+0    >[ect0]  .   1001:2001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 2001 win 65535
+0    >[ect0]  .   2001:3001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 3001 win 65535
+0    >[ect0]  .   3001:4001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 4001 win 65535
+0    >[ect0]  .   4001:5001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 5001 win 65535
+0    >[ect0]  .   5001:6001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 6001 win 65535
+0    >[ect0]  .   6001:7001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 7001 win 65535
+0    >[ect0]  .   7001:8001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 8001 win 65535
+0    >[ect0]  .   8001:9001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 9001 win 65535
+0    >[ect0]  .   9001:10001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 10001 win 65535
+0    >[ect0]  PW. 10001:11001(1000) ack 1


// Send few more "loss" packets
+0.01 write(4, ..., 5000) = 5000

+0    >[ect0]  .  11001:12001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 10001 win 65535  // 1st dupack
+0    >[ect0]  .  12001:13001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 10001 win 65535  // 2nd dupack
+0    >[ect0]  .  13001:14001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 10001 win 65535  // 3rd dupack
+0    >[ect0]  .  14001:15001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 10001 win 65535  // 4th dupack

+0    >[ect0]  P. 15001:16001(1000) ack 1         // not sent, RTO for the missing segment instead
+0    <[noecn] .      1:1(0) ack 10001 win 65535
/*
// Retransmission without ECT
+0    >[noecn] .  10001:11001(1000) ack 1

+0    <[noecn] .      1:1(0) ack 11001 win 65535
+0    >[noecn] .  11001:12001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 12001 win 65535
+0    >[noecn] .  12001:13001(1000) ack 1
+0    <[noecn] .      1:1(0) ack 16001 win 65535

//More data should be ECT marked again, and
// have the CWR flag set now.
+0.01 write(4, ..., 1000) = 1000
+0    >[ect0]  PW. 16001:17001(1000) ack 1
+0    <[noecn] .       1:1(0) ack 17001 win 65535

+0.00 close(4) = 0
+0.01 >[noecn] F.  17001:17001(0) ack 1
+0.03 <[noecn] F.      1:1(0) ack 17002 win 65535
+0.01 >[noecn] .   17002:17002(0) ack 2

```
Comment 1 Richard Scheffenegger freebsd_committer freebsd_triage 2021-02-13 15:11:31 UTC
Looked into this issue. The discrepancy between Client and Server side turned out to be a typo in the testscript. Provided that SACK is negotiated for in the 3WHS, any subsequent duplicate ACKs without SACK blocks are effectively ignored for purposes of recovering from loss.

https://reviews.freebsd.org/D28634 

When duplicate ACKs arrive after SACK was negotiated, make sure that SACK processing only happens with actual SACK blocks are also present in the TCP option of the ACK. Otherwise, the normal newreno processing path would be taken.

And as always, the retransmission timer remains as ultimate fallback, thus any problematic TCP sessions would only exhibit poor performance under loss, but no catastrophic failure (stall / closing of the session).
Comment 2 Richard Scheffenegger freebsd_committer freebsd_triage 2021-02-26 10:21:43 UTC
Fix under discussion: https://reviews.freebsd.org/D28634
Comment 3 Michael Tuexen freebsd_committer freebsd_triage 2021-07-01 23:08:51 UTC
The fix under discussion has been committed.