Bug 116335

Summary: [tcp] Excessive TCP window updates
Product: Base System Reporter: Kevin Oberman <oberman>
Component: kernAssignee: Andre Oppermann <andre>
Status: Closed FIXED    
Severity: Affects Only Me CC: hiren
Priority: Normal    
Version: 6.2-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
patch-1.diff none

Description Kevin Oberman 2007-09-13 21:20:06 UTC
	Testing over a trans-continental 10GE between two boxes with
mxge cards, at a point about 2.5 seconds into the tansfer, the receive
node starts updating the window size as fast as it can process the
data. The result is that it is sending updates at intervals of between
0 and 4 microseconds. This can result in several hundred window
updates between "real" packets and, I suspect, is causing performance
problems.

I see an old message at:
http://lists.freebsd.org/pipermail/freebsd-net/2005-January/006141.html
that may be the source of the problem, though I have not yet figured
out exactly how this code works.

Fix: 

Unknown.
How-To-Repeat: 	Send a TCP stream between to hosts with a ~100 ms. RTT between
them at speeds exceeding 3 Gbps.
Comment 1 Craig Rodrigues freebsd_committer freebsd_triage 2007-09-14 01:53:26 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Send to freebsd-net@
Comment 2 Andre Oppermann freebsd_committer freebsd_triage 2007-09-15 09:45:13 UTC
Responsible Changed
From-To: freebsd-net->andre

Take over.
Comment 3 Andre Oppermann freebsd_committer freebsd_triage 2010-08-15 11:07:25 UTC
State Changed
From-To: open->analyzed

Cause found and patch upcoming.
Comment 4 Andre Oppermann freebsd_committer freebsd_triage 2010-08-16 22:27:15 UTC
Kevin,

thanks for your bug report about the window updates.  Please try
the attached patch.  It changes TCP to be much more restrictive
in generating window updates.  Window update actually are only
really necessary when the socket buffer is close to being full
and a zero window was announced.  Then independent window updates
make the remote end send again.  In all other cases the ACK clock
will handle reporting of the current window just fine.

The patch will generate window updates only if the window can be
increased by two segments at least (silly window avoidance), and:
  - the free space in the socket buffer is 1/8, or
  - the window is increase by at least 1/4 of the sockbuf, or
  - the socket buffer is smaller than 8 times MSS.

And it won't issue an independent window update if a delayed ACK
is pending.

Lawrence: could you review the patch as well?

-- 
Andre
Comment 5 Kevin Oberman 2010-08-16 22:50:27 UTC
> Date: Mon, 16 Aug 2010 23:27:15 +0200
> From: Andre Oppermann <andre@freebsd.org>
> 
> Kevin,
> 
> thanks for your bug report about the window updates.  Please try
> the attached patch.  It changes TCP to be much more restrictive
> in generating window updates.  Window update actually are only
> really necessary when the socket buffer is close to being full
> and a zero window was announced.  Then independent window updates
> make the remote end send again.  In all other cases the ACK clock
> will handle reporting of the current window just fine.
> 
> The patch will generate window updates only if the window can be
> increased by two segments at least (silly window avoidance), and:
>   - the free space in the socket buffer is 1/8, or
>   - the window is increase by at least 1/4 of the sockbuf, or
>   - the socket buffer is smaller than 8 times MSS.
> 
> And it won't issue an independent window update if a delayed ACK
> is pending.
> 
> Lawrence: could you review the patch as well?
> 
> -- 
> Andre
> 

Wow! I had given up on hearing anything about this.

I no longer have my test setup for looking at this and am currently
swapped by deadlines, it my take a while. I'll try to get to it no
later than next week.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: oberman@es.net			Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751
Comment 6 dfilter service freebsd_committer freebsd_triage 2013-07-05 16:48:07 UTC
Author: andre
Date: Fri Jul  5 15:47:59 2013
New Revision: 252793
URL: http://svnweb.freebsd.org/changeset/base/252793

Log:
  MFC r242251, r242311:
  
   Defer sending an independent window update if a delayed ACK is pending
   saving a packet.  The window update then gets piggy-backed on the next
   already scheduled ACK.
  
  MFC r242252:
  
   Prevent a flurry of forced window updates when an application is
   doing small reads on a (partially) filled receive socket buffer.
  
   Normally one would a send a window update every time the available
   space in the socket buffer increases by two times MSS.  This leads
   to a flurry of window updates that do not provide any meaningful
   new information to the sender.  There still is available space in
   the window and the sender can continue sending data.  All window
   updates then get carried by the regular ACKs.  Only when the socket
   buffer was (almost) full and the window closed accordingly a window
   updates delivery new information and allows the sender to start
   sending more data again.
  
   Send window updates only every two MSS when the socket buffer
   has less than 1/8 space available, or the available space in the
   socket buffer increased by 1/4 its full capacity, or the socket
   buffer is very small.  The next regular data ACK will carry and
   report the exact window size again.
  
   Reported by:	sbruno
   Tested by:	darrenr
   Tested by:	Darren Baginski
   PR:		kern/116335

Modified:
  stable/9/sys/netinet/tcp_output.c
Directory Properties:
  stable/9/sys/   (props changed)

Modified: stable/9/sys/netinet/tcp_output.c
==============================================================================
--- stable/9/sys/netinet/tcp_output.c	Fri Jul  5 15:30:02 2013	(r252792)
+++ stable/9/sys/netinet/tcp_output.c	Fri Jul  5 15:47:59 2013	(r252793)
@@ -540,19 +540,39 @@ after_sack_rexmit:
 	}
 
 	/*
-	 * Compare available window to amount of window
-	 * known to peer (as advertised window less
-	 * next expected input).  If the difference is at least two
-	 * max size segments, or at least 50% of the maximum possible
-	 * window, then want to send a window update to peer.
-	 * Skip this if the connection is in T/TCP half-open state.
-	 * Don't send pure window updates when the peer has closed
-	 * the connection and won't ever send more data.
+	 * Sending of standalone window updates.
+	 *
+	 * Window updates are important when we close our window due to a
+	 * full socket buffer and are opening it again after the application
+	 * reads data from it.  Once the window has opened again and the
+	 * remote end starts to send again the ACK clock takes over and
+	 * provides the most current window information.
+	 *
+	 * We must avoid the silly window syndrome whereas every read
+	 * from the receive buffer, no matter how small, causes a window
+	 * update to be sent.  We also should avoid sending a flurry of
+	 * window updates when the socket buffer had queued a lot of data
+	 * and the application is doing small reads.
+	 *
+	 * Prevent a flurry of pointless window updates by only sending
+	 * an update when we can increase the advertized window by more
+	 * than 1/4th of the socket buffer capacity.  When the buffer is
+	 * getting full or is very small be more aggressive and send an
+	 * update whenever we can increase by two mss sized segments.
+	 * In all other situations the ACK's to new incoming data will
+	 * carry further window increases.
+	 *
+	 * Don't send an independent window update if a delayed
+	 * ACK is pending (it will get piggy-backed on it) or the
+	 * remote side already has done a half-close and won't send
+	 * more data.  Skip this if the connection is in T/TCP
+	 * half-open state.
 	 */
 	if (recwin > 0 && !(tp->t_flags & TF_NEEDSYN) &&
+	    !(tp->t_flags & TF_DELACK) &&
 	    !TCPS_HAVERCVDFIN(tp->t_state)) {
 		/*
-		 * "adv" is the amount we can increase the window,
+		 * "adv" is the amount we could increase the window,
 		 * taking into account that we are limited by
 		 * TCP_MAXWIN << tp->rcv_scale.
 		 */
@@ -572,9 +592,11 @@ after_sack_rexmit:
 		 */
 		if (oldwin >> tp->rcv_scale == (adv + oldwin) >> tp->rcv_scale)
 			goto dontupdate;
-		if (adv >= (long) (2 * tp->t_maxseg))
-			goto send;
-		if (2 * adv >= (long) so->so_rcv.sb_hiwat)
+
+		if (adv >= (long)(2 * tp->t_maxseg) &&
+		    (adv >= (long)(so->so_rcv.sb_hiwat / 4) ||
+		     recwin <= (long)(so->so_rcv.sb_hiwat / 8) ||
+		     so->so_rcv.sb_hiwat <= 8 * tp->t_maxseg))
 			goto send;
 	}
 dontupdate:
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 7 Hiren Panchasara freebsd_committer freebsd_triage 2016-02-12 20:23:14 UTC
Reopen if this is still a problem.