Bug 132832

Summary: [netinet] [patch] tcp_output() might generate invalid TSO frames when len > TCP_MAXWIN - hdrlen - optlen
Product: Base System Reporter: Renaud Lienhart <renaud>
Component: kernAssignee: Andre Oppermann <andre>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.diff none

Description Renaud Lienhart 2009-03-20 01:50:01 UTC
The tcp_output() routine has an issue when the send window exceeds TCP_MAXWIN and the underlying interface supports TSO.

There is a check to ensure the data being pushed doesn't exceed the maximum TSO packet size. If this is the case, "len" is trimmed and "sendalot" is set:

--- 8< ---
if (tso) {
	if (len > TCP_MAXWIN - hdrlen - optlen) {
		len = TCP_MAXWIN - hdrlen - optlen;
		len = len - (len % (tp->t_maxopd - optlen));
		sendalot = 1;
--- >8 ---

Everything's hunky-dory, until the next "sendalot" iteration;

If the remaining window does not require TSO (i.e. len <= tp->t_maxseg), this following piece of code fails to properly reset "tso":

--- 8< ---
if (len > tp->t_maxseg) {
	if (<tso_conds>)
		tso = 1;
	} else {
		len = tp->t_maxseg;
		sendalot = 1;
		tso = 0;
	}
}
--- >8 ---

"tso" remains set to 1 and the resulting packet is tagged with CSUM_TSO although it doesn't require TSO. This causes problems with a large number of nics which refuse to send the resulting frame and, in some case, wedge.

Fix: The solution is to always reset "tso" at the beginning of the "sendalot" loop in order to ensure it is not stale.

In my patch I also added a KASSERT() which directly catches any problematic frame before it reaches any other layer.

Patch attached with submission follows:
How-To-Repeat: Using netperf (or any TCP workload) with a large socket buffer size exposes the issue in a matter of seconds.
Comment 1 Gavin Atkinson freebsd_committer freebsd_triage 2009-04-16 09:19:28 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).  This may be the cause of some of the other 
TSO issues that have been spotted recently.
Comment 2 Andre Oppermann freebsd_committer freebsd_triage 2010-08-10 23:22:47 UTC
Responsible Changed
From-To: freebsd-net->andre

Take over.
Comment 3 dfilter service freebsd_committer freebsd_triage 2010-08-14 22:41:46 UTC
Author: andre
Date: Sat Aug 14 21:41:33 2010
New Revision: 211317
URL: http://svn.freebsd.org/changeset/base/211317

Log:
  When using TSO and sending more than TCP_MAXWIN sendalot is set
  and we loop back to 'again'.  If the remainder is less or equal
  to one full segment, the TSO flag was not cleared even though
  it isn't necessary anymore.  Enabling the TSO flag on a segment
  that doesn't require any offloaded segmentation by the NIC may
  cause confusion in the driver or hardware.
  
  Reset the internal tso flag in tcp_output() on every iteration
  of sendalot.
  
  PR:		kern/132832
  Submitted by:	Renaud Lienhart <renaud-at-vmware com>
  MFC after:	1 week

Modified:
  head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==============================================================================
--- head/sys/netinet/tcp_output.c	Sat Aug 14 21:04:27 2010	(r211316)
+++ head/sys/netinet/tcp_output.c	Sat Aug 14 21:41:33 2010	(r211317)
@@ -153,7 +153,7 @@ tcp_output(struct tcpcb *tp)
 	int idle, sendalot;
 	int sack_rxmit, sack_bytes_rxmt;
 	struct sackhole *p;
-	int tso = 0;
+	int tso;
 	struct tcpopt to;
 #if 0
 	int maxburst = TCP_MAXBURST;
@@ -211,6 +211,7 @@ again:
 	    SEQ_LT(tp->snd_nxt, tp->snd_max))
 		tcp_sack_adjust(tp);
 	sendalot = 0;
+	tso = 0;
 	off = tp->snd_nxt - tp->snd_una;
 	sendwin = min(tp->snd_wnd, tp->snd_cwnd);
 	sendwin = min(sendwin, tp->snd_bwnd);
@@ -490,9 +491,9 @@ after_sack_rexmit:
 		} else {
 			len = tp->t_maxseg;
 			sendalot = 1;
-			tso = 0;
 		}
 	}
+
 	if (sack_rxmit) {
 		if (SEQ_LT(p->rxmit + len, tp->snd_una + so->so_snd.sb_cc))
 			flags &= ~TH_FIN;
@@ -1051,6 +1052,8 @@ send:
 	 * XXX: Fixme: This is currently not the case for IPv6.
 	 */
 	if (tso) {
+		KASSERT(len > tp->t_maxopd - optlen,
+		    ("%s: len <= tso_segsz", __func__));
 		m->m_pkthdr.csum_flags |= CSUM_TSO;
 		m->m_pkthdr.tso_segsz = tp->t_maxopd - optlen;
 	}
_______________________________________________
svn-src-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscribe@freebsd.org"
Comment 4 Andre Oppermann freebsd_committer freebsd_triage 2010-08-14 23:33:56 UTC
State Changed
From-To: open->patched
Comment 5 Andre Oppermann freebsd_committer freebsd_triage 2010-08-23 15:26:00 UTC
State Changed
From-To: patched->closed

All MFC's done.