There is a bug tcp_output() for at least FreeBSD 6.1 that causes a perfectly good TCP to be dropped by its retransmit timer; the application receives ETIMEDOUT. Consider a TCP that never transmits (the receive end of the ttcp utility is an example), while the TCP is established snd_max == snd_una == snd_nxt == (isr + 1) and the retransmit timer should never be started. If the retransmit timer is started then it is never stopped by tcp_input/tcp_out because snd_max == snd_una == snd_nxt (always). Once started the timer continues its count up till tp->t_rxtshift == 12 and the connection that never transmitted gets falsely killed. The bug is to blindly rely on the return value of ip_output(). If ip_output() returns ENOBUFS then the retransmit timer is activated: From the end of tcp_output(): out: SOCKBUF_UNLOCK_ASSERT(&so->so_snd); /* Check gotos. */ if (error == ENOBUFS) { if (!callout_active(tp->tt_rexmt) && !callout_active(tp->tt_persist)) callout_reset(tp->tt_rexmt, tp->t_rxtcur, tcp_timer_rexmt, tp); tp->snd_cwnd = tp->t_maxseg; return (0); } My simple minded fix would be not to start the retransmit timer; if tcp_output() wanted to time this transmit it would have started the timer up above. This ETIMEDOUT problem is easily recreated on any old machine using a single slow ethernet device and the ttcp test utility. First, fire up a couple ttcp receivers. Second, flood the same interface with enough ttcp transmitters to cause the driver's transmit ring and interface queue to back up. Eventually, one of the ttcp receives will get ENOBUFS from ip_output() and the retransmit timer will be wrongly activated for a pure ACK segment. I was able to do it w/ the following on freeBSD6.1: box1: ttcp -s -l 16384 -p 9444 -v -b 128000 -r ttcp -s -l 16384 -p 9445 -v -b 128000 -r ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 192.168.222.13 ttcp -s -n 9999999 -l 333 -p 9447 -v -b 128000 -t 192.168.222.13 ttcp -s -n 9999999 -l 8192 -p 9448 -v -b 128000 -t 192.168.222.13 ttcp -s -n 9999999 -l 333 -p 9449 -v -b 128000 -t 192.168.222.13 ttcp -s -n 9999999 -l 8192 -p 9450 -v -b 128000 -t 192.168.222.13 box2: ttcp -s -n 6553600 -l 8192 -p 9444 -v -b 128000 -t 192.168.222.222 ttcp -s -n 9999999 -l 128 -p 9445 -v -b 128000 -t 192.168.222.222 ttcp -s -l 16384 -p 9446 -v -b 128000 -r ttcp -s -l 16384 -p 9447 -v -b 128000 -r ttcp -s -l 16384 -p 9448 -v -b 128000 -r ttcp -s -l 16384 -p 9449 -v -b 128000 -r ttcp -s -l 16384 -p 9450 -v -b 128000 -r Fix: Do not start the retransmit timer based on error codes from ip_output() ? How-To-Repeat: I was able to do it w/ the following on freeBSD6.1: box1: ttcp -s -l 16384 -p 9444 -v -b 128000 -r ttcp -s -l 16384 -p 9445 -v -b 128000 -r ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 192.168.222.13 ttcp -s -n 9999999 -l 333 -p 9447 -v -b 128000 -t 192.168.222.13 ttcp -s -n 9999999 -l 8192 -p 9448 -v -b 128000 -t 192.168.222.13 ttcp -s -n 9999999 -l 333 -p 9449 -v -b 128000 -t 192.168.222.13 ttcp -s -n 9999999 -l 8192 -p 9450 -v -b 128000 -t 192.168.222.13 box2: ttcp -s -n 6553600 -l 8192 -p 9444 -v -b 128000 -t 192.168.222.222 ttcp -s -n 9999999 -l 128 -p 9445 -v -b 128000 -t 192.168.222.222 ttcp -s -l 16384 -p 9446 -v -b 128000 -r ttcp -s -l 16384 -p 9447 -v -b 128000 -r ttcp -s -l 16384 -p 9448 -v -b 128000 -r ttcp -s -l 16384 -p 9449 -v -b 128000 -r ttcp -s -l 16384 -p 9450 -v -b 128000 -r
I realized, late last night, that I was wrong on a few details concerning this bug: 1.) The retransmit timer does not keep popping on without being restarted. 2.) ip_output() must return ENOBUFS (TCP_MAXRXTSHIFT + 1) times to the same, non-transmitting TCP. 3.) Given a TCP as described below, when tcp_output() uses ENOBUFS to blindly start the retransmit timer then tp->t_rxtshift will be falsely incremented and never cleared. Thus the bug manifests itself because it appears for a TCP that never transmits nobody ever clears clears tp->t_rxtshift; this allows tp->t_rxtshift to slowly count up to TCP_MAXRXTSHIFT; once TCP_MAXRXTSHIFT is exceeded tcp_timer_rexmt() will kill the poor innocent TCP.
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s).
Responsible Changed From-To: freebsd-net->andre Take over.
Responsible Changed From-To: andre->silby silby - this sounds like a bug you fixed not too long ago - please take a look
I think i've been hit by this. I played around with a more current version of tcp_output.c (FreeBSD 9.1) and believe the described behaviour still exists. But without knowing too much about TCP and the FreeBSD kernel, I cannot create a real patch that won't break other cases. Is anybody still working on this since 2007? Till
Dave Baukus here, again, after 8.5 years. This bug still exists in FreeBSD 10.1 Stable. Below is a dtrace script that helps illustrate the bug: ----------- start dtrace script BEGIN { self->ip_output_return = 0; self->tcp_output_enter = 0; } fbt::tcp_output:entry { self->tcp_output_enter = 1; } fbt::tcp_output:return { self->tcp_output_enter = 0; } fbt::ip_output:return /self->tcp_output_enter == 1/ { self->ip_output_return = arg1; } /* ENOBUFS == 55 */ fbt::tcp_timer_activate:entry /self->ip_output_return == 55/ { tp = (struct tcpcb *)args[0]; timer_type = (unsigned int)args[1]; inp = (struct inpcb *)tp->t_inpcb; self->ip_output_return = 0; printf( "pid/tid %d/%d: start %d = %s, local 0x%x:%x, foreign 0x%x:%x, rtx shift %d\n", pid, tid, timer_type, timer_type == 2 ? "RXMT" : "OTHER", inp->inp_inc.inc_ie.ie_dependladdr.ie46_local.ia46_addr4.s_addr, inp->inp_inc.inc_ie.ie_lport, inp->inp_inc.inc_ie.ie_dependfaddr.ie46_foreign.ia46_addr4.s_addr, inp->inp_inc.inc_ie.ie_fport, tp->t_rxtshift); } fbt::tcp_timer_rexmt:entry { tp = (struct tcpcb *)args[0]; inp = (struct inpcb *)tp->t_inpcb; printf( "%s RTX on local 0x%x:%x, foreign 0x%x:%x, rtx shift %d, soft_err %d, snd_una 0x%x, snd_nxt 0x%x, snd_max 0x%x\n", tp->t_rxtshift + 1 > 12 ? "ETIMEDOUT" : "", inp->inp_inc.inc_ie.ie_dependladdr.ie46_local.ia46_addr4.s_addr, inp->inp_inc.inc_ie.ie_lport, inp->inp_inc.inc_ie.ie_dependfaddr.ie46_foreign.ia46_addr4.s_addr, inp->inp_inc.inc_ie.ie_fport, tp->t_rxtshift, tp->t_softerror, tp->snd_una, tp->snd_nxt, tp->snd_max); } --------- END dtrace script Updating my original test scenario: box1 (one of the first 2 ttcp's will receive ETIMEDOUT) ttcp -s -l 16384 -p 9444 -v -b 128000 -r ttcp -s -l 16384 -p 9445 -v -b 128000 -r ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 10.1.4.204 ttcp -s -n 9999999 -l 333 -p 9447 -v -b 128000 -t 10.1.4.204 ttcp -s -n 9999999 -l 333 -p 9448 -v -b 128000 -t 10.1.4.204 ttcp -s -n 9999999 -l 333 -p 9449 -v -b 128000 -t 10.1.4.204 ping -s 1024 -l 999999999 sm4u-20-mgmt.eng box2: ttcp -s -n 9999999 -l 12800 -p 9444 -v -b 128000 -t 10.1.2.48 ttcp -s -n 9999999 -l 32765 -p 9445 -v -b 128000 -t 10.1.2.48 ttcp -s -l 16384 -p 9446 -v -b 128000 -r ttcp -s -l 16384 -p 9447 -v -b 128000 -r ttcp -s -l 16384 -p 9448 -v -b 128000 -r ttcp -s -l 16384 -p 9449 -v -b 128000 -r ------------ post processed dtrace output: In this example the first ttcp on box1 using port 9444 receives ETIMEDOUT from tcp_timer_rexmt. Note the ever increasing rtx shift. grep "local 0x3002010a:e424, foreign 0xcc04010a:2ba4" dtrace.out | grep -v OTHER 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 0 0 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 0, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 0 22786 tcp_timer_activate:entry pid/tid 12/100019: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 1 0 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 1, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 2 3 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 2, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 3 22786 tcp_timer_activate:entry pid/tid 12/100019: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 3 0 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 3, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 0 22786 tcp_timer_activate:entry pid/tid 12/100019: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 4 2 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 4, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 5 0 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 5, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 6 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 6 0 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 6, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 7 0 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 7, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 8 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 8 0 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 8, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 9 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 9 0 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 9, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 10 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 10 0 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 10, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 11 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 11 4 9785 tcp_timer_rexmt:entry RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 11, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46 4 22786 tcp_timer_activate:entry pid/tid 12/100019: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 12 2 22786 tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 12 0 9785 tcp_timer_rexmt:entry ETIMEDOUT RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 12, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
Reassign to the wild with permission of assignee.
And this would most likely be in -head too.
batch change: For bugs that match the following - Status Is In progress AND - Untouched since 2018-01-01. AND - Affects Base System OR Documentation DO: Reset to open status. Note: I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
This was fixed in base r300981.
Thank you for the reference Michael ^Triage: Assign to committer that resolved