Bug 108670 - [tcp] TCP connection ETIMEDOUT
Summary: [tcp] TCP connection ETIMEDOUT
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.1-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Sepherosa Ziehau
URL: https://reviews.freebsd.org/D5872
Keywords:
Depends on:
Blocks:
 
Reported: 2007-02-01 23:40 UTC by dave baukus
Modified: 2020-01-25 03:35 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dave baukus 2007-02-01 23:40:18 UTC
There is a bug  tcp_output() for at least FreeBSD 6.1
that causes a perfectly good TCP to be dropped by its
retransmit timer; the application receives ETIMEDOUT.

Consider a TCP that never transmits (the receive end of the ttcp
utility is an example), while the TCP is established
snd_max == snd_una == snd_nxt == (isr + 1) and the retransmit
timer should never be started. If the retransmit timer is started
then it is never stopped by tcp_input/tcp_out because
snd_max == snd_una == snd_nxt (always). Once started the
timer continues its count up till tp->t_rxtshift == 12 and
the connection that never transmitted gets falsely killed.

The bug is to blindly rely on the return value of ip_output().
If ip_output() returns ENOBUFS then the retransmit timer is
activated:

From the end of tcp_output():
out:
SOCKBUF_UNLOCK_ASSERT(&so->so_snd);    /* Check gotos. */
if (error == ENOBUFS) {
        if (!callout_active(tp->tt_rexmt) &&
            !callout_active(tp->tt_persist))
                     callout_reset(tp->tt_rexmt, tp->t_rxtcur,
                         tcp_timer_rexmt, tp);
                     tp->snd_cwnd = tp->t_maxseg;
                     return (0);
}

My simple minded fix would be not to start the retransmit timer;
if tcp_output() wanted to time this transmit it would have started
the timer up above.

This ETIMEDOUT problem is easily recreated on any old machine
using a single slow ethernet device and the ttcp test utility.
First, fire up a couple ttcp receivers. Second, flood the same
interface with enough ttcp transmitters to cause the driver's transmit
ring and interface queue to back up. Eventually, one of the ttcp
receives will get ENOBUFS from ip_output() and the retransmit
timer will be wrongly activated for a pure ACK segment.

I was able to do it w/ the following on freeBSD6.1:

box1:
ttcp -s -l 16384 -p 9444 -v -b 128000 -r
ttcp -s -l 16384 -p 9445 -v -b 128000 -r
ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9447 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9448 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9449 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9450 -v -b 128000 -t 192.168.222.13

box2:
ttcp -s -n 6553600 -l 8192 -p 9444 -v -b 128000 -t  192.168.222.222
ttcp -s -n 9999999 -l 128  -p 9445 -v -b 128000  -t  192.168.222.222
ttcp -s -l 16384 -p 9446 -v -b 128000 -r
ttcp -s -l 16384 -p 9447 -v -b 128000 -r
ttcp -s -l 16384 -p 9448 -v -b 128000 -r
ttcp -s -l 16384 -p 9449 -v -b 128000 -r
ttcp -s -l 16384 -p 9450 -v -b 128000 -r

Fix: 

Do not start the retransmit timer based on error codes from ip_output() ?
How-To-Repeat: 
I was able to do it w/ the following on freeBSD6.1:

box1:
ttcp -s -l 16384 -p 9444 -v -b 128000 -r
ttcp -s -l 16384 -p 9445 -v -b 128000 -r
ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9447 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9448 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 333  -p 9449 -v -b 128000 -t 192.168.222.13
ttcp -s -n 9999999 -l 8192  -p 9450 -v -b 128000 -t 192.168.222.13

box2:
ttcp -s -n 6553600 -l 8192 -p 9444 -v -b 128000 -t  192.168.222.222
ttcp -s -n 9999999 -l 128  -p 9445 -v -b 128000  -t  192.168.222.222
ttcp -s -l 16384 -p 9446 -v -b 128000 -r
ttcp -s -l 16384 -p 9447 -v -b 128000 -r
ttcp -s -l 16384 -p 9448 -v -b 128000 -r
ttcp -s -l 16384 -p 9449 -v -b 128000 -r
ttcp -s -l 16384 -p 9450 -v -b 128000 -r
Comment 1 dave baukus 2007-02-02 15:30:14 UTC
I realized, late last night, that I was wrong on a few
details concerning this bug:

1.) The retransmit timer does not keep popping on without
being restarted.

2.) ip_output() must return ENOBUFS (TCP_MAXRXTSHIFT + 1) times
to the same, non-transmitting TCP.

3.) Given a TCP as described below, when tcp_output() uses ENOBUFS
to blindly start the retransmit timer then tp->t_rxtshift will be
falsely incremented and never cleared.

Thus the bug manifests itself because it appears for a TCP that
never transmits nobody ever clears clears tp->t_rxtshift;
this allows tp->t_rxtshift to slowly count up to TCP_MAXRXTSHIFT;
once TCP_MAXRXTSHIFT is exceeded tcp_timer_rexmt() will
kill the poor innocent TCP.
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2007-04-24 03:50:19 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 3 Andre Oppermann freebsd_committer freebsd_triage 2007-05-13 19:38:09 UTC
Responsible Changed
From-To: freebsd-net->andre

Take over.
Comment 4 K. Macy freebsd_committer freebsd_triage 2007-11-15 23:24:05 UTC
Responsible Changed
From-To: andre->silby


silby - this sounds like a bug you fixed not too long ago - please take a look
Comment 5 Till Toenges 2013-08-18 13:10:06 UTC
I think i've been hit by this. I played around with a more current 
version of tcp_output.c (FreeBSD 9.1) and believe the described 
behaviour still exists. But without knowing too much about TCP and the 
FreeBSD kernel, I cannot create a real patch that won't break other 
cases. Is anybody still working on this since 2007?

Till
Comment 6 Dave Baukus 2015-09-15 19:42:57 UTC
Dave Baukus here, again, after 8.5 years.
This bug still exists in FreeBSD 10.1 Stable.

Below is a dtrace script that helps illustrate the bug:

----------- start dtrace script
BEGIN
{
        self->ip_output_return = 0;
        self->tcp_output_enter = 0;
}

fbt::tcp_output:entry
{
        self->tcp_output_enter = 1;
}

fbt::tcp_output:return
{
        self->tcp_output_enter = 0;
}

fbt::ip_output:return
/self->tcp_output_enter == 1/
{
        self->ip_output_return = arg1;
}

/* ENOBUFS == 55 */
fbt::tcp_timer_activate:entry
/self->ip_output_return == 55/
{
        tp = (struct tcpcb *)args[0];
        timer_type = (unsigned int)args[1];
        inp = (struct inpcb *)tp->t_inpcb;


        self->ip_output_return = 0; 

        printf(
"pid/tid %d/%d: start %d = %s, local 0x%x:%x, foreign 0x%x:%x, rtx shift %d\n",
         pid, tid, 
         timer_type, timer_type == 2 ? "RXMT" : "OTHER",
         inp->inp_inc.inc_ie.ie_dependladdr.ie46_local.ia46_addr4.s_addr,
         inp->inp_inc.inc_ie.ie_lport,
         inp->inp_inc.inc_ie.ie_dependfaddr.ie46_foreign.ia46_addr4.s_addr,
         inp->inp_inc.inc_ie.ie_fport,
         tp->t_rxtshift);
}

fbt::tcp_timer_rexmt:entry
{
        tp = (struct tcpcb *)args[0];
        inp = (struct inpcb *)tp->t_inpcb;

        printf(
"%s RTX on local 0x%x:%x, foreign 0x%x:%x, rtx shift %d, soft_err %d, snd_una 0x%x, snd_nxt 0x%x, snd_max 0x%x\n",
                tp->t_rxtshift + 1 > 12 ? "ETIMEDOUT" : "", 
                inp->inp_inc.inc_ie.ie_dependladdr.ie46_local.ia46_addr4.s_addr,
                inp->inp_inc.inc_ie.ie_lport,             
            inp->inp_inc.inc_ie.ie_dependfaddr.ie46_foreign.ia46_addr4.s_addr,
                inp->inp_inc.inc_ie.ie_fport,
                tp->t_rxtshift,
                tp->t_softerror,
                tp->snd_una, tp->snd_nxt, tp->snd_max);
}

--------- END dtrace script

Updating my original test scenario:
box1 (one of the first 2 ttcp's will receive ETIMEDOUT)

ttcp -s -l 16384 -p 9444 -v -b 128000 -r
ttcp -s -l 16384 -p 9445 -v -b 128000 -r
ttcp -s -n 6553600 -l 4096 -p 9446 -v -b 128000 -t 10.1.4.204
ttcp -s -n 9999999 -l 333  -p 9447 -v -b 128000 -t 10.1.4.204
ttcp -s -n 9999999 -l 333  -p 9448 -v -b 128000 -t 10.1.4.204
ttcp -s -n 9999999 -l 333  -p 9449 -v -b 128000 -t 10.1.4.204
ping -s 1024 -l 999999999 sm4u-20-mgmt.eng

box2:
ttcp -s -n 9999999 -l 12800  -p 9444 -v -b 128000  -t  10.1.2.48
ttcp -s -n 9999999 -l 32765 -p 9445 -v -b 128000  -t  10.1.2.48
ttcp -s -l 16384 -p 9446 -v -b 128000 -r
ttcp -s -l 16384 -p 9447 -v -b 128000 -r
ttcp -s -l 16384 -p 9448 -v -b 128000 -r
ttcp -s -l 16384 -p 9449 -v -b 128000 -r

------------ post processed dtrace output:

In this example the first ttcp on box1 using port 9444 receives 
ETIMEDOUT from tcp_timer_rexmt. Note the ever increasing rtx shift.


grep "local 0x3002010a:e424, foreign 0xcc04010a:2ba4" dtrace.out | grep -v OTHER


  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 0
  0   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 0, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  0  22786         tcp_timer_activate:entry pid/tid 12/100019: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 1
  0   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 1, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 2
  3   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 2, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  3  22786         tcp_timer_activate:entry pid/tid 12/100019: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 3
  0   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 3, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  0  22786         tcp_timer_activate:entry pid/tid 12/100019: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 4
  2   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 4, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 5
  0   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 5, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 6
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 6
  0   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 6, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 7
  0   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 7, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 8
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 8
  0   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 8, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 9
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 9
  0   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 9, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 10
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 10
  0   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 10, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 11
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 11
  4   9785            tcp_timer_rexmt:entry  RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 11, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
  4  22786         tcp_timer_activate:entry pid/tid 12/100019: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 12
  2  22786         tcp_timer_activate:entry pid/tid 12/100058: start 2 = RXMT, local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 12
  0   9785            tcp_timer_rexmt:entry ETIMEDOUT RTX on local 0x3002010a:e424, foreign 0xcc04010a:2ba4, rtx shift 12, soft_err 0, snd_una 0xb304fb46, snd_nxt 0xb304fb46, snd_max 0xb304fb46
Comment 7 Mark Linimon freebsd_committer freebsd_triage 2015-11-12 01:31:02 UTC
Reassign to the wild with permission of assignee.
Comment 8 Hiren Panchasara freebsd_committer freebsd_triage 2015-11-12 10:13:32 UTC
And this would most likely be in -head too.
Comment 9 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:48:34 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 10 Michael Tuexen freebsd_committer freebsd_triage 2019-12-29 12:12:42 UTC
This was fixed in base r300981.
Comment 11 Kubilay Kocak freebsd_committer freebsd_triage 2020-01-25 03:35:40 UTC
Thank you for the reference Michael

^Triage: Assign to committer that resolved