Bug 192599 - zero window and persist timer not set
Summary: zero window and persist timer not set
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.2-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: Hiren Panchasara
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2014-08-11 21:52 UTC by Jeremiah
Modified: 2016-08-02 18:01 UTC (History)
7 users (show)

See Also:


Attachments
Proposed patch (622 bytes, patch)
2014-08-11 21:52 UTC, Jeremiah
no flags Details | Diff
recv.py (456 bytes, text/x-python)
2014-08-12 16:04 UTC, Jeremiah
no flags Details
send.py (639 bytes, text/x-python)
2014-08-12 16:04 UTC, Jeremiah
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremiah 2014-08-11 21:52:51 UTC
Created attachment 145689 [details]
Proposed patch

We've been seeing a problem where a tcp connection is stuck in a zero window condition and even though the client has opened more window space, our FreeBSD box never sends any more.  After some analysis it appears that the FreeBSD box is not sending zero window probes, because the persist timer did not get set (we can see in kgdb that the tcpcb shows 0 window, there is data in the socket buffer, but the persist timer is not active).

After looking over the code for a while, I think I see the problem.  When tcp_output chooses to send a packet, it never arms the persist timer.  This causes a problem in the following scenario:

1.   A --> B: packet containing enough data to fill the window
2.   B --> A: ACK for #1 + new data (0 window advertisement)
3.   A --> B: ACK for #2, 0 len packet

In this case, A will not activate the persist timer, because it chose to send a packet.  Unless tcp_output is called for some other reason (delayed ack timer, another input packet from B, socket syscall), A will not send zero window probes.  I was finally able to recreate this condition by setting an very small window and running programs that send very specific sequences of packets without calling recv (purposefully forcing a zero window condition).  Here is a packet capture that shows the sequence:

A == 10.2.15.69 == FreeBSD 9.2
B == 10.2.14.61 == FreeBSD 8.2

16:19:49.664790 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [S], seq 2362665163, win 4300, options [mss 1460,nop,wscale 6,sackOK,TS val 88804503 ecr 0], length 0
16:19:49.664821 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [S.], seq 3306387947, ack 2362665164, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 1605043666 ecr 88804503], length 0
16:19:49.664859 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], ack 1, win 67, options [nop,nop,TS val 88804503 ecr 1605043666], length 0
16:19:49.664921 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq 1:101, ack 1, win 67, options [nop,nop,TS val 88804503 ecr 1605043666], length 100
16:19:49.665137 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [P.], seq 1:3001, ack 101, win 2046, options [nop,nop,TS val 1605043666 ecr 88804503], length 3000
16:19:49.665208 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq 101:1321, ack 1449, win 45, options [nop,nop,TS val 88804503 ecr 1605043666], length 1220
16:19:49.666195 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], seq 1321:2769, ack 3001, win 21, options [nop,nop,TS val 88804504 ecr 1605043666], length 1448
16:19:49.666205 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], ack 2769, win 2004, options [nop,nop,TS val 1605043667 ecr 88804503], length 0
16:19:49.666207 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq 2769:2771, ack 3001, win 21, options [nop,nop,TS val 88804504 ecr 1605043666], length 2
16:19:49.667183 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [.], seq 2771:4219, ack 3001, win 21, options [nop,nop,TS val 88804505 ecr 1605043667], length 1448
16:19:49.667190 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], seq 3001:4345, ack 4219, win 1982, options [nop,nop,TS val 1605043668 ecr 88804504], length 1344
16:19:49.667193 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq 4219:4221, ack 3001, win 21, options [nop,nop,TS val 88804505 ecr 1605043667], length 2
16:19:49.766487 IP 10.2.14.61.23133 > 10.2.15.69.12345: Flags [P.], seq 4221:4321, ack 4345, win 0, options [nop,nop,TS val 88804605 ecr 1605043668], length 100
16:19:49.766499 IP 10.2.15.69.12345 > 10.2.14.61.23133: Flags [.], ack 4321, win 1980, options [nop,nop,TS val 1605043768 ecr 88804505], length 0

The important packets are the last four:

1. A --> B: length 1344, fills the remaining window
2. B --> A: length 2, does not ack additional data, delayed ack timer is set
3. B --> A: length 100, acks #1, immediate ack (delayed ack timer cancelled, tcp_output called with ACKNOW)
4. A --> B: length 0, acks #1 and #2, because a packet is sent tcp_output does not activate the persist timer.

I would normally expect A to begin sending zero-window probes, but (since it didn't activate the persist timer) it does not.  Using kgdb, I can see that the persist timer is not set, only the keep timer is set.  This is kgdb on "A":

(kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_nxt
$5 = 3306392292
(kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_max
$6 = 3306392292
(kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_una
$7 = 3306392292
(kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_wnd
$8 = 0
(kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->snd_cwnd
$9 = 4380
(kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_rexmt->c_flags
$11 = 16
(kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_persist->c_flags
$12 = 16
(kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_keep->c_flags
$13 = 22
(kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_2msl->c_flags
$14 = 16
(kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_timers->tt_delack->c_flags
$15 = 16
(kgdb) print ((struct tcpcb*)(0xfffffe02ae289b70))->t_inpcb->inp_socket.so_snd.sb_cc
$16 = 1656

There is zero window, data in the socket buffer, and the persist timer is not set.

My proposed fix is attached.  If you send a 0-length packet, but there is data is the socket buffer, and neither the rexmt or persist timer is already set, then activate the persist timer.
Comment 1 John Baldwin freebsd_committer freebsd_triage 2014-08-12 14:54:51 UTC
I think this is probably correct, but adding a few more folks from the e-mail thread to chime in here as well.  I believe Peter has a test case for this as well as Adrian.
Comment 2 Jeremiah 2014-08-12 16:04:14 UTC
Created attachment 145717 [details]
recv.py

I was asked to share my test programs, so I will.  However, I'm not sure that it will help reproduce it in another environment.  It was very dependent on network timing as well as the features of the network drivers on both side (do they support tso/lro, etc.).  I tweaked it endlessly until I got something that reproduced it somewhat consistently in our environment.

The names of the files are probably a bit of a misnomer, too (backwards from what you would expect).  Run recv.py on the box you want to reproduce the problem on and send.py on another freebsd box.  It does not work running on same box, since MTU gets set to 16K in that case.  You can only tell if the problem is reproduced by running a packet trace during the run and seeing if there are zero window probes sent or not.  You'll have to SIGINT both programs after checking the packet trace, they just wait in the zero window condition forever whether the probes are sent or not.
Comment 3 Jeremiah 2014-08-12 16:04:36 UTC
Created attachment 145718 [details]
send.py
Comment 4 Hiren Panchasara freebsd_committer freebsd_triage 2015-05-09 21:29:55 UTC
I think this should be committed.
Comment 5 Hiren Panchasara freebsd_committer freebsd_triage 2015-05-12 06:43:38 UTC
--- a/sys/netinet/tcp_output.c
+++ b/sys/netinet/tcp_output.c
@@ -1394,6 +1394,11 @@ timer:
                                tp->t_rxtshift = 0;
                        }
                        tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur);
+               } else if (len == 0 && sbavail(&so->so_snd) &&
+                   !tcp_timer_active(tp, TT_REXMT) &&
+                   !tcp_timer_active(tp, TT_PERSIST)) {
+                       tp->t_rxtshift = 0;
+                       tcp_setpersist(tp);
                }
        } else {
                /*

Updated patch.

Glebius/Julian, Can you guys take a look?
Comment 6 Julien Charbon freebsd_committer freebsd_triage 2015-05-15 12:29:50 UTC
Hi Hiren, I can take a look to this issue fix, I will reply back before end of next week.

Note: I thought you tried to reach Julian (with a 'a', julian /at/ FreeBSD) and not me Julien (with a 'e', jch /at/ FreeBSD)
Comment 7 Hiren Panchasara freebsd_committer freebsd_triage 2015-05-18 18:38:58 UTC
(In reply to Julien Charbon from comment #6)
Apologies for the name confusion. And thanks for looking into it. :-)
Comment 8 Julien Charbon freebsd_committer freebsd_triage 2015-05-27 12:04:01 UTC
As far I can see the patch looks good. I am going to reproduce the issue and also check that there are no better place for this test.
Comment 9 John Baldwin freebsd_committer freebsd_triage 2015-05-27 13:00:16 UTC
My only suggestion to the patch would be to add a comment to note why this is being done.
Comment 10 Hiren Panchasara freebsd_committer freebsd_triage 2015-06-17 15:30:38 UTC
Julien,

Did you get a chance to look into this?
Comment 11 Julien Charbon freebsd_committer freebsd_triage 2015-06-23 09:20:23 UTC
Hi Hiren, I am still testing it/reviewing it.  If I am starting to be confident in the TCP stack implementation on server side, I am still learning its client side.  I believe I will be confident enough by the end of this week.
Comment 12 Julien Charbon freebsd_committer freebsd_triage 2015-06-29 12:31:43 UTC
Hi guys, I  reviewed of this patch. Looks good to me. I would just suggest to:

1. Add a comment to explain the rational behind this patch (as jhb suggested)
2. (Not mandatory) Maybe create a review on Phabricator (and I will directly accept it)

Thanks for this detailed bug report.
Comment 13 Hiren Panchasara freebsd_committer freebsd_triage 2015-06-29 17:13:53 UTC
Thanks Julien.

I've tried to copy/paste what Jeremiah already had in his analysis as a comment around the code.

Opened a phabric review: https://reviews.freebsd.org/D2946

Not sure if Jeremiah has an account there so here is the diff. Let me know if anything needs updating.

I feel the code is complex enough so detailed comments like this helps newcomers digging into the code. But if someone thinks this is more verbose than necessary, let me know.

Index: sys/netinet/tcp_output.c
===================================================================
--- sys/netinet/tcp_output.c
+++ sys/netinet/tcp_output.c
@@ -1394,6 +1394,30 @@
 				tp->t_rxtshift = 0;
 			}
 			tcp_timer_activate(tp, TT_REXMT, tp->t_rxtcur);
+		} else if (len == 0 && sbavail(&so->so_snd) &&
+		    !tcp_timer_active(tp, TT_REXMT) &&
+		    !tcp_timer_active(tp, TT_PERSIST)) {
+			/*
+			 * Avoid a situation where we do not set persist timer
+			 * after a zero window condition. For example:
+			 * 1) A -> B: packet with enough data to fill the window
+			 * 2) B -> A: ACK for #1 + new data (0 window
+			 *    advertisement)
+			 * 3) A -> B: ACK for #2, 0 len packet
+			 *
+			 * In this case, A will not activate the persist timer,
+			 * because it chose to send a packet. Unless tcp_output
+			 * is called for some other reason (delayed ack timer,
+			 * another input packet from B, socket syscall), A will
+			 * not send zero window probes.
+			 *
+			 * So, if you send a 0-length packet, but there is data
+			 * in the socket buffer, and neither the rexmt or
+			 * persist timer is already set, then activate the
+			 * persist timer.
+			 */
+			tp->t_rxtshift = 0;
+			tcp_setpersist(tp);
 		}
 	} else {
 		/*
Comment 14 commit-hook freebsd_committer freebsd_triage 2015-06-29 21:24:47 UTC
A commit references this bug:

Author: hiren
Date: Mon Jun 29 21:23:55 UTC 2015
New revision: 284941
URL: https://svnweb.freebsd.org/changeset/base/284941

Log:
  Avoid a situation where we do not set persist timer after a zero window
  condition.
  If you send a 0-length packet, but there is data is the socket buffer, and
  neither the rexmt or persist timer is already set, then activate the persist
  timer.

  PR:			192599
  Differential Revision:	D2946
  Submitted by:		jlott at averesystems dot com
  Reviewed by:		jhb, jch, gnn, hiren
  Tested by:		jlott at averesystems dot com, jch
  MFC after:		2 weeks

Changes:
  head/sys/netinet/tcp_output.c
Comment 15 commit-hook freebsd_committer freebsd_triage 2015-07-22 15:06:03 UTC
A commit references this bug:

Author: hiren
Date: Wed Jul 22 15:05:46 UTC 2015
New revision: 285793
URL: https://svnweb.freebsd.org/changeset/base/285793

Log:
  MFC r284941:
  Avoid a situation where we do not set persist timer after a zero window
  condition.
  If you send a 0-length packet, but there is data is the socket buffer, and
  neither the rexmt or persist timer is already set, then activate the persist
  timer.

  PR:		192599
  Approved by:	re (delphij)

Changes:
_U  stable/10/
  stable/10/sys/netinet/tcp_output.c