Bug 21791

Summary: Hang on FIN_WAIT_2
Product: Base System Reporter: jd <jd>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description jd 2000-10-06 22:20:00 UTC
Connecting to certain machines tcp stacks causes hang on FIN_WAIT_2, which never times out.

Fix: 

Perhaps timeout FIN_WAIT_2 sometime?
How-To-Repeat: telnet 216.105.145.50 80 and quit.
It is a Novell-HTTP-Server/3.1R1 server.

look at netstat, FIN_WAIT_2 will stay. It gets worse though, I wrote some code to initiate >2048 connections, which filled up the mbufs on the fbsd-3.5-s test machine (4.x will obviously behave differently), and permanently denied further connections until reboot. This could possibly be used as a local DoS attack from an ordinary user on a shell server, but at the least, the occasional hung FIN_WAIT_2s are annoying to look at...
Comment 1 jd 2000-10-10 20:21:29 UTC
The FIN_WAIT_2s DO eventually timeout/get cleaned up. On a 3.4-S server it
took about 4 days for
the FIN_WAITs to disappear, it was quicker on 3.5-S (about 18-24 hours?). I
still find it strange, is there
anyone that could give a detailed explanation?

-jd
Comment 2 ryan 2000-10-17 04:08:54 UTC
From what I can tell, this is not a fault with FreeBSD but a fault with the
remote server.  The remote server fails to send a final FIN to the client,
and the state of the connection remains in FIN_WAIT_2 indefinitely.  The
same situation was reproduced with www.freebsd.org, and it succeeded in
proceeding to the TIME_WAIT state.

Thus, from what I can tell, faulty TCP implementations on remote machines
will cause FreeBSD to hold connections in a FIN_WAIT_2 state for an indefinite
period of time (I believe the OP said something on the order of several days).

I have written a patch that pseudo-corrects this by not only testing whether
or not the state is not TIME_WAIT (in sys/netinet/tcp_timer.c) but also if it
is less than FIN_WAIT_2 (thus ruling out FIN_WAIT_2 or TIME_WAIT).  I tested
this with FreeBSD 4.1.1-RELEASE and the FIN_WAIT_2's dropped out of netstat's
output after 10 minutes (which the TCP code states it should do), even with
the faulty remote server's implementation.

Please review this code.  It has not been tested for current but it has been
diffed with current sources.  The affected line has been tested with 4.1.1
RELEASE and has at least indicated correct behavior.  If somebody out there
with greater knowledge of TCP/IP networking sees a flaw, please bring this up.
This marks my first attempt at debugging BSD networking kernel code so any
suggestions would be appreciated.  Thank you.

Ryan Younce

--- tcp_timer.c.orig    Mon Oct  2 18:28:49 2000
+++ tcp_timer.c Mon Oct 16 22:31:55 2000
@@ -201,10 +201,10 @@
        /*
         * 2 MSL timeout in shutdown went off.  If we're closed but
         * still waiting for peer to close and connection has been idle
-        * too long, or if 2MSL time is up from TIME_WAIT, delete connection
-        * control block.  Otherwise, check again in a bit.
+        * too long, or if 2MSL time is up from TIME_WAIT or FIN_WAIT_2,
+        * delete connection control block.  Otherwise, check again in a bit.
         */
-       if (tp->t_state != TCPS_TIME_WAIT &&
+       if (tp->t_state < TCPS_FIN_WAIT_2 &&
            (ticks - tp->t_rcvtime) <= tcp_maxidle)
                callout_reset(tp->tt_2msl, tcp_keepintvl,
                              tcp_timer_2msl, tp);

-- 
Ryan "Cheshire" Younce | ryan@manunkind.org | http://www.manunkind.org/~ryan
Comment 3 ryan 2000-10-17 04:11:44 UTC
From what I can tell, this is not a fault with FreeBSD but a fault with the
remote server.  The remote server fails to send a final FIN to the client,
and the state of the connection remains in FIN_WAIT_2 indefinitely.  The
same situation was reproduced with www.freebsd.org, and it succeeded in
proceeding to the TIME_WAIT state.

Thus, from what I can tell, faulty TCP implementations on remote machines
will cause FreeBSD to hold connections in a FIN_WAIT_2 state for an indefinite
period of time (I believe the OP said something on the order of several days).

I have written a patch that pseudo-corrects this by not only testing whether
or not the state is not TIME_WAIT (in sys/netinet/tcp_timer.c) but also if it
is less than FIN_WAIT_2 (thus ruling out FIN_WAIT_2 or TIME_WAIT).  I tested
this with FreeBSD 4.1.1-RELEASE and the FIN_WAIT_2's dropped out of netstat's
output after 10 minutes (which the TCP code states it should do), even with
the faulty remote server's implementation.

Please review this code.  It has not been tested for current but it has been
diffed with current sources.  The affected line has been tested with 4.1.1
RELEASE and has at least indicated correct behavior.  If somebody out there
with greater knowledge of TCP/IP networking sees a flaw, please bring this up.
This marks my first attempt at debugging BSD networking kernel code so any
suggestions would be appreciated.  Thank you.

Ryan Younce

--- tcp_timer.c.orig    Mon Oct  2 18:28:49 2000
+++ tcp_timer.c Mon Oct 16 22:31:55 2000
@@ -201,10 +201,10 @@
        /*
         * 2 MSL timeout in shutdown went off.  If we're closed but
         * still waiting for peer to close and connection has been idle
-        * too long, or if 2MSL time is up from TIME_WAIT, delete connection
-        * control block.  Otherwise, check again in a bit.
+        * too long, or if 2MSL time is up from TIME_WAIT or FIN_WAIT_2,
+        * delete connection control block.  Otherwise, check again in a bit.
         */
-       if (tp->t_state != TCPS_TIME_WAIT &&
+       if (tp->t_state < TCPS_FIN_WAIT_2 &&
            (ticks - tp->t_rcvtime) <= tcp_maxidle)
                callout_reset(tp->tt_2msl, tcp_keepintvl,
                              tcp_timer_2msl, tp);

-- 
Ryan Younce   /   Registered Libertarian Voter | The notion of a 'record' is
ryan@manunkind.org    www.manunkind.org/~ryan/ | an obsolete  remnant of the
ICQ H:4433228 W:74390437   AIM:CheshireCoolCat | days of the 80-column card.
Blitherings:  http://cheshire.livejournal.com/ |        -- Dennis M. Ritchie
Comment 4 ryan 2000-10-17 04:13:33 UTC
From what I can tell, this is not a fault with FreeBSD but a fault with the
remote server.  The remote server fails to send a final FIN to the client,
and the state of the connection remains in FIN_WAIT_2 indefinitely.  The
same situation was reproduced with www.freebsd.org, and it succeeded in
proceeding to the TIME_WAIT state.

Thus, from what I can tell, faulty TCP implementations on remote machines
will cause FreeBSD to hold connections in a FIN_WAIT_2 state for an indefinite
period of time (I believe the OP said something on the order of several days).

I have written a patch that pseudo-corrects this by not only testing whether
or not the state is not TIME_WAIT (in sys/netinet/tcp_timer.c) but also if it
is less than FIN_WAIT_2 (thus ruling out FIN_WAIT_2 or TIME_WAIT).  I tested
this with FreeBSD 4.1.1-RELEASE and the FIN_WAIT_2's dropped out of netstat's
output after 10 minutes (which the TCP code states it should do), even with
the faulty remote server's implementation.
   
Please review this code.  It has not been tested for current but it has been
diffed with current sources.  The affected line has been tested with 4.1.1
RELEASE and has at least indicated correct behavior.  If somebody out there
with greater knowledge of TCP/IP networking sees a flaw, please bring this up.
This marks my first attempt at debugging BSD networking kernel code so any
suggestions would be appreciated.  Thank you.

Ryan Younce

--- tcp_timer.c.orig    Mon Oct  2 18:28:49 2000
+++ tcp_timer.c Mon Oct 16 22:31:55 2000
@@ -201,10 +201,10 @@
        /*
         * 2 MSL timeout in shutdown went off.  If we're closed but
         * still waiting for peer to close and connection has been idle
-        * too long, or if 2MSL time is up from TIME_WAIT, delete connection
-        * control block.  Otherwise, check again in a bit.
+        * too long, or if 2MSL time is up from TIME_WAIT or FIN_WAIT_2,
+        * delete connection control block.  Otherwise, check again in a bit.
         */
-       if (tp->t_state != TCPS_TIME_WAIT &&
+       if (tp->t_state < TCPS_FIN_WAIT_2 &&
            (ticks - tp->t_rcvtime) <= tcp_maxidle)
                callout_reset(tp->tt_2msl, tcp_keepintvl,
                              tcp_timer_2msl, tp);

--
Ryan Younce   /   Registered Libertarian Voter | The notion of a 'record' is
ryan@manunkind.org    www.manunkind.org/~ryan/ | an obsolete  remnant of the
ICQ H:4433228 W:74390437   AIM:CheshireCoolCat | days of the 80-column card.
Blitherings:  http://cheshire.livejournal.com/ |        -- Dennis M. Ritchie
Comment 5 ryan 2000-10-17 04:19:04 UTC
From what I can tell, this is not a fault with FreeBSD but a fault with
the
remote server.  The remote server fails to send a final FIN to the
client,
and the state of the connection remains in FIN_WAIT_2 indefinitely.  The

same situation was reproduced with www.freebsd.org, and it succeeded in
proceeding to the TIME_WAIT state.

Thus, from what I can tell, faulty TCP implementations on remote
machines
will cause FreeBSD to hold connections in a FIN_WAIT_2 state for an
indefinite
period of time (I believe the OP said something on the order of several
days).

I have written a patch that pseudo-corrects this by not only testing
whether
or not the state is not TIME_WAIT (in sys/netinet/tcp_timer.c) but also
if it
is less than FIN_WAIT_2 (thus ruling out FIN_WAIT_2 or TIME_WAIT).  I
tested
this with FreeBSD 4.1.1-RELEASE and the FIN_WAIT_2's dropped out of
netstat's
output after 10 minutes (which the TCP code states it should do), even
with
the faulty remote server's implementation.

Please review this code.  It has not been tested for current but it has
been
diffed with current sources.  The affected line has been tested with
4.1.1
RELEASE and has at least indicated correct behavior.  If somebody out
therewith greater knowledge of TCP/IP networking sees a flaw, please
bring this up.
This marks my first attempt at debugging BSD networking kernel code so
any
suggestions would be appreciated.  Thank you.

Ryan Younce

*** tcp_timer.c.orig    Mon Oct  2 18:28:49 2000
--- tcp_timer.c Mon Oct 16 22:31:55 2000
***************
*** 201,210 ****
        /*
         * 2 MSL timeout in shutdown went off.  If we're closed but
         * still waiting for peer to close and connection has been idle
!        * too long, or if 2MSL time is up from TIME_WAIT, delete
connection
!        * control block.  Otherwise, check again in a bit.
         */
!       if (tp->t_state != TCPS_TIME_WAIT &&
            (ticks - tp->t_rcvtime) <= tcp_maxidle)
                callout_reset(tp->tt_2msl, tcp_keepintvl,
                              tcp_timer_2msl, tp);
--- 201,210 ----
        /*
         * 2 MSL timeout in shutdown went off.  If we're closed but
         * still waiting for peer to close and connection has been idle
!        * too long, or if 2MSL time is up from TIME_WAIT or FIN_WAIT_2,

!        * delete connection control block.  Otherwise, check again in a
bit.
         */
!       if (tp->t_state < TCPS_FIN_WAIT_2 &&
            (ticks - tp->t_rcvtime) <= tcp_maxidle)
                callout_reset(tp->tt_2msl, tcp_keepintvl,
                              tcp_timer_2msl, tp);

--
Ryan Younce         ryan@manunkind.org         http://www.manunkind.org/~ryan
"B can be thought of as C without types; more accurately, it is BCPL squeezed
 into 8K bytes of memory and filtered through Thompson's brain."
                  --Dennis Ritchie, "The Development of the C Language", 1993
Comment 6 ryan 2000-10-17 04:24:22 UTC
I don't know for certain if that patch will work.  Here's a context one
generated via diff -r -c.  Sorry about that.

Ryan

*** tcp_timer.c.orig    Mon Oct  2 18:28:49 2000
--- tcp_timer.c Mon Oct 16 22:31:55 2000
***************
*** 201,210 ****
        /*
         * 2 MSL timeout in shutdown went off.  If we're closed but
         * still waiting for peer to close and connection has been idle
!        * too long, or if 2MSL time is up from TIME_WAIT, delete
connection
!        * control block.  Otherwise, check again in a bit.
         */
!       if (tp->t_state != TCPS_TIME_WAIT &&
            (ticks - tp->t_rcvtime) <= tcp_maxidle)
                callout_reset(tp->tt_2msl, tcp_keepintvl,
                              tcp_timer_2msl, tp);
--- 201,210 ----
        /*
         * 2 MSL timeout in shutdown went off.  If we're closed but
         * still waiting for peer to close and connection has been idle
!        * too long, or if 2MSL time is up from TIME_WAIT or FIN_WAIT_2,

!        * delete connection control block.  Otherwise, check again in a
bit.
         */
!       if (tp->t_state < TCPS_FIN_WAIT_2 &&
            (ticks - tp->t_rcvtime) <= tcp_maxidle)
                callout_reset(tp->tt_2msl, tcp_keepintvl,
                              tcp_timer_2msl, tp);

--
Ryan Younce         ryan@manunkind.org         http://www.manunkind.org/~ryan
"B can be thought of as C without types; more accurately, it is BCPL squeezed
 into 8K bytes of memory and filtered through Thompson's brain."
                  --Dennis Ritchie, "The Development of the C Language", 1993
Comment 7 iedowse freebsd_committer freebsd_triage 2001-11-17 18:05:46 UTC
State Changed
From-To: open->feedback


Does this problem still exist in more recent releases?
Comment 8 iedowse freebsd_committer freebsd_triage 2002-06-02 11:43:21 UTC
State Changed
From-To: feedback->closed


Feedback timeout.