| Summary: | Hang on FIN_WAIT_2 | ||
|---|---|---|---|
| Product: | Base System | Reporter: | jd <jd> |
| Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | Unspecified | ||
| Hardware: | Any | ||
| OS: | Any | ||
|
Description
jd
2000-10-06 22:20:00 UTC
The FIN_WAIT_2s DO eventually timeout/get cleaned up. On a 3.4-S server it took about 4 days for the FIN_WAITs to disappear, it was quicker on 3.5-S (about 18-24 hours?). I still find it strange, is there anyone that could give a detailed explanation? -jd From what I can tell, this is not a fault with FreeBSD but a fault with the
remote server. The remote server fails to send a final FIN to the client,
and the state of the connection remains in FIN_WAIT_2 indefinitely. The
same situation was reproduced with www.freebsd.org, and it succeeded in
proceeding to the TIME_WAIT state.
Thus, from what I can tell, faulty TCP implementations on remote machines
will cause FreeBSD to hold connections in a FIN_WAIT_2 state for an indefinite
period of time (I believe the OP said something on the order of several days).
I have written a patch that pseudo-corrects this by not only testing whether
or not the state is not TIME_WAIT (in sys/netinet/tcp_timer.c) but also if it
is less than FIN_WAIT_2 (thus ruling out FIN_WAIT_2 or TIME_WAIT). I tested
this with FreeBSD 4.1.1-RELEASE and the FIN_WAIT_2's dropped out of netstat's
output after 10 minutes (which the TCP code states it should do), even with
the faulty remote server's implementation.
Please review this code. It has not been tested for current but it has been
diffed with current sources. The affected line has been tested with 4.1.1
RELEASE and has at least indicated correct behavior. If somebody out there
with greater knowledge of TCP/IP networking sees a flaw, please bring this up.
This marks my first attempt at debugging BSD networking kernel code so any
suggestions would be appreciated. Thank you.
Ryan Younce
--- tcp_timer.c.orig Mon Oct 2 18:28:49 2000
+++ tcp_timer.c Mon Oct 16 22:31:55 2000
@@ -201,10 +201,10 @@
/*
* 2 MSL timeout in shutdown went off. If we're closed but
* still waiting for peer to close and connection has been idle
- * too long, or if 2MSL time is up from TIME_WAIT, delete connection
- * control block. Otherwise, check again in a bit.
+ * too long, or if 2MSL time is up from TIME_WAIT or FIN_WAIT_2,
+ * delete connection control block. Otherwise, check again in a bit.
*/
- if (tp->t_state != TCPS_TIME_WAIT &&
+ if (tp->t_state < TCPS_FIN_WAIT_2 &&
(ticks - tp->t_rcvtime) <= tcp_maxidle)
callout_reset(tp->tt_2msl, tcp_keepintvl,
tcp_timer_2msl, tp);
--
Ryan "Cheshire" Younce | ryan@manunkind.org | http://www.manunkind.org/~ryan
From what I can tell, this is not a fault with FreeBSD but a fault with the
remote server. The remote server fails to send a final FIN to the client,
and the state of the connection remains in FIN_WAIT_2 indefinitely. The
same situation was reproduced with www.freebsd.org, and it succeeded in
proceeding to the TIME_WAIT state.
Thus, from what I can tell, faulty TCP implementations on remote machines
will cause FreeBSD to hold connections in a FIN_WAIT_2 state for an indefinite
period of time (I believe the OP said something on the order of several days).
I have written a patch that pseudo-corrects this by not only testing whether
or not the state is not TIME_WAIT (in sys/netinet/tcp_timer.c) but also if it
is less than FIN_WAIT_2 (thus ruling out FIN_WAIT_2 or TIME_WAIT). I tested
this with FreeBSD 4.1.1-RELEASE and the FIN_WAIT_2's dropped out of netstat's
output after 10 minutes (which the TCP code states it should do), even with
the faulty remote server's implementation.
Please review this code. It has not been tested for current but it has been
diffed with current sources. The affected line has been tested with 4.1.1
RELEASE and has at least indicated correct behavior. If somebody out there
with greater knowledge of TCP/IP networking sees a flaw, please bring this up.
This marks my first attempt at debugging BSD networking kernel code so any
suggestions would be appreciated. Thank you.
Ryan Younce
--- tcp_timer.c.orig Mon Oct 2 18:28:49 2000
+++ tcp_timer.c Mon Oct 16 22:31:55 2000
@@ -201,10 +201,10 @@
/*
* 2 MSL timeout in shutdown went off. If we're closed but
* still waiting for peer to close and connection has been idle
- * too long, or if 2MSL time is up from TIME_WAIT, delete connection
- * control block. Otherwise, check again in a bit.
+ * too long, or if 2MSL time is up from TIME_WAIT or FIN_WAIT_2,
+ * delete connection control block. Otherwise, check again in a bit.
*/
- if (tp->t_state != TCPS_TIME_WAIT &&
+ if (tp->t_state < TCPS_FIN_WAIT_2 &&
(ticks - tp->t_rcvtime) <= tcp_maxidle)
callout_reset(tp->tt_2msl, tcp_keepintvl,
tcp_timer_2msl, tp);
--
Ryan Younce / Registered Libertarian Voter | The notion of a 'record' is
ryan@manunkind.org www.manunkind.org/~ryan/ | an obsolete remnant of the
ICQ H:4433228 W:74390437 AIM:CheshireCoolCat | days of the 80-column card.
Blitherings: http://cheshire.livejournal.com/ | -- Dennis M. Ritchie
From what I can tell, this is not a fault with FreeBSD but a fault with the
remote server. The remote server fails to send a final FIN to the client,
and the state of the connection remains in FIN_WAIT_2 indefinitely. The
same situation was reproduced with www.freebsd.org, and it succeeded in
proceeding to the TIME_WAIT state.
Thus, from what I can tell, faulty TCP implementations on remote machines
will cause FreeBSD to hold connections in a FIN_WAIT_2 state for an indefinite
period of time (I believe the OP said something on the order of several days).
I have written a patch that pseudo-corrects this by not only testing whether
or not the state is not TIME_WAIT (in sys/netinet/tcp_timer.c) but also if it
is less than FIN_WAIT_2 (thus ruling out FIN_WAIT_2 or TIME_WAIT). I tested
this with FreeBSD 4.1.1-RELEASE and the FIN_WAIT_2's dropped out of netstat's
output after 10 minutes (which the TCP code states it should do), even with
the faulty remote server's implementation.
Please review this code. It has not been tested for current but it has been
diffed with current sources. The affected line has been tested with 4.1.1
RELEASE and has at least indicated correct behavior. If somebody out there
with greater knowledge of TCP/IP networking sees a flaw, please bring this up.
This marks my first attempt at debugging BSD networking kernel code so any
suggestions would be appreciated. Thank you.
Ryan Younce
--- tcp_timer.c.orig Mon Oct 2 18:28:49 2000
+++ tcp_timer.c Mon Oct 16 22:31:55 2000
@@ -201,10 +201,10 @@
/*
* 2 MSL timeout in shutdown went off. If we're closed but
* still waiting for peer to close and connection has been idle
- * too long, or if 2MSL time is up from TIME_WAIT, delete connection
- * control block. Otherwise, check again in a bit.
+ * too long, or if 2MSL time is up from TIME_WAIT or FIN_WAIT_2,
+ * delete connection control block. Otherwise, check again in a bit.
*/
- if (tp->t_state != TCPS_TIME_WAIT &&
+ if (tp->t_state < TCPS_FIN_WAIT_2 &&
(ticks - tp->t_rcvtime) <= tcp_maxidle)
callout_reset(tp->tt_2msl, tcp_keepintvl,
tcp_timer_2msl, tp);
--
Ryan Younce / Registered Libertarian Voter | The notion of a 'record' is
ryan@manunkind.org www.manunkind.org/~ryan/ | an obsolete remnant of the
ICQ H:4433228 W:74390437 AIM:CheshireCoolCat | days of the 80-column card.
Blitherings: http://cheshire.livejournal.com/ | -- Dennis M. Ritchie
From what I can tell, this is not a fault with FreeBSD but a fault with
the
remote server. The remote server fails to send a final FIN to the
client,
and the state of the connection remains in FIN_WAIT_2 indefinitely. The
same situation was reproduced with www.freebsd.org, and it succeeded in
proceeding to the TIME_WAIT state.
Thus, from what I can tell, faulty TCP implementations on remote
machines
will cause FreeBSD to hold connections in a FIN_WAIT_2 state for an
indefinite
period of time (I believe the OP said something on the order of several
days).
I have written a patch that pseudo-corrects this by not only testing
whether
or not the state is not TIME_WAIT (in sys/netinet/tcp_timer.c) but also
if it
is less than FIN_WAIT_2 (thus ruling out FIN_WAIT_2 or TIME_WAIT). I
tested
this with FreeBSD 4.1.1-RELEASE and the FIN_WAIT_2's dropped out of
netstat's
output after 10 minutes (which the TCP code states it should do), even
with
the faulty remote server's implementation.
Please review this code. It has not been tested for current but it has
been
diffed with current sources. The affected line has been tested with
4.1.1
RELEASE and has at least indicated correct behavior. If somebody out
therewith greater knowledge of TCP/IP networking sees a flaw, please
bring this up.
This marks my first attempt at debugging BSD networking kernel code so
any
suggestions would be appreciated. Thank you.
Ryan Younce
*** tcp_timer.c.orig Mon Oct 2 18:28:49 2000
--- tcp_timer.c Mon Oct 16 22:31:55 2000
***************
*** 201,210 ****
/*
* 2 MSL timeout in shutdown went off. If we're closed but
* still waiting for peer to close and connection has been idle
! * too long, or if 2MSL time is up from TIME_WAIT, delete
connection
! * control block. Otherwise, check again in a bit.
*/
! if (tp->t_state != TCPS_TIME_WAIT &&
(ticks - tp->t_rcvtime) <= tcp_maxidle)
callout_reset(tp->tt_2msl, tcp_keepintvl,
tcp_timer_2msl, tp);
--- 201,210 ----
/*
* 2 MSL timeout in shutdown went off. If we're closed but
* still waiting for peer to close and connection has been idle
! * too long, or if 2MSL time is up from TIME_WAIT or FIN_WAIT_2,
! * delete connection control block. Otherwise, check again in a
bit.
*/
! if (tp->t_state < TCPS_FIN_WAIT_2 &&
(ticks - tp->t_rcvtime) <= tcp_maxidle)
callout_reset(tp->tt_2msl, tcp_keepintvl,
tcp_timer_2msl, tp);
--
Ryan Younce ryan@manunkind.org http://www.manunkind.org/~ryan
"B can be thought of as C without types; more accurately, it is BCPL squeezed
into 8K bytes of memory and filtered through Thompson's brain."
--Dennis Ritchie, "The Development of the C Language", 1993
I don't know for certain if that patch will work. Here's a context one
generated via diff -r -c. Sorry about that.
Ryan
*** tcp_timer.c.orig Mon Oct 2 18:28:49 2000
--- tcp_timer.c Mon Oct 16 22:31:55 2000
***************
*** 201,210 ****
/*
* 2 MSL timeout in shutdown went off. If we're closed but
* still waiting for peer to close and connection has been idle
! * too long, or if 2MSL time is up from TIME_WAIT, delete
connection
! * control block. Otherwise, check again in a bit.
*/
! if (tp->t_state != TCPS_TIME_WAIT &&
(ticks - tp->t_rcvtime) <= tcp_maxidle)
callout_reset(tp->tt_2msl, tcp_keepintvl,
tcp_timer_2msl, tp);
--- 201,210 ----
/*
* 2 MSL timeout in shutdown went off. If we're closed but
* still waiting for peer to close and connection has been idle
! * too long, or if 2MSL time is up from TIME_WAIT or FIN_WAIT_2,
! * delete connection control block. Otherwise, check again in a
bit.
*/
! if (tp->t_state < TCPS_FIN_WAIT_2 &&
(ticks - tp->t_rcvtime) <= tcp_maxidle)
callout_reset(tp->tt_2msl, tcp_keepintvl,
tcp_timer_2msl, tp);
--
Ryan Younce ryan@manunkind.org http://www.manunkind.org/~ryan
"B can be thought of as C without types; more accurately, it is BCPL squeezed
into 8K bytes of memory and filtered through Thompson's brain."
--Dennis Ritchie, "The Development of the C Language", 1993
State Changed From-To: open->feedback Does this problem still exist in more recent releases? State Changed From-To: feedback->closed Feedback timeout. |