Bug 71184

Summary: tcp-sessions hangs on FIN_WAIT_2 state
Product: Base System Reporter: Pavel Gulchouck <gul>
Component: kernAssignee: Andre Oppermann <andre>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 5.3-BETA2   
Hardware: Any   
OS: Any   

Description Pavel Gulchouck 2004-08-31 09:50:23 UTC
netstat shows tcp-sessions in the FIN_WAIT_2 state for several days, its
quantity grows with uptime and its does not disappear by timeout. Packets
sends on this sessions with no responds (shows by tcpdump). Here's an
example:
root@cheetah;~>netstat -na | grep FIN_WAIT
tcp4       0      0  193.109.240.9.65053    193.109.240.4.3128     FIN_WAIT_2
tcp4       0      0  193.109.240.9.53877    193.109.241.14.80      FIN_WAIT_2
tcp4       0      0  193.109.240.51.59997   218.36.147.216.113     FIN_WAIT_2
tcp4       0      0  193.109.240.51.53426   219.78.239.150.113     FIN_WAIT_2
tcp4       0      0  193.109.240.51.52094   221.217.51.76.113      FIN_WAIT_2
tcp4       0      0  193.109.240.51.64677   201.129.34.53.113      FIN_WAIT_2
tcp4       0      0  193.109.240.51.59761   68.160.224.228.113     FIN_WAIT_2
tcp4       0      0  193.109.240.51.61273   67.17.7.10.113         FIN_WAIT_2
tcp4       0      0  193.109.240.51.53997   201.130.132.40.113     FIN_WAIT_2
tcp4       0      0  193.109.240.51.51541   192.117.121.119.113    FIN_WAIT_2
tcp4       0      0  193.109.240.51.50182   218.239.123.122.113    FIN_WAIT_2
tcp4       0      0  193.109.240.51.50852   218.80.94.186.113      FIN_WAIT_2
tcp4       0      0  193.109.240.51.25      61.32.51.122.4037      FIN_WAIT_2
root@cheetah;~>tcpdump -i fxp0 host 61.32.51.122
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on fxp0, link-type EN10MB (Ethernet), capture size 96 bytes
10:28:55.361886 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 3922185006 win 0
10:28:55.362072 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 65535 
10:33:40.361292 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 0
10:33:40.361839 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 65535
10:38:25.403144 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 0
10:38:25.403322 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 65535
10:43:10.414956 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 0
10:43:10.415099 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 65535
10:47:55.436833 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 0
10:47:55.437094 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 65535
10:52:40.458684 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 0
10:52:40.458793 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 65535
10:57:25.460519 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 0
10:57:25.460806 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 65535
11:02:10.472346 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 0
11:02:10.472635 IP relay2.itpark.com.ua.smtp > 61.32.51.122.4037: . ack 1 win 65535

I can give addition information (sysctl -a net.inet.tcp, dmesg) if needed.
The same problem exists on 5.2.1-RELEASE-p9.
Comment 1 Andre Oppermann freebsd_committer freebsd_triage 2004-09-14 00:05:47 UTC
Responsible Changed
From-To: freebsd-bugs->andre

Take over.
Comment 2 Andre Oppermann freebsd_committer freebsd_triage 2004-09-16 21:21:05 UTC
Pavel,

could you please send me the output of:

 # sysctl net.inet.tcp
 # netstat -s -p ip
 # netstat -s -p tcp
 # netstat -an | grep FIN_WAIT_2 | wc -l

Do you have any kind or type of firewall active on the machine or
on your Internet connection where all these connectections pass
through?

I have tried to connect from a machine here to 193.109.240.51:25
to reproduce the problem but the fancy ACKs never made it to me.
My connection was coming from 62.48.0.xxx.  Maybe you have it in
your netstat -an output?

This is a really tough problem and I don't have any clear idea what
is going wrong.  The output of the said programs should help to
narrow down the problem some more.

-- 
Andre
Comment 3 Pavel Gulchouck 2004-09-20 22:28:41 UTC
Hello.

I've made some tests...
The situation occurs only if I'm using stateful firewall like this:

add allow all from any to any via lo0
add check-state
# allow outgoing connectons
add allow tcp  from me to any setup keep-state
add allow udp  from me to any keep-state
add allow icmp from me to any
add allow icmp from any to me
# open some services
add allow tcp from any to me 22 setup keep-state                # ssh
add allow tcp from any to me 25 setup keep-state                # smtp
add allow tcp from any to me 80 setup keep-state                # http
add allow tcp from any to me 110 setup keep-state               # pop3
add allow tcp from any to me 113 setup keep-state               # auth
# disable the rest
add unreach port tcp from any to me setup
add unreach port udp from any to me
add deny tcp from any to me


I think the reason is understandable: temporary rule created
by syn packet can expires before end of session or in the
some of FIN states.

But AFAIU the same situation happens if remote host goes down
during the session, and no icmp "no route to host" got. Is it
correct that in this case session hangs infinite in the FIN_WAIT_2
state, when application has already closed corresponding sockets?

On Thu, Sep 16, 2004 at 10:21:05PM +0200, Andre Oppermann writes:
AO> Pavel,

AO> could you please send me the output of:

AO>  # sysctl net.inet.tcp
AO>  # netstat -s -p ip
AO>  # netstat -s -p tcp
AO>  # netstat -an | grep FIN_WAIT_2 | wc -l

root@racoon:~>sysctl net.inet.tcp
net.inet.tcp.rfc1323: 1
net.inet.tcp.rfc1644: 0
net.inet.tcp.mssdflt: 512
net.inet.tcp.keepidle: 7200000
net.inet.tcp.keepintvl: 75000
net.inet.tcp.sendspace: 32768
net.inet.tcp.recvspace: 65536
net.inet.tcp.keepinit: 75000
net.inet.tcp.delacktime: 100
net.inet.tcp.hostcache.cachelimit: 15360
net.inet.tcp.hostcache.hashsize: 512
net.inet.tcp.hostcache.bucketlimit: 30
net.inet.tcp.hostcache.count: 338
net.inet.tcp.hostcache.expire: 3600
net.inet.tcp.hostcache.purge: 0
net.inet.tcp.log_in_vain: 0
net.inet.tcp.blackhole: 0
net.inet.tcp.delayed_ack: 1
net.inet.tcp.rfc3042: 0
net.inet.tcp.rfc3390: 0
net.inet.tcp.reass.maxsegments: 556
net.inet.tcp.reass.cursegments: 3
net.inet.tcp.reass.overflows: 0
net.inet.tcp.path_mtu_discovery: 1
net.inet.tcp.slowstart_flightsize: 1
net.inet.tcp.local_slowstart_flightsize: 4
net.inet.tcp.newreno: 1
net.inet.tcp.minmss: 216
net.inet.tcp.minmssoverload: 0
net.inet.tcp.tcbhashsize: 512
net.inet.tcp.do_tcpdrain: 1
net.inet.tcp.pcbcount: 140
net.inet.tcp.icmp_may_rst: 1
net.inet.tcp.isn_reseed_interval: 0
net.inet.tcp.inflight_enable: 0
net.inet.tcp.inflight_debug: 0
net.inet.tcp.inflight_min: 6144
net.inet.tcp.inflight_max: 1073725440
net.inet.tcp.inflight_stab: 20
net.inet.tcp.syncookies: 1
net.inet.tcp.syncache.bucketlimit: 30
net.inet.tcp.syncache.cachelimit: 15359
net.inet.tcp.syncache.count: 0
net.inet.tcp.syncache.hashsize: 512
net.inet.tcp.syncache.rexmtlimit: 3
net.inet.tcp.msl: 30000
net.inet.tcp.rexmit_min: 30
net.inet.tcp.rexmit_slop: 200
net.inet.tcp.always_keepalive: 1
root@racoon:~>netstat -s -p ip
ip:
        6292019 total packets received
        0 bad header checksums
        0 with size smaller than minimum
        1 with data size < data length
        0 with ip length > max ip packet size
        0 with header length < data size
        0 with data length < header length
        0 with bad options
        0 with incorrect version number
        17 fragments received
        0 fragments dropped (dup or out of space)
        17 fragments dropped after timeout
        0 packets reassembled ok
        4638133 packets for this host
        4879 packets for unknown/unsupported protocol
        1608300 packets forwarded (0 packets fast forwarded)
        5165 packets not forwardable
        1216 packets received for unknown multicast group
        0 redirects sent
        4660995 packets sent from this host
        8780 packets sent with fabricated ip header
        1894 output packets dropped due to no bufs, etc.
        4 output packets discarded due to no route
        0 output datagrams fragmented
        0 fragments created
        0 datagrams that can't be fragmented
        0 tunneling packets that can't find gif
        0 datagrams with bad address in header
root@racoon:~>netstat -s -p tcp
tcp:
        3993222 packets sent
                2213904 data packets (1732336285 bytes)
                46903 data packets (36441214 bytes) retransmitted
                1799 data packets unnecessarily retransmitted
                7 resends initiated by MTU discovery
                1580493 ack-only packets (336226 delayed)
                0 URG only packets
                1348 window probe packets
                46229 window update packets
                104346 control packets
        4287719 packets received
                1590542 acks (for 1728080325 bytes)
                70188 duplicate acks
                2 acks for unsent data
                2912871 packets (798425464 bytes) received in-sequence
                16175 completely duplicate packets (3799804 bytes)
                569 old duplicate packets
                292 packets with some dup. data (121472 bytes duped)
                37672 out-of-order packets (39737925 bytes)
                1058 packets (1496558 bytes) of data after window
                0 window probes
                37742 window update packets
                9089 packets received after close
                588 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
        35667 connection requests
        50023 connection accepts
        3057 bad connection attempts
        0 listen queue overflows
        75467 connections established (including accepts)
        86260 connections closed (including 6644 drops)
                44786 connections updated cached RTT on close
                45816 connections updated cached RTT variance on close
                27606 connections updated cached ssthresh on close
        7135 embryonic connections dropped
        1409564 segments updated rtt (of 1208743 attempts)
        52420 retransmit timeouts
                2156 connections dropped by rexmit timeout
        2266 persist timeouts
                4 connections dropped by persist timeout
        680 keepalive timeouts
                0 keepalive probes sent
                679 connections dropped by keepalive
        248821 correct ACK header predictions
        2391596 correct data packet header predictions
        50481 syncache entries added
                1790 retransmitted
                1453 dupsyn
                3 dropped
                50023 completed
                0 bucket overflow
                0 cache overflow
                310 reset
                130 stale
                0 aborted
                0 badack
                18 unreach
                0 zone failures
        0 cookies sent
        0 cookies received
root@racoon:~>netstat -an | grep FIN_WAIT_2 | wc -l
      72
root@racoon:~>uptime
11:05PM  up 16:10, 9 users, load averages: 1.50, 1.74, 1.71

AO> Do you have any kind or type of firewall active on the machine or
AO> on your Internet connection where all these connectections pass
AO> through?

See above.

AO> I have tried to connect from a machine here to 193.109.240.51:25
AO> to reproduce the problem but the fancy ACKs never made it to me.
AO> My connection was coming from 62.48.0.xxx.  Maybe you have it in
AO> your netstat -an output?

No.
But the server receives many connections to 25 and 80 port, it is
mail relay and web-hosting, and only several sessions per hour hangs.

AO> This is a really tough problem and I don't have any clear idea what
AO> is going wrong.  The output of the said programs should help to
AO> narrow down the problem some more.

After changing
add allow tcp  from me to any setup keep-state
to
add allow tcp  from me to any keep-state
I see no sessions in the FIN_WAIT_2 state, but I'm still not sure
that no timeout for this state is correct behavior.

-- 
                                Lucky carrier,
                                                  Pavel.
Comment 4 Andre Oppermann freebsd_committer freebsd_triage 2004-09-20 22:53:19 UTC
Pavel Gulchouck wrote:
> AO> This is a really tough problem and I don't have any clear idea what
> AO> is going wrong.  The output of the said programs should help to
> AO> narrow down the problem some more.
> 
> After changing
> add allow tcp  from me to any setup keep-state
> to
> add allow tcp  from me to any keep-state
> I see no sessions in the FIN_WAIT_2 state, but I'm still not sure
> that no timeout for this state is correct behavior.

Ok, this is something to work with.  I'll try to reproduce your setup
here.

-- 
Andre
Comment 5 K. Macy freebsd_committer freebsd_triage 2007-11-16 09:52:00 UTC
State Changed
From-To: open->feedback


Does the problem still exist on RELENG_7?
Comment 6 Mark Linimon freebsd_committer freebsd_triage 2008-03-02 01:07:27 UTC
State Changed
From-To: feedback->closed

Feedback timeout (> 3 months).