254159 – [tcp] Keepalive not working/tcp rst tolerance

Bug 254159 - [tcp] Keepalive not working/tcp rst tolerance

Summary: [tcp] Keepalive not working/tcp rst tolerance

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	12.2-STABLE
Hardware:	amd64 Any

Importance:	--- Affects Only Me
Assignee:	Michael Tuexen

URL:	https://reviews.freebsd.org/D28143
Keywords:

Depends on:
Blocks:

Reported:	2021-03-09 14:36 UTC by ant2
Modified:	2021-03-22 04:58 UTC (History)
CC List:	5 users (show)

See Also:

Flags:	koobs: mfc-stable12+

Attachments
pcap from server side (22.25 KB, application/vnd.tcpdump.pcap) 2021-03-10 13:55 UTC, ant2	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description ant2 2021-03-09 14:36:08 UTC

OS: 12.2-STABLE r368657 GENERIC  amd64

I've ecnountered reproduceable problem with keepalive connection probes which tolerate TCP RST response.

There is a connection between user application (client) and database (RDBMS) on FreeBSD server. After killing user application the connection is still alive (observed by netstat -x) on server.

The keepalive timer for the connection (initially set to net.inet.tcp.keepidle value) is counting down. When it comes to zero, server send a keepalive probe. Client response with TCP RST packet. Then connection on server is still alive but keepalive timer resetted to net.inet.tcp.keepintvl value. After expiration of this interval server do not send any packet but reset keepalive timer of connection back to the net.inet.tcp.keepidle value. Then the things loop forever.

P.S.
There is no firewall. 
It is proofed that server recieves TCP RST from client (visible with tcp dump).
RDBMS is  running in jail.

TCPdump from client:

193.017688000	192.168.0.3	192.168.0.2	TCP	66	3050→3583 [ACK] Seq=1 Ack=1 Win=1026 Len=0
193.017816000	192.168.0.2	192.168.0.3	TCP	54	3583→3050 [RST] Seq=1 Win=0 Len=0
1093.033029000	192.168.0.3	192.168.0.2	TCP	66	[TCP Dup ACK 5#1] 3050→3583 [ACK] Seq=1 Ack=1 
1093.033154000	192.168.0.2	192.168.0.3	TCP	54	3583→3050 [RST] Seq=1 Win=0 Len=0
1993.113115000	192.168.0.3	192.168.0.2	TCP	66	[TCP Dup ACK 5#2] 3050→3583 [ACK] Seq=1 Ack=1 
1993.113245000	192.168.0.2	192.168.0.3	TCP	54	3583→3050 [RST] Seq=1 Win=0 Len=0
...

Comment 1 Michael Tuexen freebsd_committer

2021-03-10 10:45:24 UTC

Can you provide two .pcap files, one taken at each host?

Comment 2 ant2 2021-03-10 13:55:01 UTC

Created attachment 223153 [details]
pcap from server side

I've made pcap only from server side because I thought that it is enough for full view.

But it show good behavior except for the probe after connection termination.

This file was dumped during 30 minutes.  Meanwhile the connection is stillalive on server:

right after killing client:

tcp4       0      0 192.168.0.3.3050       192.168.0.2.1316            0      0      0      0  65700  99116      1   2048      0      0 525600 792928    0.00    0.00  870.08    0.00    0.00   15.11

tcp4       0      0 192.168.0.3.3050       192.168.0.2.1316            0      0      0      0  65700  99116      1   2048      0      0 525600 792928    0.00    0.00  420.41    0.00    0.00  464.78

tcp4       0      0 192.168.0.3.3050       192.168.0.2.1316            0      0      0      0  65700  99116      1   2048      0      0 525600 792928    0.00    0.00   17.44    0.00    0.00   32.55

tcp4       0      0 192.168.0.3.3050       192.168.0.2.1316            0      0      0      0  65700  99116      1   2048      0      0 525600 792928    0.00    0.00  849.44    0.00    0.00   50.55

~ 20 minutes ~

tcp4       0      0 192.168.0.3.3050       192.168.0.2.1316            0      0      0      0  65700  99116      1   2048      0      0 525600 792928    0.00    0.00   19.77    0.00    0.00  880.22
! timer set to keepintvl
tcp4       0      0 192.168.0.3.3050       192.168.0.2.1316            0      0      0      0  65700  99116      1   2048      0      0 525600 792928    0.00    0.00   49.64    0.00    0.00    0.35
! timer set to keepidle
tcp4       0      0 192.168.0.3.3050       192.168.0.2.1316            0      0      0      0  65700  99116      1   2048      0      0 525600 792928    0.00    0.00  849.59    0.00    0.00   50.40
end of file dumping


net.inet.tcp.keepidle: 900000
net.inet.tcp.keepintvl: 50000
net.inet.tcp.keepcnt: 3

I'll make pcap from both sides soon.

Comment 3 Michael Tuexen freebsd_committer

2021-03-10 14:43:32 UTC

The .pcap file you provide shows a TCP connection between 192.168.0.2:1339 and 192.168.0.3:3050, which is closed successfully.

Then there is a ACK segment coming from 192.168.0.3:3050 towards 192.68.0.2:1316.
The node as 192.68.0.2:1316 seems to not have such an end point and therefore
sends back a RST segment. The RST seems to be correct.

Which side is the client? Which side is the server? What OS is the client using? The server is using FreeBSD. Correct?

Comment 4 Michael Tuexen freebsd_committer

2021-03-10 15:03:50 UTC

OK. Assuming the server side is FreeBSD and running stable/12 r368657, then this problem is fixed in https://svnweb.freebsd.org/base?view=revision&revision=369132.
If this assumption is correct, please update the FreeBSD machine, retest and report the outcome.

Comment 5 ant2 2021-03-10 15:36:50 UTC

OK. I'll update and report.

Comment 6 ant2 2021-03-14 15:56:51 UTC

After updating to 12.2-STABLE r369447 the error has gone away.

Comment 7 Michael Tuexen freebsd_committer

2021-03-14 17:02:33 UTC

Thanks for testing and reporting. Will close the bug report.

Comment 8 Kubilay Kocak freebsd_committer

2021-03-22 04:58:07 UTC

Fixed in https://cgit.freebsd.org/src/commit/?id=cc3c34859eab1b317d0f38731355b53f7d978c97

^Triage:

 * Assign to committer that resolved
 * Track merge to stable/12