Bug 214630 - netstat displays spurious count for connections in LAST_ACK state inside jail with VIMAGE kernel
Summary: netstat displays spurious count for connections in LAST_ACK state inside jail...
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-net mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-18 17:21 UTC by tony181116
Modified: 2018-11-15 20:18 UTC (History)
5 users (show)

See Also:


Attachments
Showing duplicate fin packets (1.69 KB, application/octet-stream)
2016-11-23 10:59 UTC, tony181116
no flags Details
duplicate fin from another jail on same machine (454 bytes, application/octet-stream)
2016-11-23 11:34 UTC, tony181116
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description tony181116 2016-11-18 17:21:39 UTC
When running 'netstat -sp tcp' inside of jail using VIMAGE kernel, when the system is under load there appears to be a large number of connections in LAST_ACK state.

TCP connection count by state:
        0 connections in CLOSED state
        1 connection  in LISTEN state
        6 connections in SYN_SENT state
        0 connections in SYN_RCVD state
        15 connections in ESTABLISHED state
        0 connections in CLOSE_WAIT state
        0 connections in FIN_WAIT_1 state
        0 connections in CLOSING state
        18446744073709551604 connections in LAST_ACK state
        0 connections in FIN_WAIT_2 state
        31 connections in TIME_WAIT state

The machine has not made this many connections since boot.
Comment 1 Hiren Panchasara freebsd_committer 2016-11-18 18:47:03 UTC
Looks like long under/over-flowing in certain conditions with vimage.
TCPSTATES_DEC() might not be handled correctly OR cleaning up of connections may be bugged in some case.

ccing Bjoern who's been making changes in this area.
Comment 2 tony181116 2016-11-22 11:21:17 UTC
I have now seen this on two machines with FIN_WAIT_1 also affected:

TCP connection count by state:
        0 connections in CLOSED state
        1 connection  in LISTEN state
        0 connections in SYN_SENT state
        0 connections in SYN_RCVD state
        0 connections in ESTABLISHED state
        0 connections in CLOSE_WAIT state
        18446744073709551408 connections in FIN_WAIT_1 state
        0 connections in CLOSING state
        18446744073709551613 connections in LAST_ACK state
        33 connections in FIN_WAIT_2 state
        0 connections in TIME_WAIT state

TCP connection count by state:
        0 connections in CLOSED state
        1 connection  in LISTEN state
        0 connections in SYN_SENT state
        0 connections in SYN_RCVD state
        0 connections in ESTABLISHED state
        0 connections in CLOSE_WAIT state
        18446744073709551395 connections in FIN_WAIT_1 state
        0 connections in CLOSING state
        18446744073709551606 connections in LAST_ACK state
        33 connections in FIN_WAIT_2 state
        0 connections in TIME_WAIT state
Comment 3 Michael Tuexen freebsd_committer 2016-11-22 12:34:36 UTC
Just to double check: You are using 11 RELEASE?
Comment 4 tony181116 2016-11-23 10:46:42 UTC
(In reply to Michael Tuexen from comment #3)

# uname -r
11.0-RELEASE-p3
Comment 5 tony181116 2016-11-23 10:59:32 UTC
Created attachment 177326 [details]
Showing duplicate fin packets


On further investigation I have noticed duplicate FIN packets being created, although due to the set up of the machine this may or may not be helpful as the packets are manipulated by:

1) ipfw to send them to
2) a divert socket which is used to add a GTP header to the packet.
3) back into the firewall for final dispatch to gateway.

Although it did occur to me if these additional fin packets are being counted before the UDP header is added, then it would be possible to end up with a negative number of connections in some TCP states.
Comment 6 tony181116 2016-11-23 11:34:37 UTC
Created attachment 177327 [details]
duplicate fin from another jail on same machine

This packet capture shows another jail which does use ipfw but no divert or GTP encapsulation but also exhibiting the duplicate packets.
Comment 7 Michael Tuexen freebsd_committer 2016-11-23 15:44:41 UTC
Can you provide a capture file showing also some packets before the ones shown in duplicate_FIN_no_tunnel.pcap?
The FIN is retransmitted by 102.1.0.10 since there is no ACK for it. The peer expects the sequence number 3850418715, but the FIN-ACK has 385041978. So there is some mismatch here and I would like to understand what happened.

Best regards
Michael
Comment 8 Palle Girgensohn freebsd_committer 2016-11-23 16:54:56 UTC
Just for refernce, I see this as well. Using VIMAGE and netgraph interfaces in jails.
Comment 9 Palle Girgensohn freebsd_committer 2016-11-23 17:03:12 UTC
I'm investigating strange sporadical short outages to the network i jails, for a minute or less. They are related to 

kernel: sonewconn: pcb 0xfffff80bfa0263a0: Listen queue overflow: 767 already in queue awaiting acceptance (365 occurrences)

Maybe they are not related at all, but I do see this kind of reports:

TCP connection count by state:
	0 connections in CLOSED state
	11 connections in LISTEN state
	0 connections in SYN_SENT state
	3 connections in SYN_RCVD state
	136 connections in ESTABLISHED state
	4 connections in CLOSE_WAIT state
	8 connections in FIN_WAIT_1 state
	7 connections in CLOSING state
	18446744073709551578 connections in LAST_ACK state
	671 connections in FIN_WAIT_2 state
	846 connections in TIME_WAIT state

the large number is obviously < 0 so something is sending double packets.

We use VIMAGE with netgraph interfaces (not epair).


As I said, maybe it is not related to the sonewconn problem, I don't know enough about the internals, but we do see the same strange reports from netstat.
Comment 10 tony181116 2016-11-24 16:19:17 UTC
For me it looks as though this is only an issue in jails where the firewall config is using divert to modify the packets.

I'll see if I can create a simple scenario where traffic is diverted but no changes are made to it, to see if this causes the behavior to occur.

I won't have access to the systems until early next week so will try to get some better captures asap.
Comment 11 Bjoern A. Zeeb freebsd_committer 2018-11-02 14:51:49 UTC
Hi,

is this still a problem?  Were you back then able to create a simple scenario as you had indicated?

I've not seen any other reports of this apart from you two and I wonder if this is (was) indeed a VIMAGE bug.. or people fiddling with counters back then.

Can you give me an update as to what happened?
Comment 12 Bjoern A. Zeeb freebsd_committer 2018-11-15 17:48:34 UTC
jhb recently fixed one of these kinds of bugs (probably related to different counters) in https://svnweb.freebsd.org/changeset/base/340304 which would only happen with TOE on Chelsio NICs.

I won't rule out that similar bugs existed elsewhere in the last two years and have been fixed.

I'll close this for the moment.  In case this is still an issue, please simply re-open it with more details and we can look into it.
Comment 13 Bjoern A. Zeeb freebsd_committer 2018-11-15 20:18:47 UTC
jhb followed up saying https://svnweb.freebsd.org/base?view=revision&revision=308832 might have been relevant.  I am just adding it here for future reference.