When running 'netstat -sp tcp' inside of jail using VIMAGE kernel, when the system is under load there appears to be a large number of connections in LAST_ACK state. TCP connection count by state: 0 connections in CLOSED state 1 connection in LISTEN state 6 connections in SYN_SENT state 0 connections in SYN_RCVD state 15 connections in ESTABLISHED state 0 connections in CLOSE_WAIT state 0 connections in FIN_WAIT_1 state 0 connections in CLOSING state 18446744073709551604 connections in LAST_ACK state 0 connections in FIN_WAIT_2 state 31 connections in TIME_WAIT state The machine has not made this many connections since boot.
Looks like long under/over-flowing in certain conditions with vimage. TCPSTATES_DEC() might not be handled correctly OR cleaning up of connections may be bugged in some case. ccing Bjoern who's been making changes in this area.
I have now seen this on two machines with FIN_WAIT_1 also affected: TCP connection count by state: 0 connections in CLOSED state 1 connection in LISTEN state 0 connections in SYN_SENT state 0 connections in SYN_RCVD state 0 connections in ESTABLISHED state 0 connections in CLOSE_WAIT state 18446744073709551408 connections in FIN_WAIT_1 state 0 connections in CLOSING state 18446744073709551613 connections in LAST_ACK state 33 connections in FIN_WAIT_2 state 0 connections in TIME_WAIT state TCP connection count by state: 0 connections in CLOSED state 1 connection in LISTEN state 0 connections in SYN_SENT state 0 connections in SYN_RCVD state 0 connections in ESTABLISHED state 0 connections in CLOSE_WAIT state 18446744073709551395 connections in FIN_WAIT_1 state 0 connections in CLOSING state 18446744073709551606 connections in LAST_ACK state 33 connections in FIN_WAIT_2 state 0 connections in TIME_WAIT state
Just to double check: You are using 11 RELEASE?
(In reply to Michael Tuexen from comment #3) # uname -r 11.0-RELEASE-p3
Created attachment 177326 [details] Showing duplicate fin packets On further investigation I have noticed duplicate FIN packets being created, although due to the set up of the machine this may or may not be helpful as the packets are manipulated by: 1) ipfw to send them to 2) a divert socket which is used to add a GTP header to the packet. 3) back into the firewall for final dispatch to gateway. Although it did occur to me if these additional fin packets are being counted before the UDP header is added, then it would be possible to end up with a negative number of connections in some TCP states.
Created attachment 177327 [details] duplicate fin from another jail on same machine This packet capture shows another jail which does use ipfw but no divert or GTP encapsulation but also exhibiting the duplicate packets.
Can you provide a capture file showing also some packets before the ones shown in duplicate_FIN_no_tunnel.pcap? The FIN is retransmitted by 102.1.0.10 since there is no ACK for it. The peer expects the sequence number 3850418715, but the FIN-ACK has 385041978. So there is some mismatch here and I would like to understand what happened. Best regards Michael
Just for refernce, I see this as well. Using VIMAGE and netgraph interfaces in jails.
I'm investigating strange sporadical short outages to the network i jails, for a minute or less. They are related to kernel: sonewconn: pcb 0xfffff80bfa0263a0: Listen queue overflow: 767 already in queue awaiting acceptance (365 occurrences) Maybe they are not related at all, but I do see this kind of reports: TCP connection count by state: 0 connections in CLOSED state 11 connections in LISTEN state 0 connections in SYN_SENT state 3 connections in SYN_RCVD state 136 connections in ESTABLISHED state 4 connections in CLOSE_WAIT state 8 connections in FIN_WAIT_1 state 7 connections in CLOSING state 18446744073709551578 connections in LAST_ACK state 671 connections in FIN_WAIT_2 state 846 connections in TIME_WAIT state the large number is obviously < 0 so something is sending double packets. We use VIMAGE with netgraph interfaces (not epair). As I said, maybe it is not related to the sonewconn problem, I don't know enough about the internals, but we do see the same strange reports from netstat.
For me it looks as though this is only an issue in jails where the firewall config is using divert to modify the packets. I'll see if I can create a simple scenario where traffic is diverted but no changes are made to it, to see if this causes the behavior to occur. I won't have access to the systems until early next week so will try to get some better captures asap.
Hi, is this still a problem? Were you back then able to create a simple scenario as you had indicated? I've not seen any other reports of this apart from you two and I wonder if this is (was) indeed a VIMAGE bug.. or people fiddling with counters back then. Can you give me an update as to what happened?
jhb recently fixed one of these kinds of bugs (probably related to different counters) in https://svnweb.freebsd.org/changeset/base/340304 which would only happen with TOE on Chelsio NICs. I won't rule out that similar bugs existed elsewhere in the last two years and have been fixed. I'll close this for the moment. In case this is still an issue, please simply re-open it with more details and we can look into it.
jhb followed up saying https://svnweb.freebsd.org/base?view=revision&revision=308832 might have been relevant. I am just adding it here for future reference.