Bug 209471 - Listen queue overflow due to too many sockets stuck in CLOSED state
Summary: Listen queue overflow due to too many sockets stuck in CLOSED state
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-12 19:46 UTC by Robert Blayzor
Modified: 2019-08-18 19:42 UTC (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Robert Blayzor 2016-05-12 19:46:18 UTC
10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297856M

We are randomly getting daemon applications on a mail server (Dovecot and Exim) where we start getting listen queue overflows due to hundreds or thousands of TCP connections stuck in a CLOSED state.

Kernel messages:
sonewconn: pcb 0xfffff800155a3498: Listen queue overflow: 301 already in queue awaiting acceptance (50 occurrences)
sonewconn: pcb 0xfffff800155a3498: Listen queue overflow: 301 already in queue awaiting acceptance (50 occurrences)
sonewconn: pcb 0xfffff800155a3498: Listen queue overflow: 301 already in queue awaiting acceptance (50 occurrences)
sonewconn: pcb 0xfffff800155a3498: Listen queue overflow: 301 already in queue awaiting acceptance (48 occurrences)
sonewconn: pcb 0xfffff800155a3498: Listen queue overflow: 301 already in queue awaiting acceptance (50 occurrences)
sonewconn: pcb 0xfffff800155a3498: Listen queue overflow: 301 already in queue awaiting acceptance (50 occurrences)
...


tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.19266 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.12342 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.29123 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.23215 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.56331 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.52066 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.33798 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.34610 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.15283 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.51922 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.7406  CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.41955 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.56028 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.6446  CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.2474  CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.51723 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.51069 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.18158 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.38435 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.46607 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.33359 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.62935 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.11673 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.51459 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.36490 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.27831 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.44081 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.28384 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.43745 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.64070 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.35722 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.63738 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.14573 CLOSED
tcp6      32      0 2607:f058:110:2:.4190  2607:f058:110:2:.12311 CLOSED
...
(hundreds and hundreds of these lines removed)


Looking at sockstat these connections do not seem to be related to the process anymore...

?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:25 2607:f058:110:2::f:0:49398
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:110 2607:f058:110:2::f:0:28079
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:25 2607:f058:110:2::f:1:52383
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:110 2607:f058:110:2::f:1:35856
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:143 2607:f058:110:2::f:0:27734
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:143 2607:f058:110:2::f:1:36851
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:25 2607:f058:110:2::f:0:40977
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:110 2607:f058:110:2::f:0:51172
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:25 2607:f058:110:2::f:1:16197
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:110 2607:f058:110:2::f:1:1999
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:143 2607:f058:110:2::f:0:60423
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:143 2607:f058:110:2::f:1:16527
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:25 2607:f058:110:2::f:0:34327
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:110 2607:f058:110:2::f:0:5437
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:25 2607:f058:110:2::f:1:30114
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:110 2607:f058:110:2::f:1:57136
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:143 2607:f058:110:2::f:0:58399
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:143 2607:f058:110:2::f:1:37073
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:0:11673
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:0:33798
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:0:65207
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:0:13326
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:27879
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:0:2899
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:39172
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:0:19330
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:18694
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:0:1251
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:43392
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:44343
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:36523
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:41551
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:24288
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:3830
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:43978
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:8897
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:0:65187
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:0:14214
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:1:55279
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:0:31178
?        ?          ?     ?  tcp6   2607:f058:110:2::1:2:4190 2607:f058:110:2::f:0:49242
...
(hundreds or thousands of lines removed)


The only way to fix the issue is to reboot the server. (in this case a VMware ESXi 5.5 VM)

Network driver is "vmx" if that makes any difference.
Comment 1 Robert Blayzor 2016-05-16 01:25:38 UTC
Also experiencing this on other VM's (same FreeBSD release), but with Apache 2.4...

The majority of sockets appear to be from our F5 load balancers that were just doing simple HTTP health checks.



tcp6      36      0 webmail1.http          web-slb-1.alb1.i.13777 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.45830 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.12729 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.5479  CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.54684 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.24819 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.54619 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.6339  CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.54550 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.52960 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.17200 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.24141 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.33182 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.5888  CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.39082 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.57296 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.65150 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.4126  CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.12629 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.57158 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.34179 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.25479 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.26378 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.57018 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.34255 CLOSED
tcp6     113      0 localhost.http         localhost.10722        CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.30656 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.53942 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.30094 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.46475 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.20403 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.45081 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.56752 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.15115 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.41385 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.53667 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.25213 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.50966 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.15194 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.53532 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.13519 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.40583 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.2167  CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.53402 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.3610  CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.44162 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.43930 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.53267 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.12863 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.61568 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.56140 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.53133 CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.59403 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.4895  CLOSED
tcp6      36      0 webmail1.http          web-slb-0.alb1.i.57703 CLOSED
tcp6      36      0 webmail1.http          web-slb-1.alb1.i.27469 CLOSED
Comment 2 Robert Blayzor 2016-05-16 18:51:53 UTC
We have the following sysctl knobs turned at boot time...

net.inet.tcp.msl=7500
net.inet.tcp.finwait2_timeout=15000
net.inet.tcp.fast_finwait2_recycle=1
net.inet.icmp.log_redirect=0
net.inet.icmp.drop_redirect=1
net.inet.tcp.delayed_ack=0
net.inet.ip.redirect=0
net.inet6.ip6.redirect=0
net.link.ether.inet.log_arp_wrong_iface=0
net.inet.tcp.keepidle=60000
net.inet.tcp.keepintvl=10000
Comment 3 Hiren Panchasara freebsd_committer 2016-05-16 18:59:14 UTC
https://lists.freebsd.org/pipermail/freebsd-net/2008-June/018544.html is something that could be happening?

does this problem occur with certain applications? Is this a regression caused by an OS update or an app update?
Comment 4 Robert Blayzor 2016-05-16 19:02:26 UTC
Seeing this on multiple applications and on multiple VM's.

We've seen this happen on our mail servers running Dovecot & Exim (both socket types stuck in CLOSED; tcp6 ports r25, 110, 143, 4190, etc), we've also seen this with Apache 2.4 (port 80).
Comment 5 Hiren Panchasara freebsd_committer 2016-05-16 19:17:13 UTC
Any known good OS version that you know of? Can you try up/down-grading troubling app on a box with the problem and see if that helps?
(I am just throwing ideas out to see how you can isolate this problem.)
Comment 6 Robert Blayzor 2016-05-16 19:25:24 UTC
I do not have an environment that would allow me really to test it. This problem does seem a lot more apparently after upgrading to 10.3 however. It's either that, or the work around for BugID 204426 unmasked this one from happening.

Bug 204426 we would see our processes normally die, so they never really ran for longer period of time before we had to restart them.

We added the patch for PR 204426, and processes seem stable now, but now we have this bug. I believe we used to see this in in 10.2 as well, but not nearly as often now that we seem to have 204426 fixed.

The application doesn't seem to matter. Our environment has not changed. The only real special setup we have is also described in 204426. ESXi hypervisor, VMX NIC driver, and NFS mounted root FS. Other than the sysctl knobs previously mentioned, nothing else special other than a non-GENERIC kernel; but all we did was remove modules and drivers we do not use so kernel build time is faster.

One thing I did not check is to see if this is a V6 socket only issue or if V4 is also affected. I can still SSH into the server when this happens, only the process with the full queue and lots of sockets stuck in CLOSED  seems to hung. 9 times out of 10 only a server reboot clears the issue. Rarely we can manage to kill -9 the process and restart  it.
Comment 7 Robert Blayzor 2016-05-16 20:11:21 UTC
Definiately more common on 10.3-RELEASE...

Just had two Apache servers fall victim...

See:

http://pastebin.com/tfHArSYv
Comment 8 Steven Hartland freebsd_committer 2016-05-16 20:22:40 UTC
Your connections seem to IPv6, is this always the case or have you noticed it with IPv4?
Comment 9 Robert Blayzor 2016-05-16 20:26:31 UTC
When I notice this problem happen, almost all of the connections stuck in "CLOSED" seem to be 95-99% from the F5 load balancer doing a TCP health check. All of the traffic between the servers and the F5 use IPv6 addresses.

I did notice this time when two Apache 2.4 servers croaked with this problem that IPv4 connections were stalled out as well...


I just tried adding two sysctl knobs:

net.inet.tcp.drop_synfin=1
net.inet.tcp.nolocaltimewait=1



Probably will do nothing, but need to try something at this point..
Comment 10 Hiren Panchasara freebsd_committer 2016-05-16 20:34:55 UTC
Hum. Changes are a bit convoluted so it's hard to single them out and try. 

Bug 204426 got fixed with https://reviews.freebsd.org/D6085
And seems to suggest that it fixed a problem which got amplified by https://svnweb.freebsd.org/base?view=revision&revision=292261 (stable/10 MFC of r291576)

Unsure if you can remove r292261 and its fix (i.e. patch in D6085) and see if that helps?
Comment 11 Robert Blayzor 2016-05-16 20:46:23 UTC
Bug 204426 references that review/patch. I did get the patch/diff and applied it to 10.3. I've been running that for a week or two. While that corrected the problem I originally saw in Bug 204426, now I see what is reported here. I've been told the two are not related in the notes from 204426. Hence this PR.
Comment 12 Hiren Panchasara freebsd_committer 2016-05-16 21:05:36 UTC
Well, we are running a bunch of systems on stable/10 r296969 and I just checked and didn't find any connections getting stuck on CLOSED. (Most of the serving seems from v4 though.) - Just a data-point.
Comment 13 Navdeep Parhar freebsd_committer 2016-05-16 21:10:04 UTC
Is tcpdrop(8) able to clean up these connections by any chance?
Comment 14 Robert Blayzor 2016-05-16 21:47:48 UTC
Will try tcpdrop the next time it happens. We normally can't catch it until the health monitors pick it up as dead, thats' when we notice the kernel messages. Usually by that time a "kill -9" will not work on the process and the only way to fix the problem is reboot.

I'm still trying to figure out why we are affected by this issue and Bug 204426. Am I correct in that Bug 204426 is only VM memory related? If so, maybe I can jack up the physical memory to the VM and try turning VM off.

AFAIK we're not doing anything special on the servers; just diskless boot VM's, NFS root, a couple of memory disk filesystems (/etc and var, which do not grow at all)... Just trying to figure out what in our environment is causing us to see this A LOT. One would think these two bug (or same one) would be a major regression other people would run into. If not, why?
Comment 15 Robert Blayzor 2016-05-16 22:19:28 UTC
I believe I have stumbled onto this problem in it's early stages. I notice this by catching a process that seems to have a connection in a "CLOSED" state for what seems like forever.

If I look at it closer...

dev@mta2 [~] netstat -an | grep CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.40858 CLOSED

dev@mta2 [~] sockstat | grep 40858
mailnull exim       35876 7  tcp6   2607:f058:110:2::1:2:25 2607:f058:110:2::f:0:40858
mailnull exim       35876 9  tcp6   2607:f058:110:2::1:2:25 2607:f058:110:2::f:0:40858

dev@mta2 [~] sudo tcpdrop 2607:f058:110:2::1:2 25 2607:f058:110:2::f:0 40858
tcpdrop: 2607:f058:110:2::1:2 25 2607:f058:110:2::f:0 40858: No such process


If I kill the process and restart it... the old process sticks around and the socket never clears.


dev@mta2 [~] sudo killall -9 exim
dev@mta2 [~] ps auxww | grep exim
mailnull 35876   0.0  0.4  41708   8184  -  DL    9:21PM     0:09.40 /usr/local/sbin/exim -bd -q10m


dev@mta2 [~] netstat -an | grep CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.40858 CLOSED
dev@mta2 [~] sudo service exim start
Starting exim.
dev@mta2 [~] gps exim
root     39120   0.2  0.4  41708   8364  -  Ss   10:15PM     0:00.00 /usr/local/sbin/exim -bd -q10m
mailnull 35876   0.0  0.4  41708   8184  -  DL    9:21PM     0:09.48 /usr/local/sbin/exim -bd -q10m


sockstat | grep 40858
mailnull exim       35876 7  tcp6   2607:f058:110:2::1:2:25 2607:f058:110:2::f:0:40858
mailnull exim       35876 9  tcp6   2607:f058:110:2::1:2:25 2607:f058:110:2::f:0:40858


I'm not expert on kernel VM, but think stinks as related to previous bug. 

Top state shows:  vmf_de ?

35876 mailnull      1  20    0 41708K  8184K vmf_de  2   0:10   0.00% exim
Comment 16 Robert Blayzor 2016-05-16 22:31:09 UTC
Sure enough only several minutes later...


netstat -an | grep CLOSE
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.21636 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.56257 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.29702 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.43568 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.54536 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.46349 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.43627 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.1415  CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.17268 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.48771 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.14033 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.38239 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.21242 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.25488 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.64057 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.13191 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:10::10.42818 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.13892 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.36679 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.54607 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.29255 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.61873 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.40489 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.21086 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.39435 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.43388 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.36112 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.7947  CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.32606 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.12869 CLOSED
tcp6       0      0 2607:f058:110:2:.25    2607:f058:110:2:.41813 CLOSED


Cannot be killed with tcpdrop...


Exim stuck in:

35876 mailnull      1  20    0 41708K  8184K vmf_de  0   0:12   0.00% exim


and results in:

sonewconn: pcb 0xfffff80015d67000: Listen queue overflow: 31 already in queue awaiting acceptance (26 occurrences)
sonewconn: pcb 0xfffff80015d67000: Listen queue overflow: 31 already in queue awaiting acceptance (26 occurrences)
sonewconn: pcb 0xfffff80015d67000: Listen queue overflow: 31 already in queue awaiting acceptance (26 occurrences)
sonewconn: pcb 0xfffff80015d67000: Listen queue overflow: 31 already in queue awaiting acceptance (25 occurrences)
sonewconn: pcb 0xfffff80015d67000: Listen queue overflow: 31 already in queue awaiting acceptance (26 occurrences)



Process does not kill... reboot time...
Comment 17 Jason Wolfe 2016-05-17 03:27:05 UTC
Run tcpdrop -l -a to have it output a list of the format it's expecting for each active connection.  Try with the one in there to confirm you have proper syntax.  You can get the 'No such process' if it's not matching the connection.

While this obviously won't fix the issue, you could write a quick cron to get you by in the meantime if you are able to drop them.
Comment 18 Twingly 2016-05-17 11:29:40 UTC
(In reply to Robert Blayzor from comment #14)
> > One would think these two bug (or same one) would be a major regression other people would run into. If not, why?

I do think other people are running into this. We do. It has happened to us since early January, 5 times. In our cases it is mysqld that hangs. I think this is related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=204764.

Robert, can you check with procstat -t if your processes are in "vodead" state when this happens?

Sorry for the quick post, I will try to post more information about our setup later. Just some quicks notes: we have physical machines with FreeBSD 10.2, we use ZFS (mirrored SSDs).
Comment 19 Robert Blayzor 2016-05-20 13:48:13 UTC
I still currently tracking this. I have found it almost impossible to manually re-produce and I'm not sure what triggers it.

I will certainly check the next time it happens...  This may be completely unrelated or but when I added the following sysctl knobs, the problem seems to happen less..

net.inet.tcp.nolocaltimewait=1
net.inet.tcp.drop_synfin=1


This is probably coincidence however..
Comment 20 Robert Blayzor 2016-05-23 12:53:15 UTC
After several days now, we normally would see this bug hit. After adding the two sysctl knobs..

net.inet.tcp.drop_synfin=1
net.inet.tcp.nolocaltimewait=1


This seems to be mitigated on at least so far. I can certainly turn these off and see if the problem appears again. I can't see how the above would be related to the issue we were having though...
Comment 21 Robert Blayzor 2016-05-27 00:19:38 UTC
This finally showed it's head again...


sonewconn: pcb 0xfffff80001b2bc40: Listen queue overflow: 301 already in queue awaiting acceptance (25 occurrences)
sonewconn: pcb 0xfffff80001b2bc40: Listen queue overflow: 301 already in queue awaiting acceptance (28 occurrences)
sonewconn: pcb 0xfffff80001b2bc40: Listen queue overflow: 301 already in queue awaiting acceptance (25 occurrences)
...

netstat -an | grep CLOSED | wc -l
     301

procstat -ta

  726 100087 dovecot          -                  1  120 sleep   kqread
61961 100200 dovecot          -                  3  120 sleep   vmf_de


The process is dovecot... and when I see this happen, the state is stuck in "vmf_de" which is from the patch in Bug 204426



tcpdrop -l -a | tail
tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:0 49170
tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:1 55998
tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:0 47559
tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:1 36319
tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:0 47496
tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:1 46326
tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:0 36871
tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:1 24142
tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:0 7962
tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:1 13402

[~] sudo tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:0 7962
tcpdrop: 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:0 7962: No such process

[~] sudo tcpdrop 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:1 13402
tcpdrop: 2607:f058:110:2::1:1 4190 2607:f058:110:2::f:1 13402: No such process
Comment 22 y2wjegieo8c2 2019-08-18 19:42:11 UTC
I am facing the same issue with Freebsd 11.2 (freenas machine).
In the log I have this:
Aug 18 18:50:19 freenas ctld[2680]: 192.168.0.122: exiting due to timeout
Aug 18 18:50:19 freenas ctld[2683]: 192.168.0.122: exiting due to timeout
Aug 18 18:50:19 freenas ctld[2679]: 192.168.0.122: exiting due to timeout
Aug 18 18:50:19 freenas ctld[2681]: 192.168.0.122: exiting due to timeout
Aug 18 18:50:19 freenas ctld[2682]: 192.168.0.122: exiting due to timeout
sonewconn: pcb 0xfffff80045e88ae0: Listen queue overflow: 193 already in queue awaiting acceptance (1 occurrences)
sonewconn: pcb 0xfffff80045e88ae0: Listen queue overflow: 193 already in queue awaiting acceptance (322 occurrences)
sonewconn: pcb 0xfffff80045e88ae0: Listen queue overflow: 193 already in queue awaiting acceptance (340 occurrences)
sonewconn: pcb 0xfffff80045e88ae0: Listen queue overflow: 193 already in queue awaiting acceptance (340 occurrences)

netstat -Lan reported the issue about the port 3260 (iscsi)
It seems to happened while I try to rename a ZVOL (zfs rename ...)

I tried to stop the iscsi service from the GUIwith no luck:
/etc/rc.d/ctld stop did not produce any effect (process was stuck)
I tried to kill the process manually: (2281 is for /usr/sbin/ctld) 
kill -9 2481
kill -HUP 2481
kill -KILL 2481
kill -19 2481
but no luck.

For a strange reason: /etc/rc.d/ctld stop returned:
ctld not running? (check /var/run/ctld.pid). (the service was definitely running)

ps aux | awk '$8=="Z" {print $2}' returns nothing

At the end, I rebooted the VM (I had to force the poweroff as I was getting extra message on the console about sonewconn after the sync message)

Hope it helps