Bug 231950 - www/squid: bug in 4.3, No connections are accepted after ECONNABORTED
Summary: www/squid: bug in 4.3, No connections are accepted after ECONNABORTED
Status: Closed Overcome By Events
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: Kurt Jaeger
URL: https://bugs.squid-cache.org/show_bug...
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-04 17:12 UTC by Mark.Martinec
Modified: 2018-12-23 01:31 UTC (History)
6 users (show)

See Also:
pi: maintainer-feedback+


Attachments
patch-from-bug4889 (2.50 KB, patch)
2018-10-10 16:33 UTC, Kurt Jaeger
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark.Martinec 2018-10-04 17:12:18 UTC
This is just a heads-up on the following upstream bug:

  https://bugs.squid-cache.org/show_bug.cgi?id=4889

which is pretty devastating, making squid 4.3 mostly unusable.
It  locks up (at our site) every couple of hours, no longer
accept(2)-ing new connections, and needs to be restarted.

The symptoms are:

- a storm of messages in /var/log/messages like the following:

Oct  4 13:07:34 xxx kernel: [65543] sonewconn: pcb 0xfffff8019df833a0: Listen queue overflow: 193 already in queue awaiting acceptance (231 occurrences)
Oct  4 13:08:34 xxx kernel: [65604] sonewconn: pcb 0xfffff8019df833a0: Listen queue overflow: 193 already in queue awaiting acceptance (129 occurrences)
Oct  4 13:09:35 xxx kernel: [65664] sonewconn: pcb 0xfffff8019df833a0: Listen queue overflow: 193 already in queue awaiting acceptance (154 occurrences)

- and the following in /var/log/squid/cache.log :

2018/10/04 12:54:11 kid2| oldAccept  FD 14, x.x.x.x [ job5]: (53) Software caused connection abort
2018/10/04 12:59:47 kid1| oldAccept  FD 14, x.x.x.x [ job5]: (53) Software caused connection abort

I'm now running with a provisional patch from the above mentioned
bug 4889.  So far so good...

Please follow closely the upstream bug report, and unless the 4.3 is
replaced very soon with a fixed version, I think the provisional patch
in the FreeBSD ports is in order as a quick mitigation.
Comment 1 Kurt Jaeger freebsd_committer freebsd_triage 2018-10-04 19:54:46 UTC
Does changing the value of sysctl

kern.ipc.soacceptqueue=<some higher value, like 1000>

help ?
Comment 2 Mark.Martinec 2018-10-04 22:05:34 UTC
> Does changing the value of sysctl
> kern.ipc.soacceptqueue=<some higher value, like 1000>

That was indeed my first thought, but after realizing that when squid decides
to stop ACCEPTing and the socket queue fills up and remains full at a fixed
value of 1.5 times soacceptqueue (even when no new connection attempts are
coming in and the host and squid are idle), then it becomes apparent that
the socket queue will eventually just fill up regardless of its size limit.
The analysis of the problem in the upstream bug seems to confirm this.

The queue being full is just a consequence of the application not accepting
new requests any longer.
Comment 3 Josmar Calin De Pierri 2018-10-09 20:18:15 UTC
I see this error happening here too, within about 10 minutes of load in our production servers (FreeBSD 11.2-RELEASE-p4 and Squid 4.3 from ports)

Roll back to Squid 3.5.28 (also from ports) solved this issue.
Comment 4 Oleh Hushchenkov 2018-10-10 04:32:10 UTC
Upstream proposed patch is solve issue for us.
Comment 5 andrkaz 2018-10-10 08:53:17 UTC
(In reply to Josmar from comment #3)
I am in same situation, Freebsd 11.2-RELEASE-p4. Downgrade Squid to version 3.5.27. Error "sonewconn: pcb 0xfffff80202bf2cb0: Listen queue overflow: 193 already in queue awaiting acceptance (63 occurrences)" persist.
Comment 6 Josmar Calin De Pierri 2018-10-10 13:28:48 UTC
(In reply to Oleh Hushchenkov from comment #4)
I'll wait until it arrive on ports.
Comment 7 Kurt Jaeger freebsd_committer freebsd_triage 2018-10-10 16:33:26 UTC
Created attachment 198005 [details]
patch-from-bug4889

testbuilds are fine, no run-tests, no build-test with SSL=on option and security/openssl111.
Comment 8 Pavel Timofeev 2018-10-10 19:42:24 UTC
Comment on attachment 198005 [details]
patch-from-bug4889

Thanks a lot!
Comment 9 Pavel Timofeev 2018-10-10 19:45:37 UTC
Comment on attachment 198005 [details]
patch-from-bug4889

Well, for unknown reason bugzilla ignores my attempts to mark the patch with '+'.
No doubt, Bugzilla, I approve it ;)
Comment 10 commit-hook freebsd_committer freebsd_triage 2018-10-11 07:14:38 UTC
A commit references this bug:

Author: pi
Date: Thu Oct 11 07:14:16 UTC 2018
New revision: 481788
URL: https://svnweb.freebsd.org/changeset/ports/481788

Log:
  www/squid: fix bug in 4.3, no connections are accepted after ECONNABORTED

  - without the fix, squid locks up every couple of hours, no longer
    accept(2)-ing new connections, and needs to be restarted.

  PR:		231950
  Submitted by:	Mark.Martinec@ijs.si
  Reviewed by:	Oleh Hushchenkov <gor@clogic.com.ua>
  Approved by:	Pavel Timofeev <timp87@gmail.com> (maintainer)
  Obtained from:	https://bugs.squid-cache.org/show_bug.cgi?id=4889

Changes:
  head/www/squid/Makefile
  head/www/squid/files/patch-src_comm_TcpAcceptor.cc
Comment 11 Marko Cupać 2018-11-14 08:53:24 UTC
(In reply to commit-hook from comment #10)

I'm running squid-4.3_1, and it crashes under load constantly. Not sure if it's related to this particular bug report.

From squid jail's cache log, just before crash I always get 'assertion failed':

cat /var/log/squid/cache.log | grep assertion

2018/11/14 08:34:47 kid1| assertion failed: http.cc:1530: "!Comm::MonitorsRead(serverConnection->fd)"
2018/11/14 08:37:00 kid1| assertion failed: http.cc:1530: "!Comm::MonitorsRead(serverConnection->fd)"
2018/11/14 08:39:03 kid1| assertion failed: http.cc:1530: "!Comm::MonitorsRead(serverConnection->fd)"
2018/11/14 08:41:05 kid1| assertion failed: http.cc:1530: "!Comm::MonitorsRead(serverConnection->fd)"
2018/11/14 08:43:08 kid1| assertion failed: http.cc:1530: "!Comm::MonitorsRead(serverConnection->fd)"
2018/11/14 08:45:12 kid1| assertion failed: http.cc:1530: "!Comm::MonitorsRead(serverConnection->fd)"
2018/11/14 08:50:22 kid1| assertion failed: stmem.cc:98: "lowestOffset () <= target_offset"
2018/11/14 08:55:43 kid1| assertion failed: stmem.cc:98: "lowestOffset () <= target_offset"
2018/11/14 09:00:51 kid1| assertion failed: stmem.cc:98: "lowestOffset () <= target_offset"
2018/11/14 09:05:56 kid1| assertion failed: stmem.cc:98: "lowestOffset () <= target_offset"

From jail host's messages.log:

Nov 14 08:43:52 warden4 kernel: pid 95207 (squid), uid 100: exited on signal 6 (core dumped)
Nov 14 08:43:55 warden4 kernel: Limiting closed port RST response from 305 to 200 packets/sec
Nov 14 08:45:27 warden4 kernel: sonewconn: pcb 0xfffff801f2145910: Listen queue overflow: 193 already in queue awaiting acceptance (1394 occurrences)
Nov 14 08:46:03 warden4 kernel: pid 96661 (squid), uid 100: exited on signal 6 (core dumped)
Nov 14 08:46:04 warden4 kernel: Limiting closed port RST response from 567 to 200 packets/sec
Nov 14 08:50:35 warden4 kernel: sonewconn: pcb 0xfffff8016dd02910: Listen queue overflow: 193 already in queue awaiting acceptance (1647 occurrences)
Nov 14 08:51:25 warden4 kernel: pid 98427 (squid), uid 100: exited on signal 6 (core dumped)
Nov 14 08:51:26 warden4 kernel: Limiting closed port RST response from 586 to 200 packets/sec
Nov 14 08:55:59 warden4 kernel: sonewconn: pcb 0xfffff80288af5570: Listen queue overflow: 193 already in queue awaiting acceptance (1793 occurrences)
Nov 14 08:56:32 warden4 kernel: pid 2397 (squid), uid 100: exited on signal 6 (core dumped)
Nov 14 08:56:34 warden4 kernel: Limiting closed port RST response from 489 to 200 packets/sec
Nov 14 09:01:10 warden4 kernel: sonewconn: pcb 0xfffff80152101cb0: Listen queue overflow: 193 already in queue awaiting acceptance (1460 occurrences)
Nov 14 09:01:37 warden4 kernel: pid 5938 (squid), uid 100: exited on signal 6 (core dumped)
Comment 12 Pavel Timofeev 2018-11-17 08:07:20 UTC
(In reply to Marko Cupać from comment #11)
Then it's better to create a ticket in squid bugzilla.
This particular issue was fixed in scope of https://bugs.squid-cache.org/show_bug.cgi?id=4889 (I hope)
Comment 13 Kurt Jaeger freebsd_committer freebsd_triage 2018-12-22 20:47:41 UTC
I assume that this is solved (as the port is already @4.4
Comment 14 Kubilay Kocak freebsd_committer freebsd_triage 2018-12-23 01:31:46 UTC
Resolved (indirectly) by swills via bug 233845 in ports r487000