Bug 212920

Summary:	Highly loaded web server cath race condition on _close () from /lib/libc.so.7 with accf_http
Product:	Base System	Reporter:	Alexey <ucu8u1b-ol>
Component:	kern	Assignee:	Julien Charbon <jch>
Status:	Closed FIXED
Severity:	Affects Some People	CC:	chris, cstdenis, glebius, iam, jcharbon
Priority:	---
Version:	10.3-STABLE
Hardware:	amd64
OS:	Any

Description Alexey 2016-09-23 10:10:04 UTC

Hello!

Recently we upgraded our high loaded web server to FREEBSD-STABLE 10.3 r305091 and got problem with NGINX (nginx-1.10.1_2,2 compiled from latest ports with most default settings). After some time one worker stopped answer requests and top command shows it in state soclos
 1072 nobody           1  22    0  1698M 65680K soclos  5   0:13   0.00% nginx

after short while next worker stops in same state and so on untill all workers become "soclos" and web server stops serve requests (but still accept connections, which die on timeout after client sent a request). Increasing workers count only move problem to next half an hour.

Restarting nginx fix for some not so long time. Server is more or less high loaded with 1000-2000 request/sec. Actually server is frontend proxy with proxy_cache functionality. We tried on 2 different phisical servers with actually different NICs and CPUs. When we returned kernel (only kernel and modules at /boot/kernel, not world) to r302223, problem gone.

We tried to upgrade to yesterdey's r306194. Problem is still here. Something 
changed between end of Jun and end of Aug in kernel code what generate a problem

backtrace from nginx while it in "soclos"

#0  0x0000000801a17d28 in _close () from /lib/libc.so.7
#1  0x000000080098a925 in pthread_suspend_all_np () from /lib/libthr.so.3
#2  0x00000000004329b9 in ngx_close_connection (c=0x869c1de70) at src/core/ngx_connection.c:1169
#3  0x0000000000486370 in ngx_http_close_connection (c=0x869c1de70) at src/http/ngx_http_request.c:3543
#4  0x0000000000488e86 in ngx_http_close_request (r=0x80244c050, rc=408) at src/http/ngx_http_request.c:3406
#5  0x000000000048d9ed in ngx_http_process_request_headers (rev=0x807810b70) at src/http/ngx_http_request.c:1202
#6  0x000000000044fdbd in ngx_event_expire_timers () at src/event/ngx_event_timer.c:94
#7  0x000000000044e60f in ngx_process_events_and_timers (cycle=0x802488050) at src/event/ngx_event.c:256
#8  0x000000000045f406 in ngx_worker_process_cycle (cycle=0x802488050, data=0xa) at src/os/unix/ngx_process_cycle.c:753
#9  0x000000000045ae7c in ngx_spawn_process (cycle=0x802488050, proc=0x45f2f0 <ngx_worker_process_cycle>, data=0xa, name=0x53ecea "worker process", respawn=-3) at src/os/unix/ngx_process.c:198
#10 0x000000000045cc89 in ngx_start_worker_processes (cycle=0x802488050, n=16, type=-3) at src/os/unix/ngx_process_cycle.c:358
#11 0x000000000045c486 in ngx_master_process_cycle (cycle=0x802488050) at src/os/unix/ngx_process_cycle.c:130
#12 0x0000000000413288 in main (argc=1, argv=0x7fffffffead0) at src/core/nginx.c:367

(gdb) list src/core/ngx_connection.c:1169
1164   
1165        if (c->shared) {
1166            return;
1167        }
1168   
1169        if (ngx_close_socket(fd) == -1) { <<<<<<<<
1170   
1171            err = ngx_socket_errno;
1172   
1173            if (err == NGX_ECONNRESET || err == NGX_ENOTCONN) {

and actually called close(fd):
#define ngx_close_socket    close

All TCP sessions opened by worker frose in present state.

Same if we do not load and do not use in nginx config accf_http, problem not repeased with all 3 tested kernels

kernel GENERIC and only extra accf_http ipmi smbus mfip ums zfs and opensolaris  module loaded

As long as  accf_http did some good for our server, we can not simple disabe the module in production env.

I'll debug more, but as long as I'm not is good C programmer, it will take some time. If someone knows what changed in related functions, may be it will be faster to check from that side..

Comment 1 Mikanoshi 2016-09-24 09:06:50 UTC

I have the same problem on 11.0-RC3 if reset_timedout_connection is enabled in Nginx (accept filters are enabled too).

Running truss on a worker stuck in soclos produces this output:
    close(49) ERR#9 'Bad file descriptor'
    write(36,"2016/09/19 18:16:03 [crit] 15495#0: close() socket 49 failed (9: Bad file descriptor)\n",86) = 86 (0x56)
and then worker starts to process requests as usual.

Such entries are written to log when reset_timedout_connection is disabled (workers do not stuck in this case):
[info] 89888#0: close() socket 12 failed (54: Connection reset by peer)

Problem appeared after moving from FreeBSD 9.2 to 11.0-RC3.

Discussion on Nginx forum (in russian):
https://forum.nginx.org/read.php?21,269501,269778#msg-269778

Comment 2 Alexey 2016-09-27 13:01:17 UTC

I checked some SVN versions between r302223 and r305091 (10-STABLE)

problem started since r302995
https://svnweb.freebsd.org/base?view=revision&sortby=date&revision=302995

Also, if I remove accept_filter=httpready from listen in nginx or disbale reset_timedout_connection option in nginx, problem is not repeated with SVN verions after r302995.

/Alexey

Comment 3 jcharbon 2016-09-28 09:02:29 UTC

Thanks for reporting this issue, this is quite interesting.

#1. Could you revert r302995 and check that everything is fine without it?  It will increase a bit the contention on accept() but at 2000 req/sec it is fine, it starts to matter around 30k accept/sec.

#2. Then could you try this change with r302995 in place:

diff --git a/sys/netinet/tcp_syncache.c b/sys/netinet/tcp_syncache.c
index f71d028..6d7ab69 100644
--- a/sys/netinet/tcp_syncache.c
+++ b/sys/netinet/tcp_syncache.c
@@ -922,7 +922,9 @@ syncache_socket(struct syncache *sc, struct socket *lso, struct mbuf *m)
 
        INP_WUNLOCK(inp);
 
-       soisconnected(so);
+       if ((so->so_options & SO_ACCEPTFILTER) == 0) {
+               soisconnected(so);
+       }
 
        TCPSTAT_INC(tcps_accepts);
        return (so);


I just smoke tested it but not validated it, I am currently setting up nginx and accept_filter to reproduce your issue.

The rational is that r302995 and using accept_filter=httpready are quite symmetric but in r302995+accept_filter=httpready soisconnected() case soisconnected() is called sooner than later, and in a different locking context.

Comment 4 Alexey 2016-09-28 10:17:02 UTC

Thank you for you answer Julien. 

I tested both r302995 and r302994. In r302995 problem exist and in r302994 is not.

I'll compile r302995+your patch right now and will be back with results.

If you want to test, please check you have "reset_timedout_connection on;" in nginx.conf also. As It OFF by default, and problem is not repeated without reset_timedout_connection=on.

/Alexey

Comment 5 Alexey 2016-09-28 13:05:46 UTC

1-st result after 1 hour: everything works OK with r302995M (+patch). We'll test more.

Comment 6 Julien Charbon freebsd_committer

2016-09-28 16:18:58 UTC

Ok, thanks for your time testing.  Actually the more I look at the code, the more I think this soisconnected() call is useless in all cases (accept filter or not).

If your test is ok I will create a review to see if others think the same.

Comment 7 Alexey 2016-09-28 16:52:29 UTC

Thank you! Looks like patch solved our problem. 5 hours under test works OK (when problem exists, 1-st worker stopped after 5-10 minutes and all workers stopped no more then in 1 hour under same load.)

Comment 8 Julien Charbon freebsd_committer

2016-09-29 07:19:38 UTC

Ok, I will push this exact same change in HEAD. And MFC it in stable/10 and stable/11.  I was not able to reproduce your issue, but I did not see any changes with or without the patch.

I will later create a review to completely remove this extraneous soisconnected() call.

Thanks again.

Comment 9 commit-hook freebsd_committer

2016-09-29 11:19:49 UTC

A commit references this bug:

Author: jch
Date: Thu Sep 29 11:18:48 UTC 2016
New revision: 306443
URL: https://svnweb.freebsd.org/changeset/base/306443

Log:
  Fix an issue with accept_filter introduced with r261242:

  As a side effect of r261242 when using accept_filter the
  first call to soisconnected() is done earlier in tcp_input()
  instead of tcp_do_segment() context.  Restore the expected behaviour.

  Note:  This call to soisconnected() seems to be extraneous in all
  cases (with or without accept_filter).  Will be addressed in a
  separate commit.

  PR:			212920
  Reported by:		Alexey
  Tested by:              Alexey, jch
  Sponsored by:           Verisign, Inc.
  MFC after:		1 week

Changes:
  head/sys/netinet/tcp_syncache.c

Comment 10 Julien Charbon freebsd_committer

2016-09-29 11:44:42 UTC

The review for removing this extraneous soisconnected() call:

Remove an extraneous call to soisconnected() in syncache_socket(), introduced with r261242.
https://reviews.freebsd.org/D8072

Comment 11 commit-hook freebsd_committer

2016-10-09 21:03:23 UTC

A commit references this bug:

Author: jch
Date: Sun Oct  9 21:02:34 UTC 2016
New revision: 306922
URL: https://svnweb.freebsd.org/changeset/base/306922

Log:
  MFC r306443:

  Fix an issue with accept_filter introduced with r261242:

  As a side effect of r261242 when using accept_filter the
  first call to soisconnected() is done earlier in tcp_input()
  instead of tcp_do_segment() context.  Restore the expected behaviour.

  Note:  This call to soisconnected() seems to be extraneous in all
  cases (with or without accept_filter).  Will be addressed in a
  separate commit.

  PR:			212920
  Reported by:		Alexey
  Tested by:		Alexey, jch
  Sponsored by:		Verisign, Inc.

Changes:
_U  stable/11/
  stable/11/sys/netinet/tcp_syncache.c

Comment 12 commit-hook freebsd_committer

2016-10-09 21:35:31 UTC

A commit references this bug:

Author: jch
Date: Sun Oct  9 21:35:12 UTC 2016
New revision: 306925
URL: https://svnweb.freebsd.org/changeset/base/306925

Log:
  MFC r306443:

  Fix an issue with accept_filter introduced with r261242:

  As a side effect of r261242 when using accept_filter the
  first call to soisconnected() is done earlier in tcp_input()
  instead of tcp_do_segment() context.  Restore the expected behaviour.

  Note:  This call to soisconnected() seems to be extraneous in all
  cases (with or without accept_filter).  Will be addressed in a
  separate commit.

  PR:			212920
  Reported by:		Alexey
  Tested by:		Alexey, jch
  Sponsored by:		Verisign, Inc.

Changes:
  stable/10/sys/netinet/tcp_syncache.c

Comment 13 Alexey 2016-10-11 16:06:27 UTC

Hello!


After update to r306942 problem gone

Thanks a lot!

With best regards
/Alexey

Comment 14 Julien Charbon freebsd_committer

2016-10-19 08:03:01 UTC

Thanks Alexey for your time and clear bug report.

Comment 15 cstdenis 2017-04-17 06:54:07 UTC

Please push this fix out to 11.0-RELEASE, it is a problematic bug that has been fixed for months but never pushed to release versions.

Comment 16 Gleb Smirnoff freebsd_committer

2017-04-17 15:51:37 UTC

Sorry, we don't have time machine, thus we can't push it to 11.0-RELEASE, which was cut before the bug was fixed.

The fix is in the stable/11 branch, so the future 11.1-RELEASE will be not affected.