210642 – www/nginx: "service nginx upgrade" results in shutdown() failed (54: Connection reset by peer)

Bug 210642 - www/nginx: "service nginx upgrade" results in shutdown() failed (54: Connection reset by peer)

Summary: www/nginx: "service nginx upgrade" results in shutdown() failed (54: Connecti...

Status:	Closed Not A Bug

Alias:	None

Product:	Ports & Packages
Classification:	Unclassified
Component:	Individual Port(s) (show other bugs)
Version:	Latest
Hardware:	Any Any

Importance:	--- Affects Some People
Assignee:	Sergey A. Osokin

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-06-27 20:56 UTC by Adam Strohl
Modified:	2016-07-26 23:29 UTC (History)
CC List:	1 user (show)

See Also:

Flags:	bugzilla: maintainer-feedback? (osa)

Attachments
NginX config file (2.14 KB, text/plain) 2016-06-27 22:16 UTC, Adam Strohl	no flags	Details
Default site config (172 bytes, text/plain) 2016-06-27 22:35 UTC, Adam Strohl	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Adam Strohl 2016-06-27 20:56:24 UTC

We use 'service nginx upgrade' to do seamless restarting of NginX processes on high volume sites when needed (ie; an upgrade).

We recently just upgraded most of our clients to 1.10.1, and started seeing this in the log when we run the 'service nginx upgrade' command:
Code:

2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer)
2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer)
2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer)
2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer)
2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer)
2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer)
2016/06/27 00:18:58 [alert] 73753#0: worker process 73757 exited on signal 11
2016/06/27 00:18:56 [emerg] 74294#0: bind() to 1.2.3.4:443 failed (48: Address already in use)
nginx: [emerg] bind() to 1.2.3.4:443 failed (48: Address already in use)
2016/06/27 00:18:56 [emerg] 74294#0: still could not bind()
nginx: [emerg] still could not bind()

This happens even on idle servers (backups) with no traffic.

The end result is that NginX exits entirely and doesn't come back up.  Waiting a few seconds and 'service nginx start' will work, but at that point there has been a lot of missed traffic if the site is busy.

Comment 1 Sergey A. Osokin freebsd_committer

2016-06-27 22:12:10 UTC

Hi Adam,

thanks for report!

To reproduce the issue please attach a copy of the nginx.conf to the bug report.

Comment 2 Adam Strohl 2016-06-27 22:16:39 UTC

Created attachment 171890 [details]
NginX config file

Comment 3 Sergey A. Osokin freebsd_committer

2016-06-27 22:23:45 UTC

Attached nginx.conf isn't complete because it contains following directive:

include /share/conf/nginx/sites/*.conf

Please attach all those files too.

Comment 4 Adam Strohl 2016-06-27 22:35:05 UTC

Created attachment 171891 [details]
Default site config

I have found the issue exists even with just this single default site config.

One thing I did note is that you need to make at least one HTTP call post-startup to see the issue.  If nginx hasn't handled at least one request, it seems to function fine.  After 1 request though you'll see the shutdown() errors.

Comment 5 Sergey A. Osokin freebsd_committer

2016-06-27 22:51:40 UTC

Could you please also provide any details about the environment where this nginx instance is running, i.e.

o) is it a physical box or VM?
o) is it an jail?
o) output of the uname -spr command
o) output of the nginx -V

Thanks.

Comment 6 Adam Strohl 2016-06-27 23:06:33 UTC

We have a number of machines this affects so I've been trying to feed you a typical configuration but so far its all physical servers.

nginx version: nginx/1.10.1
built with OpenSSL 1.0.2h  3 May 2016
TLS SNI support enabled
configure arguments: --prefix=/usr/local/etc/nginx --with-cc-opt='-I /usr/local/include' --with-ld-opt='-L /usr/local/lib' --conf-path=/usr/local/etc/nginx/nginx.conf --sbin-path=/usr/local/sbin/nginx --pid-path=/var/run/nginx.pid --error-log-path=/var/log/nginx-error.log --user=www --group=www --modules-path=/usr/local/libexec/nginx --with-debug --with-ipv6 --http-client-body-temp-path=/var/tmp/nginx/client_body_temp --http-fastcgi-temp-path=/var/tmp/nginx/fastcgi_temp --http-proxy-temp-path=/var/tmp/nginx/proxy_temp --http-scgi-temp-path=/var/tmp/nginx/scgi_temp --http-uwsgi-temp-path=/var/tmp/nginx/uwsgi_temp --http-log-path=/var/log/nginx-access.log --add-dynamic-module=/wrkdirs/usr/ports/www/nginx/work/nginx-auth-ldap-dbcef31 --with-http_dav_module --add-module=/wrkdirs/usr/ports/www/nginx/work/nginx-dav-ext-module-0.0.3 --add-module=/wrkdirs/usr/ports/www/nginx/work/nginx-notice-3c95966 --add-module=/wrkdirs/usr/ports/www/nginx/work/nchan-0.99.15 --with-http_slice_module --with-http_stub_status_module --with-http_sub_module --add-module=/wrkdirs/usr/ports/www/nginx/work/nginx_upstream_check_module-10782ea --add-module=/wrkdirs/usr/ports/www/nginx/work/nginx_upstream_fair-20090923 --with-pcre --with-http_v2_module --with-stream=dynamic --with-stream_ssl_module --with-http_ssl_module


There are differing FreeBSD versions here are a few examples:

FreeBSD 10.1-RELEASE-p26 amd64 (some)
FreeBSD 10.1-RELEASE-p31 amd64 (most of them)

Comment 7 Sergey A. Osokin freebsd_committer

2016-06-28 01:04:18 UTC

Hi Adam,

I've just tried to reproduce the issue on my machine but no luck.

Two recommendations here:
o) recompile nginx without third-party modules and try to reproduce the issue;
o) to check is there anything else is running on the box on 443 port.

Comment 8 ganbold-freebsd 2016-07-10 05:17:21 UTC

When using port 80 I can't reproduce the issue.
However when using SSL, sending USR2 signal seems to be not working and thus reproduces this problem.

Can you try to reproduce it in case of SSL?

thanks,

Comment 9 Sergey A. Osokin freebsd_committer

2016-07-14 03:04:01 UTC

Dear Ganbold,

could you please provide nginx.conf for reproducible issue.
Thanks,

Comment 10 Sergey A. Osokin freebsd_committer

2016-07-14 22:43:09 UTC

Ganbold,

also could you please reproduce the issue with vanilla (i.e. without third-party modules) version of nginx.

So, please provide output of the nginx -V.

Thanks.

Comment 11 ganbold-freebsd 2016-07-19 00:23:10 UTC

I've tried to reproduce it in environment as close as production one.
We found out the issue and it wasn't really directly related to nginx.
Openssl package was built using ASM option (optimized Assembler code) that somehow made nginx not working when running 'service nginx upgrade'.
In this case logs show similar to (please note here 'invalid socket number' line) following:
...
2016/07/18 12:31:02 [notice] 53854#0: using inherited sockets from "6;�'"
2016/07/18 12:31:02 [emerg] 53854#0: invalid socket number "�'" in NGINX environment variable, ignoring
nginx: [emerg] invalid socket number "�'" in NGINX environment variable, ignoring
2016/07/18 12:31:02 [emerg] 53854#0: bind() to 192.168.0.200:443 failed (48: Address already in use)
nginx: [emerg] bind() to 192.168.0.200:443 failed (48: Address already in use)
2016/07/18 12:31:02 [emerg] 53854#0: bind() to 192.168.0.200:443 failed (48: Address already in use)
nginx: [emerg] bind() to 192.168.0.200:443 failed (48: Address already in use)
2016/07/18 12:31:02 [emerg] 53854#0: bind() to 192.168.0.200:443 failed (48: Address already in use)
nginx: [emerg] bind() to 192.168.0.200:443 failed (48: Address already in use)
2016/07/18 12:31:02 [emerg] 53854#0: bind() to 192.168.0.200:443 failed (48: Address already in use)
nginx: [emerg] bind() to 192.168.0.200:443 failed (48: Address already in use)
2016/07/18 12:31:02 [emerg] 53854#0: bind() to 192.168.0.200:443 failed (48: Address already in use)
nginx: [emerg] bind() to 192.168.0.200:443 failed (48: Address already in use)
2016/07/18 12:31:02 [emerg] 53854#0: still could not bind()
nginx: [emerg] still could not bind()
...

When Openssl was built without ASM option, log shows as following:

2016/07/18 12:37:14 [notice] 53934#0: using inherited sockets from "6;7;"


So I think this PR can be closed since the issue is not really related to nginx.

thanks a lot,

Comment 12 Sergey A. Osokin freebsd_committer

2016-07-26 23:29:28 UTC

Close this PR because the issue can be reproducible only if nginx compiled with openssl with ASM option only.