We use 'service nginx upgrade' to do seamless restarting of NginX processes on high volume sites when needed (ie; an upgrade). We recently just upgraded most of our clients to 1.10.1, and started seeing this in the log when we run the 'service nginx upgrade' command: Code: 2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer) 2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer) 2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer) 2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer) 2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer) 2016/06/27 00:18:58 [error] 73757#0: shutdown() failed (54: Connection reset by peer) 2016/06/27 00:18:58 [alert] 73753#0: worker process 73757 exited on signal 11 2016/06/27 00:18:56 [emerg] 74294#0: bind() to 1.2.3.4:443 failed (48: Address already in use) nginx: [emerg] bind() to 1.2.3.4:443 failed (48: Address already in use) 2016/06/27 00:18:56 [emerg] 74294#0: still could not bind() nginx: [emerg] still could not bind() This happens even on idle servers (backups) with no traffic. The end result is that NginX exits entirely and doesn't come back up. Waiting a few seconds and 'service nginx start' will work, but at that point there has been a lot of missed traffic if the site is busy.
Hi Adam, thanks for report! To reproduce the issue please attach a copy of the nginx.conf to the bug report.
Created attachment 171890 [details] NginX config file
Attached nginx.conf isn't complete because it contains following directive: include /share/conf/nginx/sites/*.conf Please attach all those files too.
Created attachment 171891 [details] Default site config I have found the issue exists even with just this single default site config. One thing I did note is that you need to make at least one HTTP call post-startup to see the issue. If nginx hasn't handled at least one request, it seems to function fine. After 1 request though you'll see the shutdown() errors.
Could you please also provide any details about the environment where this nginx instance is running, i.e. o) is it a physical box or VM? o) is it an jail? o) output of the uname -spr command o) output of the nginx -V Thanks.
We have a number of machines this affects so I've been trying to feed you a typical configuration but so far its all physical servers. nginx version: nginx/1.10.1 built with OpenSSL 1.0.2h 3 May 2016 TLS SNI support enabled configure arguments: --prefix=/usr/local/etc/nginx --with-cc-opt='-I /usr/local/include' --with-ld-opt='-L /usr/local/lib' --conf-path=/usr/local/etc/nginx/nginx.conf --sbin-path=/usr/local/sbin/nginx --pid-path=/var/run/nginx.pid --error-log-path=/var/log/nginx-error.log --user=www --group=www --modules-path=/usr/local/libexec/nginx --with-debug --with-ipv6 --http-client-body-temp-path=/var/tmp/nginx/client_body_temp --http-fastcgi-temp-path=/var/tmp/nginx/fastcgi_temp --http-proxy-temp-path=/var/tmp/nginx/proxy_temp --http-scgi-temp-path=/var/tmp/nginx/scgi_temp --http-uwsgi-temp-path=/var/tmp/nginx/uwsgi_temp --http-log-path=/var/log/nginx-access.log --add-dynamic-module=/wrkdirs/usr/ports/www/nginx/work/nginx-auth-ldap-dbcef31 --with-http_dav_module --add-module=/wrkdirs/usr/ports/www/nginx/work/nginx-dav-ext-module-0.0.3 --add-module=/wrkdirs/usr/ports/www/nginx/work/nginx-notice-3c95966 --add-module=/wrkdirs/usr/ports/www/nginx/work/nchan-0.99.15 --with-http_slice_module --with-http_stub_status_module --with-http_sub_module --add-module=/wrkdirs/usr/ports/www/nginx/work/nginx_upstream_check_module-10782ea --add-module=/wrkdirs/usr/ports/www/nginx/work/nginx_upstream_fair-20090923 --with-pcre --with-http_v2_module --with-stream=dynamic --with-stream_ssl_module --with-http_ssl_module There are differing FreeBSD versions here are a few examples: FreeBSD 10.1-RELEASE-p26 amd64 (some) FreeBSD 10.1-RELEASE-p31 amd64 (most of them)
Hi Adam, I've just tried to reproduce the issue on my machine but no luck. Two recommendations here: o) recompile nginx without third-party modules and try to reproduce the issue; o) to check is there anything else is running on the box on 443 port.
When using port 80 I can't reproduce the issue. However when using SSL, sending USR2 signal seems to be not working and thus reproduces this problem. Can you try to reproduce it in case of SSL? thanks,
Dear Ganbold, could you please provide nginx.conf for reproducible issue. Thanks,
Ganbold, also could you please reproduce the issue with vanilla (i.e. without third-party modules) version of nginx. So, please provide output of the nginx -V. Thanks.
I've tried to reproduce it in environment as close as production one. We found out the issue and it wasn't really directly related to nginx. Openssl package was built using ASM option (optimized Assembler code) that somehow made nginx not working when running 'service nginx upgrade'. In this case logs show similar to (please note here 'invalid socket number' line) following: ... 2016/07/18 12:31:02 [notice] 53854#0: using inherited sockets from "6;�'" 2016/07/18 12:31:02 [emerg] 53854#0: invalid socket number "�'" in NGINX environment variable, ignoring nginx: [emerg] invalid socket number "�'" in NGINX environment variable, ignoring 2016/07/18 12:31:02 [emerg] 53854#0: bind() to 192.168.0.200:443 failed (48: Address already in use) nginx: [emerg] bind() to 192.168.0.200:443 failed (48: Address already in use) 2016/07/18 12:31:02 [emerg] 53854#0: bind() to 192.168.0.200:443 failed (48: Address already in use) nginx: [emerg] bind() to 192.168.0.200:443 failed (48: Address already in use) 2016/07/18 12:31:02 [emerg] 53854#0: bind() to 192.168.0.200:443 failed (48: Address already in use) nginx: [emerg] bind() to 192.168.0.200:443 failed (48: Address already in use) 2016/07/18 12:31:02 [emerg] 53854#0: bind() to 192.168.0.200:443 failed (48: Address already in use) nginx: [emerg] bind() to 192.168.0.200:443 failed (48: Address already in use) 2016/07/18 12:31:02 [emerg] 53854#0: bind() to 192.168.0.200:443 failed (48: Address already in use) nginx: [emerg] bind() to 192.168.0.200:443 failed (48: Address already in use) 2016/07/18 12:31:02 [emerg] 53854#0: still could not bind() nginx: [emerg] still could not bind() ... When Openssl was built without ASM option, log shows as following: 2016/07/18 12:37:14 [notice] 53934#0: using inherited sockets from "6;7;" So I think this PR can be closed since the issue is not really related to nginx. thanks a lot,
Close this PR because the issue can be reproducible only if nginx compiled with openssl with ASM option only.