Created attachment 204020 [details]
Stack trace from kernel panic
Our setup is like this: FreeBSD 12.0-RELEASE-p3 running deployments of a web service in multiple jails, each has nginx as an upstream to the nginx running on the host, which is a reverse proxy (I think that's the right term?). The in-jail nginx's are themselves partially reverse proxies to a Go program; static files are served by nginx itself but dynamic content comes from the Go server.
Problem: Occasionally, loading the web page crashes the entire host.
There doesn't seem to be any reliable way to reproduce it. We've tried turning HTTP 2 on and off; it happens either way, but we haven't found any combination of circumstances that causes it to happen more than occasionally (I'd estimate something like 1 in 20 page reloads on average). Many times we've made a change and thought we fixed it after >50 successful reloads but found that it does still crash occasionally.
I'll attach the stack trace from the crash.
nginx version: 1.14.2_13,2
@rlwestlund Could you please also provide:
- Complete uname -a
- /var/run/dmesg.boot output (as an attachment)
- /etc/rc.conf output (sanitized where necessary) as an attachment
- nginx configuration (sanitized where necessary) as an attachment
- pkg version -v output (as an attachment)
Sorry, but I'm afraid the server was wiped since then. I'll see if we can reproduce it now.
We're unable to reproduce it now. It may have been fixed in the patches since then. I did get another symptom that might have been related before, which is the jails failing to unmount on jail -r with "Device busy". Previously we traced that error to TCP connections being left open in TIME_WAIT state, but on the server that used to crash it happened always even if the jail had just been started. Since then we had solved it with a tcpdrop script as an exec.poststop hook, but with the server I just spun up to test it it still happened a few times. Once I looked into it and couldn't find anything wrong with the tcpdrop script I couldn't get the Device busy error to happen anymore. I'm not sure if this is related.
@rlwestlund Please re-open the issue if it becomes reproducible once more