Bug 237568 - nginx causes host panic when closing socket descriptor?
Summary: nginx causes host panic when closing socket descriptor?
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: Kubilay Kocak
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2019-04-25 22:03 UTC by rlwestlund
Modified: 2019-10-15 01:54 UTC (History)
4 users (show)

See Also:
koobs: mfc-stable11?
koobs: mfc-stable12?


Attachments
Stack trace from kernel panic (1.34 KB, text/plain)
2019-04-25 22:03 UTC, rlwestlund
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description rlwestlund 2019-04-25 22:03:41 UTC
Created attachment 204020 [details]
Stack trace from kernel panic

Our setup is like this: FreeBSD 12.0-RELEASE-p3 running deployments of a web service in multiple jails, each has nginx as an upstream to the nginx running on the host, which is a reverse proxy (I think that's the right term?). The in-jail nginx's are themselves partially reverse proxies to a Go program; static files are served by nginx itself but dynamic content comes from the Go server.

Problem: Occasionally, loading the web page crashes the entire host.

There doesn't seem to be any reliable way to reproduce it. We've tried turning HTTP 2 on and off; it happens either way, but we haven't found any combination of circumstances that causes it to happen more than occasionally (I'd estimate something like 1 in 20 page reloads on average). Many times we've made a change and thought we fixed it after >50 successful reloads but found that it does still crash occasionally.

I'll attach the stack trace from the crash.


Kernel: GENERIC
Architecture: amd64
nginx version: 1.14.2_13,2
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2019-10-13 22:01:47 UTC
@rlwestlund Could you please also provide:

- Complete uname -a
- /var/run/dmesg.boot output (as an attachment)
- /etc/rc.conf output (sanitized where necessary) as an attachment
- nginx configuration (sanitized where necessary) as an attachment
- pkg version -v output (as an attachment)
Comment 2 rlwestlund 2019-10-13 22:15:52 UTC
Sorry, but I'm afraid the server was wiped since then. I'll see if we can reproduce it now.
Comment 3 rlwestlund 2019-10-14 13:22:49 UTC
We're unable to reproduce it now. It may have been fixed in the patches since then. I did get another symptom that might have been related before, which is the jails failing to unmount on jail -r with "Device busy". Previously we traced that error to TCP connections being left open in TIME_WAIT state, but on the server that used to crash it happened always even if the jail had just been started. Since then we had solved it with a tcpdrop script as an exec.poststop hook, but with the server I just spun up to test it it still happened a few times. Once I looked into it and couldn't find anything wrong with the tcpdrop script I couldn't get the Device busy error to happen anymore. I'm not sure if this is related.
Comment 4 Kubilay Kocak freebsd_committer freebsd_triage 2019-10-15 01:54:27 UTC
@rlwestlund Please re-open the issue if it becomes reproducible once more