Bug 252553 - HardenedBSD Kernel panic on big network and IO load
Summary: HardenedBSD Kernel panic on big network and IO load
Status: Closed Not Accepted
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2021-01-10 09:16 UTC by dmilith
Modified: 2021-01-10 18:23 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dmilith 2021-01-10 09:16:38 UTC
Hello.

I struggle with very annoying KP one of my dedicated servers.

Long story short - it's i7 8 cores, 64GiBs RAM machine with ELK server (Elasticsearch 5.6 + Kibana 5.6 + Logstash 5.6), which gathers application logs from our 350+ external hosts

Let's say - it's under heavy network and IO load. I'm also using ZFS on root with sync=disabled on datasets (this way after a panic we lose only some part of logs and whole ES index doesn't get corrupted).
System has 64G swap on ZFS enabled, but it never gets really filled.

Every 5-6 hours, (sometimes 5-6 days…) kernel just panics there.
Yesterday I build new, fresh 12.2 kernel with NETDUMP feature enabled, and did setup of netdumpd on our second dedicated machine.

In effect, I woke up and have some information from minidump with some info what's causing the panic (hopefully):

https://gist.github.com/dmilith/9606ebf422ae1770b42d9e23f1116c7b
Comment 1 Ed Maste freebsd_committer 2021-01-10 17:22:25 UTC
This looks like memory corruption; even if it is due to an issue in vanilla FreeBSD there is not enough information here to fix.

If you can reproduce this on FreeBSD and store the vmcore somewhere we might be able to solve it, or if you can track down the cause we'll see about a fix.
clear
Comment 2 Mark Johnston freebsd_committer 2021-01-10 18:23:29 UTC
There was a similar report on FreeBSD stable/12 recently so I suspect it is a FreeBSD bug.  Maybe only in 12, UMA in head has changed a lot.

So there is a minidump?  Can you show the backtrace from kgdb?  In particular I would like to see line numbers for the stack trace.  The linked stack trace is from DDB, the in-kernel debugger.