Bug 128263

Summary: [panic] 2 amd64 dl380 g5 with dual quadcore xeons, 8 and 16gb ram, crash and dump mem
Product: Base System Reporter: Martin W <martin.wikesjo>
Component: amd64Assignee: freebsd-amd64 (Nobody) <amd64>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Martin W 2008-10-21 09:10:01 UTC
Two HP DL 380 G5's with dual quadcore xeons, 8GB ram on one and 16GB ram
on the other. They have recently been crashing and dumping mem. The 16GB
one has crashed 3 times the last 1 1/2 month, twice yesterday. The 8GB
one had its first crash yesterday. They are db-servers which run postgresql
or mysql.

They are both Running 7.0 amd64 RELEASE with GENERIC kernel, completely
default installs except for some sysctl tweaks:

kern.ipc.maxsockets=16424
kern.maxfiles=65536
kern.maxfilesperproc=32768
net.inet.tcp.recvspace=32768
net.inet.tcp.sendspace=65536

Below is the output from kgdb, which looks about the same on both of them.

How-To-Repeat: Random, so don't know.
Comment 1 spry 2008-10-31 12:01:04 UTC
Hiya

     I have a 2 CPU Quad-Core Xeon HP DL380 w/ 8Gb of RAM:

     http://pastebin.com/f4a00a1f5

     I've installed 7.0R, then built the world to 7.0Rp5, and now
7.1-PRERELEASE. It is one of the fastest boxens I've held my hands on,
can build the world in just:
          2772.643u 1392.701s 13:23.41 518.4%     6056+1294k
4423+8405io 2295pf+0w

     No problems whatsoever, it might be a component of your system?


-- 
cheers
mars
Comment 2 mw 2009-07-31 17:30:41 UTC
Sorry for the late follow up. Since this has been rated as "serious" and 
with a "high" priority, yet I have recieved no real feedback I haven't 
put much effort into reporting anymore.
Anyhow, we have been having the same issues with a few more machines 
now. Random spontaneous crashes.

I do suspect faulty hardware, more specifically RAM or CPU.
But since the errors I see don't provide me any proof I am unable to 
convince our hardware vendor, HP, that they are broken. I have ran the 
HP diagnostics for 7 loops as HP recommends, and it reports no errors.
I have also recompiled the kernel on one of these machines with "options 
PRINTF_BUFR_SIZE=128" to see if the output would be more than just 
garbage, but it did not help.

We will attempt to upgrade one machine to 7.2 next week to see if it 
will produce better error logs if/when they crash again(or maybe we'll 
be incredibly lucky and its a software bug that is now fixed).

FWIW, these machine are part of a large online gaming platform. It has 
well over 100 more of these machines with the same hardware and FreeBSD 
setup.

If someone could look into this that would be much appreciated.
Comment 3 Joel Dahl freebsd_committer freebsd_triage 2009-07-31 17:32:05 UTC
For the record: I've chatted with the submitter and with jhb.  The panics are 
due to NMI's and most likely bad RAM.  One workaround would be to turn off 
machdep.panic_on_nmi ...

--
Joel
Comment 4 Andriy Gapon freebsd_committer freebsd_triage 2010-12-05 11:17:57 UTC
Is this still an issue?
Can it be reproduced with head or stable/8 (8.2 prerelease)?

-- 
Andriy Gapon
Comment 5 Martin W 2010-12-06 13:02:49 UTC
This customer is no longer managed by myself, or the company I work for.
So I can't answer that unfortunately.

/Martin
Comment 6 Andriy Gapon freebsd_committer freebsd_triage 2010-12-06 13:45:31 UTC
State Changed
From-To: open->closed

The reporter no longer has access to the problematic system.