| Summary: | Kernel Panic on Dual Processor System during heavy disk IO | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | cjm88 <cjm88> | ||||||
| Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||||
| Status: | Closed FIXED | ||||||||
| Severity: | Affects Only Me | ||||||||
| Priority: | Normal | ||||||||
| Version: | 4.2-RELEASE | ||||||||
| Hardware: | Any | ||||||||
| OS: | Any | ||||||||
| Attachments: |
|
||||||||
|
Description
cjm88
2001-03-28 03:40:02 UTC
On Tue, Mar 27, 2001 at 06:31:21PM -0800, cjm88@home.com wrote: > > >Number: 26161 > >Category: kern > >Synopsis: Kernel Panic on Dual Processor System during heavy disk IO > >Originator: Christophe Michel > >Release: 4.2-RELEASE > >Organization: > >Environment: > FreeBSD u2 4.2-RELEASE FreeBSD 4.2-RELEASE #1: Sat Mar 24 21:27:43 EST 2001 > root@u2:/usr/src/sys/compile/U2 i386 > > >Description: > The system panics when subjected to heavy disk IO. The system is an Intel > altserver with two Pentium 166 processors on a mother board supprting SMP > 1.4. I'm using the on-board adaptec SCSI controller with 2G Seagate drive. > It is quite stable until something requires heavy disk IO and then > crashes within 15 to 30 minutes. The behavior is the same whether the IO > is for swapping or just heavy file access. I managed to photograph the > console just after the panic on two occasions and can forward via e-mail, > those jpgs to whoever would be interested in looking at this problem. > > I tried to replicate the problem on two other FreeBSD platforms but they > were single-cpu boxes. The problem did not occur even after extended > disk pounding (over 24 hours). This is all very nice :) But, can you either: 1. update your system to 4.2-stable (which is actually 4.3-RC now), or 2. follow the instructions on http://www.FreeBSD.org/handbook/kerneldebug.html to build a debugging kernel, run dumpon, have the kernel panic again, this time storing the core dump, then run savecore and examine the kernel crash dump, posting more information about the dump? G'luck, Peter -- Nostalgia ain't what it used to be. cjm88@home.com wrote: > OK, Here's what I'll do (in the hope that it's the most useful way to proceed in > terms of QA for future releases). > > 1) I'll follow your first suggestion and update to 4.2-stable/4.3-RC1 > 2) I'll try to replicate the problem > 3) If it occurs I'll follow your second suggestion and try to recreate the problem > again > 4) I'll advise regarding the outcome with additional information in either case. > > I should be able to perform the above this evening. > > Thanks for your help :) > > C > > PS. I attached the jpegs of the console for your interesst although the info will > be moot once I complete step 1) > > Peter Pentchev wrote: > > > On Tue, Mar 27, 2001 at 06:31:21PM -0800, cjm88@home.com wrote: > > > > > > >Number: 26161 > > > >Category: kern > > > >Synopsis: Kernel Panic on Dual Processor System during heavy disk IO > > > >Originator: Christophe Michel > > > >Release: 4.2-RELEASE > > > >Organization: > > > >Environment: > > > FreeBSD u2 4.2-RELEASE FreeBSD 4.2-RELEASE #1: Sat Mar 24 21:27:43 EST 2001 > > > root@u2:/usr/src/sys/compile/U2 i386 > > > > > > >Description: > > > The system panics when subjected to heavy disk IO. The system is an Intel > > > altserver with two Pentium 166 processors on a mother board supprting SMP > > > 1.4. I'm using the on-board adaptec SCSI controller with 2G Seagate drive. > > > It is quite stable until something requires heavy disk IO and then > > > crashes within 15 to 30 minutes. The behavior is the same whether the IO > > > is for swapping or just heavy file access. I managed to photograph the > > > console just after the panic on two occasions and can forward via e-mail, > > > those jpgs to whoever would be interested in looking at this problem. > > > > > > I tried to replicate the problem on two other FreeBSD platforms but they > > > were single-cpu boxes. The problem did not occur even after extended > > > disk pounding (over 24 hours). > > > > This is all very nice :) But, can you either: > > > > 1. update your system to 4.2-stable (which is actually 4.3-RC now), or > > > > 2. follow the instructions on http://www.FreeBSD.org/handbook/kerneldebug.html > > to build a debugging kernel, run dumpon, have the kernel panic again, > > this time storing the core dump, then run savecore and examine > > the kernel crash dump, posting more information about the dump? > > > > G'luck, > > Peter > > > > -- > > Nostalgia ain't what it used to be. > > ------------------------------------------------------------------------ > [Image] [Image] OK,
I followed the first suggestion and initially it seemed that the system was more
stable (i.e. it took longer for it to panic).
When it paniced I built a debug versino of the kernel and ran the tests again.
this time it took even longer for the system to crash. So I ran the tests a few
more times with increasing intensity. It seemed that the time required to crash
the system was inversely proportional to the intensity of the disk IO that the
system was subjected to.
Here is what the gdb session (run by a newbie... i.e. 'me' ) showed... I probably
need some further direction from someone more experienced to extract more useful
information.
u2# gdb -k
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-unknown-freebsd".
(kgdb) symbol-file kernel.debug
Reading symbols from kernel.debug...done.
(kgdb) exec-file kernel
(kgdb) core-file /usr/wrk/vmcore.0
SMP 2 cpus
IdlePTD 3461120
initial pcb at 2bdae0
panicstr: NMI indicates hardware failure
panic messages:
---
---
#0 0xc0158eae in dumpsys ()
(kgdb) where
#0 0xc0158eae in dumpsys ()
#1 0xc0158ccf in boot ()
#2 0xc0159080 in poweroff_wait ()
#3 0xc0256d90 in trap (frame={tf_fs = 47, tf_es = 47, tf_ds = 47,
tf_edi = 134533120, tf_esi = 69, tf_ebp = -1077937120,
tf_isp = -931377196, tf_ebx = 183304184, tf_edx = -1077937252,
tf_ecx = 672025592, tf_eax = 15, tf_trapno = 19, tf_err = 0,
tf_eip = 134514178, tf_cs = 31, tf_eflags = 514, tf_esp = -1077937160,
tf_ss = 47}) at ../../i386/i386/trap.c:396
#4 0x8048602 in ?? ()
#5 0x80484bd in ?? ()
(kgdb)
OK... :) so now what do I do???
Thanks for your help.
C
Peter Pentchev wrote:
> On Tue, Mar 27, 2001 at 06:31:21PM -0800, cjm88@home.com wrote:
> >
> > >Number: 26161
> > >Category: kern
> > >Synopsis: Kernel Panic on Dual Processor System during heavy disk IO
> > >Originator: Christophe Michel
> > >Release: 4.2-RELEASE
> > >Organization:
> > >Environment:
> > FreeBSD u2 4.2-RELEASE FreeBSD 4.2-RELEASE #1: Sat Mar 24 21:27:43 EST 2001
> > root@u2:/usr/src/sys/compile/U2 i386
> >
> > >Description:
> > The system panics when subjected to heavy disk IO. The system is an Intel
> > altserver with two Pentium 166 processors on a mother board supprting SMP
> > 1.4. I'm using the on-board adaptec SCSI controller with 2G Seagate drive.
> > It is quite stable until something requires heavy disk IO and then
> > crashes within 15 to 30 minutes. The behavior is the same whether the IO
> > is for swapping or just heavy file access. I managed to photograph the
> > console just after the panic on two occasions and can forward via e-mail,
> > those jpgs to whoever would be interested in looking at this problem.
> >
> > I tried to replicate the problem on two other FreeBSD platforms but they
> > were single-cpu boxes. The problem did not occur even after extended
> > disk pounding (over 24 hours).
>
> This is all very nice :) But, can you either:
>
> 1. update your system to 4.2-stable (which is actually 4.3-RC now), or
>
> 2. follow the instructions on http://www.FreeBSD.org/handbook/kerneldebug.html
> to build a debugging kernel, run dumpon, have the kernel panic again,
> this time storing the core dump, then run savecore and examine
> the kernel crash dump, posting more information about the dump?
>
> G'luck,
> Peter
>
> --
> Nostalgia ain't what it used to be.
State Changed From-To: open->feedback NMI traps are usually caused by some sort of hardware failure (RAM parity errors etc). I guess maybe try replacing bits of hardware to see if you can isolate the problem and let us know if it helps. State Changed From-To: feedback->closed Feedback timeout. |