Bug 65801

Summary: 5.2.1 locks up with SMP kernel
Product: Base System Reporter: Russell Francis <rf358197>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 5.2.1-RELEASE   
Hardware: Any   
OS: Any   

Description Russell Francis 2004-04-20 02:50:21 UTC
The machine which is a dual PIII "locks up" after a day or so when it is compiled with an SMP kernel.  I have been unable to duplicate this with a UP kernel.  When I say lock up, the following things occur.

- Screen goes black.

- Keyboard becomes unresponsive [Numlock/capslock] don't toggle lights on the keyboard.

- The machine becomes unpingable and ssh into the machine no longer works.

----------------------------------------------------------------------

I am interested in helping resolve the problem but don't really know what information to provide.  Any guidance would be appreciated :)

How-To-Repeat: I haven't found a way to repeat the problem consistently, it just happens within a day or two.

It doesn't seem to occur with any particular application, it has happened with only X running and no users on the machine.  It has also happened with many applications open and me interacting with it.
Comment 1 Kris Kennaway 2004-04-24 05:27:09 UTC
On Mon, Apr 19, 2004 at 06:44:38PM -0700, Russell Francis wrote:

> >Description:
> The machine which is a dual PIII "locks up" after a day or so when it is compiled with an SMP kernel.  I have been unable to duplicate this with a UP kernel.  When I say lock up, the following things occur.
> 
> - Screen goes black.
> 
> - Keyboard becomes unresponsive [Numlock/capslock] don't toggle lights on the keyboard.
> 
> - The machine becomes unpingable and ssh into the machine no longer works.

The machine is probably either panicking, or deadlocking due a locking
error.  Enable WITNESS, DDB, INVARIANTS and kernel crashdumping, and
try to obtain debugging information.  Leaving the machine out of X
Windows may also help, because you'll see the panic message on the
system console.  See the Developer's Handbook and for more details.

Kris
Comment 2 Kris Kennaway freebsd_committer freebsd_triage 2004-04-24 05:27:12 UTC
State Changed
From-To: open->feedback

Awaiting more debugging information
Comment 3 Russell Francis 2004-05-06 18:55:54 UTC
 > >Description:
 > The machine which is a dual PIII "locks up" after a day or so when it 
 > is compiled with an SMP kernel.  I have been unable to duplicate this 
 > with a UP kernel.  When I say lock up, the following things occur.
 >
 > - Screen goes black.
 >
 > - Keyboard becomes unresponsive [Numlock/capslock] don't toggle 
lights >    on the keyboard.
 >
 > - The machine becomes unpingable and ssh into the machine no longer > 
 >    works.

  The machine is probably either panicking, or deadlocking due a locking
  error.  Enable WITNESS, DDB, INVARIANTS and kernel crashdumping, and
  try to obtain debugging information.  Leaving the machine out of X
  Windows may also help, because you'll see the panic message on the
  system console.  See the Developer's Handbook and for more details.

  Kris,

I haven't had any luck getting a core nor has the machine locked up like 
it had before.  I have however been able to get it to drop to the kernel 
debugger.  Here is the stack trace if that helps.  It looks like a 
possible locking issue.

lock order reversal
  1st 0xc070a800 UMA lock (UMA lock) @ vm/uma_core.c:1200
  2nd 0xc0c31100 system map (system map ) @ vm/vm_map.c:2210
Stack backtrace:

backtrace(c066e42c,c0c31100,c0679189,c0679189,c06791e4) at backtrace+0x17

witness_lock(c0c31100,8,c06791e4,8a2,c0c310a0) at witness_lock+0x5aa

_mtx_lock_flags(c0c31100,0,c06791db,8a2,c4a5f000) at _mtx_lock_flags+0x6a

_vm_map_lock(c0c310a0,c06791db,8a2,c070a0c0,1) at _vm_map_lock+0x36

vm_map_remove(c0c310a0,c4a5e000,c4a5f000,d767fbf8,c05faadb) at 
vm_map_remove+0x30

kmem_free(c0c310a0,c4a5e000,1000,d767fc28,c05fa4ef) at kmem_free+0x32

page_free(c4a5e000,1000,2,0,c4a5e000) at page_free+0x3b

zone_drain(c0c2b380,0,c067a289,4b0,0) at zone_drain+0x2cf

zone_foreach(c05fa220,d767fcf0,c05f7aa6,c067a191,0) at zone_foreach+0x45

uma_reclaim(c067a191,0,c067a13a,29e,c06e1bc0) at uma_reclaim+0x17

vm_pageout_scan(0,0,c067a13a,5a9,1f4) at vm_pageout_scan+0xf6

vm_pageout(0,d767fd48,c066a061,311,0) at vm_pageout+0x31b

fork_exit(c05f8840,0,d767fd48) at fork_exit+0x7e

fork_trampoline() at fork_trampoline+0x8

--- trap 0x1, eip = 0, esp = 0xd767fd7c, ebp = 0 ---

Debugger("witness_lock")
Stopped at Debugger+0x55: xchgl %ebx,in_Debugger.0



--------------------------------------------------------------------------

Looking at the output from dmesg also revealed this

lock order reversal
  1st 0xc070a800 UMA lock (UMA lock) @ vm/uma_core.c:1200
  2nd 0xc0c31100 system map (system map) @ vm/vm_map.c:2210
Stack backtrace:
psmintr: delay too long; resetting byte count
drm0: <Matrox G400/G450 (AGP)> mem 
0xd7000000-0xd77fffff,0xd6000000-0xd6003fff,0xd4000000-0xd5ffffff irq 9 
at device 0.0 on pci1
info: [drm] AGP at 0xd0000000 64MB
info: [drm] Initialized mga 3.1.0 20021029 on minor 0
drm0: [MPSAFE]
lock order reversal
  1st 0xc4678108 vm object (vm object) @ vm/swap_pager.c:1323
  2nd 0xc0709c80 swap_pager swhash (swap_pager swhash) @ 
vm/swap_pager.c:1838
  3rd 0xc0c358c4 vm object (vm object) @ vm/uma_core.c:873
Stack backtrace:



I hope this helps a little, I am still trying to get a core ...

Thanks,
Russell Francis
Comment 4 Russell Francis 2004-07-07 11:51:56 UTC
After further investigation, I have identified the graphics card and 
memory in the machine as being faulty.  Putting them in other machines 
causes the same symptoms regardless of whether they run Linux/Windows or 
FreeBSD.  Putting new memory and graphics card in this machine causes it 
to run flawlessly :)

This is only a hardware problem.

Thanks,
Russ
Comment 5 Mark Linimon freebsd_committer freebsd_triage 2004-08-30 23:58:43 UTC
State Changed
From-To: feedback->closed

Submitter identified this as a hardware problem.