Bug 105134

Summary: 'panic: lockmgr: thread ... not exclusive lock owner' on SMP Alpha EV6 machine
Product: Base System Reporter: Kevin Thompson <antiduh>
Component: alphaAssignee: freebsd-alpha (Nobody) <alpha>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Kevin Thompson 2006-11-04 07:20:20 UTC
Reproduced on FreeBSD 6.1-RELEASE, 6.2-BETA3 with SMP and PREEMPTION enabled.

Currently trying 6.2-PRERELEASE with preemption disabled, and if that doesn't work, turn off SMP.

Under heavy disk access and cpu load (eg make -j2 buildkernel), the machine deadlocks with the following message:
panic: lockmgr: thread 0xfffffc005f9a4000, not exclusive lock holder 0xfffffc00487e1500 unlocking cpuid = 1
It does not display anything more, it does not reboot, it just stops right there with that message. The machine is completely unresponsive from every interface - ethernet, serial console, vga console, etc.

The machine is a dual 500mhz alpha EV6 with 1.5GB of RAM, 

System disk:
   
   da0 at isp0 bus 0 target 0 lun 0
   da0: <IBM DDYS-T18350N S80D> Fixed Direct Access SCSI-3 device
   da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
   da0: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C)

Controller for system disk:
   isp0: <Qlogic ISP 1020/1040 PCI SCSI Adapter> port 0x1000-0x10ff mem 0x1010000-0x1010fff irq 47 at device 7.0 on pci1
   isp0: interrupting at TSUNAMI irq 47
   isp0: [GIANT-LOCKED]


The important bits of bootup:
   ST6600
   AlphaPC 264DP 500 MHz, 500MHz
   8192 byte page size, 2 processors.
   CPU: EV6 (21264) major=8 minor=7 extensions=0x303<BWX,FIX,MVI,PRECISE>
   OSF PAL rev: 0x2004500020157
   real memory  = 1607819264 (1533 MB)
   avail memory = 1567784960 (1495 MB)
   FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
    cpu0 (BSP): PAL ID:  0
    cpu1 (AP): PAL ID:  1
   tsunami0: <21271 Core Logic chipset>

How-To-Repeat: Heavy disk i/o and cpu usage on alpha w/ SMP & PREEMPTION.
Comment 1 kevin 2006-11-04 17:34:59 UTC
Turning off PREEMPTION and leaving on SMP seems to have done it. The  
machine has made it through a full kernel and world rebuild several times  
now - before it would usually crash before the first kernel finished.

--Kevin Thompson
Comment 2 Kevin Thompson 2006-11-07 01:14:36 UTC
Okay, so the machine has made me a liar. It died again while doing heavy  
I/O and CPU (recursive fetching a website while building apache22). It  
would seem that the the failures are much harder to reproduce with  
PREEMPTION off, but I guess still a problem. Maybe turning SMP off would  
be the final kicker.

I might be willing to provide access to this machine and its serial  
console if it would be of any help in debugging the problem.

--Kevin
Comment 3 kevin 2006-11-07 18:08:17 UTC
I turned on the following kernel options and tried to reproduce it - no  
luck. I had it running make -j2 buildworld repeatedly and it was still  
running a day later.
options         WITNESS
options         INVARIANTS
options         INVARIANT_SUPPORT
options         DIAGNOSTIC

Preemption was still off.

Of mention, I couldn't get the kernel to compile with options DEBUG_LOCKS  
on.
The only output that I got every once in a while were the following:
Expensive timeout(9) function: 0xfffffc00005f7820(0) 0.002491656 s
Expensive timeout(9) function: 0xfffffc000054f7c0(0xfffffc005f928810)  
0.010740188 s
Expensive timeout(9) function: 0xfffffc00005746e0(0) 0.011898436 s
Expensive timeout(9) function: 0xfffffc000054f7c0(0xfffffc005f929830)  
0.012660264 s

I'm going to try to turn PREEMPTION back on and see if I can reproduce it.

--Kevin
Comment 4 kevin 2006-11-07 18:12:11 UTC
I turned on the following kernel options and tried to reproduce it - no
luck. I had it running make -j2 buildworld repeatedly and it was still
running a day later.
options         WITNESS
options         INVARIANTS
options         INVARIANT_SUPPORT
options         DIAGNOSTIC

Preemption was still off.

Of mention, I couldn't get the kernel to compile with options DEBUG_LOCKS
on.
The only output that I got every once in a while were the following:
Expensive timeout(9) function: 0xfffffc00005f7820(0) 0.002491656 s
Expensive timeout(9) function: 0xfffffc000054f7c0(0xfffffc005f928810)
0.010740188 s
Expensive timeout(9) function: 0xfffffc00005746e0(0) 0.011898436 s
Expensive timeout(9) function: 0xfffffc000054f7c0(0xfffffc005f929830)
0.012660264 s

I'm going to try to turn PREEMPTION back on and see if I can reproduce it.

--Kevin
Comment 5 kevin 2006-11-08 07:05:50 UTC
Didn't turn preemption on, but did turn off the debug options, and yeah,  
it blew up. Anyone willing to help on this?

--Kevin
Comment 6 kevin 2006-11-12 19:04:16 UTC
Downgraded to FreeBSD 5.5 and the machine has been stable for a couple  
days now. I guess something bad happened between 5 and 6, I'm guessing  
giant.

--Kevin
Comment 7 John Baldwin freebsd_committer freebsd_triage 2010-11-03 13:30:11 UTC
State Changed
From-To: open->closed

Other platforms have not reported problems with this, so I do not think 
this is MI.  This could possibly have been due to insufficiently strong 
memory clobbers in atomic operations (if so, that might also have explained 
instability with PREEMPTION enabled).  However, since Alpha development has 
stopped, this will not be fixed.