| Summary: | 'panic: lockmgr: thread ... not exclusive lock owner' on SMP Alpha EV6 machine | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Kevin Thompson <antiduh> |
| Component: | alpha | Assignee: | freebsd-alpha (Nobody) <alpha> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | Unspecified | ||
| Hardware: | Any | ||
| OS: | Any | ||
Turning off PREEMPTION and leaving on SMP seems to have done it. The machine has made it through a full kernel and world rebuild several times now - before it would usually crash before the first kernel finished. --Kevin Thompson Okay, so the machine has made me a liar. It died again while doing heavy I/O and CPU (recursive fetching a website while building apache22). It would seem that the the failures are much harder to reproduce with PREEMPTION off, but I guess still a problem. Maybe turning SMP off would be the final kicker. I might be willing to provide access to this machine and its serial console if it would be of any help in debugging the problem. --Kevin I turned on the following kernel options and tried to reproduce it - no luck. I had it running make -j2 buildworld repeatedly and it was still running a day later. options WITNESS options INVARIANTS options INVARIANT_SUPPORT options DIAGNOSTIC Preemption was still off. Of mention, I couldn't get the kernel to compile with options DEBUG_LOCKS on. The only output that I got every once in a while were the following: Expensive timeout(9) function: 0xfffffc00005f7820(0) 0.002491656 s Expensive timeout(9) function: 0xfffffc000054f7c0(0xfffffc005f928810) 0.010740188 s Expensive timeout(9) function: 0xfffffc00005746e0(0) 0.011898436 s Expensive timeout(9) function: 0xfffffc000054f7c0(0xfffffc005f929830) 0.012660264 s I'm going to try to turn PREEMPTION back on and see if I can reproduce it. --Kevin I turned on the following kernel options and tried to reproduce it - no luck. I had it running make -j2 buildworld repeatedly and it was still running a day later. options WITNESS options INVARIANTS options INVARIANT_SUPPORT options DIAGNOSTIC Preemption was still off. Of mention, I couldn't get the kernel to compile with options DEBUG_LOCKS on. The only output that I got every once in a while were the following: Expensive timeout(9) function: 0xfffffc00005f7820(0) 0.002491656 s Expensive timeout(9) function: 0xfffffc000054f7c0(0xfffffc005f928810) 0.010740188 s Expensive timeout(9) function: 0xfffffc00005746e0(0) 0.011898436 s Expensive timeout(9) function: 0xfffffc000054f7c0(0xfffffc005f929830) 0.012660264 s I'm going to try to turn PREEMPTION back on and see if I can reproduce it. --Kevin Didn't turn preemption on, but did turn off the debug options, and yeah, it blew up. Anyone willing to help on this? --Kevin Downgraded to FreeBSD 5.5 and the machine has been stable for a couple days now. I guess something bad happened between 5 and 6, I'm guessing giant. --Kevin State Changed From-To: open->closed Other platforms have not reported problems with this, so I do not think this is MI. This could possibly have been due to insufficiently strong memory clobbers in atomic operations (if so, that might also have explained instability with PREEMPTION enabled). However, since Alpha development has stopped, this will not be fixed. |
Reproduced on FreeBSD 6.1-RELEASE, 6.2-BETA3 with SMP and PREEMPTION enabled. Currently trying 6.2-PRERELEASE with preemption disabled, and if that doesn't work, turn off SMP. Under heavy disk access and cpu load (eg make -j2 buildkernel), the machine deadlocks with the following message: panic: lockmgr: thread 0xfffffc005f9a4000, not exclusive lock holder 0xfffffc00487e1500 unlocking cpuid = 1 It does not display anything more, it does not reboot, it just stops right there with that message. The machine is completely unresponsive from every interface - ethernet, serial console, vga console, etc. The machine is a dual 500mhz alpha EV6 with 1.5GB of RAM, System disk: da0 at isp0 bus 0 target 0 lun 0 da0: <IBM DDYS-T18350N S80D> Fixed Direct Access SCSI-3 device da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled da0: 17501MB (35843670 512 byte sectors: 255H 63S/T 2231C) Controller for system disk: isp0: <Qlogic ISP 1020/1040 PCI SCSI Adapter> port 0x1000-0x10ff mem 0x1010000-0x1010fff irq 47 at device 7.0 on pci1 isp0: interrupting at TSUNAMI irq 47 isp0: [GIANT-LOCKED] The important bits of bootup: ST6600 AlphaPC 264DP 500 MHz, 500MHz 8192 byte page size, 2 processors. CPU: EV6 (21264) major=8 minor=7 extensions=0x303<BWX,FIX,MVI,PRECISE> OSF PAL rev: 0x2004500020157 real memory = 1607819264 (1533 MB) avail memory = 1567784960 (1495 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): PAL ID: 0 cpu1 (AP): PAL ID: 1 tsunami0: <21271 Core Logic chipset> How-To-Repeat: Heavy disk i/o and cpu usage on alpha w/ SMP & PREEMPTION.