Bug 46537

Summary: amr(4) hangs system on -CURRENT or make panic, and conditionally
Product: Base System Reporter: Jason Li <delphij>
Component: kernAssignee: Andre Oppermann <andre>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 5.0-CURRENT   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
smime.p7s none

Description Jason Li 2002-12-26 09:30:00 UTC
	FreeBSD 5-CURRENT hangs or panics on amr(4) driver.

	Recently our university has purchased some new servers. Among them, there 
are one Dell PowerEdge 2650 with two Pentium4-XEON 2G installed, and 2GB of 
RAM. No matter if the memory is limited to 256MB, or if the HyperThread is 
disabled or enabled.

	The server has AIC 7899 SCSI controller, and Dell Extensible RAID 
Controller(LSILogic MegaRAID having 128MB RAM on it) installed. When booting 
from 5.0-RC-20021213 CD-ROM, the system stops to respond after the following 
line is displayed on screen:

	amr0: <LSILogic MegaRAID> mem 0xf0000000-0xf7ffffff irq 5 at device 0.0 on 
pci3

	I have tried 4.7-STABLE, when ANY ONE of(and of course combined) the 
following options enabled, the system hangs on amr0 too:
	ENABLE_SSE
	SMP (with APIC_IO)
	makeoptions     CONF_CFLAGS=-fno-builtin
	options         MAXDSIZ="(2048*1024*1024)"
	options         MAXSSIZ="(2048*1024*1024)"
	options         DFLDSIZ="(2048*1024*1024)"

	What's more, amr(4) panics on HP6000 with the MegaRAID having 32MB RAM:

	amr0: <LSILogic MegaRAID> mem 0xe0000000-0xefffffff irq 5 at device 3.1 on 
pci4

	Fatal trap 12: Page fault in kernel
	fault virtual address 	= 0xde
	fault code		= supervisor read, no page present
	instruction pointer	= 0x8:0xc01a8229
	stack pointer		= 0x10:0xc0b0f9fc
	frame pointer		= 0x10:0xc0b0fa0c
	code segment		= base rx0, limit 0xfffff, type 0x1b
				= DPL 0, pres 1, def32 1, grow 1
	processor eflags	= interrupt enabled, resume, IOPL=0
	current process		= 0 (swapper)
	kernel: type 12 trap, code = 0
	stopped at amr_alloccmd+0x19:	cmpl $0, 0(%ecx)

	This behavior doesn't appear on 5.0-DP1, I have to say, that DP1 doesn't 
have the same problem. But DP2 seemed to have the same problem, and all 
-CURRENT from that time, including all recent RCs, does have the problem.

Fix: 

Currently unknown, because I have final exam in these weeks. But I would be 
glad to help work out this problem in January 2003, if needed, as I am 
familiar with i386 assembly language, C and C++ language, and programming 
with hardware. I think if this problem continues, the release of 5.0 should 
be considered to be delayed, as LSI MegaRAID is widely used around the 
world.

_________________________________________________________________
Protect your PC - get McAfee.com VirusScan Online 
http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
How-To-Repeat: 	Boot from FreeBSD 5-CURRENT installation CD-ROM on Dell PowerEdge 2650, the 
system hangs immediately after the following line:

	and the keybord stops to respond anything, including CapsLock. Seemed that 
the system completely hangs.
Comment 1 Liu Kang 2003-01-01 01:59:04 UTC
I got the same problem when using new amr driver with dell PERC3/DC raid 
card.
I tried the newest source (use cvsup tag=. and tag=RELENG_5_0), but it can 
not boot and hangs when probe amr device. I think it is a serious problem, 
because a lot of servers use this kind of raid card.
Here is a workaround for this pr: rollback all files in src/sys/dev/amr to 
that dated OCT 30(or 31). I've not compare the differenet between newest 
driver and OCT's driver, but when rollback the system works fine.

PS, 4.7 stable has the same problem, rollback is temporary way to get rid of 
this problem.

Happy new year, daemons!!


liukang=liukang->next;




_________________________________________________________________
Help STOP SPAM: Try the new MSN 8 and get 2 months FREE* 
http://join.msn.com/?page=features/junkmail
Comment 2 Liu Kang 2003-01-09 05:11:35 UTC
I tried to enable debug option in amr driver by adding :"CFLAGS= -O
-pipe -DAMR_DEBUG COPTFLAGS= -O -pipe -DAMR_DEBUG" in /etc/make.conf and
add some printf() in amr.c for debug. I think I find the problem , but I
do not know how to solve it.

In amr.c (Revision 1.38)
Line 1008: while(sc->amr_mailbox->mb_nstatus == 0xFF);
The value of sc->amr_mailbox->mb_nstatus is always 255, it seems that
amr card should change the value of mb_nstatus itself. (my card is dell
PERC3/DC)

In pr 45698 http://www.freebsd.org/cgi/query-pr.cgi?pr=45698 I find
there is the same problem in HP's raid card on 5.0.
5.0 is about to release,  device amr is widely used in servers, I think
this problem should be considered as critical.
Comment 3 Liu Kang 2003-01-09 05:31:01 UTC
I tried to enable debug option in amr driver by adding :"CFLAGS= -O
-pipe -DAMR_DEBUG COPTFLAGS= -O -pipe -DAMR_DEBUG" in /etc/make.conf and
add some printf() in amr.c for debug. I think I find the problem , but I
do not know how to solve it.

In amr.c (Revision 1.38)
Line 1008: while(sc->amr_mailbox->mb_nstatus == 0xFF);
The value of sc->amr_mailbox->mb_nstatus is always 255, it seems that
amr card should change the value of mb_nstatus itself. (my card is dell
PERC3/DC)

In pr 45698 http://www.freebsd.org/cgi/query-pr.cgi?pr=45698 I find
there is the same problem in HP's raid card on 5.0.
5.0 is about to release,Since the device amr is widely used in servers,this 
problem should be considered as critical.


_________________________________________________________________
MSN 8 with e-mail virus protection service: 2 months FREE* 
http://join.msn.com/?page=features/virus
Comment 4 Andre Oppermann freebsd_committer freebsd_triage 2003-12-27 14:40:32 UTC
Xin,

do you still have the problem with FreeBSD 5.2 or -CURRENT?
There have been many fixes to the amr driver since you filed
the PR.

-- 
Andre
Comment 5 Andre Oppermann freebsd_committer freebsd_triage 2003-12-27 14:41:39 UTC
State Changed
From-To: open->feedback

There have been many changes and fixes to the amr driver. 
Check back with Originator if fixed. 


Comment 6 Andre Oppermann freebsd_committer freebsd_triage 2003-12-27 14:41:39 UTC
Responsible Changed
From-To: freebsd-bugs->andre

There have been many changes and fixes to the amr driver. 
Check back with Originator if fixed.
Comment 7 Jason Li 2003-12-27 17:18:43 UTC
This was believed to be fixed when 5.0 was released. If memory serves me
right, it was fixed in revision 1.39 of sys/dev/amr/amr.c, and MFC'ed as
1.36.2.2 for 5.0-RELEASE, and 1.7.2.13 for 4-STABLE.

Please close this. The problem no longer exists :) Thank you for your great
work!

Xin LI
Frontfree Technology Network

_______________________________________________
freebsd-bugs@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"
Comment 8 Andre Oppermann freebsd_committer freebsd_triage 2003-12-27 17:39:53 UTC
State Changed
From-To: feedback->closed

Fixed according to Originator.