Bug 30744

Summary: UDMA ICRC error results in kernel panic
Product: Base System Reporter: david <david>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.4-RELEASE   
Hardware: Any   
OS: Any   

Description david 2001-09-22 18:40:01 UTC
I use a Promise TX2/100 card. From /var/run/dmesg.boot:
ar0: 19458MB <ATA RAID1 array> [2480/255/63] subdisks:
  ad4: 19458MB <Maxtor 2B020H1> [39535/16/63] at ata2-master UDMA100
  ad6: 19458MB <Maxtor 2B020H1> [39535/16/63] at ata3-master UDMA100

When an UDMA ICRC error is printed, the kernel panics with 'integer divide fault'.

Fix: 

Comment out the bit of diskerr() in ufs/ufs_disksubr.c that prints the (hp0 bn %d cn %d tn %d sn %d) bit (i.e. lines 367 to 384) - that's where the problem seems to lie...
How-To-Repeat: 
Generate a UDMA ICRC error on a Promise TX2/100
Comment 1 greid freebsd_committer freebsd_triage 2001-09-22 18:47:03 UTC
On Sat, Sep 22, 2001 at 10:38:28AM -0700, David Hedley wrote:

> >Fix:
> 
> Comment out the bit of diskerr() in ufs/ufs_disksubr.c that prints the
> (hp0 bn %d cn %d tn %d sn %d) bit (i.e. lines 367 to 384) - that's where 
> the problem seems to lie...

The correct fix is to replace the cable and/or drive(s) which are at 
fault...

~greid
Comment 2 david 2001-09-22 18:53:39 UTC
Well, yes.

However, I don't expect the whole OS to go belly up if it gets one CRC
error, especially if it can simply retry the request (which the code tries
to do anyhow).

A better solution maybe to not pass in the disklabel structure to diskerr as
obviously some of the fields haven't been initialised (properly).

> -----Original Message-----
> From: George Reid [mailto:greid@FreeBSD.org]
> Sent: 22 September 2001 18:47
> To: David Hedley
> Cc: freebsd-gnats-submit@FreeBSD.org
> Subject: Re: kern/30744: UDMA ICRC error results in kernel panic
>
>
> On Sat, Sep 22, 2001 at 10:38:28AM -0700, David Hedley wrote:
>
> > >Fix:
> >
> > Comment out the bit of diskerr() in ufs/ufs_disksubr.c that prints the
> > (hp0 bn %d cn %d tn %d sn %d) bit (i.e. lines 367 to 384) -
> that's where
> > the problem seems to lie...
>
> The correct fix is to replace the cable and/or drive(s) which are at
> fault...
>
> ~greid
>
> This email has been virus scanned using Sophos Anti-Virus by intY
> (www.inty.net)
>
>


--
Information in this electronic mail message is confidential
and may be legally privileged. It is intended solely for
the addressee. Access to this message by anyone else is
unauthorised. If you are not the intended recipient any 
use, disclosure, copying or distribution of this message is
prohibited and may be unlawful. When addressed to our
customers, any information contained in this message is
subject to Intelligent Network Technology Ltd Terms & Conditions.
-----------------------------------------------
Take part in the intY 2001 Email Usage survey
online at http://www.inty.net/email/survey.html
-----------------------------------------------

This email has been virus scanned using Sophos Anti-Virus by intY (www.inty.net)
Comment 3 Søren Schmidt freebsd_committer freebsd_triage 2002-05-06 19:36:54 UTC
State Changed
From-To: open->closed

Fixed in 4.6