Bug 22650

Summary: SCSI CD drives not attached on boot on isp driver
Product: Base System Reporter: wilko <wilko>
Component: alphaAssignee: Matt Jacob <mjacob>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
scsi_cd.c.scsi_status.20001109 none

Description wilko freebsd_committer freebsd_triage 2000-11-06 22:30:00 UTC
	Typical error is:

da1: 4357MB (8925000 512 byte sectors: 255H 63S/T 555C)
(cd0:isp0:0:6:0): got CAM status 0x50
(cd0:isp0:0:6:0): fatal error, failed to attach to device   
(cd0:isp0:0:6:0): lost device 
(cd0:isp0:0:6:0): removing device entry

Same hardware but using sym driven ncr810 works OK.

Please refer to mail thread with:

Subject: SCSI cdrom attach problems on 4-stable
Message-ID: <20001104200119.A13502@freebie.demon.nl>

posted on -alpha at Nov 4, 2000 

This contains multiple log / experiments.

Fix: 

Use other adapter than isp-drive one. Alternatively keeping a CD in the
drive appears to help.
How-To-Repeat: 
See above
Comment 1 Matt Jacob freebsd_committer freebsd_triage 2000-11-06 22:35:41 UTC
Responsible Changed
From-To: freebsd-alpha->mjacob

mine. 
.
Comment 2 wkb 2000-11-09 22:46:06 UTC
On Thu, Nov 09, 2000 at 03:37:17PM -0800, Matthew Jacob wrote:
> 
> 
> Wilko - does your machine that has this problem say:
> 
> isp0: invalid NVRAM header

Don't remember having seen this (ever). But will check tomorrow (heading to
bed now <snore>

W/

-- 
Wilko Bulte  	 					Arnhem, the Netherlands
wilko@freebsd.org  	http://www.freebsd.org 		http://www.nlfug.nl
Comment 3 ken 2000-11-09 22:49:11 UTC
On Thu, Nov 09, 2000 at 14:34:19 -0800, Matthew Jacob wrote:
> 
> I updated the PR (22650) with edit-pr, but it doesn't seem to then email
> submitter/responsible person. The short answer is "f/w breakage(?), don't know
> how to fix yet".
> 
> mjacob Wed Nov  8 17:08:53 PST 2000
> 
> Okay- here's what is happening- the Qlogic f/w is returning an AUTOSENSE
> failure- that is, it's unable to automatically run a request sense
> (reason unknown). So, the CAM status being returned is 
> 
>         CAM_AUTOSENSE_FAIL 
>         CAM_DEV_QFRZN
> 
> It's been apparent to me for some time that cam_periph_error should be
> then running an INQUIRY command in this case.

Why an inquiry, if autosense failed?  Don't you mean a request sense?

> What's particularly obnoxious here is that the CD in question doesn't
> actually completely detach- that is, it's pass instance stays, but
> the cd instance won't attach- and this, for some reason, makes it
> impossible to rescan it later.

The reason the device isn't gone is because the pass(4) driver actually
attached successfully.  The problem is that when there is no CD in the
drive, any CDROM drive will return an error in response to a READ CAPACITY
command.

Since autosense is failing, the cd(4) driver can't tell what sort of error
is getting returned (and therefore whether the drive is really accessible),
so it won't attach.

The pass(4) driver doesn't issue any commands to check the device (it
doesn't have any requirements for device functionality beyond the basic
probe code), so it attaches without problems.

Both drivers are doing the right thing from what I can see.

> I don't know why the Qlogic f/w is returning this code, but the fundamental
> problem here is that CAM is broken. And, no, it's not up to each SIM to
> run INQUIRY commands themselves if AUTOSENSE fails.

Don't you mean request sense?

Ken
-- 
Kenneth Merry
ken@kdm.org
Comment 4 mjacob 2000-11-09 22:54:35 UTC
On Thu, 9 Nov 2000, Kenneth D. Merry wrote:

> On Thu, Nov 09, 2000 at 14:34:19 -0800, Matthew Jacob wrote:
> > 
> > I updated the PR (22650) with edit-pr, but it doesn't seem to then email
> > submitter/responsible person. The short answer is "f/w breakage(?), don't know
> > how to fix yet".
> > 
> > mjacob Wed Nov  8 17:08:53 PST 2000
> > 
> > Okay- here's what is happening- the Qlogic f/w is returning an AUTOSENSE
> > failure- that is, it's unable to automatically run a request sense
> > (reason unknown). So, the CAM status being returned is 
> > 
> >         CAM_AUTOSENSE_FAIL 
> >         CAM_DEV_QFRZN
> > 
> > It's been apparent to me for some time that cam_periph_error should be
> > then running an INQUIRY command in this case.
> 
> Why an inquiry, if autosense failed?  Don't you mean a request sense?

Sorry. Ooops. Yes.

> 
> > What's particularly obnoxious here is that the CD in question doesn't
> > actually completely detach- that is, it's pass instance stays, but
> > the cd instance won't attach- and this, for some reason, makes it
> > impossible to rescan it later.
> 
> The reason the device isn't gone is because the pass(4) driver actually
> attached successfully.  The problem is that when there is no CD in the
> drive, any CDROM drive will return an error in response to a READ CAPACITY
> command.
> 
> Since autosense is failing, the cd(4) driver can't tell what sort of error
> is getting returned (and therefore whether the drive is really accessible),
> so it won't attach.

An AUTOSENSE failing means that a check condition occurred, but no sense data
is available. That should, in fact, be treated identically to READ CAPACITY
failing because there's no media.

> 
> The pass(4) driver doesn't issue any commands to check the device (it
> doesn't have any requirements for device functionality beyond the basic
> probe code), so it attaches without problems.
> 
> Both drivers are doing the right thing from what I can see.

But a later rescan should see it but it doesm't. And see above.

But the high order bit is that the autosense is failing. All other stuff in
scsi_cd is is secondary.

What's more important is that cam_periph_error or the periph should send a
REQUEST SENSE if AUTOSENSE fails- the sim should not be the one doing this.

> 
> > I don't know why the Qlogic f/w is returning this code, but the fundamental
> > problem here is that CAM is broken. And, no, it's not up to each SIM to
> > run INQUIRY commands themselves if AUTOSENSE fails.
> 
> Don't you mean request sense?

Yes, sorry. Brains.....

-matt
Comment 5 mjacob 2000-11-09 22:57:46 UTC
I should also note, btw, that this doesn't always happen predictably. 

One 8200 running 4.2 Beta does:

(cd0:isp3:0:4:0): READ CD RECORDED CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 
(cd0:isp3:0:4:0): NOT READY asc:3a,0
(cd0:isp3:0:4:0): Medium not present
cd0 at isp3 bus 0 target 4 lun 0
cd0: <DEC RRD45   (C) DEC 0436> Removable CD-ROM SCSI-2 device 
isp3: 0.4 get current period 0x3e offset 0xc flags 0xd500
cd0: 4.032MB/s transfers (4.032MHz, offset 12)
cd0: Attempt to query device size failed: NOT READY, Medium not present
da0: invalid primary partition table: no magic

While another does:

(cd0:isp3:0:4:0): got CAM status 0x50
(cd0:isp3:0:4:0): fatal error, failed to attach to device
(cd0:isp3:0:4:0): lost device
(cd0:isp3:0:4:0): removing device entry



Essentially the same hardware is involved.

-matt
Comment 6 ken 2000-11-09 23:13:38 UTC
On Thu, Nov 09, 2000 at 14:54:35 -0800, Matthew Jacob wrote:
> On Thu, 9 Nov 2000, Kenneth D. Merry wrote:
> > On Thu, Nov 09, 2000 at 14:34:19 -0800, Matthew Jacob wrote:
> > > What's particularly obnoxious here is that the CD in question doesn't
> > > actually completely detach- that is, it's pass instance stays, but
> > > the cd instance won't attach- and this, for some reason, makes it
> > > impossible to rescan it later.
> > 
> > The reason the device isn't gone is because the pass(4) driver actually
> > attached successfully.  The problem is that when there is no CD in the
> > drive, any CDROM drive will return an error in response to a READ CAPACITY
> > command.
> > 
> > Since autosense is failing, the cd(4) driver can't tell what sort of error
> > is getting returned (and therefore whether the drive is really accessible),
> > so it won't attach.
> 
> An AUTOSENSE failing means that a check condition occurred, but no sense data
> is available. That should, in fact, be treated identically to READ CAPACITY
> failing because there's no media.

Right.

> > The pass(4) driver doesn't issue any commands to check the device (it
> > doesn't have any requirements for device functionality beyond the basic
> > probe code), so it attaches without problems.
> > 
> > Both drivers are doing the right thing from what I can see.
> 
> But a later rescan should see it but it doesm't. And see above.

That's how the rescan semantics work.  The device only gets announced to
the peripheral drivers if it is a new device, or if it has gone away.

In this case the device is still there, and still attached to the pass(4)
driver.  So from the transport layer's perspective, all the peripheral
drivers have already seen the device and had a chance to attach or not.

> But the high order bit is that the autosense is failing. All other stuff in
> scsi_cd is is secondary.
> 
> What's more important is that cam_periph_error or the periph should send a
> REQUEST SENSE if AUTOSENSE fails- the sim should not be the one doing this.

I agree.  It would be interesting to find out what the SCSI status byte is
here.  I've attached a patch that should print it out.

Ken
-- 
Kenneth Merry
ken@kdm.org
Comment 7 mjacob 2000-11-09 23:33:47 UTC
It reported:

got SCSI status 0x2
Comment 8 mjacob 2000-11-09 23:37:17 UTC
Wilko - does your machine that has this problem say:

isp0: invalid NVRAM header

?
Comment 9 Justin T. Gibbs 2000-11-09 23:37:40 UTC
>> Since autosense is failing, the cd(4) driver can't tell what sort of error
>> is getting returned (and therefore whether the drive is really accessible),
>> so it won't attach.
>
>An AUTOSENSE failing means that a check condition occurred, but no sense data
>is available. That should, in fact, be treated identically to READ CAPACITY
>failing because there's no media.

Are you saying that the Qlogic firmware will return autosense fail
if the sense information is all zeros (no sense)?  That would be
really broken.

>What's more important is that cam_periph_error or the periph should send a
>REQUEST SENSE if AUTOSENSE fails- the sim should not be the one doing this.

I don't know that you are guaranteed to get correct sense in this case
as the first attempt to retrieve sense may have cleared or changed
the sense information.  I do have this implemented, BTW, in some error
recovery enhancements I've made, but I still don't know that it is
the appropriate thing to do.

--
Justin
Comment 10 ken 2000-11-09 23:41:35 UTC
On Thu, Nov 09, 2000 at 15:33:47 -0800, Matthew Jacob wrote:
> 
> It reported:
> 
> got SCSI status 0x2

Which is check condition.  So what would happen in cam_periph_error() is
that we would retry until our retry count was exhausted, and then return
EIO.

I suppose it would be nice to do a request sense there.

Ken
-- 
Kenneth Merry
ken@kdm.org
Comment 11 mjacob 2000-11-09 23:42:48 UTC
On Thu, 9 Nov 2000, Justin T. Gibbs wrote:

> >> Since autosense is failing, the cd(4) driver can't tell what sort of error
> >> is getting returned (and therefore whether the drive is really accessible),
> >> so it won't attach.
> >
> >An AUTOSENSE failing means that a check condition occurred, but no sense data
> >is available. That should, in fact, be treated identically to READ CAPACITY
> >failing because there's no media.
> 
> Are you saying that the Qlogic firmware will return autosense fail
> if the sense information is all zeros (no sense)?  That would be
> really broken.

No, no, no... It gives you a special status of "AUTOSENSE FAIL"- sort of much
like how the AHA1542 does it. But you know a check condition occurred.

> 
> >What's more important is that cam_periph_error or the periph should send a
> >REQUEST SENSE if AUTOSENSE fails- the sim should not be the one doing this.
> 
> I don't know that you are guaranteed to get correct sense in this case
> as the first attempt to retrieve sense may have cleared or changed
> the sense information.  I do have this implemented, BTW, in some error
> recovery enhancements I've made, but I still don't know that it is
> the appropriate thing to do.

Yes- that troubles me also. But w/o being able to say *why* autosense failed
it's the best one can do.

The pragmatics here are that if you have a CHECK CONDITION, maybe 7 times out
of 10 you really don't care what the Sense Key is. It's either an operation
you can retry on (most I/O to stateless devices) or fundamentally don't much
care about (e.g., a CHECK CONDITION on a tape unload).

-matt
Comment 12 mjacob 2000-11-09 23:45:15 UTC
On Thu, 9 Nov 2000, Kenneth D. Merry wrote:

> On Thu, Nov 09, 2000 at 15:33:47 -0800, Matthew Jacob wrote:
> > 
> > It reported:
> > 
> > got SCSI status 0x2
> 
> Which is check condition.  So what would happen in cam_periph_error() is
> that we would retry until our retry count was exhausted, and then return
> EIO.

Yes. I added a > 1 retry count to READ CAPACITY, but the problem seems to be
persistent.

It's this !@$*!$)!*$)!$*!)$*!)$!*$!)$*!$) problem with Qlogic and startup
resurfacing again. Remember that something similar bunged up Andrew for quite
some time about 8 months ago.

Now that I have a bus analyzer, I can probably even track down what's actually
up when I get time to do it.

> 
> I suppose it would be nice to do a request sense there.
> 

Yes. But it still puzzles me (at a high levelk) why I can't rescan and get cd
to attach later.

-matt
Comment 13 ken 2000-11-09 23:52:53 UTC
On Thu, Nov 09, 2000 at 15:45:15 -0800, Matthew Jacob wrote:
> On Thu, 9 Nov 2000, Kenneth D. Merry wrote:
> 
> > I suppose it would be nice to do a request sense there.
> > 
> 
> Yes. But it still puzzles me (at a high levelk) why I can't rescan and get cd
> to attach later.

Because the device hasn't gone away, it's still in the EDT.  So there's no
need to re-announce a device that is hasn't shown up or gone away since the
last time the bus was scanned.

Ken
-- 
Kenneth Merry
ken@kdm.org
Comment 14 mjacob 2000-11-09 23:53:44 UTC
> Because the device hasn't gone away, it's still in the EDT.  So there's no
> need to re-announce a device that is hasn't shown up or gone away since the
> last time the bus was scanned.

Yeah, but it's ready now to attach a different driver other than pass... :-)
Comment 15 ken 2000-11-10 00:04:28 UTC
On Thu, Nov 09, 2000 at 15:53:44 -0800, Matthew Jacob wrote:
> 
> > Because the device hasn't gone away, it's still in the EDT.  So there's no
> > need to re-announce a device that is hasn't shown up or gone away since the
> > last time the bus was scanned.
> 
> Yeah, but it's ready now to attach a different driver other than pass... :-)

The cd(4) driver had its chance, and declined.  It doesn't get another
chance.

We could make a design decision to always re-announce all devices during a
rescan, but it would take some thought and discussion to get to that point,
and then some code changes to make all the probe code work that way.

Ken
-- 
Kenneth Merry
ken@kdm.org
Comment 16 mjacob 2000-11-10 00:07:50 UTC
> > Yeah, but it's ready now to attach a different driver other than pass... :-)
> 
> The cd(4) driver had its chance, and declined.  It doesn't get another
> chance.

Well, if it were loadable it would.
Comment 17 wkb 2000-11-13 09:13:56 UTC
On Thu, Nov 09, 2000 at 11:46:06PM +0100, Wilko Bulte wrote:
> On Thu, Nov 09, 2000 at 03:37:17PM -0800, Matthew Jacob wrote:
> > 
> > 
> > Wilko - does your machine that has this problem say:
> > 
> > isp0: invalid NVRAM header
> 
> Don't remember having seen this (ever). But will check tomorrow (heading to
> bed now <snore>

Took a bit longer but I checked it today: my Miata never displayed this 
message.

W/

-- 
Wilko Bulte  	 					Arnhem, the Netherlands
wilko@freebsd.org  	http://www.freebsd.org 		http://www.nlfug.nl
Comment 18 mjacob 2000-11-14 03:28:41 UTC
'kay, thanks, that's one theory.


On Mon, 13 Nov 2000, Wilko Bulte wrote:

> On Thu, Nov 09, 2000 at 11:46:06PM +0100, Wilko Bulte wrote:
> > On Thu, Nov 09, 2000 at 03:37:17PM -0800, Matthew Jacob wrote:
> > > 
> > > 
> > > Wilko - does your machine that has this problem say:
> > > 
> > > isp0: invalid NVRAM header
> > 
> > Don't remember having seen this (ever). But will check tomorrow (heading to
> > bed now <snore>
> 
> Took a bit longer but I checked it today: my Miata never displayed this 
> message.
> 
> W/
> 
> -- 
> Wilko Bulte  	 					Arnhem, the Netherlands
> wilko@freebsd.org  	http://www.freebsd.org 		http://www.nlfug.nl
> 
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-alpha" in the body of the message
>
Comment 19 mjacob 2001-01-15 18:16:28 UTC
Wilko- try this patch and seeif this fixes things for you
(don't hit me! don't hit me! I'm just an idiot...)

Index: isp_freebsd.h
===================================================================
RCS file: /home/ncvs/src/sys/dev/isp/isp_freebsd.h,v
retrieving revision 1.43
diff -u -r1.43 isp_freebsd.h
--- isp_freebsd.h	2001/01/09 02:47:56	1.43
+++ isp_freebsd.h	2001/01/15 18:14:19
@@ -244,6 +244,7 @@
 	XS_SETERR(ccb, CAM_REQ_INPROG), (ccb)->ccb_h.spriv_field0 = 0
 
 #define	XS_SAVE_SENSE(xs, sp)				\
+	(xs)->ccb_h.status |= CAM_AUTOSNS_VALID,	\
 	bcopy(sp->req_sense_data, &(xs)->sense_data,	\
 	    imin(XS_SNSLEN(xs), sp->req_sense_len))
 


> 
> >Number:         22650
> >Category:       alpha
> >Synopsis:       SCSI CD drives not attached on boot on isp driver
> >Confidential:   no
> >Severity:       serious
> >Priority:       medium
> >Responsible:    freebsd-alpha
> >State:          open
> >Quarter:        
> >Keywords:       
> >Date-Required:
> >Class:          sw-bug
> >Submitter-Id:   current-users
> >Arrival-Date:   Mon Nov 06 14:30:00 PST 2000
> >Closed-Date:
> >Last-Modified:
> >Originator:     Wilko Bulte
> >Release:        FreeBSD 4-stable on alpha
> >Organization:
> Private FreeBSD site - The Netherlands
> >Environment:
> 
> 	Alpha with isp driven adapter and SCSI cdrom. DEC RRD4[56] 
> 	does reproduce well.
> 
> >Description:
> 
> 	Typical error is:
> 
> da1: 4357MB (8925000 512 byte sectors: 255H 63S/T 555C)
> (cd0:isp0:0:6:0): got CAM status 0x50
> (cd0:isp0:0:6:0): fatal error, failed to attach to device   
> (cd0:isp0:0:6:0): lost device 
> (cd0:isp0:0:6:0): removing device entry
> 
> Same hardware but using sym driven ncr810 works OK.
> 
> Please refer to mail thread with:
> 
> Subject: SCSI cdrom attach problems on 4-stable
> Message-ID: <20001104200119.A13502@freebie.demon.nl>
> 
> posted on -alpha at Nov 4, 2000 
> 
> This contains multiple log / experiments.
> 
> >How-To-Repeat:
> 
> See above
> 
> >Fix:
> 
> Use other adapter than isp-drive one. Alternatively keeping a CD in the
> drive appears to help.
> 
> >Release-Note:
> >Audit-Trail:
> >Unformatted:
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-alpha" in the body of the message
>
Comment 20 wkb 2001-01-15 20:02:00 UTC
On Mon, Jan 15, 2001 at 10:16:28AM -0800, Matthew Jacob wrote:

> Wilko- try this patch and seeif this fixes things for you
> (don't hit me! don't hit me! I'm just an idiot...)

I never said so...

But the patch fixes it nicely:

on -current:

cd0 at isp0 bus 0 target 6 lun 0
cd0: <TOSHIBA CD-ROM XM-6201TA 1206> Removable CD-ROM SCSI-2 device 
cd0: 10.000MB/s transfers (10.000MHz, offset 8)
cd0: Attempt to query device size failed: NOT READY, Medium not present

TATA (victory).

I'll try -stable next.

I saw you commited it to -current already.

	Wilko

-- 
|   / o / /  _  	 Arnhem, The Netherlands    	email: wilko@freebsd.org
|/|/ / / /( (_) Bulte	 http://www.freebsd.org 	http://www.nlfug.nl
Comment 21 wilko freebsd_committer freebsd_triage 2001-01-15 20:25:38 UTC
State Changed
From-To: open->closed

Fixed in -current per the following commit: 

mjacob      2001/01/15 10:36:09 PST 

Modified files: 
sys/dev/isp          isp_freebsd.h 
Log: 
Use the isp_lastmbxcmd tag to report timed out mailbox commands. 

Arrrggghhhh! Very likely fix 22650 by remembering to, ahem, set 
CAM_AUTOSNS_VALID when one has sense data.
Comment 22 mjacob 2001-01-15 21:48:49 UTC
On Mon, 15 Jan 2001, Wilko Bulte wrote:

> On Mon, Jan 15, 2001 at 10:16:28AM -0800, Matthew Jacob wrote:
> 
> > Wilko- try this patch and seeif this fixes things for you
> > (don't hit me! don't hit me! I'm just an idiot...)
> 
> I never said so...

Well, *I* say so....

> 
> But the patch fixes it nicely:
> 
> on -current:
> 
> cd0 at isp0 bus 0 target 6 lun 0
> cd0: <TOSHIBA CD-ROM XM-6201TA 1206> Removable CD-ROM SCSI-2 device 
> cd0: 10.000MB/s transfers (10.000MHz, offset 8)
> cd0: Attempt to query device size failed: NOT READY, Medium not present
> 
> TATA (victory).
> 
> I'll try -stable next.
> 
> I saw you commited it to -current already.

Yes, it was a necessary thing entirely. I'll take care of MFC'ing. Thanks.

-matt