| Summary: | SCSI CD drives not attached on boot on isp driver | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | wilko <wilko> | ||||
| Component: | alpha | Assignee: | Matt Jacob <mjacob> | ||||
| Status: | Closed FIXED | ||||||
| Severity: | Affects Only Me | ||||||
| Priority: | Normal | ||||||
| Version: | Unspecified | ||||||
| Hardware: | Any | ||||||
| OS: | Any | ||||||
| Attachments: |
|
||||||
|
Description
wilko
2000-11-06 22:30:00 UTC
Responsible Changed From-To: freebsd-alpha->mjacob mine. . On Thu, Nov 09, 2000 at 03:37:17PM -0800, Matthew Jacob wrote: > > > Wilko - does your machine that has this problem say: > > isp0: invalid NVRAM header Don't remember having seen this (ever). But will check tomorrow (heading to bed now <snore> W/ -- Wilko Bulte Arnhem, the Netherlands wilko@freebsd.org http://www.freebsd.org http://www.nlfug.nl On Thu, Nov 09, 2000 at 14:34:19 -0800, Matthew Jacob wrote: > > I updated the PR (22650) with edit-pr, but it doesn't seem to then email > submitter/responsible person. The short answer is "f/w breakage(?), don't know > how to fix yet". > > mjacob Wed Nov 8 17:08:53 PST 2000 > > Okay- here's what is happening- the Qlogic f/w is returning an AUTOSENSE > failure- that is, it's unable to automatically run a request sense > (reason unknown). So, the CAM status being returned is > > CAM_AUTOSENSE_FAIL > CAM_DEV_QFRZN > > It's been apparent to me for some time that cam_periph_error should be > then running an INQUIRY command in this case. Why an inquiry, if autosense failed? Don't you mean a request sense? > What's particularly obnoxious here is that the CD in question doesn't > actually completely detach- that is, it's pass instance stays, but > the cd instance won't attach- and this, for some reason, makes it > impossible to rescan it later. The reason the device isn't gone is because the pass(4) driver actually attached successfully. The problem is that when there is no CD in the drive, any CDROM drive will return an error in response to a READ CAPACITY command. Since autosense is failing, the cd(4) driver can't tell what sort of error is getting returned (and therefore whether the drive is really accessible), so it won't attach. The pass(4) driver doesn't issue any commands to check the device (it doesn't have any requirements for device functionality beyond the basic probe code), so it attaches without problems. Both drivers are doing the right thing from what I can see. > I don't know why the Qlogic f/w is returning this code, but the fundamental > problem here is that CAM is broken. And, no, it's not up to each SIM to > run INQUIRY commands themselves if AUTOSENSE fails. Don't you mean request sense? Ken -- Kenneth Merry ken@kdm.org On Thu, 9 Nov 2000, Kenneth D. Merry wrote: > On Thu, Nov 09, 2000 at 14:34:19 -0800, Matthew Jacob wrote: > > > > I updated the PR (22650) with edit-pr, but it doesn't seem to then email > > submitter/responsible person. The short answer is "f/w breakage(?), don't know > > how to fix yet". > > > > mjacob Wed Nov 8 17:08:53 PST 2000 > > > > Okay- here's what is happening- the Qlogic f/w is returning an AUTOSENSE > > failure- that is, it's unable to automatically run a request sense > > (reason unknown). So, the CAM status being returned is > > > > CAM_AUTOSENSE_FAIL > > CAM_DEV_QFRZN > > > > It's been apparent to me for some time that cam_periph_error should be > > then running an INQUIRY command in this case. > > Why an inquiry, if autosense failed? Don't you mean a request sense? Sorry. Ooops. Yes. > > > What's particularly obnoxious here is that the CD in question doesn't > > actually completely detach- that is, it's pass instance stays, but > > the cd instance won't attach- and this, for some reason, makes it > > impossible to rescan it later. > > The reason the device isn't gone is because the pass(4) driver actually > attached successfully. The problem is that when there is no CD in the > drive, any CDROM drive will return an error in response to a READ CAPACITY > command. > > Since autosense is failing, the cd(4) driver can't tell what sort of error > is getting returned (and therefore whether the drive is really accessible), > so it won't attach. An AUTOSENSE failing means that a check condition occurred, but no sense data is available. That should, in fact, be treated identically to READ CAPACITY failing because there's no media. > > The pass(4) driver doesn't issue any commands to check the device (it > doesn't have any requirements for device functionality beyond the basic > probe code), so it attaches without problems. > > Both drivers are doing the right thing from what I can see. But a later rescan should see it but it doesm't. And see above. But the high order bit is that the autosense is failing. All other stuff in scsi_cd is is secondary. What's more important is that cam_periph_error or the periph should send a REQUEST SENSE if AUTOSENSE fails- the sim should not be the one doing this. > > > I don't know why the Qlogic f/w is returning this code, but the fundamental > > problem here is that CAM is broken. And, no, it's not up to each SIM to > > run INQUIRY commands themselves if AUTOSENSE fails. > > Don't you mean request sense? Yes, sorry. Brains..... -matt I should also note, btw, that this doesn't always happen predictably. One 8200 running 4.2 Beta does: (cd0:isp3:0:4:0): READ CD RECORDED CAPACITY. CDB: 25 0 0 0 0 0 0 0 0 0 (cd0:isp3:0:4:0): NOT READY asc:3a,0 (cd0:isp3:0:4:0): Medium not present cd0 at isp3 bus 0 target 4 lun 0 cd0: <DEC RRD45 (C) DEC 0436> Removable CD-ROM SCSI-2 device isp3: 0.4 get current period 0x3e offset 0xc flags 0xd500 cd0: 4.032MB/s transfers (4.032MHz, offset 12) cd0: Attempt to query device size failed: NOT READY, Medium not present da0: invalid primary partition table: no magic While another does: (cd0:isp3:0:4:0): got CAM status 0x50 (cd0:isp3:0:4:0): fatal error, failed to attach to device (cd0:isp3:0:4:0): lost device (cd0:isp3:0:4:0): removing device entry Essentially the same hardware is involved. -matt On Thu, Nov 09, 2000 at 14:54:35 -0800, Matthew Jacob wrote: > On Thu, 9 Nov 2000, Kenneth D. Merry wrote: > > On Thu, Nov 09, 2000 at 14:34:19 -0800, Matthew Jacob wrote: > > > What's particularly obnoxious here is that the CD in question doesn't > > > actually completely detach- that is, it's pass instance stays, but > > > the cd instance won't attach- and this, for some reason, makes it > > > impossible to rescan it later. > > > > The reason the device isn't gone is because the pass(4) driver actually > > attached successfully. The problem is that when there is no CD in the > > drive, any CDROM drive will return an error in response to a READ CAPACITY > > command. > > > > Since autosense is failing, the cd(4) driver can't tell what sort of error > > is getting returned (and therefore whether the drive is really accessible), > > so it won't attach. > > An AUTOSENSE failing means that a check condition occurred, but no sense data > is available. That should, in fact, be treated identically to READ CAPACITY > failing because there's no media. Right. > > The pass(4) driver doesn't issue any commands to check the device (it > > doesn't have any requirements for device functionality beyond the basic > > probe code), so it attaches without problems. > > > > Both drivers are doing the right thing from what I can see. > > But a later rescan should see it but it doesm't. And see above. That's how the rescan semantics work. The device only gets announced to the peripheral drivers if it is a new device, or if it has gone away. In this case the device is still there, and still attached to the pass(4) driver. So from the transport layer's perspective, all the peripheral drivers have already seen the device and had a chance to attach or not. > But the high order bit is that the autosense is failing. All other stuff in > scsi_cd is is secondary. > > What's more important is that cam_periph_error or the periph should send a > REQUEST SENSE if AUTOSENSE fails- the sim should not be the one doing this. I agree. It would be interesting to find out what the SCSI status byte is here. I've attached a patch that should print it out. Ken -- Kenneth Merry ken@kdm.org It reported: got SCSI status 0x2 Wilko - does your machine that has this problem say: isp0: invalid NVRAM header ? >> Since autosense is failing, the cd(4) driver can't tell what sort of error >> is getting returned (and therefore whether the drive is really accessible), >> so it won't attach. > >An AUTOSENSE failing means that a check condition occurred, but no sense data >is available. That should, in fact, be treated identically to READ CAPACITY >failing because there's no media. Are you saying that the Qlogic firmware will return autosense fail if the sense information is all zeros (no sense)? That would be really broken. >What's more important is that cam_periph_error or the periph should send a >REQUEST SENSE if AUTOSENSE fails- the sim should not be the one doing this. I don't know that you are guaranteed to get correct sense in this case as the first attempt to retrieve sense may have cleared or changed the sense information. I do have this implemented, BTW, in some error recovery enhancements I've made, but I still don't know that it is the appropriate thing to do. -- Justin On Thu, Nov 09, 2000 at 15:33:47 -0800, Matthew Jacob wrote: > > It reported: > > got SCSI status 0x2 Which is check condition. So what would happen in cam_periph_error() is that we would retry until our retry count was exhausted, and then return EIO. I suppose it would be nice to do a request sense there. Ken -- Kenneth Merry ken@kdm.org On Thu, 9 Nov 2000, Justin T. Gibbs wrote: > >> Since autosense is failing, the cd(4) driver can't tell what sort of error > >> is getting returned (and therefore whether the drive is really accessible), > >> so it won't attach. > > > >An AUTOSENSE failing means that a check condition occurred, but no sense data > >is available. That should, in fact, be treated identically to READ CAPACITY > >failing because there's no media. > > Are you saying that the Qlogic firmware will return autosense fail > if the sense information is all zeros (no sense)? That would be > really broken. No, no, no... It gives you a special status of "AUTOSENSE FAIL"- sort of much like how the AHA1542 does it. But you know a check condition occurred. > > >What's more important is that cam_periph_error or the periph should send a > >REQUEST SENSE if AUTOSENSE fails- the sim should not be the one doing this. > > I don't know that you are guaranteed to get correct sense in this case > as the first attempt to retrieve sense may have cleared or changed > the sense information. I do have this implemented, BTW, in some error > recovery enhancements I've made, but I still don't know that it is > the appropriate thing to do. Yes- that troubles me also. But w/o being able to say *why* autosense failed it's the best one can do. The pragmatics here are that if you have a CHECK CONDITION, maybe 7 times out of 10 you really don't care what the Sense Key is. It's either an operation you can retry on (most I/O to stateless devices) or fundamentally don't much care about (e.g., a CHECK CONDITION on a tape unload). -matt On Thu, 9 Nov 2000, Kenneth D. Merry wrote: > On Thu, Nov 09, 2000 at 15:33:47 -0800, Matthew Jacob wrote: > > > > It reported: > > > > got SCSI status 0x2 > > Which is check condition. So what would happen in cam_periph_error() is > that we would retry until our retry count was exhausted, and then return > EIO. Yes. I added a > 1 retry count to READ CAPACITY, but the problem seems to be persistent. It's this !@$*!$)!*$)!$*!)$*!)$!*$!)$*!$) problem with Qlogic and startup resurfacing again. Remember that something similar bunged up Andrew for quite some time about 8 months ago. Now that I have a bus analyzer, I can probably even track down what's actually up when I get time to do it. > > I suppose it would be nice to do a request sense there. > Yes. But it still puzzles me (at a high levelk) why I can't rescan and get cd to attach later. -matt On Thu, Nov 09, 2000 at 15:45:15 -0800, Matthew Jacob wrote: > On Thu, 9 Nov 2000, Kenneth D. Merry wrote: > > > I suppose it would be nice to do a request sense there. > > > > Yes. But it still puzzles me (at a high levelk) why I can't rescan and get cd > to attach later. Because the device hasn't gone away, it's still in the EDT. So there's no need to re-announce a device that is hasn't shown up or gone away since the last time the bus was scanned. Ken -- Kenneth Merry ken@kdm.org > Because the device hasn't gone away, it's still in the EDT. So there's no
> need to re-announce a device that is hasn't shown up or gone away since the
> last time the bus was scanned.
Yeah, but it's ready now to attach a different driver other than pass... :-)
On Thu, Nov 09, 2000 at 15:53:44 -0800, Matthew Jacob wrote: > > > Because the device hasn't gone away, it's still in the EDT. So there's no > > need to re-announce a device that is hasn't shown up or gone away since the > > last time the bus was scanned. > > Yeah, but it's ready now to attach a different driver other than pass... :-) The cd(4) driver had its chance, and declined. It doesn't get another chance. We could make a design decision to always re-announce all devices during a rescan, but it would take some thought and discussion to get to that point, and then some code changes to make all the probe code work that way. Ken -- Kenneth Merry ken@kdm.org > > Yeah, but it's ready now to attach a different driver other than pass... :-)
>
> The cd(4) driver had its chance, and declined. It doesn't get another
> chance.
Well, if it were loadable it would.
On Thu, Nov 09, 2000 at 11:46:06PM +0100, Wilko Bulte wrote: > On Thu, Nov 09, 2000 at 03:37:17PM -0800, Matthew Jacob wrote: > > > > > > Wilko - does your machine that has this problem say: > > > > isp0: invalid NVRAM header > > Don't remember having seen this (ever). But will check tomorrow (heading to > bed now <snore> Took a bit longer but I checked it today: my Miata never displayed this message. W/ -- Wilko Bulte Arnhem, the Netherlands wilko@freebsd.org http://www.freebsd.org http://www.nlfug.nl 'kay, thanks, that's one theory.
On Mon, 13 Nov 2000, Wilko Bulte wrote:
> On Thu, Nov 09, 2000 at 11:46:06PM +0100, Wilko Bulte wrote:
> > On Thu, Nov 09, 2000 at 03:37:17PM -0800, Matthew Jacob wrote:
> > >
> > >
> > > Wilko - does your machine that has this problem say:
> > >
> > > isp0: invalid NVRAM header
> >
> > Don't remember having seen this (ever). But will check tomorrow (heading to
> > bed now <snore>
>
> Took a bit longer but I checked it today: my Miata never displayed this
> message.
>
> W/
>
> --
> Wilko Bulte Arnhem, the Netherlands
> wilko@freebsd.org http://www.freebsd.org http://www.nlfug.nl
>
>
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-alpha" in the body of the message
>
Wilko- try this patch and seeif this fixes things for you
(don't hit me! don't hit me! I'm just an idiot...)
Index: isp_freebsd.h
===================================================================
RCS file: /home/ncvs/src/sys/dev/isp/isp_freebsd.h,v
retrieving revision 1.43
diff -u -r1.43 isp_freebsd.h
--- isp_freebsd.h 2001/01/09 02:47:56 1.43
+++ isp_freebsd.h 2001/01/15 18:14:19
@@ -244,6 +244,7 @@
XS_SETERR(ccb, CAM_REQ_INPROG), (ccb)->ccb_h.spriv_field0 = 0
#define XS_SAVE_SENSE(xs, sp) \
+ (xs)->ccb_h.status |= CAM_AUTOSNS_VALID, \
bcopy(sp->req_sense_data, &(xs)->sense_data, \
imin(XS_SNSLEN(xs), sp->req_sense_len))
>
> >Number: 22650
> >Category: alpha
> >Synopsis: SCSI CD drives not attached on boot on isp driver
> >Confidential: no
> >Severity: serious
> >Priority: medium
> >Responsible: freebsd-alpha
> >State: open
> >Quarter:
> >Keywords:
> >Date-Required:
> >Class: sw-bug
> >Submitter-Id: current-users
> >Arrival-Date: Mon Nov 06 14:30:00 PST 2000
> >Closed-Date:
> >Last-Modified:
> >Originator: Wilko Bulte
> >Release: FreeBSD 4-stable on alpha
> >Organization:
> Private FreeBSD site - The Netherlands
> >Environment:
>
> Alpha with isp driven adapter and SCSI cdrom. DEC RRD4[56]
> does reproduce well.
>
> >Description:
>
> Typical error is:
>
> da1: 4357MB (8925000 512 byte sectors: 255H 63S/T 555C)
> (cd0:isp0:0:6:0): got CAM status 0x50
> (cd0:isp0:0:6:0): fatal error, failed to attach to device
> (cd0:isp0:0:6:0): lost device
> (cd0:isp0:0:6:0): removing device entry
>
> Same hardware but using sym driven ncr810 works OK.
>
> Please refer to mail thread with:
>
> Subject: SCSI cdrom attach problems on 4-stable
> Message-ID: <20001104200119.A13502@freebie.demon.nl>
>
> posted on -alpha at Nov 4, 2000
>
> This contains multiple log / experiments.
>
> >How-To-Repeat:
>
> See above
>
> >Fix:
>
> Use other adapter than isp-drive one. Alternatively keeping a CD in the
> drive appears to help.
>
> >Release-Note:
> >Audit-Trail:
> >Unformatted:
>
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-alpha" in the body of the message
>
On Mon, Jan 15, 2001 at 10:16:28AM -0800, Matthew Jacob wrote: > Wilko- try this patch and seeif this fixes things for you > (don't hit me! don't hit me! I'm just an idiot...) I never said so... But the patch fixes it nicely: on -current: cd0 at isp0 bus 0 target 6 lun 0 cd0: <TOSHIBA CD-ROM XM-6201TA 1206> Removable CD-ROM SCSI-2 device cd0: 10.000MB/s transfers (10.000MHz, offset 8) cd0: Attempt to query device size failed: NOT READY, Medium not present TATA (victory). I'll try -stable next. I saw you commited it to -current already. Wilko -- | / o / / _ Arnhem, The Netherlands email: wilko@freebsd.org |/|/ / / /( (_) Bulte http://www.freebsd.org http://www.nlfug.nl State Changed From-To: open->closed Fixed in -current per the following commit: mjacob 2001/01/15 10:36:09 PST Modified files: sys/dev/isp isp_freebsd.h Log: Use the isp_lastmbxcmd tag to report timed out mailbox commands. Arrrggghhhh! Very likely fix 22650 by remembering to, ahem, set CAM_AUTOSNS_VALID when one has sense data. On Mon, 15 Jan 2001, Wilko Bulte wrote: > On Mon, Jan 15, 2001 at 10:16:28AM -0800, Matthew Jacob wrote: > > > Wilko- try this patch and seeif this fixes things for you > > (don't hit me! don't hit me! I'm just an idiot...) > > I never said so... Well, *I* say so.... > > But the patch fixes it nicely: > > on -current: > > cd0 at isp0 bus 0 target 6 lun 0 > cd0: <TOSHIBA CD-ROM XM-6201TA 1206> Removable CD-ROM SCSI-2 device > cd0: 10.000MB/s transfers (10.000MHz, offset 8) > cd0: Attempt to query device size failed: NOT READY, Medium not present > > TATA (victory). > > I'll try -stable next. > > I saw you commited it to -current already. Yes, it was a necessary thing entirely. I'll take care of MFC'ing. Thanks. -matt |