Bug 218572 - pass(4) driver sometimes does error recovery when CAM_PASS_ERR_RECOVER is not set
Summary: pass(4) driver sometimes does error recovery when CAM_PASS_ERR_RECOVER is not...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: Kenneth D. Merry
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-11 21:27 UTC by Terry Kennedy
Modified: 2017-05-10 15:23 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Terry Kennedy 2017-04-11 21:27:59 UTC
[This is a summation of a long discussion between me, ken@ and mav@]

After SVN rev 236814 in FreeBSD/head, the pass(4) driver does some error 
recovery, but not all cases, when the retry_count is set in the CCB and
CAM_PASS_ERR_RECOVER is not set.

Previously, the pass(4) driver would only do error recovery if
CAM_PASS_ERR_RECOVER is set.

This can be seen with 'camcontrol tur -v'.  camcontrol sets the retry_count
to 1 by default, so that the user will have at least one retry if he turns
on retries with -E.

If you reset a hard drive:

# camcontrol reset 1:172:0
Reset of 1:172:0 was successful

There should be a Unit Attention pending:

# camcontrol tur 1:172:0 -v
Unit is ready

But that doesn't happen, because the kernel is doing error recovery when
we have not turned it on with -E (which sets the CAM_PASS_ERR_RECOVER flag
on the CCB).

Retrying the experiment:

# camcontrol reset 1:172:0
Reset of 1:172:0 was successful

Now set the retry count to 0:

# camcontrol tur 1:172:0 -v -C 0
Unit is not ready
(pass42:mps1:0:172:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 
(pass42:mps1:0:172:0): CAM status: SCSI Status Error
(pass42:mps1:0:172:0): SCSI status: Check Condition
(pass42:mps1:0:172:0): SCSI sense: UNIT ATTENTION asc:29,2 (SCSI bus reset occurred)
(pass42:mps1:0:172:0): Field Replaceable Unit: 2

We get the unit attention.

Also, the "Filemark detected" asc/ascq entry (0x00,0x01) and other, similar
tape error recovery entries should probably have an error recovery action
of SS_NOP instead of SS_RDEF.  The application should be notified of 
Filemarks, setmarks, end of medium, etc.

[This affects everything after r237326 in 9-STABLE, so the affected releases are 9.1/2/3, 10.0/1/2/3, 11.0, HEAD. As everything before 10.3 is EoL, the fix only needs to be MFC'd back to 10.3.]
Comment 1 Kenneth D. Merry freebsd_committer freebsd_triage 2017-04-12 14:14:44 UTC
I'll take this one.
Comment 2 commit-hook freebsd_committer freebsd_triage 2017-05-03 20:59:53 UTC
A commit references this bug:

Author: ken
Date: Wed May  3 20:59:47 UTC 2017
New revision: 317775
URL: https://svnweb.freebsd.org/changeset/base/317775

Log:
  Fix error recovery behavior in the pass(4) driver.

  After FreeBSD SVN revision 236814, the pass(4) driver changed from
  only doing error recovery when the CAM_PASS_ERR_RECOVER flag was
  set on a CCB to sometimes doing error recovery if the passed in
  retry count was non-zero.

  Error recovery would happen if two conditions were met:

  1.  The error recovery action was simply a retry.  (Which is most
      cases.)
  2.  The retry_count is non-zero. (Which happened a lot because of
      cut-and-pasted code.)

  This explains a bug I noticed in with camcontrol:

  # camcontrol tur da34 -v
  Unit is ready
  # camcontrol reset da34
  Reset of 1:172:0 was successful

  At this point, there should be a Unit Attention:

  # camcontrol tur da34 -v
  Unit is ready

  No Unit Attention.

  Try it again:

  # camcontrol reset da34
  Reset of 1:172:0 was successful

  Now set the retry_count to 0 for the TUR:

  # camcontrol tur da34 -v -C 0
  Unit is not ready
  (pass42:mps1:0:172:0): TEST UNIT READY. CDB: 00 00 00 00 00 00
  (pass42:mps1:0:172:0): CAM status: SCSI Status Error
  (pass42:mps1:0:172:0): SCSI status: Check Condition
  (pass42:mps1:0:172:0): SCSI sense: UNIT ATTENTION asc:29,2 (SCSI bus reset occurred)
  (pass42:mps1:0:172:0): Field Replaceable Unit: 2

  There is the unit attention. camcontrol(8) has a default
  retry_count of 1, in case someone sets the -E flag without
  setting -C.

  The CAM_PASS_ERR_RECOVER behavior was only broken with the
  CAMIOCOMMAND ioctl, which is the synchronous pass(4) API.  It has
  worked as intended (error recovery is only done when the flag
  is set) in the asynchronous API (CAMIOQUEUE ioctl).

  sys/cam/scsi/scsi_pass.c:
  	In passsendccb(), when calling cam_periph_runccb(), only
  	specify the error routine when CAM_PASS_ERR_RECOVER is set.

  share/man/man4/pass.4:
  	Document that CAM_PASS_ERR_RECOVER is needed to enable
  	error recovery.

  Reported by:	Terry Kennedy <TERRY@glaver.org>
  PR:		kern/218572
  MFC after:	1 week
  Sponsored by:	Spectra Logic

Changes:
  head/share/man/man4/pass.4
  head/sys/cam/scsi/scsi_pass.c
Comment 3 commit-hook freebsd_committer freebsd_triage 2017-05-10 15:20:55 UTC
A commit references this bug:

Author: ken
Date: Wed May 10 15:20:38 UTC 2017
New revision: 318139
URL: https://svnweb.freebsd.org/changeset/base/318139

Log:
  MFC r317775:

    Fix error recovery behavior in the pass(4) driver.

    After FreeBSD SVN revision 236814, the pass(4) driver changed from
    only doing error recovery when the CAM_PASS_ERR_RECOVER flag was
    set on a CCB to sometimes doing error recovery if the passed in
    retry count was non-zero.

    Error recovery would happen if two conditions were met:

    1.  The error recovery action was simply a retry.  (Which is most
        cases.)
    2.  The retry_count is non-zero. (Which happened a lot because of
        cut-and-pasted code.)

    This explains a bug I noticed in with camcontrol:

    # camcontrol tur da34 -v
    Unit is ready
    # camcontrol reset da34
    Reset of 1:172:0 was successful

    At this point, there should be a Unit Attention:

    # camcontrol tur da34 -v
    Unit is ready

    No Unit Attention.

    Try it again:

    # camcontrol reset da34
    Reset of 1:172:0 was successful

    Now set the retry_count to 0 for the TUR:

    # camcontrol tur da34 -v -C 0
    Unit is not ready
    (pass42:mps1:0:172:0): TEST UNIT READY. CDB: 00 00 00 00 00 00
    (pass42:mps1:0:172:0): CAM status: SCSI Status Error
    (pass42:mps1:0:172:0): SCSI status: Check Condition
    (pass42:mps1:0:172:0): SCSI sense: UNIT ATTENTION asc:29,2 (SCSI bus reset
    occurred)
    (pass42:mps1:0:172:0): Field Replaceable Unit: 2

    There is the unit attention. camcontrol(8) has a default
    retry_count of 1, in case someone sets the -E flag without
    setting -C.

    The CAM_PASS_ERR_RECOVER behavior was only broken with the
    CAMIOCOMMAND ioctl, which is the synchronous pass(4) API.  It has
    worked as intended (error recovery is only done when the flag
    is set) in the asynchronous API (CAMIOQUEUE ioctl).

    sys/cam/scsi/scsi_pass.c:
    	In passsendccb(), when calling cam_periph_runccb(), only
    	specify the error routine when CAM_PASS_ERR_RECOVER is set.

    share/man/man4/pass.4:
    	Document that CAM_PASS_ERR_RECOVER is needed to enable
    	error recovery.

  Reported by:	Terry Kennedy <TERRY@glaver.org>
  PR:		kern/218572
  Sponsored by:	Spectra Logic

Changes:
_U  stable/11/
  stable/11/share/man/man4/pass.4
  stable/11/sys/cam/scsi/scsi_pass.c
Comment 4 commit-hook freebsd_committer freebsd_triage 2017-05-10 15:20:57 UTC
A commit references this bug:

Author: ken
Date: Wed May 10 15:20:40 UTC 2017
New revision: 318140
URL: https://svnweb.freebsd.org/changeset/base/318140

Log:
  MFC r317775:

    Fix error recovery behavior in the pass(4) driver.

    After FreeBSD SVN revision 236814, the pass(4) driver changed from
    only doing error recovery when the CAM_PASS_ERR_RECOVER flag was
    set on a CCB to sometimes doing error recovery if the passed in
    retry count was non-zero.

    Error recovery would happen if two conditions were met:

    1.  The error recovery action was simply a retry.  (Which is most
        cases.)
    2.  The retry_count is non-zero. (Which happened a lot because of
        cut-and-pasted code.)

    This explains a bug I noticed in with camcontrol:

    # camcontrol tur da34 -v
    Unit is ready
    # camcontrol reset da34
    Reset of 1:172:0 was successful

    At this point, there should be a Unit Attention:

    # camcontrol tur da34 -v
    Unit is ready

    No Unit Attention.

    Try it again:

    # camcontrol reset da34
    Reset of 1:172:0 was successful

    Now set the retry_count to 0 for the TUR:

    # camcontrol tur da34 -v -C 0
    Unit is not ready
    (pass42:mps1:0:172:0): TEST UNIT READY. CDB: 00 00 00 00 00 00
    (pass42:mps1:0:172:0): CAM status: SCSI Status Error
    (pass42:mps1:0:172:0): SCSI status: Check Condition
    (pass42:mps1:0:172:0): SCSI sense: UNIT ATTENTION asc:29,2 (SCSI bus reset
    occurred)
    (pass42:mps1:0:172:0): Field Replaceable Unit: 2

    There is the unit attention. camcontrol(8) has a default
    retry_count of 1, in case someone sets the -E flag without
    setting -C.

    The CAM_PASS_ERR_RECOVER behavior was only broken with the
    CAMIOCOMMAND ioctl, which is the synchronous pass(4) API.  It has
    worked as intended (error recovery is only done when the flag
    is set) in the asynchronous API (CAMIOQUEUE ioctl).

    sys/cam/scsi/scsi_pass.c:
    	In passsendccb(), when calling cam_periph_runccb(), only
    	specify the error routine when CAM_PASS_ERR_RECOVER is set.

    share/man/man4/pass.4:
    	Document that CAM_PASS_ERR_RECOVER is needed to enable
    	error recovery.

  Reported by:	Terry Kennedy <TERRY@glaver.org>
  PR:		kern/218572
  Sponsored by:	Spectra Logic

Changes:
_U  stable/10/
  stable/10/share/man/man4/pass.4
  stable/10/sys/cam/scsi/scsi_pass.c
Comment 5 Kenneth D. Merry freebsd_committer freebsd_triage 2017-05-10 15:23:06 UTC
Fixed in head, stable/11 and stable 10.  Thank you for the report!