[This is a summation of a long discussion between me, ken@ and mav@] After SVN rev 236814 in FreeBSD/head, the pass(4) driver does some error recovery, but not all cases, when the retry_count is set in the CCB and CAM_PASS_ERR_RECOVER is not set. Previously, the pass(4) driver would only do error recovery if CAM_PASS_ERR_RECOVER is set. This can be seen with 'camcontrol tur -v'. camcontrol sets the retry_count to 1 by default, so that the user will have at least one retry if he turns on retries with -E. If you reset a hard drive: # camcontrol reset 1:172:0 Reset of 1:172:0 was successful There should be a Unit Attention pending: # camcontrol tur 1:172:0 -v Unit is ready But that doesn't happen, because the kernel is doing error recovery when we have not turned it on with -E (which sets the CAM_PASS_ERR_RECOVER flag on the CCB). Retrying the experiment: # camcontrol reset 1:172:0 Reset of 1:172:0 was successful Now set the retry count to 0: # camcontrol tur 1:172:0 -v -C 0 Unit is not ready (pass42:mps1:0:172:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 (pass42:mps1:0:172:0): CAM status: SCSI Status Error (pass42:mps1:0:172:0): SCSI status: Check Condition (pass42:mps1:0:172:0): SCSI sense: UNIT ATTENTION asc:29,2 (SCSI bus reset occurred) (pass42:mps1:0:172:0): Field Replaceable Unit: 2 We get the unit attention. Also, the "Filemark detected" asc/ascq entry (0x00,0x01) and other, similar tape error recovery entries should probably have an error recovery action of SS_NOP instead of SS_RDEF. The application should be notified of Filemarks, setmarks, end of medium, etc. [This affects everything after r237326 in 9-STABLE, so the affected releases are 9.1/2/3, 10.0/1/2/3, 11.0, HEAD. As everything before 10.3 is EoL, the fix only needs to be MFC'd back to 10.3.]
I'll take this one.
A commit references this bug: Author: ken Date: Wed May 3 20:59:47 UTC 2017 New revision: 317775 URL: https://svnweb.freebsd.org/changeset/base/317775 Log: Fix error recovery behavior in the pass(4) driver. After FreeBSD SVN revision 236814, the pass(4) driver changed from only doing error recovery when the CAM_PASS_ERR_RECOVER flag was set on a CCB to sometimes doing error recovery if the passed in retry count was non-zero. Error recovery would happen if two conditions were met: 1. The error recovery action was simply a retry. (Which is most cases.) 2. The retry_count is non-zero. (Which happened a lot because of cut-and-pasted code.) This explains a bug I noticed in with camcontrol: # camcontrol tur da34 -v Unit is ready # camcontrol reset da34 Reset of 1:172:0 was successful At this point, there should be a Unit Attention: # camcontrol tur da34 -v Unit is ready No Unit Attention. Try it again: # camcontrol reset da34 Reset of 1:172:0 was successful Now set the retry_count to 0 for the TUR: # camcontrol tur da34 -v -C 0 Unit is not ready (pass42:mps1:0:172:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 (pass42:mps1:0:172:0): CAM status: SCSI Status Error (pass42:mps1:0:172:0): SCSI status: Check Condition (pass42:mps1:0:172:0): SCSI sense: UNIT ATTENTION asc:29,2 (SCSI bus reset occurred) (pass42:mps1:0:172:0): Field Replaceable Unit: 2 There is the unit attention. camcontrol(8) has a default retry_count of 1, in case someone sets the -E flag without setting -C. The CAM_PASS_ERR_RECOVER behavior was only broken with the CAMIOCOMMAND ioctl, which is the synchronous pass(4) API. It has worked as intended (error recovery is only done when the flag is set) in the asynchronous API (CAMIOQUEUE ioctl). sys/cam/scsi/scsi_pass.c: In passsendccb(), when calling cam_periph_runccb(), only specify the error routine when CAM_PASS_ERR_RECOVER is set. share/man/man4/pass.4: Document that CAM_PASS_ERR_RECOVER is needed to enable error recovery. Reported by: Terry Kennedy <TERRY@glaver.org> PR: kern/218572 MFC after: 1 week Sponsored by: Spectra Logic Changes: head/share/man/man4/pass.4 head/sys/cam/scsi/scsi_pass.c
A commit references this bug: Author: ken Date: Wed May 10 15:20:38 UTC 2017 New revision: 318139 URL: https://svnweb.freebsd.org/changeset/base/318139 Log: MFC r317775: Fix error recovery behavior in the pass(4) driver. After FreeBSD SVN revision 236814, the pass(4) driver changed from only doing error recovery when the CAM_PASS_ERR_RECOVER flag was set on a CCB to sometimes doing error recovery if the passed in retry count was non-zero. Error recovery would happen if two conditions were met: 1. The error recovery action was simply a retry. (Which is most cases.) 2. The retry_count is non-zero. (Which happened a lot because of cut-and-pasted code.) This explains a bug I noticed in with camcontrol: # camcontrol tur da34 -v Unit is ready # camcontrol reset da34 Reset of 1:172:0 was successful At this point, there should be a Unit Attention: # camcontrol tur da34 -v Unit is ready No Unit Attention. Try it again: # camcontrol reset da34 Reset of 1:172:0 was successful Now set the retry_count to 0 for the TUR: # camcontrol tur da34 -v -C 0 Unit is not ready (pass42:mps1:0:172:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 (pass42:mps1:0:172:0): CAM status: SCSI Status Error (pass42:mps1:0:172:0): SCSI status: Check Condition (pass42:mps1:0:172:0): SCSI sense: UNIT ATTENTION asc:29,2 (SCSI bus reset occurred) (pass42:mps1:0:172:0): Field Replaceable Unit: 2 There is the unit attention. camcontrol(8) has a default retry_count of 1, in case someone sets the -E flag without setting -C. The CAM_PASS_ERR_RECOVER behavior was only broken with the CAMIOCOMMAND ioctl, which is the synchronous pass(4) API. It has worked as intended (error recovery is only done when the flag is set) in the asynchronous API (CAMIOQUEUE ioctl). sys/cam/scsi/scsi_pass.c: In passsendccb(), when calling cam_periph_runccb(), only specify the error routine when CAM_PASS_ERR_RECOVER is set. share/man/man4/pass.4: Document that CAM_PASS_ERR_RECOVER is needed to enable error recovery. Reported by: Terry Kennedy <TERRY@glaver.org> PR: kern/218572 Sponsored by: Spectra Logic Changes: _U stable/11/ stable/11/share/man/man4/pass.4 stable/11/sys/cam/scsi/scsi_pass.c
A commit references this bug: Author: ken Date: Wed May 10 15:20:40 UTC 2017 New revision: 318140 URL: https://svnweb.freebsd.org/changeset/base/318140 Log: MFC r317775: Fix error recovery behavior in the pass(4) driver. After FreeBSD SVN revision 236814, the pass(4) driver changed from only doing error recovery when the CAM_PASS_ERR_RECOVER flag was set on a CCB to sometimes doing error recovery if the passed in retry count was non-zero. Error recovery would happen if two conditions were met: 1. The error recovery action was simply a retry. (Which is most cases.) 2. The retry_count is non-zero. (Which happened a lot because of cut-and-pasted code.) This explains a bug I noticed in with camcontrol: # camcontrol tur da34 -v Unit is ready # camcontrol reset da34 Reset of 1:172:0 was successful At this point, there should be a Unit Attention: # camcontrol tur da34 -v Unit is ready No Unit Attention. Try it again: # camcontrol reset da34 Reset of 1:172:0 was successful Now set the retry_count to 0 for the TUR: # camcontrol tur da34 -v -C 0 Unit is not ready (pass42:mps1:0:172:0): TEST UNIT READY. CDB: 00 00 00 00 00 00 (pass42:mps1:0:172:0): CAM status: SCSI Status Error (pass42:mps1:0:172:0): SCSI status: Check Condition (pass42:mps1:0:172:0): SCSI sense: UNIT ATTENTION asc:29,2 (SCSI bus reset occurred) (pass42:mps1:0:172:0): Field Replaceable Unit: 2 There is the unit attention. camcontrol(8) has a default retry_count of 1, in case someone sets the -E flag without setting -C. The CAM_PASS_ERR_RECOVER behavior was only broken with the CAMIOCOMMAND ioctl, which is the synchronous pass(4) API. It has worked as intended (error recovery is only done when the flag is set) in the asynchronous API (CAMIOQUEUE ioctl). sys/cam/scsi/scsi_pass.c: In passsendccb(), when calling cam_periph_runccb(), only specify the error routine when CAM_PASS_ERR_RECOVER is set. share/man/man4/pass.4: Document that CAM_PASS_ERR_RECOVER is needed to enable error recovery. Reported by: Terry Kennedy <TERRY@glaver.org> PR: kern/218572 Sponsored by: Spectra Logic Changes: _U stable/10/ stable/10/share/man/man4/pass.4 stable/10/sys/cam/scsi/scsi_pass.c
Fixed in head, stable/11 and stable 10. Thank you for the report!