Bug 207247 - GEOM multipath cycles through all paths to a device even in error conditions
Summary: GEOM multipath cycles through all paths to a device even in error conditions
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-geom (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-16 13:15 UTC by Jan Bramkamp
Modified: 2018-05-29 09:11 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Bramkamp 2016-02-16 13:15:24 UTC
GEOM multipath cycles through all paths to a device without remembering previous path failures or differentiating between path and medium errors as mentioned in #178473. I had a device fail during a planned clean power cycle with this result:

GEOM_MULTIPATH: Error 5, da22 in jbod2data35 marked FAIL
GEOM_MULTIPATH: all paths in jbod2data35 were marked FAIL, restore da68
GEOM_MULTIPATH: da68 is now active path in jbod2data35
(da68:mps1:0:99:0): READ(10). CDB: 28 00 02 00 06 00 00 01 00 00 
(da68:mps1:0:99:0): CAM status: SCSI Status Error
(da68:mps1:0:99:0): SCSI status: Check Condition
(da68:mps1:0:99:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da68:mps1:0:99:0): Info: 0x2000654
(da68:mps1:0:99:0): Actual Retry Count: 63
(da68:mps1:0:99:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da68 in jbod2data35 marked FAIL
GEOM_MULTIPATH: all paths in jbod2data35 were marked FAIL, restore da22
GEOM_MULTIPATH: da22 is now active path in jbod2data35
(da22:mps0:0:40:0): READ(10). CDB: 28 00 02 00 06 00 00 01 00 00 
(da22:mps0:0:40:0): CAM status: SCSI Status Error
(da22:mps0:0:40:0): SCSI status: Check Condition
(da22:mps0:0:40:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da22:mps0:0:40:0): Info: 0x2000654
(da22:mps0:0:40:0): Actual Retry Count: 63
(da22:mps0:0:40:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da22 in jbod2data35 marked FAIL
GEOM_MULTIPATH: all paths in jbod2data35 were marked FAIL, restore da68
GEOM_MULTIPATH: da68 is now active path in jbod2data35

The system is stuck in the boot process in an endless loop over all paths to the failed device.
Comment 1 Jan Bramkamp 2017-10-26 14:57:31 UTC

*** This bug has been marked as a duplicate of bug 178473 ***
Comment 2 Jan Bramkamp 2017-10-26 14:58:12 UTC
The original bug is still unfixed.
Comment 3 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:46:04 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 4 Jan Bramkamp 2018-05-29 09:11:30 UTC
This bug is still very much open and the problem still exists in 11.1-p10. Afaik nothing changed in 12-CURRENT that would fix the problem described in this PR. I moved away from JBOD chassis with dual ported expanders, but I still have access to one to test changes although I suspect that it can be simulated with GEOM nop.