Bug 178473

Summary: gmultipath(8) should not restore providers that have already failed the current bio
Product: Base System Reporter: wollman <wollman>
Component: binAssignee: Alan Somers <asomers>
Status: Closed Works As Intended    
Severity: Affects Only Me CC: asomers, crest, crest, daniel.uvehag, otis, rkunert, wollman
Priority: Normal    
Version: 10.2-RELEASE   
Hardware: Any   
OS: Any   

Description wollman 2013-05-10 08:20:00 UTC
Currently, hard medium errors reported by the underlying provider will
cause geom_multipath to cycle infinitely, turning what should be a
failure reported to the consumer (ZFS in my case, which can do
something useful with it) into a write that simply never returns at
all, untill the hardware is physically offlined.

It looks like this:

(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x2103f4f7
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x2103f4f7
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040687
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040687
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040f4f
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040f4f
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 43 fe 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x210443ff
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 43 fe 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x210443ff
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 ac 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x2103f4f7
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 ac 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x2103f4f7
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040687
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040687
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040f4f
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040f4f
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 17 f3 0 0 2a 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21041817
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 67 eb 0 0 2a 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x210467ec
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 67 eb 0 0 2a 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x210467ec
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 3 f4 d9 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x2103f4f7
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 3 f4 d9 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x2103f4f7
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040687
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040687
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040f4f
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040f4f
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x2103f4f7
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x2103f4f7
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040687
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040687
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 e 7d 0 1 0 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040f4f
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 e 7d 0 1 0 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040f4f
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
pid 62004 (ntpd), uid 0: exited on signal 11 (core dumped)
pid 62021 (ntpd), uid 0: exited on signal 11 (core dumped)
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x2103f4f7
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x2103f4f7
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040687
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040687
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040f4f
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040f4f
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040687
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040687
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12
(da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 
(da158:mps2:0:77:0): CAM status: SCSI Status Error
(da158:mps2:0:77:0): SCSI status: Check Condition
(da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da158:mps2:0:77:0): Info: 0x21040f4f
(da158:mps2:0:77:0): Actual Retry Count: 63
(da158:mps2:0:77:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85
GEOM_MULTIPATH: da85 is now active path in s25d12
(da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 
(da85:mps1:0:102:0): CAM status: SCSI Status Error
(da85:mps1:0:102:0): SCSI status: Check Condition
(da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted)
(da85:mps1:0:102:0): Info: 0x21040f4f
(da85:mps1:0:102:0): Actual Retry Count: 63
(da85:mps1:0:102:0): Error 5, Unretryable error
GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL
GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158
GEOM_MULTIPATH: da158 is now active path in s25d12

Fix: 

Don't know.  geom_multipath should have some way of figuring out that
a failure is "really" a failure on the underlying device, and notify
the consumer appropriately.  Perhaps keep some state in the bio that
remembers which providers have failed the request already, and if
every available provider fails the request without any intervening
configuration changes, then return the last failure to the consumer.
How-To-Repeat: 
Build a big multipathed zpool, and use it.  Eventually, a disk will
fail.  Wonder why ZFS doesn't notice it.  (Apparently the SMART
implementation on the drive doesn't consider a single hard read error
to be "bad enough" to raise the failure-imminent bit.)  Finally get
the answer in the nightly root mail.
Comment 1 Jan Bramkamp 2016-08-23 09:30:06 UTC
ZFS won't see the I/O request fail because geom_multipath retries all paths into eternity instead of reporting failure. This might trigger the ZFS deadman timer an panic() the system if enough geom_multipath providers are affected. The worst part is that rebooting a system with a defective disk behind geom_multipath hangs during (re-)boot and won't even reach single user mode until geom_multipath is unloaded or the defective device is removed.
Comment 2 Jan Bramkamp 2017-10-26 14:57:31 UTC
*** Bug 207247 has been marked as a duplicate of this bug. ***
Comment 3 Jan Bramkamp 2017-10-26 14:59:30 UTC
Is anyone working on multipath SAS and differentiating between medium and link failure in FreeBSD?
Comment 4 Richard Kunert 2017-11-17 17:33:55 UTC
I think what is needed is a gmultipath equivalent to the no_path_retry setting in DM Multipath on Linux, which allows specifying the number of retries before the queue is released and the I/O fails. I assume that would allow ZFS to realize that something is wrong. Retrying forever is not always a rational or useful behavior.

no_path_retry:
A numeric value for this attribute specifies the number of times the system should attempt to use a failed path before disabling queuing.
A value of fail indicates immediate failure, without queuing.
A value of queue indicates that queuing should not stop until the path is fixed.

I just set up a new FreeBSD server with dual HBAs, dual SAS expander backplanes, etc., using gmultipath. I am on the verge of migrating the server to Ubuntu and ZFS on Linux just to avoid this issue - particularly seeing that it was reported years ago and there has been no movement.
Comment 5 Eitan Adler freebsd_committer freebsd_triage 2018-05-20 23:59:53 UTC
For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"
Comment 6 Jan Bramkamp 2018-05-29 09:12:54 UTC
The problem described in this PR still exists in FreeBSD 11.1-p10.
Comment 7 Juraj Lutter freebsd_committer freebsd_triage 2019-10-31 20:57:55 UTC
It persists still also in 12-STABLE
Comment 8 crest 2019-11-04 15:13:22 UTC
The gmultipath logic in active-passive mode uses one path until it fails. On failure it picks the next path round robin style until all paths are marked as failed. In that case it "recovers" by reenabling all paths. There is no state per BIO request used to track tried paths per request. I don't know if the BIO struct contain enough information to reliably differentiate between path and medium errors.
Comment 9 crest 2019-11-04 18:17:31 UTC
Its a bit more complex than removing the code that reenables failed paths if none are left, because that state isn't tracked per request in the existing code and gmultipath should recover from path errors without operator intervention e.g. two iSCSI connections to the same can fail (and recover). Just ripping the flawed path recovery code out of gmultipath would leave the multipath GEOM provider in a failed state. The existing code recovers in this case. Each BIO request contains two pointer sized fields to store per request state. They could be used to track the tried paths for each request. Attempting all paths round-robin starting with the active one would at worst multiply the number of retries by the number of paths which is expected to be small (between 2 and 4 in most cases).
Comment 10 Alan Somers freebsd_committer freebsd_triage 2019-11-04 18:22:05 UTC
crest, yes that matches my understanding of the problem.  Would you care to review my test case for it?  https://reviews.freebsd.org/D22235 .
Comment 11 Alan Somers freebsd_committer freebsd_triage 2019-11-04 21:47:38 UTC
Actually, there is no infinite loop bug in gmultipath.  It just looks that way from a casual perusal of the logs.  Reality is that:

1) gmultipath will retry each operation up to as many times as there are configured paths.  If necessary, it will restore failed paths.  So for a four-path device, you may see as many as four "all paths in XXX were marked FAIL, restore" log messages.

2) ZFS will retry failed I/Os many times.  Even after it gives up on one operation, it will swiftly move onto another.  Failed reads may trigger scrub operations.  So a small amount of user activity can still result in a large number of gmultipath restores.  However, if zfsd is running, it will eventually fault the bad drive.  Then ZFS won't attempt to access it any more.

3) Operations that open and close a devices (such as "zpool import") will trigger every other geom class to taste the device on every close.  That can add up to a large amount (but still finite) of I/O.

If you think that you're experiencing this bug, you should:
1) Turn on zfsd (service zfsd onestart).  That may fault the bad vdev.

2) If you're seeing the problem during zpool import, try setting "sysctl kern.geom.notaste=1".  That will reduce the amount of I/O to the bad drive during zpool import.  But it will cause other problems, so change it back to 0 after zpool import is done.

3) If you still think you're experiencing this problem and you think that gmultipath is retrying an I/O indefinitely, run the following dtrace script.  I predict that you'll see more "config:restore" events than "io:restart" events.  If the retry were infinite, you would see equal numbers.
dtrace -i 'geom:multipath:io:restart' -i 'geom:multipath:config:restore'
(note: I haven't yet committed those probes, but I expect to soon).
Comment 12 crest 2019-11-05 17:22:03 UTC
At least a server with one failed drive in a 45 drive dual ported SAS2 JBOD spend >48 hours "tasting" the GEOM providers after a reboot before I physically removed the drive (identified by the activity LED). Unloading geom_multipath.ko at the loader prompt worked as well. I wouldn't call that "works as intended".
Comment 13 Alan Somers freebsd_committer freebsd_triage 2019-11-05 17:32:25 UTC
(In reply to crest from comment #12)
Well, gmultipath can't make up for having a bad drive.  But it "works as intended" in the sense that it doesn't infinitely loop.  In the worst case it merely multiplies the number of failing commands by the number of paths that you have.  The next time this happens to you you might want to try disabling the failing disk's SAS phy.  That's basically the same as pulling the drive, except it doesn't require physical access.  man camcontrol and search for "smppc".  Another pro tip: with a ses-capable enclosure like yours you can control the fault and locate LEDs using the sesutil(8) command.
Comment 14 crest 2019-11-05 18:25:20 UTC
The system was stuck and didn't even reach single user mode (I rebooted friday evening and it was still stuck monday morning). There were two dual ported SAS expanders in the JBOD. Each expander was hooked up both HBAs resulting in two paths to each disk. Unloading geom_multipath allowed the system to boot. It didn't amplify the retries by the number of paths (two in this case). Without geom_multipath the system would attempt to access the faulty drive and give up within a reasonable timeframe (less than 5 minutes to reboot). With geom_multipath loaded the system would cycle through both paths retrying them for two days at least. I needed the system back in operation so I couldn't test for more than two days at a time.
Comment 15 Alan Somers freebsd_committer freebsd_triage 2019-11-05 18:40:03 UTC
I simply can't reproduce any infinite loop behavior.  If you can, and you're willing to work with me, then reopen the bug.  Do you still have the bad drives?  Are you able to reproduce the situation?  And can you run head?
Comment 16 commit-hook freebsd_committer freebsd_triage 2019-12-06 00:13:12 UTC
A commit references this bug:

Author: asomers
Date: Fri Dec  6 00:12:15 UTC 2019
New revision: 355431
URL: https://svnweb.freebsd.org/changeset/base/355431

Log:
  gmultipath: add ATF tests

  Add ATF tests for most gmultipath operations. Add some dtrace probes too,
  primarily for configuration changes that happen in response to provider
  errors.

  PR:		178473
  MFC after:	2 weeks
  Sponsored by:	Axcient
  Differential Revision:	https://reviews.freebsd.org/D22235

Changes:
  head/etc/mtree/BSD.tests.dist
  head/sys/geom/geom_subr.c
  head/sys/geom/multipath/g_multipath.c
  head/tests/sys/geom/class/Makefile
  head/tests/sys/geom/class/multipath/
  head/tests/sys/geom/class/multipath/Makefile
  head/tests/sys/geom/class/multipath/conf.sh
  head/tests/sys/geom/class/multipath/failloop.sh
  head/tests/sys/geom/class/multipath/misc.sh
Comment 17 commit-hook freebsd_committer freebsd_triage 2020-02-13 20:32:45 UTC
A commit references this bug:

Author: asomers
Date: Thu Feb 13 20:32:06 UTC 2020
New revision: 357876
URL: https://svnweb.freebsd.org/changeset/base/357876

Log:
  MFC r355431:

  gmultipath: add ATF tests

  Add ATF tests for most gmultipath operations. Add some dtrace probes too,
  primarily for configuration changes that happen in response to provider
  errors.

  PR:		178473
  Sponsored by:	Axcient
  Differential Revision:	https://reviews.freebsd.org/D22235

Changes:
_U  stable/12/
  stable/12/etc/mtree/BSD.tests.dist
  stable/12/sys/geom/geom_subr.c
  stable/12/sys/geom/multipath/g_multipath.c
  stable/12/tests/sys/geom/class/Makefile
  stable/12/tests/sys/geom/class/multipath/