Currently, hard medium errors reported by the underlying provider will cause geom_multipath to cycle infinitely, turning what should be a failure reported to the consumer (ZFS in my case, which can do something useful with it) into a write that simply never returns at all, untill the hardware is physically offlined. It looks like this: (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x2103f4f7 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x2103f4f7 (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040687 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040687 (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040f4f (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040f4f (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 43 fe 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x210443ff (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 43 fe 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x210443ff (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 ac 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x2103f4f7 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 ac 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x2103f4f7 (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040687 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040687 (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040f4f (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040f4f (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 17 f3 0 0 2a 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21041817 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 67 eb 0 0 2a 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x210467ec (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 67 eb 0 0 2a 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x210467ec (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 3 f4 d9 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x2103f4f7 (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 3 f4 d9 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x2103f4f7 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040687 (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040687 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040f4f (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040f4f (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x2103f4f7 (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x2103f4f7 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040687 (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040687 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 e 7d 0 1 0 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040f4f (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 e 7d 0 1 0 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040f4f (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 pid 62004 (ntpd), uid 0: exited on signal 11 (core dumped) pid 62021 (ntpd), uid 0: exited on signal 11 (core dumped) (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x2103f4f7 (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 3 f4 58 0 0 d6 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x2103f4f7 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040687 (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040687 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040f4f (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040f4f (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040687 (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 6 5d 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040687 (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 (da158:mps2:0:77:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 (da158:mps2:0:77:0): CAM status: SCSI Status Error (da158:mps2:0:77:0): SCSI status: Check Condition (da158:mps2:0:77:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da158:mps2:0:77:0): Info: 0x21040f4f (da158:mps2:0:77:0): Actual Retry Count: 63 (da158:mps2:0:77:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da158 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da85 GEOM_MULTIPATH: da85 is now active path in s25d12 (da85:mps1:0:102:0): READ(10). CDB: 28 0 21 4 f 28 0 0 2b 0 (da85:mps1:0:102:0): CAM status: SCSI Status Error (da85:mps1:0:102:0): SCSI status: Check Condition (da85:mps1:0:102:0): SCSI sense: MEDIUM ERROR asc:11,1 (Read retries exhausted) (da85:mps1:0:102:0): Info: 0x21040f4f (da85:mps1:0:102:0): Actual Retry Count: 63 (da85:mps1:0:102:0): Error 5, Unretryable error GEOM_MULTIPATH: Error 5, da85 in s25d12 marked FAIL GEOM_MULTIPATH: all paths in s25d12 were marked FAIL, restore da158 GEOM_MULTIPATH: da158 is now active path in s25d12 Fix: Don't know. geom_multipath should have some way of figuring out that a failure is "really" a failure on the underlying device, and notify the consumer appropriately. Perhaps keep some state in the bio that remembers which providers have failed the request already, and if every available provider fails the request without any intervening configuration changes, then return the last failure to the consumer. How-To-Repeat: Build a big multipathed zpool, and use it. Eventually, a disk will fail. Wonder why ZFS doesn't notice it. (Apparently the SMART implementation on the drive doesn't consider a single hard read error to be "bad enough" to raise the failure-imminent bit.) Finally get the answer in the nightly root mail.
ZFS won't see the I/O request fail because geom_multipath retries all paths into eternity instead of reporting failure. This might trigger the ZFS deadman timer an panic() the system if enough geom_multipath providers are affected. The worst part is that rebooting a system with a defective disk behind geom_multipath hangs during (re-)boot and won't even reach single user mode until geom_multipath is unloaded or the defective device is removed.
*** Bug 207247 has been marked as a duplicate of this bug. ***
Is anyone working on multipath SAS and differentiating between medium and link failure in FreeBSD?
I think what is needed is a gmultipath equivalent to the no_path_retry setting in DM Multipath on Linux, which allows specifying the number of retries before the queue is released and the I/O fails. I assume that would allow ZFS to realize that something is wrong. Retrying forever is not always a rational or useful behavior. no_path_retry: A numeric value for this attribute specifies the number of times the system should attempt to use a failed path before disabling queuing. A value of fail indicates immediate failure, without queuing. A value of queue indicates that queuing should not stop until the path is fixed. I just set up a new FreeBSD server with dual HBAs, dual SAS expander backplanes, etc., using gmultipath. I am on the verge of migrating the server to Ubuntu and ZFS on Linux just to avoid this issue - particularly seeing that it was reported years ago and there has been no movement.
For bugs matching the following conditions: - Status == In Progress - Assignee == "bugs@FreeBSD.org" - Last Modified Year <= 2017 Do - Set Status to "Open"
The problem described in this PR still exists in FreeBSD 11.1-p10.
It persists still also in 12-STABLE
The gmultipath logic in active-passive mode uses one path until it fails. On failure it picks the next path round robin style until all paths are marked as failed. In that case it "recovers" by reenabling all paths. There is no state per BIO request used to track tried paths per request. I don't know if the BIO struct contain enough information to reliably differentiate between path and medium errors.
Its a bit more complex than removing the code that reenables failed paths if none are left, because that state isn't tracked per request in the existing code and gmultipath should recover from path errors without operator intervention e.g. two iSCSI connections to the same can fail (and recover). Just ripping the flawed path recovery code out of gmultipath would leave the multipath GEOM provider in a failed state. The existing code recovers in this case. Each BIO request contains two pointer sized fields to store per request state. They could be used to track the tried paths for each request. Attempting all paths round-robin starting with the active one would at worst multiply the number of retries by the number of paths which is expected to be small (between 2 and 4 in most cases).
crest, yes that matches my understanding of the problem. Would you care to review my test case for it? https://reviews.freebsd.org/D22235 .
Actually, there is no infinite loop bug in gmultipath. It just looks that way from a casual perusal of the logs. Reality is that: 1) gmultipath will retry each operation up to as many times as there are configured paths. If necessary, it will restore failed paths. So for a four-path device, you may see as many as four "all paths in XXX were marked FAIL, restore" log messages. 2) ZFS will retry failed I/Os many times. Even after it gives up on one operation, it will swiftly move onto another. Failed reads may trigger scrub operations. So a small amount of user activity can still result in a large number of gmultipath restores. However, if zfsd is running, it will eventually fault the bad drive. Then ZFS won't attempt to access it any more. 3) Operations that open and close a devices (such as "zpool import") will trigger every other geom class to taste the device on every close. That can add up to a large amount (but still finite) of I/O. If you think that you're experiencing this bug, you should: 1) Turn on zfsd (service zfsd onestart). That may fault the bad vdev. 2) If you're seeing the problem during zpool import, try setting "sysctl kern.geom.notaste=1". That will reduce the amount of I/O to the bad drive during zpool import. But it will cause other problems, so change it back to 0 after zpool import is done. 3) If you still think you're experiencing this problem and you think that gmultipath is retrying an I/O indefinitely, run the following dtrace script. I predict that you'll see more "config:restore" events than "io:restart" events. If the retry were infinite, you would see equal numbers. dtrace -i 'geom:multipath:io:restart' -i 'geom:multipath:config:restore' (note: I haven't yet committed those probes, but I expect to soon).
At least a server with one failed drive in a 45 drive dual ported SAS2 JBOD spend >48 hours "tasting" the GEOM providers after a reboot before I physically removed the drive (identified by the activity LED). Unloading geom_multipath.ko at the loader prompt worked as well. I wouldn't call that "works as intended".
(In reply to crest from comment #12) Well, gmultipath can't make up for having a bad drive. But it "works as intended" in the sense that it doesn't infinitely loop. In the worst case it merely multiplies the number of failing commands by the number of paths that you have. The next time this happens to you you might want to try disabling the failing disk's SAS phy. That's basically the same as pulling the drive, except it doesn't require physical access. man camcontrol and search for "smppc". Another pro tip: with a ses-capable enclosure like yours you can control the fault and locate LEDs using the sesutil(8) command.
The system was stuck and didn't even reach single user mode (I rebooted friday evening and it was still stuck monday morning). There were two dual ported SAS expanders in the JBOD. Each expander was hooked up both HBAs resulting in two paths to each disk. Unloading geom_multipath allowed the system to boot. It didn't amplify the retries by the number of paths (two in this case). Without geom_multipath the system would attempt to access the faulty drive and give up within a reasonable timeframe (less than 5 minutes to reboot). With geom_multipath loaded the system would cycle through both paths retrying them for two days at least. I needed the system back in operation so I couldn't test for more than two days at a time.
I simply can't reproduce any infinite loop behavior. If you can, and you're willing to work with me, then reopen the bug. Do you still have the bad drives? Are you able to reproduce the situation? And can you run head?
A commit references this bug: Author: asomers Date: Fri Dec 6 00:12:15 UTC 2019 New revision: 355431 URL: https://svnweb.freebsd.org/changeset/base/355431 Log: gmultipath: add ATF tests Add ATF tests for most gmultipath operations. Add some dtrace probes too, primarily for configuration changes that happen in response to provider errors. PR: 178473 MFC after: 2 weeks Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D22235 Changes: head/etc/mtree/BSD.tests.dist head/sys/geom/geom_subr.c head/sys/geom/multipath/g_multipath.c head/tests/sys/geom/class/Makefile head/tests/sys/geom/class/multipath/ head/tests/sys/geom/class/multipath/Makefile head/tests/sys/geom/class/multipath/conf.sh head/tests/sys/geom/class/multipath/failloop.sh head/tests/sys/geom/class/multipath/misc.sh
A commit references this bug: Author: asomers Date: Thu Feb 13 20:32:06 UTC 2020 New revision: 357876 URL: https://svnweb.freebsd.org/changeset/base/357876 Log: MFC r355431: gmultipath: add ATF tests Add ATF tests for most gmultipath operations. Add some dtrace probes too, primarily for configuration changes that happen in response to provider errors. PR: 178473 Sponsored by: Axcient Differential Revision: https://reviews.freebsd.org/D22235 Changes: _U stable/12/ stable/12/etc/mtree/BSD.tests.dist stable/12/sys/geom/geom_subr.c stable/12/sys/geom/multipath/g_multipath.c stable/12/tests/sys/geom/class/Makefile stable/12/tests/sys/geom/class/multipath/