Created attachment 216866 [details] This patch removes raid map sync functionality from mfi driver We have found a raid map sync failure in invader (device id: 5d) as soon as the <mfi> driver is loaded. This is due to the failure in fetching updated raid map from the firmware as raid map is logically unsupported in driver reason being driver is not getting any raid map data as part of MFI_DCMD_LD_MAP_GET_INFO instead it is getting the config seq number as part of MFI_DCMD_LD_GET_LIST DCMD resulted in raid map sync failure. Below is the firmware log snippet where there is a config sequence number mismatch between driver and firmware: C0:ld sync: non-matching seqNums 1 C0:ld sync: 01 unsync'd lds remaining This issue applies to the controllers which has a raid map support like Thunderbolt (device id: 5b) invader (device id: 5d) and fury (device id: 5f) controllers and it is not applicable to till Liberator (Gen1 and Gen2) as these controllers don't have a raid map support. Hence, We propose to remove the raid map sync functionality from <mfi> driver. I have attached the sample patch covers raid map sync functionality removal part and hasn't covered any test cases and it is just for reference. If it looks feasible to remove raid map sync support from driver then please consider my patch. Note: we are not seeing this issue with <mrsas> driver.
Can FreeBSD comment on this bug ? Dell has found this bug is causing excessive event logging, due to the incorrect raidmap sync call which fails. This in turn causes premature wear on the flash components where the logs are stored. i.e. the bug eventually damages the hardware.
[responding with hat bugmeister@] So, from looking over the src tree, there does not seem to be anyone actively maintaining this driver -- most of the last few years' worth of commits are issues that arose elsewhere in the codebase. I'll try to find someone versed in disk driver code to comment on this.
^Triage: Request feedback from original mfi(4) author
Any update/ETA on when the MFI drive will be updated to resolve this issue? The issue currently impacts card hardware as well as renders the log useless due to the excessive prints.
@kubilay or Mark Any update on finding a maintainer ?
Unfortunately I don't think mfi has any interested maintainers. I've done a bit of work on it, but that was a long time ago and I don't have access to any hardware. I don't quite understand the bug. I believe the claim is that mfi_tbolt_sync_map() is misusing MFI_DCMD_LD_GET_LIST, and so the RAID map sync functionality is simply broken on thunderbolt/invader/fury?
The defect is for the following products: Thunderbolt device ID: 5b Invader device ID: 5d Fury device ID: 5f The issue is that the mfi driver uses the incorrect method to do Raid Map sync; it is using GET_LD_LIST DCMD to get seqNum. Dell has one set of RAID Map Sync events which consist of 4 lines of texts with a total of 262 bytes. 07/09/20 17:16:53: C0:LdDcmdRaidMapCompleteLegacy: Completing FW_RAID_MAP cmd 07/09/20 17:16:53: C0:ldIsFPCapable: LD 00 disabled reason LD properties 07/09/20 17:16:53: C0:ld sync: non-matching seqNums 1 07/09/20 17:16:53: C0:ld sync: 01 unsync'd lds remaining This set of events occurred 32 times per second, not 128 times per seconds at the WRITE rate of one page (512KB) per minute, not 4 pages per minute Then the total would be 17.5 * 4 = 70 days exact to complete 100,000 write cycles BSD will need to resolve or remove the RAID Map sync from the mfi driver.
Just a note that when my raid card (Perc H330) started throwing errors on startup with the error: Disabling writes to flash as the flash part has gone bad. Please contact technical support to resolve this issue. Please press 'Y' to continue. Dell pointed me at this bug. We're internally moving everything we have over to the mrsas driver, but this really should be looked at, or maybe mrsas made the default.
^Triage: mfc-stable12 and previous are now obsolete.