Bug 248352 - mfi(4): Remove RAID map sync functionality
Summary: mfi(4): Remove RAID map sync functionality
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-RELEASE
Hardware: Any Any
: Normal Affects Many People
Assignee: freebsd-bugs (Nobody)
URL: https://svnweb.freebsd.org/base/head/...
Keywords: needs-qa
Depends on:
Blocks:
 
Reported: 2020-07-29 19:25 UTC by Chandrakanth Patil
Modified: 2024-03-21 17:17 UTC (History)
10 users (show)

See Also:
scottl: maintainer-feedback-
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments
This patch removes raid map sync functionality from mfi driver (7.68 KB, patch)
2020-07-29 19:25 UTC, Chandrakanth Patil
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chandrakanth Patil 2020-07-29 19:25:23 UTC
Created attachment 216866 [details]
This patch removes raid map sync functionality from mfi driver

We have found a raid map sync failure in invader (device id: 5d) as soon as the <mfi> driver is loaded. This is due to the failure in fetching updated raid map from the firmware as raid map is logically unsupported in driver reason being driver is not getting any raid map data as part of MFI_DCMD_LD_MAP_GET_INFO instead it is getting the config seq number as part of MFI_DCMD_LD_GET_LIST DCMD resulted in raid map sync failure. Below is the firmware log snippet where there is a config sequence number mismatch between driver and firmware:

C0:ld sync: non-matching seqNums 1
C0:ld sync: 01 unsync'd lds remaining

This issue applies to the controllers which has a raid map support like Thunderbolt (device id: 5b) invader (device id: 5d) and fury (device id: 5f) controllers and it is not applicable to till Liberator (Gen1 and Gen2) as these controllers don't have a raid map support.

Hence, We propose to remove the raid map sync functionality from <mfi> driver. I have attached the sample patch covers raid map sync functionality removal part and hasn't covered any test cases and it is just for reference.

If it looks feasible to remove raid map sync support from driver then please consider my patch.

Note: we are not seeing this issue with <mrsas> driver.
Comment 1 David Papasan 2020-08-14 19:31:20 UTC
Can FreeBSD comment on this bug ?
Dell has found this bug is causing excessive event logging, due to the incorrect raidmap sync call which fails. This in turn causes premature wear on the flash components where the logs are stored.

i.e. the bug eventually damages the hardware.
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2020-08-15 01:58:16 UTC
[responding with hat bugmeister@]

So, from looking over the src tree, there does not seem to be anyone actively maintaining this driver -- most of the last few years' worth of commits are issues that arose elsewhere in the codebase.

I'll try to find someone versed in disk driver code to comment on this.
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2020-08-17 03:48:45 UTC
^Triage: Request feedback from original mfi(4) author
Comment 4 Bill Steinke 2020-09-08 15:02:47 UTC
Any update/ETA on when the MFI drive will be updated to resolve this issue?  The issue currently impacts card hardware as well as renders the log useless due to the excessive prints.
Comment 5 David Papasan 2020-10-13 13:19:06 UTC
@kubilay or Mark
Any update on finding a maintainer ?
Comment 6 Mark Johnston freebsd_committer freebsd_triage 2020-11-12 19:43:37 UTC
Unfortunately I don't think mfi has any interested maintainers.  I've done a bit of work on it, but that was a long time ago and I don't have access to any hardware.

I don't quite understand the bug.  I believe the claim is that mfi_tbolt_sync_map() is misusing MFI_DCMD_LD_GET_LIST, and so the RAID map sync functionality is simply broken on thunderbolt/invader/fury?
Comment 7 Bill Steinke 2021-04-13 14:58:23 UTC
The defect is for the following products:
Thunderbolt device ID: 5b
Invader device ID: 5d
Fury device ID: 5f

The issue is that the mfi driver uses the incorrect method to do Raid Map sync; it is using GET_LD_LIST DCMD to get seqNum.

Dell has one set of RAID Map Sync events which consist of 4 lines of texts with a total of 262 bytes.

07/09/20 17:16:53: C0:LdDcmdRaidMapCompleteLegacy: Completing FW_RAID_MAP cmd
07/09/20 17:16:53: C0:ldIsFPCapable: LD 00 disabled reason LD properties
07/09/20 17:16:53: C0:ld sync: non-matching seqNums 1
07/09/20 17:16:53: C0:ld sync: 01 unsync'd lds remaining

This set of events occurred 32 times per second, not 128 times per seconds at the WRITE rate of one page (512KB) per minute, not 4 pages per minute
Then the total would be 17.5 * 4 = 70 days exact to complete 100,000 write cycles


BSD will need to resolve or remove the RAID Map sync from the mfi driver.
Comment 8 Dan Mahoney 2024-03-21 17:17:03 UTC
Just a note that when my raid card (Perc H330) started throwing errors on startup with the error:

Disabling writes to flash as the flash part has gone bad.
Please contact technical support to resolve this issue.
Please press 'Y' to continue.

Dell pointed me at this bug.  We're internally moving everything we have over to the mrsas driver, but this really should be looked at, or maybe mrsas made the default.