Summary: | after upgrading from 12.1 to 12.2 the mfi driver crashes during boot | ||
---|---|---|---|
Product: | Base System | Reporter: | cstamas+fbsdbz |
Component: | kern | Assignee: | Justin Hibbits <jhibbits> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | afedorov, ambrisko, bz, eugen, jhibbits, lwhsu, mizhka, number201724, pfduch, zarychtam |
Priority: | --- | ||
Version: | 12.2-RELEASE | ||
Hardware: | amd64 | ||
OS: | Any |
Description
cstamas+fbsdbz
2020-11-30 15:01:16 UTC
Hi, It looks like commit 358689 introduced regression: https://svnweb.freebsd.org/base/head/sys/dev/mfi/mfi_tbolt.c?revision=358689&view=markup Added jhibbit@ into ticket. Thx! Hm, it looks really strange. I can revert it, but Clang will complain, as it would be a tautology... 'if (cdb[0] != 0x28 || cdb[0] != 0x2A)' is *always* true, because if it is one, it's not the other. Adding ambrisko@ for his thoughts, due to PR 187831. I reverted the change and the kernel boots up properly now: root@backup20vh:/usr/src/sys # svnlite log | head -2 ------------------------------------------------------------------------ r368402 | kib | 2020-12-07 01:05:39 +0000 (Mon, 07 Dec 2020) | 3 lines root@backup20vh:/usr/src/sys # svnlite status M dev/mfi/mfi_tbolt.c root@backup20vh:/usr/src/sys # svnlite diff Index: dev/mfi/mfi_tbolt.c =================================================================== --- dev/mfi/mfi_tbolt.c (revision 368414) +++ dev/mfi/mfi_tbolt.c (working copy) @@ -1109,7 +1109,7 @@ if (hdr->cmd == MFI_CMD_PD_SCSI_IO) { /* check for inquiry commands coming from CLI */ - if (cdb[0] != 0x28 && cdb[0] != 0x2A) { + if (cdb[0] != 0x28 || cdb[0] != 0x2A) { if ((req_desc = mfi_tbolt_build_mpt_cmd(sc, cm)) == NULL) { device_printf(sc->mfi_dev, "Mapping from MFI " root@backup20vh:/usr/src/sys # uname -a FreeBSD backup20vh.removed-domain 12.2-STABLE FreeBSD 12.2-STABLE #1 r368414M: Mon Dec 7 17:35:28 UTC 2020 cstamas@backup20vh.removed-domain:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 I even ran zpool scrub and all seems fine. (In reply to Justin Hibbits from comment #2) At the same time, it's strange that the comment talks about "inquiry commands" while 0x28 and 0x2A are READ_10 and WRITE_10 commands respectively. There could be a bug-cancels-bug situation there. I never figured out what this code was dealing with. It came from LSI/Avago/Broadcom with their initial support of Thunderbolt family of RAID cards. This was before they decided to make mrsas. That is why I put in the printf. I assume that the CLI was MegaCLI. Note that the MFI_CMD_PD_SCSI_IO is for commands going to the raw disk versus to the RAID volume. This should mostly be handled as a "SYSPDIO" which they called system physical disk IO, which is a talking straight to the disk. That is handled the "mfisyspd" of the driver. it's kind of strange mode. The real fix is probably to remove the cdb[0] != 0x28 && cdb[0] != 0x2A and always run mfi_tbolt_build_mpt_cmd and not fall through to "DJA NA XXX SYSPDIO". I never found out why we would need to do that check. I put that message there so if it happened we can try to figure out why. Since I've never seen that happen we can probably remove it. A commit references this bug: Author: jhibbits Date: Wed Dec 9 02:07:02 UTC 2020 New revision: 368473 URL: https://svnweb.freebsd.org/changeset/base/368473 Log: dev/mfi: Make a seemingly bogus conditional unconditional Summary: r358689 attempted to fix a clang warning/error by inferring the intent of the condition "(cdb[0] != 0x28 || cdb[0] != 0x2A)". Unfortunately, it looks like this broke things. Instead, fix this by making this path unconditional, effectively reverting to the previous state. PR: kern/251483 Reviewed By: ambrisko MFC after: 2 days Differential Revision: https://reviews.freebsd.org/D27515 Changes: head/sys/dev/mfi/mfi_tbolt.c Could you please add this commit into 12.2-p2? A commit references this bug: Author: jhibbits Date: Sun Dec 13 22:42:48 UTC 2020 New revision: 368620 URL: https://svnweb.freebsd.org/changeset/base/368620 Log: MFC r3684733: dev/mfi: Make a seemingly bogus conditional unconditional Summary: r358689 attempted to fix a clang warning/error by inferring the intent of the condition "(cdb[0] != 0x28 || cdb[0] != 0x2A)". Unfortunately, it looks like this broke things. Instead, fix this by making this path unconditional, effectively reverting to the previous state. PR: kern/251483 Changes: _U stable/12/ stable/12/sys/dev/mfi/mfi_tbolt.c Is this an EN candidate? I just got hit by it badly on a 16 spindle machine as well after doing a freebsd-update to 12.2. (In reply to Bjoern A. Zeeb from comment #10) Yeah, it's a good candidate for EN. Just for the record: I was hit with this panic too, in similar conditions trying to install 12.2-RELEASE on new hardware. I worked this around switching to mrsas(4) that was my intention anyway. Also note that for ZFS it may be preferable configuring HDD/SSD/NVMe as "Non-RAID" media (as opposed to JBOD/RAID0 per drive) in the controller's SETUP and using mrsas(4). Such configuration gives transparent S.M.A.R.T. support for smartmontools, too. I encountered the same problem on DELL PowerEdge T340. The solution is to use mrsas and D248727 ^Triage: apparently fixed back in 2020. The comment about using mrsas may still be correct, however. |