Bug 211081 - LSI MegaRAID SAS 9271-8i not detected after upgrade to 11; card doesn't attach to bus on 11
Summary: LSI MegaRAID SAS 9271-8i not detected after upgrade to 11; card doesn't attac...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: John Baldwin
URL:
Keywords: patch, regression
Depends on:
Blocks:
 
Reported: 2016-07-13 14:56 UTC by Sergey Renkas
Modified: 2016-08-08 07:07 UTC (History)
6 users (show)

See Also:


Attachments
pciconf.log (7.43 KB, text/plain)
2016-07-14 06:41 UTC, Sergey Renkas
no flags Details
picconf.log 11 version (7.20 KB, text/plain)
2016-07-14 07:27 UTC, Sergey Renkas
no flags Details
dmesg.boot 11 version (8.66 KB, text/plain)
2016-07-14 07:28 UTC, Sergey Renkas
no flags Details
dmesg.boot 11 version boot -v (56.67 KB, text/plain)
2016-07-14 08:08 UTC, Sergey Renkas
no flags Details
pciconf_lcbBev_pcib11 (1.20 KB, text/plain)
2016-07-15 06:18 UTC, Sergey Renkas
no flags Details
pci_hp_tunable.patch (657 bytes, patch)
2016-07-28 15:02 UTC, John Baldwin
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Sergey Renkas 2016-07-13 14:56:46 UTC
Use the raid controller LSI MegaRAID SAS 9271-8i.

#uname -a
FreeBSD ... 10.3-STABLE FreeBSD 10.3-STABLE #0: Wed Jul 13 16:45:24 MSK 2016     ... amd64

# mfiutil show adapter
mfi0 Adapter:
    Product Name: LSI MegaRAID SAS 9271-8i
   Serial Number: SV42934928
        Firmware: 23.34.0-0005
     RAID Levels: JBOD, RAID0, RAID1, RAID5, RAID6, RAID10, RAID50
  Battery Backup: present
           NVRAM: 32K
  Onboard Memory: 1024M
  Minimum Stripe: 8K
  Maximum Stripe: 1M

After upgrading to 11.0-Stable, revision: 302669, raid controller became invisible.
Comment 1 Enji Cooper freebsd_committer freebsd_triage 2016-07-13 20:40:10 UTC
Have you tried the mrsas driver?

I'm CCing one of the driver maintainers for more input. kadesai hasn't registered
in Bugzilla yet (he maintains one of the other drivers).

pciconf -lv and /var/run/dmesg.boot output would help too.
Comment 2 Sergey Renkas 2016-07-14 06:41:12 UTC
Created attachment 172494 [details]
pciconf.log
Comment 3 Enji Cooper freebsd_committer freebsd_triage 2016-07-14 06:43:52 UTC
(In reply to Sergey Renkas from comment #2)

Is this from when you're booted into 10.3-STABLE ?
Comment 4 Sergey Renkas 2016-07-14 06:50:20 UTC
(In reply to Ngie Cooper from comment #3)
Yes! In the 11th version of the mfid device not shown.
Comment 5 Enji Cooper freebsd_committer freebsd_triage 2016-07-14 06:52:27 UTC
(In reply to Sergey Renkas from comment #4)

Ok. pciconf -lv and /var/run/dmesg.boot are more valuable from 11.x than 10.3-STABLE, but having both of those items from 10.3-STABLE would be helpful as well for comparing things.
Comment 6 Sergey Renkas 2016-07-14 06:58:41 UTC
(In reply to Ngie Cooper from comment #5)
It will take some time to run a live-cd 11 version.
Comment 7 Sergey Renkas 2016-07-14 07:27:22 UTC
Created attachment 172495 [details]
picconf.log 11 version
Comment 8 Sergey Renkas 2016-07-14 07:28:11 UTC
Created attachment 172496 [details]
dmesg.boot 11 version
Comment 9 Sergey Renkas 2016-07-14 07:33:41 UTC
(In reply to Sergey Renkas from comment #6)
Files already added.
Comment 10 Enji Cooper freebsd_committer freebsd_triage 2016-07-14 07:38:33 UTC
(In reply to Sergey Renkas from comment #7)

The device isn't present here, i.e. there's nothing for mfi(4) to attach to according to the PCI bus.

This output at boot looks interesting...

unknown: I/O range not supported

That exact message doesn't pop up on my machine. It just says,

atrtc0: Warning: Couldn't map I/O.

Could you please boot 11.0 with boot -v and attach the output of /var/run/dmesg.boot?
Comment 11 Sergey Renkas 2016-07-14 08:08:18 UTC
Created attachment 172498 [details]
dmesg.boot 11 version boot -v
Comment 12 Sergey Renkas 2016-07-14 08:09:02 UTC
(In reply to Ngie Cooper from comment #10)
is ready
Comment 13 Kashyap 2016-07-14 14:18:24 UTC
Hi I am Kashyap..manage <mrsas> driver for LSI/Avago MR controller. Let me know if I have to look at this. 

From this discussion and attached dmesg/pci logs, I can see that device is not even detected at PCI layer, so driver is not involved for Device not getting detected. Also could not able to find anything odd from logs...

May be others in this BZ can point out ..
Comment 14 Enji Cooper freebsd_committer freebsd_triage 2016-07-14 15:09:18 UTC
(In reply to Sergey Renkas from comment #11)

Ok. Something's not working from the ACPI/PCI end of things. Is your BIOS up to date? Are you using the traditional BIOS boot or a UEFI boot? What's your motherboard model?
Comment 15 Sergey Renkas 2016-07-14 15:28:00 UTC
(In reply to Ngie Cooper from comment #14)
Supermicro X7DBU. "traditional BIOS" 2.1a 12/20/08
FreeBSD 8-9-10 they all worked ... 11 non funziona! )
Comment 16 John Baldwin freebsd_committer freebsd_triage 2016-07-14 19:21:29 UTC
Please capture 'pciconf -lcBbev' output from 11.  The PLX PCI-express switch is HotPlug capable and it seems that the bridge is not being enumerated.  The pciconf output will tell me if the PLX chip is claiming that there is nothing plugged in.

(For future reference, pciconf -lv doesn't have the really useful bits, the useful bits are generally in -c and to a lesser extent -b, -B, and -e).

You can just run this against the relevant bridges btw, e.g.:

'pciconf -lcbBev pcib11'
Comment 17 Sergey Renkas 2016-07-15 06:18:45 UTC
Created attachment 172545 [details]
pciconf_lcbBev_pcib11
Comment 18 Sergey Renkas 2016-07-15 06:20:02 UTC
(In reply to John Baldwin from comment #16)
done...waiting for a miracle.)
Comment 19 John Baldwin freebsd_committer freebsd_triage 2016-07-15 11:28:45 UTC
Comment on attachment 172545 [details]
pciconf_lcbBev_pcib11

Interesting, this PCI-PCI bridge has many more of the "optional" hot plug features.  In this case it claims that there is a mechanical latch (MRL) that holds the card in that is open.  We refuse to attach to the bus if the MRL is open (and in general we try to
power down the slot assuming that a user has opened the latch to remove the card).

Can you see any sort of latch near the slot the card is in?
Comment 20 Sergey Renkas 2016-07-15 12:44:02 UTC
(In reply to John Baldwin from comment #19)
There is no latch. Can send pictures to your E-Mail.
Comment 21 John Baldwin freebsd_committer freebsd_triage 2016-07-15 13:29:21 UTC
Is the LSI adapter in a physical slot or is it integrated onto the motherboard?

If you can send me pictures that would be nice, perhaps one with no card in the slot at all if possible?
Comment 22 Sergey Renkas 2016-07-15 13:41:48 UTC
(In reply to John Baldwin from comment #21)
Is the LSI adapter in a physical slot... sent pictures to your E-Mail.(In reply to John Baldwin from comment #21)
Is the LSI adapter in a physical slot... sent picture to your E-Mail.
Comment 23 John Baldwin freebsd_committer freebsd_triage 2016-07-26 01:29:39 UTC
So this add-on card doesn't appear to honor the spec.  It should be wiring the
sensor closed or some such if it isn't implemented.  OTOH, it seems like the Linux hotplug code (by my reading) doesn't actually care if the sensor is open.

As a test, can you try patching pcib_hotplug_inserted() in sys/dev/pci/pci_pci.c to comment out the last check?  (The lines under the 'If the MRL is disengaged...' comment)
Comment 24 Sergey Renkas 2016-07-27 06:39:14 UTC
(In reply to John Baldwin from comment #23)
The card is detected. But at mount time the error was detected:
interrupt strom detected on "irq259" throtting interrupt source
...
and so on to infinity.
Comment 25 John Baldwin freebsd_committer freebsd_triage 2016-07-27 18:20:11 UTC
Grr, ok.  Let's try to narrow down why it breaks.  I will probably add a tunable to disable hotplug entirely, though that will require a manual tunable to be set for this box unfortunately.

Can you drop your earlier patch and instead hack pcib_probe_hotplug() to always be empty (in particular, to never set PCIB_HOTPLUG)?
Comment 26 Sergey Renkas 2016-07-28 10:23:34 UTC
(In reply to John Baldwin from comment #25)
Hacking pcib_probe_hotplug() was successful.

pcib_probe_hotplug(struct pcib_softc *sc)
{
}  
The disk was successfully mounted.
Comment 27 John Baldwin freebsd_committer freebsd_triage 2016-07-28 15:02:13 UTC
Created attachment 173071 [details]
pci_hp_tunable.patch

Please try the attached patch.  You will need to set 'hw.pci.enable_pcie_hp=0' in /boot/loader.conf for now.

I still want to try to figure out what write to the slot control register is breaking your device, but this tunable is something I can merge to 11.0 at least.
Comment 28 Sergey Renkas 2016-07-29 07:22:44 UTC
(In reply to John Baldwin from comment #27)
Thanks, the patch works.
Comment 29 commit-hook freebsd_committer freebsd_triage 2016-07-29 17:54:42 UTC
A commit references this bug:

Author: jhb
Date: Fri Jul 29 17:54:21 UTC 2016
New revision: 303497
URL: https://svnweb.freebsd.org/changeset/base/303497

Log:
  Add a loader tunable (hw.pci.enable_pcie_hp) to disable PCI-e HotPlug.

  Some systems and/or devices (such as riser cards) do not include a
  non-compliant implementation of PCI-e HotPlug that can result in devices
  not being attached (e.g. the HotPlug code might assume that a card is
  being unplugged and will power the slot off and detach it).  This
  tunable can be set to 0 to disable support for PCI-e HotPlug ignoring
  the incorrect HotPlug state on these slots.

  PR:		211081
  Reported by:	Sergey Renkas <serg_ic@mail.ru> (SuperMicro X7 riser card)
  Reported by:	Jeffrey E Pieper <jeffrey.e.pieper@intel.com>
  	 	(Intel X520 adapter)
  MFC after:	1 week
  Relnotes:	yes

Changes:
  head/sys/dev/pci/pci_pci.c
Comment 30 commit-hook freebsd_committer freebsd_triage 2016-08-01 22:19:51 UTC
A commit references this bug:

Author: jhb
Date: Mon Aug  1 22:19:23 UTC 2016
New revision: 303645
URL: https://svnweb.freebsd.org/changeset/base/303645

Log:
  Disable PCI hotplug support for slots with power controllers.

  After further review of the spec, I do not think the current HotPlug
  code handles slots with power controllers correctly.  In particular,
  the power state of the slot is to be inferred from other events, not
  from examining the state of the power control bit in SLOT_CTL.  For now,
  disable PCI hotplug support on such slots.

  PR:		211081
  Tested by:	Jeffrey E Pieper <jeffrey.e.pieper@intel.com>
  MFC after:	3 days

Changes:
  head/sys/dev/pci/pci_pci.c
Comment 31 commit-hook freebsd_committer freebsd_triage 2016-08-05 18:41:56 UTC
A commit references this bug:

Author: jhb
Date: Fri Aug  5 18:41:51 UTC 2016
New revision: 303781
URL: https://svnweb.freebsd.org/changeset/base/303781

Log:
  MFC 303497,303559,303645: Disable PCI-e hotplug on bridges with power
  controllers.

  303497:
  Add a loader tunable (hw.pci.enable_pcie_hp) to disable PCI-e HotPlug.

  Some systems and/or devices (such as riser cards) do not include a
  non-compliant implementation of PCI-e HotPlug that can result in devices
  not being attached (e.g. the HotPlug code might assume that a card is
  being unplugged and will power the slot off and detach it).  This
  tunable can be set to 0 to disable support for PCI-e HotPlug ignoring
  the incorrect HotPlug state on these slots.

  303559:
  Try to declare _hw_pci for all sysctl cases needed after r303497.

  303645:
  Disable PCI hotplug support for slots with power controllers.

  After further review of the spec, I do not think the current HotPlug
  code handles slots with power controllers correctly.  In particular,
  the power state of the slot is to be inferred from other events, not
  from examining the state of the power control bit in SLOT_CTL.  For now,
  disable PCI hotplug support on such slots.

  PR:		211081
  Approved by:	re (gjb)

Changes:
_U  stable/11/
  stable/11/sys/dev/pci/pci_pci.c