Bug 196944 - [bge] [ipmi] regression IPMI access disabled when bge driver is loaded
Summary: [bge] [ipmi] regression IPMI access disabled when bge driver is loaded
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-BETA2
Hardware: amd64 Any
: Normal Affects Some People
Assignee: Pyun YongHyeon
URL: https://svnweb.freebsd.org/changeset/...
Keywords: regression
Depends on:
Blocks:
 
Reported: 2015-01-20 17:29 UTC by Laurent Frigault
Modified: 2016-03-01 19:59 UTC (History)
2 users (show)

See Also:
koobs: mfc-stable9?
koobs: mfc-stable10?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Laurent Frigault 2015-01-20 17:29:38 UTC
hardware :  dell poweredge 860
bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x004101> mem 0xfe5f0000-0xfe5fffff irq 16 at device 0.0 on pci4
bge0: CHIP ID 0x00004101; ASIC REV 0x04; CHIP REV 0x41; PCI-E
miibus0: <MII bus> on bge0
brgphy0: <BCM5750 1000BASE-T media interface> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow

# pciconf -lvb bge0
bge0@pci0:4:0:0:        class=0x020000 card=0x01e61028 chip=0x165914e4 rev=0x11 hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = 'NetXtreme BCM5721 Gigabit Ethernet PCI Express'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 64, base rxfe5f0000, size 65536, enabled


Until 9.1p4 (at least), hw.bge.allow_asf which was set to 1 by default allow  cooperating with IPMI .

Under 10.1, on the same hardware, this does not work any more. hw.bge.allow_asf is still set to 1 by default, but IPMI access is disabled when the driver is loaded.

This looks like a regression between 9.1 and 10.1
Comment 1 Andrew Daugherity 2016-02-18 20:22:08 UTC
I can confirm this regression on a Dell PowerEdge SC1435 with the same BCM5721 NICs:
% pciconf -lvb bge0
bge0@pci0:1:0:0:	class=0x020000 card=0x01eb1028 chip=0x165914e4 rev=0x21 hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = 'NetXtreme BCM5721 Gigabit Ethernet PCI Express'
    class      = network
    subclass   = ethernet
    bar   [10] = type Memory, range 64, base rxefcf0000, size 65536, enabled

bge0: <Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x004201> mem 0xefcf0000-0xefcfffff irq 33 at device 0.0 on pci1
bge0: CHIP ID 0x00004201; ASIC REV 0x04; CHIP REV 0x42; PCI-E
miibus0: <MII bus> on bge0
brgphy0: <BCM5750 1000BASE-T media interface> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow

I am running 10.2 and IPMI ceases to work after the kernel loads.  I previously ran Linux on this hardware and IPMI worked fine.  IPMI shares a physical port with bge0 but has its own MAC address and IP.

I have tested other versions of FreeBSD install images, and for my hardware at least, the regression seems to be between 9.1 and 9.2:
9.1: works
9.2: does not work
9.3: does not work
10.2: does not work

Interestingly, I have some PE850 (not 860) running 9.3 that also have a BCM5721 bge0, and IPMI *does* work on 3/4 of them, but only when connecting from the local subnet, despite the gateway being set in IPMI config.  Not sure what's broken with the last one, or when the others started working at all; I know the last time I tried IPMI on those a couple years ago it failed the same way (worked at boot, stopped working once kernel initialized bge0), but that was in the 7.x or 8.x days.

My SC1435 has IPMI 2.0, unlike the 850 & 860, which only have IPMI 1.5, if that matters.
Comment 2 Andrew Daugherity 2016-02-20 03:42:02 UTC
I've found the commit that breaks it: base r241438 (which was MFC into stable/9 as 243546).  Reading the commit history between 9.1 and 9.2, I saw that r248226 (MFC onto 9-stable as r248858) claims to fix IPMI on a Sun X2200 that broke with 241838, so my first test was to see if it was working just before that.

I built a 9.2 kernel with sys/dev/bge/if_bge.c rolled back to the commit before the "bad" one, 243541 (MFC from 241436), and IPMI works!  (I did not touch any other files.)  Also works for a stable/9 kernel (identified as 9.3-STABLE #1 r243541:295788) with if_bge.c at 243541.

If I update if_bge.c to the commit in question (241438 aka 243546), IPMI is broken once more.  I also tried r248858 (248226) which supposedly fixed IPMI on those Sun servers but it did not help here.  I have not tried any other commits, as it appears that for my hardware, it works on <=241436 and is broken for >=241438.

I also fixed my 10.2 kernel in the same way by rolling back if_bge.c to r241436.  I had to merge r242426 and r242625 to get it to build; after doing so IPMI works in 10.2!

Obviously rolling all the way back like this isn't the solution for everyone, as there have been many other commits since then, but at least I found the breakage point.  I don't know the bge driver or kernel well enough to properly fix it, but hopefully this is good information for someone who does.
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2016-02-23 06:16:15 UTC
Assign to committer for apparent regressing changeset made in HEAD.

This is a 10.3-RELEASE candidate
Comment 4 Kubilay Kocak freebsd_committer freebsd_triage 2016-02-23 06:17:13 UTC
For clarity, this is a regression in 9.x, 10.x, current
Comment 5 Pyun YongHyeon freebsd_committer 2016-02-24 01:06:37 UTC
(In reply to Andrew Daugherity from comment #2)
Thank you very much for narrowing down guilty change set.
I don't see differences in ASF/IPMI code path before/after APE
support except additional H/W reset in 9.1.
If you don't configure bge(4) at all(i.e. kernel just attaches
driver), does the IPMI work?
Comment 6 Andrew Daugherity 2016-02-25 00:19:15 UTC
No, it doesn't.  The only difference is the interface speed is 100BaseTX at boot and then 1000BaseT after running ifconfig or dhclient, but IPMI ceases to work once the kernel loads, before any interface configuration is done.

However, I have found a workaround: enabling PXE in the BIOS.  I'm still booting via local disk, not over PXE, but with PXE enabled, it prints a message during BIOS load and apparently resets/initializes the NIC in such a way that IPMI still works after FreeBSD loads its bge driver.

To clarify: with FreeBSD 9.1 (and my test kernels with if_bge.c rolled back) and Linux, IPMI works regardless of PXE setting.

With FreeBSD >= 9.2, IPMI only works when PXE is enabled.  This is true for both the PowerEdge 850 and PowerEdge SC1435, and I would expect the 860 as well.

For completeness, I also tested OpenBSD (snapshot) and NetBSD 7.0, and IPMI also breaks with both of those, even with PXE enabled.

The default Dell BIOS setting is "enabled with PXE" for bge0 and "enabled without PXE" for bge1, but I had disabled PXE on some systems to speed up booting and avoid accidentally booting the wrong device.
Comment 7 Pyun YongHyeon freebsd_committer 2016-02-25 05:10:27 UTC
(In reply to Andrew Daugherity from comment #6)
Thanks for PXE related clue. But I've confused with ifconfig/dhclient
command.  When did you run those commands?
bge(4) does not report current link speed if the interface
is not UP.  So if you can see established link it means you
initialized/upped the controller.
By upping interface bge(4) will initialize the controller which in turn
will touch many registers.  The same is true for dhclient(8).  The first
thing dhclint(8) does is UP the interface.

In order not to touch bge(4) H/W in bge_init(), you should not have any
'ifconfig_bge0=xxxx' line in rc.conf.  What I'd like to know is whether
IPMI is broken by bge_attach() call. Could you check it?
Comment 8 Andrew Daugherity 2016-02-26 19:41:00 UTC
I've done most of my testing with the FreeBSD memdisk install images, both with release kernels and test kernels copied onto the USB key.  After choosing "Live CD" and logging in, no network is configured until I run 'dhclient bge0' or 'ifconfig bge0 inet a.b.c.d/NN up'.

Some kernels on the PE850 reported the media speed even before bringing up the interface and others didn't, but that's not the issue here, since it fails when the driver loads, before any configuration happens.

Just to be sure this issue is on attach vs. network configuration, I built a test 9-stable (unmodified r296050) kernel with GENERIC + 'nodevice bge' and tested it with PXE disabled.  IPMI continues to work after this kernel is booted and I log in to the live CD environment, but as soon as I 'kldload if_bge' it breaks.
Comment 9 Pyun YongHyeon freebsd_committer 2016-02-28 11:13:51 UTC
(In reply to Andrew Daugherity from comment #8)
OK, thank you very much for double checking.  Could you try a diff at the following URL?
https://people.freebsd.org/~yongari/bge/bge.ipmi.diff

I don't have access to IPMI-aware bge(4) H/Ws so it's just compile tested.
The diff was generated against HEAD but I guess it will apply to stable/9 or stable/10.
Comment 10 Andrew Daugherity 2016-02-29 23:55:14 UTC
(In reply to Pyun YongHyeon from comment #9)
Unfortunately, the diff does not fix anything on my hardware.  IPMI still works when PXE is enabled and does not work without it.
Comment 11 Pyun YongHyeon freebsd_committer 2016-03-01 06:25:06 UTC
(In reply to Andrew Daugherity from comment #10)
Uploaded updated diff. The URL is the same as before.
Could you test it again?
Comment 12 Andrew Daugherity 2016-03-01 19:59:33 UTC
(In reply to Pyun YongHyeon from comment #11)
Is that diff meant to be cumulative with the previous one or replace it entirely?

With a kernel built with only the new diff (discarding the previous one), there is no change.