Bug 221350 - Unable to boot/install on HPE Proliant MicroServer Gen10 (AMD Opteron X3000): Hangs/Panics
Summary: Unable to boot/install on HPE Proliant MicroServer Gen10 (AMD Opteron X3000):...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.1-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs mailing list
URL:
Keywords: crash, patch
Depends on:
Blocks:
 
Reported: 2017-08-08 22:23 UTC by Rafal Lukawiecki
Modified: 2018-06-20 06:48 UTC (History)
28 users (show)

See Also:
koobs: mfc-stable9?
koobs: mfc-stable10?
koobs: mfc-stable11?


Attachments
Verbose boot console output at hang point (366.94 KB, image/jpeg)
2017-08-08 22:23 UTC, Rafal Lukawiecki
no flags Details
Non-verbose boot output showing more context before the hang (355.14 KB, image/jpeg)
2017-08-08 22:25 UTC, Rafal Lukawiecki
no flags Details
Panic when booting with "set hint.apic.0.disabled=1" (345.92 KB, image/jpeg)
2017-08-08 22:26 UTC, Rafal Lukawiecki
no flags Details
Panic when booting with ACPI off in boot options menu (263.75 KB, image/jpeg)
2017-08-08 22:28 UTC, Rafal Lukawiecki
no flags Details
Kernel Display Messages of the last boot (11.12 KB, text/plain)
2017-09-12 18:46 UTC, Thomas Neuber
no flags Details
verbose kernel display messages (58.84 KB, text/plain)
2017-09-25 16:06 UTC, Thomas Neuber
no flags Details
output of pciconf -lvbce (10.31 KB, text/plain)
2017-09-25 16:07 UTC, Thomas Neuber
no flags Details
output of devinfo -vr (15.75 KB, text/plain)
2017-09-25 16:07 UTC, Thomas Neuber
no flags Details
This patch adds a quirk to address the issue (1.23 KB, patch)
2017-10-24 14:57 UTC, Bob Bishop
no flags Details | Diff
verbose kernel messages with applied patch (61.38 KB, text/plain)
2017-11-29 15:46 UTC, Thomas Neuber
no flags Details
output of devinfo -vr with applied patch (15.85 KB, text/plain)
2017-11-29 15:47 UTC, Thomas Neuber
no flags Details
Difference Report of devinfo -vr output (56.88 KB, text/html)
2017-11-29 15:54 UTC, Thomas Neuber
no flags Details
Difference Report of dmesg (421.22 KB, text/html)
2017-11-29 16:02 UTC, Thomas Neuber
no flags Details
Panic trace with xhci driver attach (217.26 KB, image/jpeg)
2018-05-15 13:00 UTC, Rajesh
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rafal Lukawiecki 2017-08-08 22:23:28 UTC
Created attachment 185171 [details]
Verbose boot console output at hang point

I am unable to install/run FreeBSD 11.1 on a brand-new HPE MicroServer Gen10 containing an AMD Opteron X3421 with 16GB RAM. The installer hangs just after it prints:

pci0: <ACPI PCI bus> on pcib0

Please see attached console screenshot showing verbose output which suggests that the boot process hangs just as it has collected information about the RAM on pcib0. This is while booting from a USB (several tried and tested and integrity validated) or when moving an existing pre-installed elsewhere system. Same affects FreeNAS 11.0 on this system.

Switching off ACPI during boot causes a kernel panic with a message:

panic: running without device atpic requires a local APIC

Trying to set hint.apic.0.disabled=1 (tried with apic.0 till apic.3 on this 4-core machine) causes a different panic:

Fatal trap 12: page fault while in kernel mode.

I have started a forum discussion about this issue: https://forums.freebsd.org/threads/61936/

It has also been referred to on FreeNAS forum, as others have been affected:
https://forums.freenas.org/index.php?threads/installation-stop-after-a-few-seconds-on-a-microserver-gen10.56809/#post-399023

I only have a basic understanding of FreeBSD but I am a long-time IT professional and I would be happy to submit further traces and try suggestions to help debug it. As-is, it seems like FreeBSD is incompatible with HPE Proliant MicroServer Gen10.

Thank you for your kind help.
Comment 1 Rafal Lukawiecki 2017-08-08 22:25:49 UTC
Created attachment 185172 [details]
Non-verbose boot output showing more context before the hang
Comment 2 Rafal Lukawiecki 2017-08-08 22:26:19 UTC
Created attachment 185173 [details]
Panic when booting with "set hint.apic.0.disabled=1"
Comment 3 Rafal Lukawiecki 2017-08-08 22:28:27 UTC
Created attachment 185174 [details]
Panic when booting with ACPI off in boot options menu
Comment 4 Rafal Lukawiecki 2017-08-09 17:35:29 UTC
Following suggestion by Miroslav Lachman I have tested a few other releases of FreeBSD to see if this issue still persists and if it was a regression. Unfortunately, in all tests, 9.3-12.0-CURRENT, I get exactly the same error. To be precise, these are the versions that I have tested:

FreeBSD-9.3-RELEASE-amd64-memstick.img
FreeBSD-11.1-RELEASE-amd64-memstick.img
FreeBSD-11.1-STABLE-amd64-20170807-r322164-memstick.img
FreeBSD-12.0-CURRENT-amd64-20170807-r322167-memstick.img
Comment 5 Kubilay Kocak freebsd_committer freebsd_triage 2017-08-11 08:20:14 UTC
Thank you for testing all major version branches Rafal
Comment 6 Thomas Neuber 2017-08-31 18:44:50 UTC
At first, I can confirm the behavior as described before. 

In addition to that: I had a working installation for a HPE Proliant Microserver Gen8 (Intel Xeon E3 v1220L). The old system did not support UEFI. The old installation (SD card copied 1:1 to a USB stick + the 4 hard drives) runs properly on the Gen10 after switching the settings to UEFI with CSM Enabled + Boot Mode Legacy Only - except the console. I expected the console menu after certain amount of time. 

The console output stops at:
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pcib0: _OSC returned error 0x10
pci0: <ACPI PCI bus> on pcib0

As I said, everything else works fine.
Comment 7 Jussi Hagman 2017-09-11 18:33:13 UTC
I can confirm this too. Trying to install FreeBSD (and FreeNAS) hang as described by Rafal.

I would love to provide more information to help debugging and fixing this problem.


I can run Linux on the box, but would really prefer FreeNAS.
Comment 8 Thomas Neuber 2017-09-12 17:15:21 UTC
I created an UEFI aware installation on a different PC and repeated the test as described in comment #6. The overall behavior booting with UEFI is exactly the same as with legacy boot. All configured (network) services (samba, web, ssh, several jails, etc.) are up and running but the console hangs. From a remote location, everything is functional with the expected level of performance. But locally, it is still unusable.

Let me know, if I should provide any additional information, run any test cases, etc.
Comment 9 Thomas Neuber 2017-09-12 18:46:54 UTC
Created attachment 186306 [details]
Kernel Display Messages of the last boot

Compared to this file, the visible console messages stop after line 58. The next output would be:
vgapci0: <VGA-compatible display> port 0xf000-0xf0ff mem 0xe0000000-0xefffffff,0xf0000000-0xf07fffff at device 1.0 on pci0
vgapci0: Boot video device
Comment 10 Jan Bramkamp 2017-09-25 09:51:30 UTC
I just ran into the same bug.
Comment 11 Thomas Neuber 2017-09-25 16:06:46 UTC
Created attachment 186707 [details]
verbose kernel display messages
Comment 12 Thomas Neuber 2017-09-25 16:07:19 UTC
Created attachment 186708 [details]
output of pciconf -lvbce
Comment 13 Thomas Neuber 2017-09-25 16:07:46 UTC
Created attachment 186709 [details]
output of devinfo -vr
Comment 14 Mikael D 2017-10-05 10:24:12 UTC
(In reply to Thomas Neuber from comment #8)

Do you mean that it boots if VGA is not used?(In reply to Thomas Neuber from comment #8)

Do you mean that it could run headless (Without VGA) ?
Comment 15 Thomas Neuber 2017-10-05 17:19:30 UTC
(In reply to Mikael D from comment #14)

My system is up and running stable for the last 9 days. Last restart was after an update from Freenas 11.0 U3 to U4. The headless use is definitely possible. The challenge was the installation without a working local console (VGA). I am not aware whether an installation from a remote location is possible and how does it works. It utilized another PC to set up the system.

The console does not work with any of the two Displayports or the VGA connector. Is the Carrizo chipset supported by the kernel?

[root@jupiter ~]# uname -a                                                                                                          
FreeBSD jupiter.fritz.box 11.0-STABLE FreeBSD 11.0-STABLE #0 r321665+25fe8ba8d06(freenas/11.0-stable): Mon Sep 25 06:24:11 UTC 2017 
    root@gauntlet:/freenas-11-releng/freenas/_BE/objs/freenas-11-releng/freenas/_BE/os/sys/FreeNAS.amd64  amd64                     
[root@jupiter ~]# uptime                                                                                                            
 7:00PM  up 9 days,  1:05, 0 users, load averages: 0.14, 0.14, 0.16                                                                 
[root@jupiter ~]#
Comment 16 Mikael D 2017-10-05 17:23:34 UTC
(In reply to Thomas Neuber from comment #15)

This is the first time ever that i encounter any *nix system that does not boot due to faulty VGA, kind of curious what the issue is.

Anyway, I hope the FreeBSD people here can fix it and I'll try running it headless once i receive my MicroServer Gen10 next Tuesday. (I've gotten myself a Gen8 too so i can have one spare and use it to install FreeNAS to the USB).

I'll do some tests and report my findings (though i think all is covered here already).

Thanks for the quick reply!
Comment 17 Jure K. 2017-10-09 07:57:45 UTC
Diffenrent system, but same symptoms: OpenBSD also gets stuck. The trick is to disable ACPI at boot. Hope this info leads you on the right track.
Comment 18 Mikael D 2017-10-09 08:00:00 UTC
(In reply to Jure K. from comment #17)

It was mentioned above that this causes a kernel panic:

"Switching off ACPI during boot causes a kernel panic with a message"
Comment 19 Jan Bramkamp 2017-10-09 09:04:24 UTC
Disabling ACPI is no longer an acceptable workaround for modern system because it disables a lot more than just power management for laptops.
Comment 20 Jörn Lentes 2017-10-09 11:08:07 UTC
I have the same problem with my HP Gen10 MicroServer with AMD Opteron X3216.

non-BSD based systems boot up fine from USB.
Comment 21 Jan Bramkamp 2017-10-09 11:32:37 UTC
The system does boot, but the only console hangs as soon as the kernel takes over from the UEFI firmware. I installed mine with mfsBSD and once FreeBSD is installed (including network configuration and SSH server) it works just fine, but it is annoying to have a headless x86 box around. And a USB<->RS232 adapter doesn't work as console because the USB stack comes up too late in the boot process.
Comment 22 Mikael D 2017-10-09 12:32:25 UTC
(In reply to Jan Bramkamp from comment #21)

Hopefully it can be fixed before we run into any boot/hardware issues that require the console before SSH has loaded.
Comment 23 Igor Porozov 2017-10-16 09:09:34 UTC
We succeed in running FreeBSD installed on another computer. 
It turns out that the video driver is hanging up, but the system itself is loaded further. 
For the full load, you need to add hard drives to fstab and network adapters in rc.conf. 
Hard drives are called ada, and network interfaces are called bge. 
After that, you can connect via SSH.
Comment 24 Jörn Lentes 2017-10-16 09:32:06 UTC
Is there a possibility to have an unattended install to workaround that problem?
What information do I need to add to the usb boot image?
Comment 25 Igor Porozov 2017-10-16 10:07:34 UTC
You can not make the installation, because freebsd does not work with the video installed in this AMD's APU. 
It is necessary to install and first configure freebsd on another computer, then rearrange the hard disk in the microserver and add information about disks and network controllers using livecd.
Comment 26 Jan Bramkamp 2017-10-16 15:50:00 UTC
FreeBSD can be installed on headless systems like the HP Gen 10 Microservers. The most common tool for the job is mfsBSD. You can put mfsBSD on a USB stick and boot from the USB stick. MfsBSD includes some scripts to start dhclient on all ethernet like interfaces and starts the SSH server by default. You can either use the minimal install er script included in mfsBSD or run bsdinstall over SSH. Due to a bug in bsdinstall you have to create /usr/freebsd-dist and load the MANIFEST manually if you want to go that way.
Comment 27 Jörn Lentes 2017-10-17 17:43:50 UTC
(In reply to Igor Porozov from comment #25)
Thanks a lot! This workaround did it. On another computer I booted from USB and installed onto another USB stick.
Network was set to DHCP.

I was able to plugin this stick into the MircoServer and boot with it. Figured out the IP it got assigned from my router and was able to access the web console.
Comment 28 Bob Bishop 2017-10-24 09:26:58 UTC
Adding: hw.pci.realloc_bars="1"
to /boot/loader.conf on an installed system will make the VGA console functional.

You can make the same modification to /boot/loader.conf on an installer memstick and that will then work to do a normal install. Tested with 11.1-R
Comment 29 Jan Bramkamp 2017-10-24 10:16:10 UTC
The same should be possible from the bootloader prompt avoiding the need to change the install medium at all. Just add it to the /boot/loader.conf when the installer prompts if you want to enter the fresh installation.
Comment 30 Bob Bishop 2017-10-24 11:46:53 UTC
(In reply to Jan Bramkamp from comment #29)

Yes. If you break out to the loader prompt and type:

set hw.pci.realloc_bars 1
boot

the installation proceeds as expected, and you can fix loader.conf at the end of installation.
Comment 31 Bob Bishop 2017-10-24 14:57:11 UTC
Created attachment 187442 [details]
This patch adds a quirk to address the issue

I've attached a patch that fixes the problem. It's tested against 11.1-R but should apply to HEAD.
Comment 32 Jan Bramkamp 2017-10-25 10:30:47 UTC
I tested the patch against FreeBSD 11.1/amd64 and it solved the problem.
Comment 33 Jan Bramkamp 2017-10-25 10:32:53 UTC
(In reply to Bob Bishop from comment #30)
The correct syntax is:
  
  set <name>=<value>

instead of:
  set <name> <value>
Comment 34 Bob Bishop 2017-10-25 10:37:38 UTC
(In reply to Jan Bramkamp from comment #33)
Of course.
Comment 35 Rafal Lukawiecki 2017-10-26 10:33:04 UTC
Thank you, everyone, and Bob Bishop for the patch. I am impressed the community were able to fix this, but I am sorry I will not be able to test it, as I have since returned the Gen 10 MicroServer back to HPE. It has been replaced with a very-well functioning SuperMicro machine, which runs FreeBSD like a charm.
Comment 36 Andrew Irwin 2017-11-05 13:38:58 UTC
(In reply to Bob Bishop from comment #30)
I have successfully tested the work around with FreeNAS-11.0-U4.  The permanent setting can be set through the web interface under System->Tunables
Variable:  hw.pci.realloc_bars
Value: 1
Type: Loader
Comment 37 Alexander Motin freebsd_committer 2017-11-27 08:35:09 UTC
Can somebody try to compare verbose `dmesg` and `devinfo -vr` with and without the attached patch to find out what exactly it change in resource allocation?

PS: I am not sure it is a video adapter problem, not a BIOS, so the patch may apply workaround at the wrong place (be somewhat overaggressive).
Comment 38 Thomas Neuber 2017-11-29 15:46:17 UTC
Created attachment 188398 [details]
verbose kernel messages with applied patch
Comment 39 Thomas Neuber 2017-11-29 15:47:22 UTC
Created attachment 188399 [details]
output of devinfo -vr with applied patch
Comment 40 Thomas Neuber 2017-11-29 15:54:55 UTC
Created attachment 188401 [details]
Difference Report of devinfo -vr output

The configuration of storage and usb devices has been changed slightly, but the onboard configuration is excatly the same as before.
Comment 41 Thomas Neuber 2017-11-29 16:02:09 UTC
Created attachment 188403 [details]
Difference Report of dmesg

Same as before - storage and usb configuration changed, everything else unchanged.
Comment 42 mishu 2017-11-30 18:09:28 UTC
What is the path the patch has to take to get to master? This way we can gauge if we should wait or try to figure out how to apply & build the patch to current code.
Comment 43 Bob Bishop 2017-11-30 19:35:40 UTC
(In reply to mishu from comment #42)
Don't know about FreeNAS, but I think the patch applies to FreeBSD HEAD.
Comment 44 Thomas Neuber 2017-11-30 20:31:24 UTC
I realized my yesterdays tests as follows: I built a complete FreeNAS system from the freenas-master sources. That uses FreeBSD 11.1-STABLE as you can see in dmesg output. I pulled the all the sources and applied the attached patch. I tried the resulting ISO installation image to setup the system. Console output is now ok. Everything works fine.
Comment 45 Bob Bishop 2017-12-07 10:29:59 UTC
(In reply to Alexander Motin from comment #37)
The patch is in the form of a quirk, so its effect on other hardware should be precisely zero. Any reason why it shouldn't be committed?
Comment 46 robert_welsh 2017-12-08 18:55:43 UTC
Comment on attachment 185172 [details]
Non-verbose boot output showing more context before the hang

Have reproduced this same error on HPE Gen10 Microserver when attempting to boot from USB thumb drive to install FreeNAS11U4.
Comment 47 curt 2018-03-15 04:19:01 UTC
I am experiencing the same behavior (locks after pci0: <ACPI PCI bus> on pcib0) using a USB FreeBSD 11.1 boot media. My system is a HP MicroServer Gen10 (X3421). I used the workaround (hw.pci.realloc_bars=1) to get it booted, installed and running (put hw.pci.realloc_bars=1 in /boot/loader.conf). Looking forward to a patch.
Comment 48 Rajesh 2018-05-15 12:59:34 UTC
I hit a similar issue when I try to install FreeBSD 11.1 release (memstick image), in one of our development boards.

Sometimes, I see installation hangs after the below messages
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0
pcib0: _OSC returned error 0x10
pci0: <ACPI PCI bus> numa-domain 0 on pcib0

But, most of the time, I see a panic trace as seen in the attached image and the system reboots.  I see the panic during PCI device attach (like "hpet_attach" or "xhci_pci_attach"). It's like if I disable one device in BIOS, its proceeds furthur and panics in another driver. I could see "_OSC returned error 0x10" messages here as well(but it proceeds further without hang and then panics).

I tried setting hw.pci.realloc_bars=1 from boot loader prompt. It doesn't help.

So, could this issue be related to the base issue reported in this bug?
Comment 49 Rajesh 2018-05-15 13:00:26 UTC
Created attachment 193428 [details]
Panic trace with xhci driver attach
Comment 50 Bob Bishop 2018-05-15 13:12:38 UTC
(In reply to Rajesh from comment #48)

Doesn't sound like it's related.
Comment 51 Rajesh 2018-05-16 18:02:01 UTC
(In reply to Bob Bishop from comment #50)

Thanks for your response Bob. But I have couple of questions

1. __OSC returned error 0x10 -  What does this message mean? Is it really an issue?

2. Before the panic, we see "Unable to map MSI-X table" (since bus_alloc_resource_any call fails in xhci_pci_attach). So, driver falls to alloc MSI (pci_alloc_msi), which also fails because rman_manage_region fails (in nexus_add_irq).  So, does this mean PCI BAR mappings are not proper to allocate enough resources to the device? If so, what should be checked in this case?