Summary: | Unable to boot/install on HPE Proliant MicroServer Gen10 (AMD Opteron X3000): Hangs/Panics | ||
---|---|---|---|
Product: | Base System | Reporter: | Rafal Lukawiecki <raf> |
Component: | kern | Assignee: | John Baldwin <jhb> |
Status: | Closed FIXED | ||
Severity: | Affects Many People | CC: | 214748mv, andrew, anxanywhere, apirwin+freebsd, arinci, crest, crest_maintainer, curt, emaste, freebsd, grog, imp, jeff, jhb, joern.lentes, juhagman, jure.kozamernik, koobs, lg, matthew, mav, mishu, patrick.stasko, pheek, pi, raf, rajeshasp, rb, risaac, robert_welsh, rubendepedro, sysadmin, t.mikael.d, thomas, wulf |
Priority: | Normal | Keywords: | crash, easy |
Version: | 11.1-STABLE | Flags: | rb:
mfc-stable12?
koobs: mfc-stable11? |
Hardware: | amd64 | ||
OS: | Any | ||
Attachments: |
Description
Rafal Lukawiecki
2017-08-08 22:23:28 UTC
Created attachment 185172 [details]
Non-verbose boot output showing more context before the hang
Created attachment 185173 [details]
Panic when booting with "set hint.apic.0.disabled=1"
Created attachment 185174 [details]
Panic when booting with ACPI off in boot options menu
Following suggestion by Miroslav Lachman I have tested a few other releases of FreeBSD to see if this issue still persists and if it was a regression. Unfortunately, in all tests, 9.3-12.0-CURRENT, I get exactly the same error. To be precise, these are the versions that I have tested: FreeBSD-9.3-RELEASE-amd64-memstick.img FreeBSD-11.1-RELEASE-amd64-memstick.img FreeBSD-11.1-STABLE-amd64-20170807-r322164-memstick.img FreeBSD-12.0-CURRENT-amd64-20170807-r322167-memstick.img Thank you for testing all major version branches Rafal At first, I can confirm the behavior as described before. In addition to that: I had a working installation for a HPE Proliant Microserver Gen8 (Intel Xeon E3 v1220L). The old system did not support UEFI. The old installation (SD card copied 1:1 to a USB stick + the 4 hard drives) runs properly on the Gen10 after switching the settings to UEFI with CSM Enabled + Boot Mode Legacy Only - except the console. I expected the console menu after certain amount of time. The console output stops at: pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pcib0: _OSC returned error 0x10 pci0: <ACPI PCI bus> on pcib0 As I said, everything else works fine. I can confirm this too. Trying to install FreeBSD (and FreeNAS) hang as described by Rafal. I would love to provide more information to help debugging and fixing this problem. I can run Linux on the box, but would really prefer FreeNAS. I created an UEFI aware installation on a different PC and repeated the test as described in comment #6. The overall behavior booting with UEFI is exactly the same as with legacy boot. All configured (network) services (samba, web, ssh, several jails, etc.) are up and running but the console hangs. From a remote location, everything is functional with the expected level of performance. But locally, it is still unusable. Let me know, if I should provide any additional information, run any test cases, etc. Created attachment 186306 [details]
Kernel Display Messages of the last boot
Compared to this file, the visible console messages stop after line 58. The next output would be:
vgapci0: <VGA-compatible display> port 0xf000-0xf0ff mem 0xe0000000-0xefffffff,0xf0000000-0xf07fffff at device 1.0 on pci0
vgapci0: Boot video device
I just ran into the same bug. Created attachment 186707 [details]
verbose kernel display messages
Created attachment 186708 [details]
output of pciconf -lvbce
Created attachment 186709 [details]
output of devinfo -vr
(In reply to Thomas Neuber from comment #8) Do you mean that it boots if VGA is not used?(In reply to Thomas Neuber from comment #8) Do you mean that it could run headless (Without VGA) ? (In reply to Mikael D from comment #14) My system is up and running stable for the last 9 days. Last restart was after an update from Freenas 11.0 U3 to U4. The headless use is definitely possible. The challenge was the installation without a working local console (VGA). I am not aware whether an installation from a remote location is possible and how does it works. It utilized another PC to set up the system. The console does not work with any of the two Displayports or the VGA connector. Is the Carrizo chipset supported by the kernel? [root@jupiter ~]# uname -a FreeBSD jupiter.fritz.box 11.0-STABLE FreeBSD 11.0-STABLE #0 r321665+25fe8ba8d06(freenas/11.0-stable): Mon Sep 25 06:24:11 UTC 2017 root@gauntlet:/freenas-11-releng/freenas/_BE/objs/freenas-11-releng/freenas/_BE/os/sys/FreeNAS.amd64 amd64 [root@jupiter ~]# uptime 7:00PM up 9 days, 1:05, 0 users, load averages: 0.14, 0.14, 0.16 [root@jupiter ~]# (In reply to Thomas Neuber from comment #15) This is the first time ever that i encounter any *nix system that does not boot due to faulty VGA, kind of curious what the issue is. Anyway, I hope the FreeBSD people here can fix it and I'll try running it headless once i receive my MicroServer Gen10 next Tuesday. (I've gotten myself a Gen8 too so i can have one spare and use it to install FreeNAS to the USB). I'll do some tests and report my findings (though i think all is covered here already). Thanks for the quick reply! Diffenrent system, but same symptoms: OpenBSD also gets stuck. The trick is to disable ACPI at boot. Hope this info leads you on the right track. (In reply to Jure K. from comment #17) It was mentioned above that this causes a kernel panic: "Switching off ACPI during boot causes a kernel panic with a message" Disabling ACPI is no longer an acceptable workaround for modern system because it disables a lot more than just power management for laptops. I have the same problem with my HP Gen10 MicroServer with AMD Opteron X3216. non-BSD based systems boot up fine from USB. The system does boot, but the only console hangs as soon as the kernel takes over from the UEFI firmware. I installed mine with mfsBSD and once FreeBSD is installed (including network configuration and SSH server) it works just fine, but it is annoying to have a headless x86 box around. And a USB<->RS232 adapter doesn't work as console because the USB stack comes up too late in the boot process. (In reply to Jan Bramkamp from comment #21) Hopefully it can be fixed before we run into any boot/hardware issues that require the console before SSH has loaded. We succeed in running FreeBSD installed on another computer. It turns out that the video driver is hanging up, but the system itself is loaded further. For the full load, you need to add hard drives to fstab and network adapters in rc.conf. Hard drives are called ada, and network interfaces are called bge. After that, you can connect via SSH. Is there a possibility to have an unattended install to workaround that problem? What information do I need to add to the usb boot image? You can not make the installation, because freebsd does not work with the video installed in this AMD's APU. It is necessary to install and first configure freebsd on another computer, then rearrange the hard disk in the microserver and add information about disks and network controllers using livecd. FreeBSD can be installed on headless systems like the HP Gen 10 Microservers. The most common tool for the job is mfsBSD. You can put mfsBSD on a USB stick and boot from the USB stick. MfsBSD includes some scripts to start dhclient on all ethernet like interfaces and starts the SSH server by default. You can either use the minimal install er script included in mfsBSD or run bsdinstall over SSH. Due to a bug in bsdinstall you have to create /usr/freebsd-dist and load the MANIFEST manually if you want to go that way. (In reply to Igor Porozov from comment #25) Thanks a lot! This workaround did it. On another computer I booted from USB and installed onto another USB stick. Network was set to DHCP. I was able to plugin this stick into the MircoServer and boot with it. Figured out the IP it got assigned from my router and was able to access the web console. Adding: hw.pci.realloc_bars="1" to /boot/loader.conf on an installed system will make the VGA console functional. You can make the same modification to /boot/loader.conf on an installer memstick and that will then work to do a normal install. Tested with 11.1-R The same should be possible from the bootloader prompt avoiding the need to change the install medium at all. Just add it to the /boot/loader.conf when the installer prompts if you want to enter the fresh installation. (In reply to Jan Bramkamp from comment #29) Yes. If you break out to the loader prompt and type: set hw.pci.realloc_bars 1 boot the installation proceeds as expected, and you can fix loader.conf at the end of installation. Created attachment 187442 [details]
This patch adds a quirk to address the issue
I've attached a patch that fixes the problem. It's tested against 11.1-R but should apply to HEAD.
I tested the patch against FreeBSD 11.1/amd64 and it solved the problem. (In reply to Bob Bishop from comment #30) The correct syntax is: set <name>=<value> instead of: set <name> <value> (In reply to Jan Bramkamp from comment #33) Of course. Thank you, everyone, and Bob Bishop for the patch. I am impressed the community were able to fix this, but I am sorry I will not be able to test it, as I have since returned the Gen 10 MicroServer back to HPE. It has been replaced with a very-well functioning SuperMicro machine, which runs FreeBSD like a charm. (In reply to Bob Bishop from comment #30) I have successfully tested the work around with FreeNAS-11.0-U4. The permanent setting can be set through the web interface under System->Tunables Variable: hw.pci.realloc_bars Value: 1 Type: Loader Can somebody try to compare verbose `dmesg` and `devinfo -vr` with and without the attached patch to find out what exactly it change in resource allocation? PS: I am not sure it is a video adapter problem, not a BIOS, so the patch may apply workaround at the wrong place (be somewhat overaggressive). Created attachment 188398 [details]
verbose kernel messages with applied patch
Created attachment 188399 [details]
output of devinfo -vr with applied patch
Created attachment 188401 [details]
Difference Report of devinfo -vr output
The configuration of storage and usb devices has been changed slightly, but the onboard configuration is excatly the same as before.
Created attachment 188403 [details]
Difference Report of dmesg
Same as before - storage and usb configuration changed, everything else unchanged.
What is the path the patch has to take to get to master? This way we can gauge if we should wait or try to figure out how to apply & build the patch to current code. (In reply to mishu from comment #42) Don't know about FreeNAS, but I think the patch applies to FreeBSD HEAD. I realized my yesterdays tests as follows: I built a complete FreeNAS system from the freenas-master sources. That uses FreeBSD 11.1-STABLE as you can see in dmesg output. I pulled the all the sources and applied the attached patch. I tried the resulting ISO installation image to setup the system. Console output is now ok. Everything works fine. (In reply to Alexander Motin from comment #37) The patch is in the form of a quirk, so its effect on other hardware should be precisely zero. Any reason why it shouldn't be committed? Comment on attachment 185172 [details]
Non-verbose boot output showing more context before the hang
Have reproduced this same error on HPE Gen10 Microserver when attempting to boot from USB thumb drive to install FreeNAS11U4.
I am experiencing the same behavior (locks after pci0: <ACPI PCI bus> on pcib0) using a USB FreeBSD 11.1 boot media. My system is a HP MicroServer Gen10 (X3421). I used the workaround (hw.pci.realloc_bars=1) to get it booted, installed and running (put hw.pci.realloc_bars=1 in /boot/loader.conf). Looking forward to a patch. I hit a similar issue when I try to install FreeBSD 11.1 release (memstick image), in one of our development boards. Sometimes, I see installation hangs after the below messages pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff numa-domain 0 on acpi0 pcib0: _OSC returned error 0x10 pci0: <ACPI PCI bus> numa-domain 0 on pcib0 But, most of the time, I see a panic trace as seen in the attached image and the system reboots. I see the panic during PCI device attach (like "hpet_attach" or "xhci_pci_attach"). It's like if I disable one device in BIOS, its proceeds furthur and panics in another driver. I could see "_OSC returned error 0x10" messages here as well(but it proceeds further without hang and then panics). I tried setting hw.pci.realloc_bars=1 from boot loader prompt. It doesn't help. So, could this issue be related to the base issue reported in this bug? Created attachment 193428 [details]
Panic trace with xhci driver attach
(In reply to Rajesh from comment #48) Doesn't sound like it's related. (In reply to Bob Bishop from comment #50) Thanks for your response Bob. But I have couple of questions 1. __OSC returned error 0x10 - What does this message mean? Is it really an issue? 2. Before the panic, we see "Unable to map MSI-X table" (since bus_alloc_resource_any call fails in xhci_pci_attach). So, driver falls to alloc MSI (pci_alloc_msi), which also fails because rman_manage_region fails (in nexus_add_irq). So, does this mean PCI BAR mappings are not proper to allocate enough resources to the device? If so, what should be checked in this case? Just checked, the patch applies OK to STABLE and HEAD as of today. John: This looks fairly trivial to commit. However, that usually means I misunderstand something. I think the patch is probably fine. This looks to be a BIOS bug. The failing memory BAR for the VGA device is '0xfeb00000' with a size of 256K. However, ACPI reserves the first page of that range as a system resource (from devinfo -vr output without the patch): nexus0 ... acpi0 ... I/O memory addresses: ... 0xfeb00000-0xfeb00fff FWIW, hw.pci.realloc_bars is a bit of a conservative knob that I had considered enabling by default. I've added Warner to see what he thinks about possibly enabling this knob by default in HEAD. The current patch is probably fine to commit as-is and is probably safe to MFC as well since it is specific to that device id. A commit references this bug: Author: grog Date: Mon Dec 17 07:09:46 UTC 2018 New revision: 342160 URL: https://svnweb.freebsd.org/changeset/base/342160 Log: Work around BIOS quirks on HPE Proliant MicroServer Gen10 PR: 221350 Submitted by: Bob Bishop Reported by: Rafal Lukawiecki Reviewed by: jhb MFC after: 2 weeks Changes: head/sys/dev/pci/pci.c Committed, thanks. I don't have the hardware to test this myself, so all people affected: please test and confirm that it solves your problem. Hi, still the same problem "pci0: <ACPI PCI bus> on pcib0. I have tried 12 Stable (FreeBSD-12.0-STABLE-amd64-20181226-r342545-memstick.img) and 11.2 Stable (FreeBSD-11.2-STABLE-amd64-20181226-r342543-memstick.img) Regards, Petr (In reply to Petr from comment #57) base r342160 has only been committed head to head (CURRENT), and not yet MFC'd to stable/12 or stable/11 You'll want to try a 13.0-CURRENT snapshot @triage: re-open pending MFC's (In reply to Kubilay Kocak from comment #58) Yes, 13.0-CURRENT is working perfectly. Can I ask, will this fix get to stable/12? (In reply to Petr from comment #59) mfc-{12,11} flags have been set to ? which means they are requested/candidates, and Greg included MFC: in the commit log message, so I'd say yes at least for 12, and given the minimal scope of the changeset, expected in stable/11 as well. ^Triage: Re-open to clarify MFC status @Greg Were relevant commits merged to stable/{11,12}? If so, can you please reference them in a comment with "base r<revision>" and set the mfc-stable* flags to + The bug is still present in release 12.1. My server : HP MICROSERVER GEN10, OPTERON X3418, 8GB. Release 13.0 is ok. Looks like the change has not been MFC'd. I can work on that today. To do not use a "current" version instead a "stable" version for a little server in production (previous server is dead) and because I must reinstall the system after a bad op, I tried many versions : 11.3, 12, 12.1... without success... but I needed times and tries to install the version 13.0 again ! => kernel panic (new message), even after a cold boot... and finally after 4 or 5 attempts, the system is installed... Now the system seems to be Ok and stable. *** Bug 208357 has been marked as a duplicate of this bug. *** A commit references this bug: Author: jhb Date: Tue Dec 3 22:01:45 UTC 2019 New revision: 355359 URL: https://svnweb.freebsd.org/changeset/base/355359 Log: MFC 342160: Work around BIOS quirks on HPE Proliant MicroServer Gen10 PR: 221350 Changes: _U stable/11/ stable/11/sys/dev/pci/pci.c _U stable/12/ stable/12/sys/dev/pci/pci.c r344022 partially fixed similar bug observed on my old "Acer Aspire S7" laptop which I was able to workaround with reverting of "agp: Do not attach to Intel GEN6+" commit (r296719) But it fixed only legacy VGA mode booting. UEFI boot still hangs at the same point. Reassigned to jhb@, who committed a fix addressing this issue. This is a comment strictly for reference history. On 20191126 I did a fresh install of FreeNAS 11.2.U7 on the same type HPE Proliant Microserver. I downloaded the install image from the FreeNAS website, burned to a CD, connected an external CD drive to the server, and booted from the CD. On its own the boot from CD did not work. In order to successfully boot from CD, I had to make the following environment changes (Note: I did not go through multiple iterations of booting to see which changes were critical. Just referenced old notes and made these mods): 1. set autoboot_delay=2 2. set vfs.mountroot.timeout=80000 3. set hw.pci.realloc_bars=1 When I made these environment changes, I was able to boot from CD. Once I installed FreeNAS on the HPE server, I had to make one change to the loader.conf file in the /boot directory. I added the "set hw.pci.realloc_bars=1" string to the loader.conf file. After this change, the HPE server boots consistently. |