Bug 277062 - FreeBSD 14.0-RELEASE installer unable to install on HPE ProLiant DL385 Gen10v2 and HPE ProLiant DL345 Gen11
Summary: FreeBSD 14.0-RELEASE installer unable to install on HPE ProLiant DL385 Gen10v...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 14.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-02-15 08:29 UTC by Arrigo Triulzi
Modified: 2024-04-04 11:49 UTC (History)
5 users (show)

See Also:


Attachments
Installer boot kernel "loses" virtual CD-ROM drive (211.18 KB, image/png)
2024-02-15 08:29 UTC, Arrigo Triulzi
no flags Details
Attempt at circumventing issue with boot flags (does not work) (44.36 KB, image/png)
2024-02-15 08:38 UTC, Arrigo Triulzi
no flags Details
Slightly different USB error message with IMG file (94.11 KB, image/png)
2024-02-15 14:14 UTC, Arrigo Triulzi
no flags Details
FreeBSD 14.0-RELEASE mini-memstick console output (179.71 KB, image/png)
2024-02-15 14:39 UTC, Arrigo Triulzi
no flags Details
FreeBSD 11.4-RELEASE ISO booting on HPE ProLiant (150.53 KB, image/png)
2024-02-15 15:05 UTC, Arrigo Triulzi
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Arrigo Triulzi 2024-02-15 08:29:15 UTC
Created attachment 248474 [details]
Installer boot kernel "loses" virtual CD-ROM drive
Comment 1 Arrigo Triulzi 2024-02-15 08:37:58 UTC
On HPE ProLiant DL386 Gen10 "Plus" v2 (board part no. P38409-B21, product no. P38409-B21, machines running the latest iLO 5 version 3.01 (Jan 23 2024) and BIOS version A42 v2.90 (Oct 27 2023) when attempting to install via the "virtual CD-ROM" (or "virtual floppy") the installer "loses" the CD-ROM after the kernel boots.

The output is as per the image attached (sorry, no serial console working to get c&p into the ticket, my apologies).
Comment 2 Arrigo Triulzi 2024-02-15 08:38:34 UTC
Created attachment 248475 [details]
Attempt at circumventing issue with boot flags (does not work)
Comment 3 Arrigo Triulzi 2024-02-15 08:39:58 UTC
A work-around is to install FreeBSD 12.4-RELEASE and then upgrade via FreeBSD 13.2-RELEASE and FreeBSD 14.0-RELEASE.

Note that FreeBSD 13.2-RELEASE does _not_ install either with exactly the same issue. The last "known good" installer is 12.4-RELEASE.

We did not test with either earlier versions or non -RELEASE installers.
Comment 4 Arrigo Triulzi 2024-02-15 11:37:43 UTC
Correction: this particular model does _not_ boot with the FreeBSD 12.4-RELEASE installer either… (it used to work with older models such as a DL360 Gen9).
Comment 5 Michael Osipov freebsd_committer freebsd_triage 2024-02-15 13:16:33 UTC
We also have several DL360 Gen9 and they do work. The previous iLO version failed to boot from some HTTP servers, "python -m http.server" for instance while Apache HTTP server works.

How are you serving the ISO for the new ones?
Comment 6 Arrigo Triulzi 2024-02-15 13:26:20 UTC
(In reply to Michael Osipov from comment #5)

The same way as the others, via the iLO "virtual CD" (or "virtual floppy" for IMG files). We don't use any particular mechanism - we have tried both the "local file" (i.e. the browser becomes the "server" for the image file) and the HTTP from OpenBSD httpd, directly from a mirror (sorry… desperation), and from a Windows system via the browser.
Comment 7 Arrigo Triulzi 2024-02-15 13:28:38 UTC
For reference there is an HPE community post regarding a similar issue:

https://community.hpe.com/t5/proliant-servers-ml-dl-sl/ilo-disconnects-when-booting-off-of-virtual-media-cd-rom-image/m-p/3730353#M50079

but this is with a _shared_ NIC for the iLO, we use a dedicated port so the issue above is not relevant (at least in theory).

We added a new post on the community:

https://community.hpe.com/t5/proliant-servers-ml-dl-sl/hpe-proliant-dl386-gen10-amp-quot-plus-amp-quot-v2-virtual-media/td-p/7206703

and have opened a case with HPE (case# pending).
Comment 8 Michael Osipov freebsd_committer freebsd_triage 2024-02-15 13:30:35 UTC
(In reply to Arrigo Triulzi from comment #6)

I bet that the iLO HTTP client is very picky. I failed to boostrap Windows DVD ISO with Python's HTTP server. You should run tcpdump on OpenBSD while serving the file. You might see a TCP RST. I wouldn't use a mirror because it is too far away. Try another server, Apache HTTPd and report. I am interested as well since I need to swap servers sooner or later.
Comment 9 Arrigo Triulzi 2024-02-15 13:34:54 UTC
(In reply to Michael Osipov from comment #8)
OK, trying Apache on FreeBSD - same physical network, same physical switch as the iLO being used for installation. Will report.
Comment 10 Arrigo Triulzi 2024-02-15 13:47:16 UTC
(In reply to Michael Osipov from comment #8)
While I can see your reasoning, I think there is a deeper problem with the USB emulation which the iLO provides and FreeBSD's kernel… my dmesg is full of:

usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored)
usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT
usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored)
usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT
usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored)
usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT
usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored)
usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT
usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored)
usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT
ugen0.2: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored)
usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT
usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored)

and the console keyboard does not work… methinks this is something on the HPE side which breaks with new(er) FreeBSD kernels.
Comment 11 Michael Osipov freebsd_committer freebsd_triage 2024-02-15 14:00:56 UTC
(In reply to Arrigo Triulzi from comment #10)

Yes, that could be a reason as well...
Comment 12 Arrigo Triulzi 2024-02-15 14:09:07 UTC
(In reply to Arrigo Triulzi from comment #9)
OK, no luck with ISO served via Apache 2.4 from FreeBSD - logs are clean, no RST (not attaching image because it is the same as the others :( ).
Comment 13 Michael Osipov freebsd_committer freebsd_triage 2024-02-15 14:11:54 UTC
(In reply to Arrigo Triulzi from comment #12)

It was worth a try. Do other OSes work? Did you try older version like 11 since 12 does not work? If not then their USB emulation has either changed or is broken.
Comment 14 Arrigo Triulzi 2024-02-15 14:14:49 UTC
Created attachment 248484 [details]
Slightly different USB error message with IMG file

This one is the IMG file being served from Apache 2.4 (from FreeBSD packages) from a FreeBSD 13.2-RELEASE host on the same subnet and same switch as the iLO being installed.

No errors on the Apache side, last log entries before fail are:

192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 4096
192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 4096
192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 2048
192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 4096
192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 32768
192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 4096
Comment 15 Arrigo Triulzi 2024-02-15 14:17:16 UTC
(In reply to Michael Osipov from comment #13)
Trying Linux but I think it is going to work because the installer is loaded after the kernel boots and there is no intermediate loader - it is GRUB then kernel then installer so if GRUB is happy and the kernel is happy the installer goes off whereas here it seems like the USB is "lost" before the kernel is booted. It is honestly an interesting problem because I cannot see what "disconnects" the USB or tells the USB to disconnect.  The additional fact that the keyboard disconnects too seems to point to a USB issue of some sort.

I'll see if we can grab an 11 image… I was surprised 12 failed as that was the "fix" for the Gen 9 I had.
Comment 16 Michael Osipov freebsd_committer freebsd_triage 2024-02-15 14:19:07 UTC
(In reply to Arrigo Triulzi from comment #14)

Those are range requests with partial responses (206). Do you see anything not having status 206? You see the block sizes it is downloading (4 KiB, 32 KiB). Can you count after how many bytes the requests stop? Maybe a minimal image is also worth a try...
Comment 17 Michael Osipov freebsd_committer freebsd_triage 2024-02-15 14:20:10 UTC
(In reply to Arrigo Triulzi from comment #15)

Try Windows 10, 11, Server as well, for the record.
Comment 18 Arrigo Triulzi 2024-02-15 14:26:22 UTC
(In reply to Michael Osipov from comment #16)
Everything in the log is 206 and the total bytes are 1389862656 for IMG and 1171579332 for ISO.

Trying a minimal image now, Linux afterwards and 11 after that (that's the current working queue).
Comment 19 Michael Osipov freebsd_committer freebsd_triage 2024-02-15 14:27:46 UTC
(In reply to Arrigo Triulzi from comment #18)

That is good then that means that the images could be streamed properly. One headache less.
Comment 20 Arrigo Triulzi 2024-02-15 14:39:01 UTC
Created attachment 248485 [details]
FreeBSD 14.0-RELEASE mini-memstick console output

So, …mini-memstick is different! Not sure how to interpret this to be honest.
Comment 21 Arrigo Triulzi 2024-02-15 15:05:41 UTC
Created attachment 248491 [details]
FreeBSD 11.4-RELEASE ISO booting on HPE ProLiant

This one (FreeBSD 11.4-RELEASE) is, again, slightly different but hangs too…
Comment 22 Arrigo Triulzi 2024-02-15 15:21:54 UTC
Another data point HPE ProLiant DL345 Gen11 has the same problem so it seems to be something in the iLO behaviour which has changed, sadly.
Comment 23 Arrigo Triulzi 2024-02-15 15:28:36 UTC
Linux (Ubuntu 22.04 LTS "server") boots and gets to the installer just fine.
Comment 24 Arrigo Triulzi 2024-02-15 15:49:39 UTC
Final test: OpenBSD 7.4, install74.img or install74.iso don't even boot.
Comment 25 Arrigo Triulzi 2024-02-15 16:08:48 UTC
For reference the iLO 5 manual (latest version): https://support.hpe.com/hpesc/public/docDisplay?docId=a00105236en_us

It mentions clearly that the USB is UHCI but also, hidden away in the power settings…

https://support.hpe.com/hpesc/public/docDisplay?docId=a00105236en_us&page=GUID-6099A408-6792-431B-B947-A3BB73E49F1B.html

Enable persistent mouse and keyboard
* Enabled — The iLO virtual keyboard and mouse are always connected to the iLO UHCI USB controller.
* Disabled (default) — The iLO virtual keyboard and mouse are connected dynamically to the iLO UHCI controller only when a remote console application is open and connected to iLO.

When this feature is disabled, some servers are able to increase power savings by 15 watts when:
* The server OS is idle.
* No virtual USB keyboard and mouse are connected.

I am wondering if this might be relevant seeing as we saw those disconnects on the dmesg output.
Comment 26 Michael Osipov freebsd_committer freebsd_triage 2024-02-15 17:14:16 UTC
(In reply to Arrigo Triulzi from comment #25)

Maybe this is a bug in FreeBSD never surfaced before...
Comment 27 Arrigo Triulzi 2024-02-15 17:16:38 UTC
(In reply to Michael Osipov from comment #26)
I would have agreed, at least partially, if OpenBSD had managed to boot as it uses a different bootloader, etc. but it doesn't. I'm still in the HPE UHCI emulation has a problem. We might be able to find a work-around but it seems a bit peculiar that it only happens on HPE iLO5 systems.
Comment 28 Michael Osipov freebsd_committer freebsd_triage 2024-02-15 17:20:04 UTC
(In reply to Arrigo Triulzi from comment #27)

What about Windows?
Comment 29 Arrigo Triulzi 2024-02-15 17:30:05 UTC
(In reply to Michael Osipov from comment #28)
Windows 11 installer boots but we didn't have a day for the installation to complete… ;)
Comment 30 Chuck Tuffli freebsd_committer freebsd_triage 2024-02-16 01:16:32 UTC
I'm seeing a similar issue on a DL385. In my case, the CD image doesn't disappear, but the kernel spews a stream of:

usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_TIMEOUT
usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_TIMEOUT, ignored)
usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_TIMEOUT
usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_TIMEOUT, ignored)
usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_TIMEOUT
usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_TIMEOUT, ignored)
usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_TIMEOUT
usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_TIMEOUT, ignored)
usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_TIMEOUT
ugen0.4: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device

Note that I suspect there is a different issue using the "Virtual Floppy" device with a memstick image file.
Comment 31 Arrigo Triulzi 2024-02-16 08:00:00 UTC
(In reply to Chuck Tuffli from comment #30)
I believe the:

ugen0.4: <Unknown > at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device

is where the problem lies. As I mentioned elsewhere there is an iLO5 power saving setting which smells relevant even though it only talks about the USB HID (https://support.hpe.com/hpesc/public/docDisplay?docId=a00105236en_us&page=GUID-6099A408-6792-431B-B947-A3BB73E49F1B.html).

I am going to try this today.
Comment 32 Arrigo Triulzi 2024-02-16 08:10:14 UTC
For reference: the HPE UEFI BIOS manual https://support.hpe.com/hpesc/public/docDisplay?docId=sd00001068en_us&page=GUID-0F514002-9AE6-41F1-9005-1B910268FFD0.html

Went through it with a fine-toothed comb (i.e. read every page) and there is nothing which obviously applies to the USB connect/disconnect issue.
Comment 33 Arrigo Triulzi 2024-02-16 10:04:23 UTC
Interesting regression with the BIOS versions on DL385 Gen10 Plus v2:

* iLO5 2.98, BIOS 2.84_08-17-2023 (https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_b9044fffa7404e82b45bd5a84f) - iLO5 HTML5 console works correctly, was installed with the FreeBSD 12.3-RELEASE image via virtual CD-ROM and upgraded to FreeBSD 13.2-RELEASE via freebsd-upgrade

* iLO 2.99, BIOS 2.90_10-27-2023 https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_e87fa7295f974fa6ae1d1303fe - iLO5 HTML5 console does _not_ work correctly (no keyboard, no mouse, errors as detailed in comment 10 above. Installed with FreeBSD 12.3-RELEASE image via virtual CD-ROM and upgraded to FreeBSD 13.2-RELEASE via freebsd-upgrade

* iLO 3.01, 3.00_1-26-2024 https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_f3ae8dc2ee8a40af9b8f6db1c6 - iLO5 HTML5 console unknown (we don't have a FreeBSD install), unable to install via virtual CD-ROM.

Obviously the HPE ChangeLogs make it sound like nothing has changed except AMD microcode stuff… 

iLO5 ChangeLog to 3.01: https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_d300241929344f2191fa7966d8&tab=revisionHistory

(again, nothing of any obvious use).
Comment 34 Arrigo Triulzi 2024-02-16 10:55:01 UTC
(In reply to Arrigo Triulzi from comment #33)
Updated the machine with iLO 2.99 and BIOS 2.90_10_27 to iLO 3.01 and BIOS 3.00_01_26 and the behaviour is confirmed:

* will not boot a FreeBSD image of any version
* console does not work with repeated USB error messages

Captured the boot:

ivhd3: supported paging level:7, will use only: 4
ivhd3: device 10x8 - 0x3ffel config:0
ivhd3: device Laxff00 - 0xffff] config:0
ivhd3: PCI cap 0x190b640f@0x40 feature:19<101LB, EFR,CapExt>
Starting powerd.
Security policy loaded: MAC/ntpd (mac_ntpd)
Starting ntpd.
Mounting late filesystems:.
Starting sendmail_submit.
Starting sendmail_msp-queue.
Performing sanity check on ssho configuration.
Starting sshd.
Configuring vt: keymap blanktime.
Starting cron.
Starting background file system checks in 60 seconds.
* CITOIC] starting jails...
usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored)
+ Setting RCTL props + Setting RCTL props
usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored)
+ Setting RCTL props

Fri Feb 16 10:48:49 UTC 2024

FreeBSD/amd64 (ops-1) (ttyva)

login: usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 2 failed, ugen0.2: ‹Unknown › at usbus (disconnected)
lescriptor at addr 2 failed, USB ERE TIMEOUT
USB_ERR_TIMEOUT
uhub_reattach_port: could not allocate new device usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored)
usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 2 failed,
USB_ERR_TIMEOUT
usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored)
[repeats continuously]
Comment 35 Michael Osipov freebsd_committer freebsd_triage 2024-02-16 10:58:44 UTC
(In reply to Arrigo Triulzi from comment #34)

So HPE did break the firmware here? Can you report this?
Comment 36 Arrigo Triulzi 2024-02-16 11:03:54 UTC
(In reply to Michael Osipov from comment #35)
Well… I am going to try to report it but I suspect HPE's answer is going to be that it "works with Windows and Linux". This might need pushing within HPE if anyone has contacts, I'll definitely report it via commercial support channels.

At least we have a clear path to perdition through the iLO and BIOS versions, now I wonder if I can regress to the older versions. Worth a try.
Comment 37 Michael Osipov freebsd_committer freebsd_triage 2024-02-16 11:06:00 UTC
(In reply to Arrigo Triulzi from comment #36)

True or it requires a fix in FreeBSD...thanks for searching the needle in the haystack.
Comment 38 Arrigo Triulzi 2024-02-16 11:07:42 UTC
(In reply to Michael Osipov from comment #37)
Least I could do, you were all constructive and helpful. I am downgrading the iLO5 from 3.01 to 2.98 to see if that fixes (at least) the HTML5 console. That would be a good win and a further confirmation.
Comment 39 Arrigo Triulzi 2024-02-16 11:19:21 UTC
(In reply to Arrigo Triulzi from comment #38)
Oh this is bad… downgrading to 2.98 does _not_ fix the console. I might have to hit the BIOS too.
Comment 40 Arrigo Triulzi 2024-02-16 14:35:18 UTC
For completeness: I have tried going through the whole "Intelligent Provisioning", painfully… it boots a Linux variant which then uses, I presume, ipmitools to speak to the Redfish interface and "do things". Bottom line: you can't install FreeBSD that way either because it is designed for RH Linux and Windows.
This allows me to go back to HPE and say "I tried everything."
Comment 41 Arrigo Triulzi 2024-02-16 18:00:59 UTC
Additional data point: booting in Legacy Mode from a physical USB stick on iLO5 v3.01 and BIOS v3.00_1-26-2024 I do get a functional keyboard which, at least, allows you to install…
Comment 43 Arrigo Triulzi 2024-02-19 15:25:40 UTC
Additional data point: if we boot with

hw.usb.debug=-1

we get to keep a working console after booting from a USB stick (no change with the virtual devices).
Comment 44 Arrigo Triulzi 2024-02-23 18:13:46 UTC
Further comments from my team working on the problem:

I think that's what's hitting us, as the virtual drives (Floppy/img, CD/iso) are mounted as a USB3 device.
Tried with a mounted iso, img and http but the result is the same.
Tried out a bunch of kernel options but nothing made the mounted device appear for boot.
Managed to get to mountroot with a working keyboard, but none of the options make the virtual device stick. Using ? at mountroot only shows the disks.
Went through BIOS and iLO settings again but there is nothing to tweak that might make a difference.

Tried quite a few of the setting (and combinations of them) from the below.
https://man.freebsd.org/cgi/man.cgi?query=xhci
https://man.freebsd.org/cgi/man.cgi?query=uhci
https://man.freebsd.org/cgi/man.cgi?query=ohci
https://man.freebsd.org/cgi/man.cgi?query=ehci

amongst the tried options (taken from various bug reports and suggestions):

set hw.usb.xhci.dcepquirk=1
set hw.mfi.mrsas_enable="1"
set hw.usb.xhci.xhci_port_route="-1"
set debug.acpi.disabled="hostres"
set hw.pci.realloc_bars="1"
hw.usb.no_shutdown_wait=1
hw.usb.xhci.no_hs=1
Comment 45 Arrigo Triulzi 2024-04-04 11:38:04 UTC
There also appears to be a note regarding Linux having the same issue on the HPE community website - no solution there either…

https://community.hpe.com/t5/proliant-servers-ml-dl-sl/usb-ilo-problem-on-dl320-g4/td-p/3759010
Comment 46 Arrigo Triulzi 2024-04-04 11:38:20 UTC
There also appears to be a note regarding Linux having the same issue on the HPE community website - no solution there either…

https://community.hpe.com/t5/proliant-servers-ml-dl-sl/usb-ilo-problem-on-dl320-g4/td-p/3759010