Created attachment 248474 [details] Installer boot kernel "loses" virtual CD-ROM drive
On HPE ProLiant DL386 Gen10 "Plus" v2 (board part no. P38409-B21, product no. P38409-B21, machines running the latest iLO 5 version 3.01 (Jan 23 2024) and BIOS version A42 v2.90 (Oct 27 2023) when attempting to install via the "virtual CD-ROM" (or "virtual floppy") the installer "loses" the CD-ROM after the kernel boots. The output is as per the image attached (sorry, no serial console working to get c&p into the ticket, my apologies).
Created attachment 248475 [details] Attempt at circumventing issue with boot flags (does not work)
A work-around is to install FreeBSD 12.4-RELEASE and then upgrade via FreeBSD 13.2-RELEASE and FreeBSD 14.0-RELEASE. Note that FreeBSD 13.2-RELEASE does _not_ install either with exactly the same issue. The last "known good" installer is 12.4-RELEASE. We did not test with either earlier versions or non -RELEASE installers.
Correction: this particular model does _not_ boot with the FreeBSD 12.4-RELEASE installer either… (it used to work with older models such as a DL360 Gen9).
We also have several DL360 Gen9 and they do work. The previous iLO version failed to boot from some HTTP servers, "python -m http.server" for instance while Apache HTTP server works. How are you serving the ISO for the new ones?
(In reply to Michael Osipov from comment #5) The same way as the others, via the iLO "virtual CD" (or "virtual floppy" for IMG files). We don't use any particular mechanism - we have tried both the "local file" (i.e. the browser becomes the "server" for the image file) and the HTTP from OpenBSD httpd, directly from a mirror (sorry… desperation), and from a Windows system via the browser.
For reference there is an HPE community post regarding a similar issue: https://community.hpe.com/t5/proliant-servers-ml-dl-sl/ilo-disconnects-when-booting-off-of-virtual-media-cd-rom-image/m-p/3730353#M50079 but this is with a _shared_ NIC for the iLO, we use a dedicated port so the issue above is not relevant (at least in theory). We added a new post on the community: https://community.hpe.com/t5/proliant-servers-ml-dl-sl/hpe-proliant-dl386-gen10-amp-quot-plus-amp-quot-v2-virtual-media/td-p/7206703 and have opened a case with HPE (case# pending).
(In reply to Arrigo Triulzi from comment #6) I bet that the iLO HTTP client is very picky. I failed to boostrap Windows DVD ISO with Python's HTTP server. You should run tcpdump on OpenBSD while serving the file. You might see a TCP RST. I wouldn't use a mirror because it is too far away. Try another server, Apache HTTPd and report. I am interested as well since I need to swap servers sooner or later.
(In reply to Michael Osipov from comment #8) OK, trying Apache on FreeBSD - same physical network, same physical switch as the iLO being used for installation. Will report.
(In reply to Michael Osipov from comment #8) While I can see your reasoning, I think there is a deeper problem with the USB emulation which the iLO provides and FreeBSD's kernel… my dmesg is full of: usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT ugen0.2: <Unknown > at usbus0 (disconnected) uhub_reattach_port: could not allocate new device usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) and the console keyboard does not work… methinks this is something on the HPE side which breaks with new(er) FreeBSD kernels.
(In reply to Arrigo Triulzi from comment #10) Yes, that could be a reason as well...
(In reply to Arrigo Triulzi from comment #9) OK, no luck with ISO served via Apache 2.4 from FreeBSD - logs are clean, no RST (not attaching image because it is the same as the others :( ).
(In reply to Arrigo Triulzi from comment #12) It was worth a try. Do other OSes work? Did you try older version like 11 since 12 does not work? If not then their USB emulation has either changed or is broken.
Created attachment 248484 [details] Slightly different USB error message with IMG file This one is the IMG file being served from Apache 2.4 (from FreeBSD packages) from a FreeBSD 13.2-RELEASE host on the same subnet and same switch as the iLO being installed. No errors on the Apache side, last log entries before fail are: 192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 4096 192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 4096 192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 2048 192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 4096 192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 32768 192.168.54.192 - - [15/Feb/2024:14:10:33 +0000] "GET /FreeBSD-14.0-RELEASE-amd64-memstick.img HTTP/1.1" 206 4096
(In reply to Michael Osipov from comment #13) Trying Linux but I think it is going to work because the installer is loaded after the kernel boots and there is no intermediate loader - it is GRUB then kernel then installer so if GRUB is happy and the kernel is happy the installer goes off whereas here it seems like the USB is "lost" before the kernel is booted. It is honestly an interesting problem because I cannot see what "disconnects" the USB or tells the USB to disconnect. The additional fact that the keyboard disconnects too seems to point to a USB issue of some sort. I'll see if we can grab an 11 image… I was surprised 12 failed as that was the "fix" for the Gen 9 I had.
(In reply to Arrigo Triulzi from comment #14) Those are range requests with partial responses (206). Do you see anything not having status 206? You see the block sizes it is downloading (4 KiB, 32 KiB). Can you count after how many bytes the requests stop? Maybe a minimal image is also worth a try...
(In reply to Arrigo Triulzi from comment #15) Try Windows 10, 11, Server as well, for the record.
(In reply to Michael Osipov from comment #16) Everything in the log is 206 and the total bytes are 1389862656 for IMG and 1171579332 for ISO. Trying a minimal image now, Linux afterwards and 11 after that (that's the current working queue).
(In reply to Arrigo Triulzi from comment #18) That is good then that means that the images could be streamed properly. One headache less.
Created attachment 248485 [details] FreeBSD 14.0-RELEASE mini-memstick console output So, …mini-memstick is different! Not sure how to interpret this to be honest.
Created attachment 248491 [details] FreeBSD 11.4-RELEASE ISO booting on HPE ProLiant This one (FreeBSD 11.4-RELEASE) is, again, slightly different but hangs too…
Another data point HPE ProLiant DL345 Gen11 has the same problem so it seems to be something in the iLO behaviour which has changed, sadly.
Linux (Ubuntu 22.04 LTS "server") boots and gets to the installer just fine.
Final test: OpenBSD 7.4, install74.img or install74.iso don't even boot.
For reference the iLO 5 manual (latest version): https://support.hpe.com/hpesc/public/docDisplay?docId=a00105236en_us It mentions clearly that the USB is UHCI but also, hidden away in the power settings… https://support.hpe.com/hpesc/public/docDisplay?docId=a00105236en_us&page=GUID-6099A408-6792-431B-B947-A3BB73E49F1B.html Enable persistent mouse and keyboard * Enabled — The iLO virtual keyboard and mouse are always connected to the iLO UHCI USB controller. * Disabled (default) — The iLO virtual keyboard and mouse are connected dynamically to the iLO UHCI controller only when a remote console application is open and connected to iLO. When this feature is disabled, some servers are able to increase power savings by 15 watts when: * The server OS is idle. * No virtual USB keyboard and mouse are connected. I am wondering if this might be relevant seeing as we saw those disconnects on the dmesg output.
(In reply to Arrigo Triulzi from comment #25) Maybe this is a bug in FreeBSD never surfaced before...
(In reply to Michael Osipov from comment #26) I would have agreed, at least partially, if OpenBSD had managed to boot as it uses a different bootloader, etc. but it doesn't. I'm still in the HPE UHCI emulation has a problem. We might be able to find a work-around but it seems a bit peculiar that it only happens on HPE iLO5 systems.
(In reply to Arrigo Triulzi from comment #27) What about Windows?
(In reply to Michael Osipov from comment #28) Windows 11 installer boots but we didn't have a day for the installation to complete… ;)
I'm seeing a similar issue on a DL385. In my case, the CD image doesn't disappear, but the kernel spews a stream of: usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 4 failed, USB_ERR_TIMEOUT ugen0.4: <Unknown > at usbus0 (disconnected) uhub_reattach_port: could not allocate new device Note that I suspect there is a different issue using the "Virtual Floppy" device with a memstick image file.
(In reply to Chuck Tuffli from comment #30) I believe the: ugen0.4: <Unknown > at usbus0 (disconnected) uhub_reattach_port: could not allocate new device is where the problem lies. As I mentioned elsewhere there is an iLO5 power saving setting which smells relevant even though it only talks about the USB HID (https://support.hpe.com/hpesc/public/docDisplay?docId=a00105236en_us&page=GUID-6099A408-6792-431B-B947-A3BB73E49F1B.html). I am going to try this today.
For reference: the HPE UEFI BIOS manual https://support.hpe.com/hpesc/public/docDisplay?docId=sd00001068en_us&page=GUID-0F514002-9AE6-41F1-9005-1B910268FFD0.html Went through it with a fine-toothed comb (i.e. read every page) and there is nothing which obviously applies to the USB connect/disconnect issue.
Interesting regression with the BIOS versions on DL385 Gen10 Plus v2: * iLO5 2.98, BIOS 2.84_08-17-2023 (https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_b9044fffa7404e82b45bd5a84f) - iLO5 HTML5 console works correctly, was installed with the FreeBSD 12.3-RELEASE image via virtual CD-ROM and upgraded to FreeBSD 13.2-RELEASE via freebsd-upgrade * iLO 2.99, BIOS 2.90_10-27-2023 https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_e87fa7295f974fa6ae1d1303fe - iLO5 HTML5 console does _not_ work correctly (no keyboard, no mouse, errors as detailed in comment 10 above. Installed with FreeBSD 12.3-RELEASE image via virtual CD-ROM and upgraded to FreeBSD 13.2-RELEASE via freebsd-upgrade * iLO 3.01, 3.00_1-26-2024 https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_f3ae8dc2ee8a40af9b8f6db1c6 - iLO5 HTML5 console unknown (we don't have a FreeBSD install), unable to install via virtual CD-ROM. Obviously the HPE ChangeLogs make it sound like nothing has changed except AMD microcode stuff… iLO5 ChangeLog to 3.01: https://support.hpe.com/connect/s/softwaredetails?language=en_US&softwareId=MTX_d300241929344f2191fa7966d8&tab=revisionHistory (again, nothing of any obvious use).
(In reply to Arrigo Triulzi from comment #33) Updated the machine with iLO 2.99 and BIOS 2.90_10_27 to iLO 3.01 and BIOS 3.00_01_26 and the behaviour is confirmed: * will not boot a FreeBSD image of any version * console does not work with repeated USB error messages Captured the boot: ivhd3: supported paging level:7, will use only: 4 ivhd3: device 10x8 - 0x3ffel config:0 ivhd3: device Laxff00 - 0xffff] config:0 ivhd3: PCI cap 0x190b640f@0x40 feature:19<101LB, EFR,CapExt> Starting powerd. Security policy loaded: MAC/ntpd (mac_ntpd) Starting ntpd. Mounting late filesystems:. Starting sendmail_submit. Starting sendmail_msp-queue. Performing sanity check on ssho configuration. Starting sshd. Configuring vt: keymap blanktime. Starting cron. Starting background file system checks in 60 seconds. * CITOIC] starting jails... usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) + Setting RCTL props + Setting RCTL props usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) + Setting RCTL props Fri Feb 16 10:48:49 UTC 2024 FreeBSD/amd64 (ops-1) (ttyva) login: usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 2 failed, ugen0.2: ‹Unknown › at usbus (disconnected) lescriptor at addr 2 failed, USB ERE TIMEOUT USB_ERR_TIMEOUT uhub_reattach_port: could not allocate new device usb_alloc_device: set address 2 failed (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) usbd_setup_device_desc: getting device descriptor at addr 2 failed, USB_ERR_TIMEOUT usbd_req_re_enumerate: addr=2, set address failed! (USB_ERR_TIMEOUT, ignored) [repeats continuously]
(In reply to Arrigo Triulzi from comment #34) So HPE did break the firmware here? Can you report this?
(In reply to Michael Osipov from comment #35) Well… I am going to try to report it but I suspect HPE's answer is going to be that it "works with Windows and Linux". This might need pushing within HPE if anyone has contacts, I'll definitely report it via commercial support channels. At least we have a clear path to perdition through the iLO and BIOS versions, now I wonder if I can regress to the older versions. Worth a try.
(In reply to Arrigo Triulzi from comment #36) True or it requires a fix in FreeBSD...thanks for searching the needle in the haystack.
(In reply to Michael Osipov from comment #37) Least I could do, you were all constructive and helpful. I am downgrading the iLO5 from 3.01 to 2.98 to see if that fixes (at least) the HTML5 console. That would be a good win and a further confirmation.
(In reply to Arrigo Triulzi from comment #38) Oh this is bad… downgrading to 2.98 does _not_ fix the console. I might have to hit the BIOS too.
For completeness: I have tried going through the whole "Intelligent Provisioning", painfully… it boots a Linux variant which then uses, I presume, ipmitools to speak to the Redfish interface and "do things". Bottom line: you can't install FreeBSD that way either because it is designed for RH Linux and Windows. This allows me to go back to HPE and say "I tried everything."
Additional data point: booting in Legacy Mode from a physical USB stick on iLO5 v3.01 and BIOS v3.00_1-26-2024 I do get a functional keyboard which, at least, allows you to install…
Similar issue: https://forums.freebsd.org/threads/freebsd-13-2-install-boot-issues-on-a-dell-pe-r430-server-hardware.89989/
Additional data point: if we boot with hw.usb.debug=-1 we get to keep a working console after booting from a USB stick (no change with the virtual devices).
Further comments from my team working on the problem: I think that's what's hitting us, as the virtual drives (Floppy/img, CD/iso) are mounted as a USB3 device. Tried with a mounted iso, img and http but the result is the same. Tried out a bunch of kernel options but nothing made the mounted device appear for boot. Managed to get to mountroot with a working keyboard, but none of the options make the virtual device stick. Using ? at mountroot only shows the disks. Went through BIOS and iLO settings again but there is nothing to tweak that might make a difference. Tried quite a few of the setting (and combinations of them) from the below. https://man.freebsd.org/cgi/man.cgi?query=xhci https://man.freebsd.org/cgi/man.cgi?query=uhci https://man.freebsd.org/cgi/man.cgi?query=ohci https://man.freebsd.org/cgi/man.cgi?query=ehci amongst the tried options (taken from various bug reports and suggestions): set hw.usb.xhci.dcepquirk=1 set hw.mfi.mrsas_enable="1" set hw.usb.xhci.xhci_port_route="-1" set debug.acpi.disabled="hostres" set hw.pci.realloc_bars="1" hw.usb.no_shutdown_wait=1 hw.usb.xhci.no_hs=1
There also appears to be a note regarding Linux having the same issue on the HPE community website - no solution there either… https://community.hpe.com/t5/proliant-servers-ml-dl-sl/usb-ilo-problem-on-dl320-g4/td-p/3759010