Bug 241118 - [boot] 12.1-BETA3 installer hangs before loader menu
Summary: [boot] 12.1-BETA3 installer hangs before loader menu
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks: 240700
  Show dependency treegraph
 
Reported: 2019-10-07 18:57 UTC by Ryan Moeller
Modified: 2019-12-02 22:27 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Moeller 2019-10-07 18:57:16 UTC
FreeBSD 12.0-RELEASE disc1.iso written to a USB stick is able to boot on a FreeNAS Mini XL+, but the 12.1-BETA3 amd64 disc1.iso hangs in the spinner after printing "Consoles: EFI console" and before the loader menu is displayed. It does this on any USB port and with extra features such as EFI network stack disabled in firmware.
Doing a legacy boot, the screen is cleared nothing gets displayed, not even the spinner. 
I tried the snapshot 12.1-PRERELEASE 20190906 r351916 disc1.iso and it fails as well, so the break must have been introduced before that.
Comment 1 Ed Maste freebsd_committer 2019-10-08 14:42:47 UTC
Are you able to try a few other snapshots to narrow down the breakage?
Comment 2 Ryan Moeller 2019-10-08 14:57:39 UTC
(In reply to Ed Maste from comment #1)
That was the earliest snapshot I could find on download.freebsd.org. Is there some place I can find older snapshots?
Comment 3 Ed Maste freebsd_committer 2019-10-08 15:01:01 UTC
(In reply to Ryan Moeller from comment #2)
There are also CI snapshots available at
https://artifact.ci.freebsd.org/snapshot/stable-12/
Comment 4 Ryan Moeller 2019-10-08 15:26:47 UTC
(In reply to Ed Maste from comment #3)
Thanks, but I don't see install CD images there. I found FreeBSD-12.0-STABLE-amd64-20190411-r346111-mini-memstick.img in my collection so I'll give that a try.
Comment 5 Ryan Moeller 2019-10-08 15:30:31 UTC
(In reply to Ryan Moeller from comment #4)
I also found FreeBSD-12.0-STABLE-amd64-20190425-r346638-disc1.iso on another server, will try that as well.
Comment 6 Ryan Moeller 2019-10-08 15:37:46 UTC
(In reply to Ryan Moeller from comment #5)
r346638 works.
Comment 7 Ed Maste freebsd_committer 2019-10-08 15:56:17 UTC
(In reply to Ryan Moeller from comment #4)
I believe you should be able to uncompress one of the disk.img.xz images and write to a USB stick to test booting (assuming that it's a general problem in the loader, and not something specific to the install images).
Comment 8 Ryan Moeller 2019-10-08 16:54:18 UTC
(In reply to Ed Maste from comment #7)
Ok, I had to dig through a few folders to find one of those.

r347048
Blank loader screen, but after a little wait the kernel boots with a very low contrast font color. With the other images I have tested, the system would simply reset after sitting for a while.

r348988
Same; blank screen for a while then dim font for kernel.

r350956
All black, no visible activity. System doesn't reset (or I didn't wait long enough), but that might be a difference between the UEFI booted CD image and the legacy booted CI disk images.

I'll work on narrowing it down between r348988 and r350956.
Comment 9 Ryan Moeller 2019-10-09 15:19:41 UTC
r348508 - Blank screen for a while then dim font for kernel.
r348524 - All black, no visible activity.
Comment 10 Ryan Moeller 2019-10-09 16:17:43 UTC
None of the commits to stable/12 between r348508 and r348524 are anywhere near the loader, and the behavior flip-flops again later, so I think both behaviors must be the same problem. I'll have to look even further back.
Comment 11 Ryan Moeller 2019-10-09 19:07:19 UTC
r346844 - No loader, kernel visible
r346774 - No loader, kernel visible

On a hunch I tried the r346638 image from CI and that doesn't boot, it's just a blank screen, even though the disc1.iso for that revision did work. The r340154 image from CI is about as far back as history goes there, and it falls in the "doesn't show anything until the kernel" bucket.

I tried the CI image for r346638 again on a different USB drive and this time it still didn't show the loader, but the kernel did boot with a faint font. Same issue again I guess.

I tested the r346638 disc1.iso I had again, it turns out that UEFI booting works correctly, but legacy booting it exhibits the same problems as the CI images all have (they're all legacy boot only). That's a relief, for a minute I was worried the CI images had the console set to comconsole or something. I confirmed there is no loader.conf or boot.config forcing weird settings on the CI images, to be sure.

So vidconsole at least has been broken on stable/12 for quite a while it seems, but that must not be related to the UEFI boot issue.

Fun fun fun. I can't really narrow it down any further without starting to build my own images. I do need to do other things with this machine for the rest of the day, so that will have to wait.
Comment 12 Ed Maste freebsd_committer 2019-10-09 19:09:09 UTC
(In reply to Ryan Moeller from comment #11)
Thank you for all of the effort so far in trying to track this down. Unfortunately I have some travel coming up and won't be able to look at it in detail but hopefully either you or someone else will be able to chase details down before the release.
Comment 13 Ryan Moeller 2019-10-10 19:46:36 UTC
I found the issue I was having with vidconsole. It was an incorrect setting in the BIOS: [Advanced > PCIe/PCI/PnP Configuration > Onboard Video OPROM] was set to EFI instead of legacy. With it set to legacy I can see the boot text now. It doesn't fix the original issue though.

For sanity checking (legacy booting images from CI),

r353390.img (latest HEAD) - text visible

Consoles: internal video/keyboard
BIOS drive C: is disk0
BIOS drive D: is disk1ersion is 1.02 (looks like we're missing a screen clear)
BIOS drive E: is disk2
...
BIOS drive M: is disk10
BIOS drive N: is disk11
|

The spinner twiddled for a while then the system hung here. No loader menu.

r353385.img (latest stable/12) - text visible but stalls before menu (same as above, but without the glitch on the drive D: line)


r340154.img (earliest stable/12) - works correctly
r348988.img - works correctly
r351206.img - works correctly
r352298.img - stalls before loader menu

Now this feels like some progress. I'll narrow it down between r351206 and r352298 next.
Comment 14 Ryan Moeller 2019-10-10 21:33:01 UTC
r351752 - stalls
r351504 - stalls
r351358 - works

r351426 - stalls
r351390 - stalls
r351384 - stalls

There are no amd64 images in the CI between r351358 and r351384.

r351384 is a commit to stand/ so I'll try reverting that and building an image to test tomorrow.
Comment 15 Ryan Moeller 2019-10-16 00:28:28 UTC
I built release.iso on releng/12.1 and confirmed it stalls before the loader menu. Then I reverted r351384 and did another build. The second installer boots successfully.
Comment 16 Toomas Soome freebsd_committer 2019-10-16 06:05:27 UTC
(In reply to Ryan Moeller from comment #15)

Have you attempted boot from current? I wonder if this is something we have fixed already but not merged to 12...

In any case, I'll check over, it will take a bit time.
Comment 17 Ryan Moeller 2019-10-16 13:14:45 UTC
(In reply to Toomas Soome from comment #16)
It is probably broken on HEAD too. The boot fails in the same way. I'll build a test image with the appropriate commit reverted today to confirm it is the same issue.
Comment 18 Ryan Moeller 2019-10-18 13:43:46 UTC
I built an iso from HEAD at r353681 and confirmed the boot stalls, then reverted r350825 and r350772 and built a new iso, which successfully boots.
Comment 19 Toomas Soome freebsd_committer 2019-10-18 13:55:12 UTC
(In reply to Ryan Moeller from comment #18)

I have been trying to replicate the issue but failed so far. But.. I did now review the messages here, and I guess my test setup is just not replicating what you have.

Could you post or mail me directly the output from zdb run without the arguments.
Comment 20 Ryan Moeller 2019-11-02 22:50:08 UTC
I've run into this problem again trying to boot an 12.1-RC2 installer on a server currently running vanilla FreeBSD 12.0-RELEASE. This one has mirrored SSDs for boot and a pool of 24 disks grouped into mirrors (with two reserved for hot spares). :(

(I have been corresponding with Toomas by email but I wanted to document this publicly as well.
Comment 21 Ryan Moeller 2019-11-07 23:54:22 UTC
Tested latest head snapshot FreeBSD-13.0-CURRENT-amd64-20191107-r354423-disc1.iso on the FreeBSD 12.0 machine.
Still hangs when the pool disks are installed. I see "Consoles: efi" and the spinner spins for a while then gets stuck. If I slide out the storage pool disks the image boots.
Comment 22 Chris R 2019-11-18 13:58:20 UTC
This is happening for me too. I have a SuperMicro X11SDV-4C-TP8F (Xeon-D) server which has two ZFS pools, a system pool consisting of two mirrored SATA SSDs and a data pool which consists of 12 SATA HDDS. The system was running/booting 12.0-RELEASE-p11 with no problems. I started upgrading to 12.1-RELEASE and the machine failed in the same way as the reports in the other comments here, at the "Consoles: EFI console" line. If I remove all the disks for the data pool, the machine boots fine (and I've since finished the 12.1 upgrade), however now the machine is on 12.1 it will refuse to boot if the data pool disks are inserted, so I have to boot the machine with them removed, then manually insert the disks once it's booted.
Comment 23 Chris R 2019-11-19 14:40:44 UTC
Just commenting to confirm that the work-around I'm currently using is to copy the /boot/loader and /boot/loader.efi from 12.0-RELEASE into my /boot. This is definitely a regression in the loader since 12.0-RELEASE.
Comment 24 Ryan Moeller 2019-12-02 22:27:32 UTC
I found zfs_spa_init() in zfsimpl.c is stuck in an infinite loop iterating through a circular list of vdevs. It's not yet clear where the cycle comes from, but it's good to finally have a clue.