Bug 228960 - Why is the boot loader taking so long
Summary: Why is the boot loader taking so long
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: Colin Percival
URL:
Keywords:
Depends on:
Blocks: 228911 228959
  Show dependency treegraph
 
Reported: 2018-06-12 16:22 UTC by Rodney W. Grimes
Modified: 2018-08-05 19:06 UTC (History)
6 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rodney W. Grimes freebsd_committer 2018-06-12 16:22:57 UTC
Why is the boot loader taking so long
Comment 1 Emmanuel Vadot freebsd_committer 2018-07-30 16:18:13 UTC
What does this mean ?
Taking so long to do what ? Compared to what ?
Comment 2 Colin Percival freebsd_committer 2018-07-30 17:44:02 UTC
This is something I brought up during the BSDCan devsummit.  In my testing it seemed like the time from "system turns on" to "kernel starts running" was much longer than it should be, but I haven't had a chance to track down what's happening during that period.
Comment 3 Warner Losh freebsd_committer 2018-07-30 18:04:22 UTC
There's two things: (1) BIOS takes forever and UEFI only a little less so. (2) we have a 10s autoboot_delay on top of that. It could likely be 2 or 3 by default these days.
Comment 4 Rodney W. Grimes freebsd_committer 2018-07-30 18:15:55 UTC
IIRC there are also major slow downs when your on a virtualized platform because of emulation of the devices on the LPC bus tend to be very slow in most hypervisors.   Things like 8042 keyboard controllers, and serial ports.
Comment 5 Conrad Meyer freebsd_committer 2018-08-02 01:43:30 UTC
We need a consistent yardstick platform (or set of platforms) and baseline figures for those platforms in order to evaluate relative performance on HEAD.

It seems like one good candidate might be a specific cloud compute instance type at a given vendor.  I don't have any vendor in mind.

I'd also love to see a user-type system (laptop or desktop) included.  I don't really care what the exact hardware is, as long as it's x86 and has the same BIOS installed across all tests.

It would be good to stopwatch BIOS time, loader time, kernel time before rc, and rc to login, although that might be getting ahead of things.

Do RELEASE systems need any autoboot_delay?  It is useful for developers who often break their systems, but RELEASEs should be golden, right? ;-)  And I think developers can handle increasing the default low or zero value (although I strongly recommend 'nextboot' instead and leaving a known safe kernel configured as the default).
Comment 6 Warner Losh freebsd_committer 2018-08-02 09:46:23 UTC
People need to set tunables in the boot environment before loading the kernel, so yes, the installer needs auto_boot_delay set.
Comment 7 Conrad Meyer freebsd_committer 2018-08-02 19:12:37 UTC
(In reply to Warner Losh from comment #6)
The installer, sure, for weird corner cases that totally prevent installation.


I don't expect the installed system to need this on a regular basis.  Any long-term loader variables can be set in boot.conf.  If an admin knows they need to regularly set different loader variables, they can configure autoboot delay themselves.  Leaving it enabled on the default installation seems like prioritizing an extreme corner case at small cost to every other system.
Comment 8 Warner Losh freebsd_committer 2018-08-02 19:23:23 UTC
The problem is when the installed system needs it, it NEEDS IT and there's no ability to enable it. We have cases all the time where something breaks and you have to unload a module, or set a tunable to resolve the issue or boot an old kernel because the recently installed kernel panics before it gets to a point where you can switch back. It's hardly a rare or an edge case.

Having said that, we can set the delay to 1 or 2 rather than 10, however, to allow people to interrupt without unduly delaying the boot. I run my systems at 3 all the time on my systems, and 2 or even 1 would suffice.
Comment 9 Conrad Meyer freebsd_committer 2018-08-02 19:35:22 UTC
(In reply to Warner Losh from comment #8)
You can still break into loader with autoboot="0".  I think you may have autoboot="-1" in mind?

(And I still think you're thinking of only your particular use case as a developer and not the larger herd of boring servers and desktop systems who tend to keep defaults.  It's easy for Netflix to deploy a different autoboot_delay than the release branch default to all its development systems, if need be.)
Comment 10 Warner Losh freebsd_committer 2018-08-02 21:18:51 UTC
The problem is that when you need to interrupt the boot process, you really need to do it. You can't easily enable it when you need it.

autoboot=0 might be OK, but I'm still thinking we'd want 1 or 2 there since 0 has proven tough to interrupt at times for me.
Comment 11 Ed Maste freebsd_committer 2018-08-03 14:59:53 UTC
> autoboot=0 might be OK, but I'm still thinking we'd want 1 or 2 there since 0
> has proven tough to interrupt at times for me.

Agreed, I think 1 or 2 is a fair compromise for the default.
Comment 12 Conrad Meyer freebsd_committer 2018-08-03 15:03:11 UTC
Ok, reducing autoboot_delay is one concern we can make easy progress on in code we control.  Can we also speed up the actual loading of modules and kernel after the autoboot timer expires?
Comment 13 Colin Percival freebsd_committer 2018-08-05 08:30:44 UTC
I've just finished doing some digging here.  In EC2 (where I have autoboot_delay="-1"), almost all of the time (~3.6s out of ~4.0s) in the loader is spent reading /boot/kernel/kernel, and that's mostly doing 4 kB reads.  I strongly suspect that doing 32 kB reads would be faster, but we're limited by a 4 kB bounce buffer (see r313349).

If that bounce buffer could be made larger, I strongly suspect that we could get much higher I/O performance in the loader, but I don't know enough about the BTX code and the requirements of V86 calls to know if this is feasible.
Comment 14 Toomas Soome freebsd_committer 2018-08-05 17:07:01 UTC
(In reply to Colin Percival from comment #13)

Yes, at this time we are using 4k bounce buffer I found from low memory. The bounce buffer is needed because BIOS INT calls are running in real mode and can only access low memory.

The 4k bounce buffer history is simple - the original code was using stack space (with alloca), but we can not control that space nor give any guarantees for safety, so I found unused 4k page in BTX memory area. However, I recently found another area (16k) but it will need a bit of work but we will get there soon:)
Comment 15 Colin Percival freebsd_committer 2018-08-05 19:06:08 UTC
> I recently found another area (16k) but it will need a bit of work but we will get there soon:)


Please let me know as soon as you have a potentially-working patch for this -- I'd like to benchmark it to see how it affects the boot time.