Bug 276720 - AMIs fail to boot in AWS EC2
Summary: AMIs fail to boot in AWS EC2
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-01-29 20:30 UTC by mgrooms
Modified: 2024-01-30 21:29 UTC (History)
2 users (show)

See Also:


Attachments
Stuck boot (97.31 KB, image/jpeg)
2024-01-29 20:30 UTC, mgrooms
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description mgrooms 2024-01-29 20:30:12 UTC
Created attachment 248065 [details]
Stuck boot

I can't seem to boot any version of FreeBSD in AWS EC2 at the moment. I tried the non ZFS community AMIs for the following versions ...

FreeBSD 12
FreeBSD 13
FreeBSD 14

It seems to get to the point where the network stack initialized as I can ping the host, but it looks like it gets stuck shortly after that. I tried a t2.small and several different t3 instance sizes. All of them appear to exhibit the same behavior. I tried booting an Amz Linux AMI with the same parameters and connected with ssh immediately after, so I'm quite certain it's not a user error.

I've attached a screen shot of the console for a t3.micro instance. It just sits in that stuck state without completing the boot process. When I attempt to attach the serial console it just shows jibberish as if there was a baud rate mismatch.
Comment 1 Colin Percival freebsd_committer freebsd_triage 2024-01-29 20:49:18 UTC
Can you tell me the exact AMI IDs you tried?
Comment 2 mgrooms 2024-01-30 15:08:49 UTC
Hi Colin. Thanks for the response. I'm running a bunch of FreeBSD in AWS, so thanks for all your hard work.

I left some of these new instances up for several minutes knowing that the initial boot ( growfs ) can take some time to complete. After opening this bug report, I left one up and running overnight and the boot completed at some point. Here is the AMI used ...

ami-0059170a80e36d30f

When I look at the boot log, I see this in the output right after the console messages ...

Jan 29 20:56:27 freebsd freebsd-update[870]: src component not installed, skipped
Jan 29 20:56:27 freebsd freebsd-update[870]: Looking up aws.update.FreeBSD.org mirrors... 1 mirrors found.
Jan 29 21:00:57 freebsd freebsd-update[870]: Fetching public key from dualstack.aws.update.freebsd.org... failed.
Jan 29 21:00:57 freebsd freebsd-update[870]: No mirrors remaining, giving up.
Jan 29 21:00:57 freebsd freebsd-update[870]: 
Jan 29 21:00:57 freebsd freebsd-update[870]: This may be because upgrading from this platform (amd64)
Jan 29 21:00:57 freebsd freebsd-update[870]: or release (13.2-RELEASE) is unsupported by freebsd-update. Only
Jan 29 21:00:57 freebsd freebsd-update[870]: platforms with Tier 1 support can be upgraded by freebsd-update.
Jan 29 21:00:57 freebsd freebsd-update[870]: See https://www.freebsd.org/platforms/ for more info.
Jan 29 21:00:57 freebsd freebsd-update[870]: 
Jan 29 21:00:57 freebsd freebsd-update[870]: If unsupported, FreeBSD must be upgraded by source.

I think this caused a 4.5 minute delay because I booted the AMI with a security group that didn't allow access to aws.update.FreeBSD.org. That doesn't seem ideal :/

In any case, I'm pretty sure that solves the mystery. Sorry for the noise.
Comment 3 Michael Osipov freebsd_committer freebsd_triage 2024-01-30 15:43:30 UTC
(In reply to mgrooms from comment #2)

It still shouldn't block for 4.5 min? No?
Comment 4 mgrooms 2024-01-30 16:56:37 UTC
This is likely just freebsd-update waiting to establish a connection. I believe that is 60s by default which would add up with multiple attempts. Maybe there is a way to shorten the TCP connect timeout in this case to prevent the boot delay?

It looks like update uses fetch which appears to honor the following ...

-T seconds, --timeout=seconds
    Set timeout value to seconds.  Overrides the environment
    variables FTP_TIMEOUT for FTP transfers or HTTP_TIMEOUT for
    HTTP transfers if set.

Maybe that means that the init script could export an HTTP_TIMEOUT value < 60?
Comment 5 Michael Osipov freebsd_committer freebsd_triage 2024-01-30 17:02:27 UTC
(In reply to mgrooms from comment #4)

Here it is:
./fetch.c:#define TIMEOUT               120

Two minutes per default is insane. I'd say that 30 s are more than enough...
Comment 6 commit-hook freebsd_committer freebsd_triage 2024-01-30 21:27:12 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/ports/commit/?id=39126a2d9768e46e0fe8795c815721d122adadad

commit 39126a2d9768e46e0fe8795c815721d122adadad
Author:     Colin Percival <cperciva@FreeBSD.org>
AuthorDate: 2024-01-30 21:20:11 +0000
Commit:     Colin Percival <cperciva@FreeBSD.org>
CommitDate: 2024-01-30 21:26:23 +0000

    sysutils/firstboot-freebsd-update: HTTP_TIMEOUT=5

    If a system with firstboot_freebsd_update_enable="YES" boots without
    access to the FreeBSD Update mirrors (e.g. an EC2 instance which has
    an EC2 security group settings which block outbound HTTP) the boot
    will hang until it times out.  The default timeout of 120 seconds is
    suboptimal.

    Run freebsd-update with a timeout of 5 seconds, and bump the package
    version to 1.4 to reflect this change.

    Reported by:    mgrooms@shrew.net
    PR:             276720
    Sponsored by:   https://www.patreon.com/cperciva

 sysutils/firstboot-freebsd-update/Makefile                          | 2 +-
 sysutils/firstboot-freebsd-update/files/firstboot_freebsd_update.in | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
Comment 7 Colin Percival freebsd_committer freebsd_triage 2024-01-30 21:28:10 UTC
Should be much faster now.
Comment 8 mgrooms 2024-01-30 21:29:45 UTC
Thank you!