Bug 257722 - Current RELEASE builds (11.4, 12.2, 13.0) give BTX boot crash, where 11.3 worked fine.
Summary: Current RELEASE builds (11.4, 12.2, 13.0) give BTX boot crash, where 11.3 wor...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Warner Losh
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2021-08-09 21:38 UTC by jwdevel
Modified: 2021-09-22 22:10 UTC (History)
3 users (show)

See Also:


Attachments
console image for 11.4 BTX crash (115.97 KB, image/jpeg)
2021-08-09 21:48 UTC, jwdevel
no flags Details
console image for 12.2 BTX crash (190.18 KB, image/jpeg)
2021-08-09 21:48 UTC, jwdevel
no flags Details
console image for 13.0 BTX crash (71.21 KB, image/jpeg)
2021-08-09 21:48 UTC, jwdevel
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description jwdevel 2021-08-09 21:38:07 UTC
I am trying to upgrade my system from 11.3 to a supported release (amd64).
My initial plan was to go to 11.4, then to 12.2. But as you will see, none of the current supported releases can run the BTX loader. 11.3 seems to be the last release that works on my system.

See attached images for what the physical console shows for each of the 3 failure cases.

I am more than happy to provide more details if it will help fix this.
In the meantime, I am working on rolling back to 11.3.

11.4 problem:

My upgrade went smoothly until I rebooted after installing the new userland. Now, boot fails with a register dump and "BTX halted" every time.

I then tried creating a boot USB stick of 11.4, and indeed that has the same failure to boot on my system.

So, I don't think it is a problem with my upgrade steps. I think 11.4 boot code simply does not work with my system.

12.2 problem:

I thought perhaps that skipping 11.4 would work, so I made a 12.2-RELEASE USB boot stick too. This also failed to boot, but in a different way.

This time, when the BTX code runs, the screen is filled with garbage ASCII and beeps a few times, then halts. So, 12.x seems problematic, too.

13.0 problem:

Since this is the only other supported release, I tried it as well, again via USB boot stick.

This one also fails, but it simply hangs, with no output, after the list of BIOS drives.

----

Other notes:

* This is somewhat old hardware, but nothing that's unsupported, AFAIK. Again, happy to provide details, but I'm not sure what would be most useful.
Comment 1 jwdevel 2021-08-09 21:48:10 UTC
Created attachment 227062 [details]
console image for 11.4 BTX crash

console image for 11.4 BTX crash
Comment 2 jwdevel 2021-08-09 21:48:29 UTC
Created attachment 227063 [details]
console image for 12.2 BTX crash

console image for 12.2 BTX crash
Comment 3 jwdevel 2021-08-09 21:48:48 UTC
Created attachment 227064 [details]
console image for 13.0 BTX crash

console image for 13.0 BTX crash
Comment 4 jwdevel 2021-08-09 22:00:14 UTC
Out of curiousity, I have been looking at the BTX source to see what differences there are between 11.3 and 11.4, but interestingly, I find *none*!

Having both source trees checked out in SVN, I do:

    $ diff -du -r src_11.3-RELEASE/stand/i386/btx src_11.4-RELEASE/stand/i386/btx

and see no differences (other than version number comments).

But maybe I am looking in the wrong place? I am not sure if these are the BTX sources relevant to me. I notice it is in /i386, but I do not see any /amd64 directory, so am guessing that this is what would get run on my system....

It is the same situation when looking at src/usr.sbin/btxld/**

So, I am currently at a loss as to what could be causing the different behavior. But I am not at all experienced with the boot loader, so might just misunderstand something.
Comment 5 Warner Losh freebsd_committer 2021-08-09 23:45:53 UTC
[[ I sent this to the hackers@ thread ]]

The BIOS boot on amd64 is a 32-bit i386 binary, so all its source lives under stand/i386/btx...  But it's not BTX that's dying, per se, rather it's the thing btx is loading / is running is hitting a problem. The EIP is weird as well, at 0x1011f002 which is at 269,611,010 and that's well above where the ~500k loader should be and well below where I'd expect the BIOS routines to live...

Not much was merged between 11.3 and 11.4. Can you describe your system in terms of CPU, RAM, etc?

But the interface between the  different parts of the boot system didn't change, so it's safe to use 11.3 gptboot pmbr with an 11.4 /boot/loader and kernel.

The bug posted shows what I think is gptboot dying.  At least I think so, the bug didn't say if it was UFS or ZFS...

My guess is that things are a bit bigger, and an allocation is failing and the boot code isn't coping as well as it should with a nice error, but it could always be some other rare, edge case bug that you're running into.

It may also make sense to check out the stable/13 branch to see where, exactly on that branch it fails to give some clue about what the root cause might be. Doing this with 'git bisect' is likely easier to cope with than our mirrored svn branch, but I'm biased.
Comment 6 Warner Losh freebsd_committer 2021-08-09 23:53:15 UTC
Most of the boot loader stuff lives under stand (a few things pulled in from the kernel and libc as well)

% git diff --stat release/11.3.0 release/11.4.0 -- stand
 53 files changed, 556 insertions(+), 1128 deletions(-)
% git rev-list --count release/11.3.0..release/11.4.0 -- stand
39
% git rev-list --count release/11.3.0..release/11.4.0
1288

With git bisect, you're likely looking at 10 or 11 attempts between the two. No need to buildworld, since building in src/stand, installing from there and then doing the gpart to install the gptboot code would make this relatively quick to track down (though you'd likely need a USB stick to fall back booting to for the bad iterations).
Comment 7 jwdevel 2021-08-10 00:00:16 UTC
(In reply to Warner Losh from comment #5)

> Can you describe your system in terms of CPU, RAM, etc?

It's an old system, probably 10-15 years.
It boots from MBR, not GPT.
Also, it boots from a hardware RAID card ('twe' driver).

This is the boot disk, from 'gpart list':

	Geom name: twed0
	modified: false
	state: OK
	fwheads: 255
	fwsectors: 63
	last: 488395119
	first: 63
	entries: 4
	scheme: MBR
	Providers:
	1. Name: twed0s1
   	   Mediasize: 250058221056 (233G)
   	   Sectorsize: 512
   	   Stripesize: 0
   	   Stripeoffset: 64512
   	   Mode: r2w2e6
   	   efimedia: HD(1,MBR,0x90909090,0x7e,0x1d1c50d3)
   	   attrib: active
   	   rawtype: 165
   	   length: 250058221056
   	   offset: 64512
   	   type: freebsd
   	   index: 1
   	   end: 488395088
   	   start: 126
	Consumers:
	1. Name: twed0
   	   Mediasize: 250058301440 (233G)
   	   Sectorsize: 512
   	   Mode: r2w2e8

	Geom name: twed0s1
	modified: false
	state: OK
	fwheads: 255
	fwsectors: 63
	last: 488394962
	first: 0
	entries: 8
	scheme: BSD
	Providers:
	1. Name: twed0s1a
   	   Mediasize: 4294967296 (4.0G)
   	   Sectorsize: 512
   	   Stripesize: 0
   	   Stripeoffset: 65536
   	   Mode: r1w1e2
   	   rawtype: 7
   	   length: 4294967296
   	   offset: 1024
   	   type: freebsd-ufs
   	   index: 1
   	   end: 8388609
   	   start: 2
	2. Name: twed0s1b
   	   Mediasize: 34359738368 (32G)
   	   Sectorsize: 512
   	   Stripesize: 0
   	   Stripeoffset: 65536
   	   Mode: r0w0e0
   	   rawtype: 1
   	   length: 34359738368
   	   offset: 4294968320
   	   type: freebsd-swap
   	   index: 2
   	   end: 75497473
   	   start: 8388610
	3. Name: twed0s1d
   	   Mediasize: 17179869184 (16G)
   	   Sectorsize: 512
   	   Stripesize: 0
   	   Stripeoffset: 65536
   	   Mode: r0w0e0
   	   rawtype: 7
   	   length: 17179869184
   	   offset: 38654706688
   	   type: freebsd-ufs
   	   index: 4
   	   end: 109051905
   	   start: 75497474
	4. Name: twed0s1e
   	   Mediasize: 8589934592 (8.0G)
   	   Sectorsize: 512
   	   Stripesize: 0
   	   Stripeoffset: 65536
   	   Mode: r0w0e0
   	   rawtype: 7
   	   length: 8589934592
   	   offset: 55834575872
   	   type: freebsd-ufs
   	   index: 5
   	   end: 125829121
   	   start: 109051906
	5. Name: twed0s1f
   	   Mediasize: 185633710080 (173G)
   	   Sectorsize: 512
   	   Stripesize: 0
   	   Stripeoffset: 65536
   	   Mode: r1w1e2
   	   rawtype: 7
   	   length: 185633710080
   	   offset: 64424510464
   	   type: freebsd-ufs
   	   index: 6
   	   end: 488394961
   	   start: 125829122
	Consumers:
	1. Name: twed0s1
   	   Mediasize: 250058221056 (233G)
   	   Sectorsize: 512
   	   Stripesize: 0
   	   Stripeoffset: 64512
   	   Mode: r2w2e6

Other system info, via 'dmesg':

	CPU: AMD Phenom(tm) 9600 Quad-Core Processor (2311.66-MHz K8-class CPU)
	...
	real memory  = 8589934592 (8192 MB)
	avail memory = 8001925120 (7631 MB)

> so it's safe to use 11.3 gptboot pmbr with an 11.4 /boot/loader and kernel.

If I understand, with MBR booting, I do *not* want pmbr. Am I right on that point?

Also, while I'm here: is there a good way to put my system back in a bootable state, just in the meantime?
I still have the old 11.3 source tree saved. Is there an easy way to install that boot loader code, to get it running (even though it's otherwise 11.4 userland + kernel) ?

I feel like it's probably doable, but not sure the actual commands to use. Maybe 'gpart bootcode -b XXX'? Maybe 'bsdlabel -B -b XXX'?
Comment 8 Warner Losh freebsd_committer 2021-08-10 00:10:19 UTC
Ah, MBR does use mbr to load boot1/boot2 which loads /boot/loader. Somewhere in that, before /boot/loader announces itself, we die.

Also, MBR is less well tested than GPT, alas, since most newer systems have GPT...

I think that 'gpart bootcode -b /boot/boot twed0s1` is the most relevant. Substitute /boot/boot with your saved copy of the 11.3 boot blocks.

If that doesn't solve it, you'll need to restore boot0 as well with `gpart bootcode -b /boot/boot0 twed0`

If that doesn't solve it, then you'll need to copy /boot/loader from 11.3 onto your root partition as /boot/loader.
Comment 9 Warner Losh freebsd_committer 2021-08-10 00:21:48 UTC
So with 8GB we've ruled out the "your machine had only 256MB of memory and that's too small now" class of bugs.

I forgot to mention that.

Also, looking at the output on 11.3, we're dying inside of the boot loader, it seems after we've initialized the disks, so we're already loaded, so I suspect strongly that boot0 and boot are going to turn out to not make a difference.
Comment 10 jwdevel 2021-08-10 00:50:27 UTC
I was having trouble locating the actual boot blocks for my old 11.3 source tree. I'm not sure if that is created on-the-fly somehow? I did not see a file called 'boot' in that tree, at any rate.

Instead, after booting from my stock 11.3 USB stick, I copied those boot blocks:

    # gpart bootcode -b /boot/boot twed0s1

(my assumption was that if the USB stick booted with that successfully, it should be okay to copy from there)

But after reboot to the main system, I got a plain blank screen (with blinking cursor). No hex dump or anything. So ... something changed, but doesn't seem like for the better.

I tried continuing with the advice, also restoring boot0:

    gpart bootcode -b /boot/boot0 twed0

Rebooted into main system again, and get the same black screen w/blinking cursor.

At this point, I am worried that I am making things worse, so not sure if I should proceed with copying the /boot/loader, too?

Question: do you know where, in the built source tree, the boot blocks are loaded? I'd like to at least get back to the broken 11.4 version which shows the BTX hex dump...
Comment 11 jwdevel 2021-08-10 01:10:21 UTC
(In reply to jwdevel from comment #10)

> where, in the built source tree...

Oh, I was being silly. Need to look in /usr/obj/... not /usr/src/...

I think the file is: /usr/obj/usr/src/sys/boot/i386/boot2/boot
Comment 12 jwdevel 2021-08-10 01:19:40 UTC
Well crud.

I tried restoring the old boot blocks:

    # gpart bootcode -b mounted-usrfs/obj/usr/src/sys/boot/i386/boot2/boot twed0s1
    # gpart bootcode -b mounted-usrfs/obj/usr/src/sys/boot/i386/boot0/boot0 twed0

(where 'mounted-usrfs' is a dir where I had temporarily mounted my real system's /usr, from the LiveCD).

But upon reboot, it is still just a plain blank screen with a blinking cursor.
No 'BTX halted' message, or anything.

My confusion grows (:
Comment 13 jwdevel 2021-08-10 01:23:53 UTC
From another thread, I found this quote:

"the MBR block is the first 512 byte block on the disk and it's actually in two parts, the code and then the MBR partition table. The bootcode is the first 446 bytes out of the 512 and the rest is reserved for the partition table. With this in mind it should be obvious that if you want to replace only the code part of the MBR block you have to know to write only 446 bytes and not the full 512 or you will mess up the partition table."

I wonder if this is what's happening here... I don't know how 'gpart bootcode' actually works; is it smart enough to retain the partition table correctly?

researching more...
Comment 14 Warner Losh freebsd_committer 2021-08-10 01:48:23 UTC
(In reply to jwdevel from comment #13)
> I wonder if this is what's happening here... I don't know how 'gpart bootcode'
> actually works; is it smart enough to retain the partition table correctly?

It should. i'm quite confused by your results. Here's a snippet from the test script I've used in the past to test the boot loader...

# For MBR, we have lots of choices, but select mbr, boot0 has issues with UEFI
mbr0=${srcroot}/boot/mbr
mbr2=${srcroot}/boot/boot

gpart bootcode -b ${mbr0} ${dev}
s=$(find-freebsd-part ${dev})
gpart bootcode -p ${mbr2} ${dev}s${s}

is what I've used in my regression scripts, but I've not run them recently enough to know how well they work. I think well, but I'm now unsure of what those issues were, so you might try that. Only difference is that I've used the installed result rather than files from the object tree.
Comment 15 jwdevel 2021-08-10 01:59:31 UTC
> # For MBR, we have lots of choices, but select mbr, boot0 has issues with UEFI
> mbr0=${srcroot}/boot/mbr

What is the difference between /boot/mbr and /boot/boot0 ?

I have seen various references, but nothing concrete.
Are they just alternate choices for early-stage boot code? The handbook just talks about boot0, for instance.
Comment 16 jwdevel 2021-08-10 03:01:55 UTC
Some progress!

I got the 11.4 system booting again.
For whatever reason, the default boot disk my BIOS was choosing was wrong. I'm a little confused by that, since I did not change anything in the BIOS.
Maybe the upgrade set some flag on some disk somewhere that caused the BIOS to change things? I have no idea; it is an excruciatingly old BIOS, at any rate.

Anyway, now that it is properly booting from twed0, which is where those boot blocks are located, it is indeed as Warner describes: I still got the "BTX halted" error until I updated /boot/loader on the root partition.

Once I copied that from the 11.3 boot stick, then the whole normal system boots again.

Remaining work:
* figure out what git revision introduced the bad loader code.

So, the next question:

For bisecting to find the broken revision, I believe what I need to do is:
1. sync the src to test
2. from inside '/usr/src/stand', run 'make'
3. install '/usr/obj/usr/src/sys/boot/i386/loader/loader', by copying it to the root partition (since that seems to be the problem file)

Right?
Comment 17 Warner Losh freebsd_committer 2021-08-10 04:22:26 UTC
(In reply to jwdevel from comment #15)
> What is the difference between /boot/mbr and /boot/boot0 ?

mbr boots the first active ufs partition automatically. boot0 will prompt for which partition to boot, maybe a different disk.
Comment 18 Warner Losh freebsd_committer 2021-08-10 04:25:45 UTC
> For bisecting to find the broken revision, I believe what I need to do is:

Yes. Except step 3 I'd just do a 'make install' as root from src/stand/i386/loader. I'd also save a working /boot/loader as /boot/loader.good. You can break into the boot2 process if you are quick and type /boot/loader.good to boot the working one when you get one that's bad, but you gotta be quick about it. I think you have ~1s to do that after the / is printed, I believe.

I fear you'll find one of my large MFCs is to blame and sorting that out may be tricky, but no sense fretting over possible future difficulty before we get to that point, eh?
Comment 19 jwdevel 2021-08-10 06:25:18 UTC
Been poking at this a bit, but so far the "make build" in /usr/src/stand is not succeeding.

I am sure this is some novice thing, but I'm unfamiliar with the details of the FreeBSD build system (reading build(7) a bit, though).

The current situation:

* I made a git clone into /usr/src and checked out release/11.3.0 (ok)
* I still have /usr/obj leftover from my previons SVN-based build
* I 'cd stand' and 'make', and get:

	ld -m elf_i386_fbsd -static -N --gc-sections -Ttext 0x2000 -o boot2.out /usr/obj/usr/src/stand/i386/btx/lib/crt0.o boot2.o sio.o
	boot2.o: In function `fsread_size':
	/usr/src/stand/libsa/ufsread.c:234: undefined reference to `__ashldi3'
	/usr/src/stand/libsa/ufsread.c:270: undefined reference to `__ashldi3'
	/usr/src/stand/libsa/ufsread.c:295: undefined reference to `__ashldi3'
	/usr/src/stand/libsa/ufsread.c:297: undefined reference to `__ashldi3'
	*** Error code 1

I know that the OS build setup can be complex, sometimes needing a newer compiler than is available on the base system (if you're upgrading to something newer), etc.
This seems possibly related to such issues...

It seems that the crt0 from the /usr/obj (my 11.4 build leftovers) is not usable with this 11.3 codebase.

Maybe I really need to do a "full" build as I am bisecting to find the bad commit?
A full compile (like 'make buildworld') can take 12+ hours on this old system, so that will slow me down a bit.

Maybe there is a better way?
I'm researching/learning, but any advice appreciated.
Comment 20 jwdevel 2021-08-10 07:09:44 UTC
I tried moving my old /usr/obj out of the way, and building clean (inside /usr/src/stand), and got the same '__ashldi3' error.

I wonder if the issue is that I have clang 10.0 on this 11.4 system, but the 11.3 source code is assuming clang 8 (I believe that's what was used in 11.3)?

I *can* successfully build when I 'git checkout release/11.4.0'. Of course, this will produce a non-working loader for my system.

I'll try to do some amount of bisecting, where the 11.3 "good" end of the range is not buildable. Not sure if I'll hit a working loader (yay) or a non-buildable source tree (boo) first, though.
Comment 21 jwdevel 2021-08-10 07:12:27 UTC
(In reply to Warner Losh from comment #18)

> Except step 3 I'd just do a 'make install' as root from src/stand/i386/loader.

I have not gotten this to work. Even on a 11.4 tree that builds ok ('make' in /usr/src/stand succeeds).

I get:

    make: don't know how to make /usr/src/stand/i386/btx/lib/crt0.o. Stop
    make: stopped in /usr/src/stand/i386/loader

Not sure why it's looking for that file there. It *does* exist in /usr/obj/usr/src/stand/i386/btx/lib/crt0.o

I tried setting MAKEOBJDIRPREFIX=/usr/obj, but it made no difference.

Is doing 'make install' from that lower dir (.../stand/i386/loader) supported?
Comment 22 jwdevel 2021-08-10 08:51:56 UTC
Results from bisecting:

I narrowed it down to these 8 changes, down from the full release/11.3.0..release/11.4.0 range (limited to /stand/ directory):

	ce18c8bb63650bdf14295fb17f3b6ddf88d3f9a3
	60377393248e8128d70c693f68bf7609f023c3bd
	faf3e20aeee5829fd9f2ad69400ec75fcd1b8965
	9bb5b05ea4eddd2b489bcdb52842b29528a06a2d
	ca75fe1ec29097bca9f3a78c550f7d30aad248e4
	3610f4c0fa2082e0163a5ec012e45b7822c50b75
	d1718b0f2ebeae1303b62e9006576224a8c99f8e
	a1c9c3ef5558d2e367059686edcc8f918cb65ab9

Some notes on the process I used, and why I couldn't go back further, so far:

I built from /usr/src/strand, with:
    # rm -r /usr/obj && make obj && make

Then, I copied the result into place from /usr/obj/usr/src/stand/i386 :
    # cp loader_4th/loader_4th /boot/loader && chmod +x /boot/loader && chmod -w /boot/loader

Note: I didn't copy any of the other files, like the ".4th" sources. I think that's okay?

Anyway, using that process, the oldest I can successfully build as-is is f05451cb62e395f249ab4a52eab3f761244d1dd2, and that gives the "BTX halted" boot failure for me.

Trying to build one revision earlier (4e0475d1f82f863437f28f92d4799012b8f56f51) gives me:

    /usr/src/stand/i386/gptboot/gptldr.S:141:3: error: value of 36878 is too large for field of 2 bytes.
      jmp MEM_JMP # Start BTX
      ^
    *** Error code 1

However, that f0545... revision has just the fix for that issue, so I backported that one-line fix for continuing bisecting to older revs.
Using that, I could build all the way back to 275127189cfb002fd13e236f57b2a05e51cd750e, which again has the boot failure issue.

So, that leaves revisions between release/11.3.0..275127 as candidates (which is the list given up above).

The reason I can't build further back is because in ce18c8bb, the build fails with:

	cc: error: unknown argument: '-fformat-extensions'

Which I think boils down to my using the 11.4 system compiler trying to build old 11.3(ish) code.

Perhaps I can set up a VM or something to build the older code, and try to narrow this down further.
Or perhaps do some more dicey backporting of changes to make it compile again...
Comment 23 jwdevel 2021-08-10 09:12:43 UTC
Okay, narrowed this down a little more.
Now it's one of these 5 commits:

	9bb5b05ea4eddd2b489bcdb52842b29528a06a2d
	ca75fe1ec29097bca9f3a78c550f7d30aad248e4
	3610f4c0fa2082e0163a5ec012e45b7822c50b75
	d1718b0f2ebeae1303b62e9006576224a8c99f8e
	a1c9c3ef5558d2e367059686edcc8f918cb65ab9


Was able to get past the "unknown argument: '-fformat-extensions'" error by backporting a fix to mk/bsd.compiler (xxxx)

My money is on 9bb5b0... being the problematic commit. The earliest 3 are zfs-specific (not relevant for me), the next one is "mostly a nop" according to the notes, and the remaining one (9bb5b0) is a big combination of changes.
Comment 24 Graham Perrin 2021-08-13 06:24:11 UTC
(In reply to jwdevel from comment #3)

> console image for 13.0 BTX crash

Bug 255072 - boot (legacy): no progress beyond 'BIOS DRIVE D: is disk1'

See the photograph there. Maybe the same bug?
Comment 25 Warner Losh freebsd_committer 2021-08-13 13:50:18 UTC
(In reply to Graham Perrin from comment #24)
They both are crashes, but it's unclear if they are the same bug...
Comment 26 Warner Losh freebsd_committer 2021-08-14 01:23:41 UTC
(In reply to jwdevel from comment #23)
The last two can be eliminated because one undoes the other.
The others aren't looking like great candidates either, but you can eliminate the possibility it is ZFS related by MK_LOADER_ZFS=no if you land on one of those commits and really think it is it.
Comment 27 jwdevel 2021-08-14 02:57:23 UTC
(In reply to Warner Losh from comment #26)

My current trouble in going further is that I can't build these old commits on my new (11.4) system toolchain. I will look into using a VM or so, though.

Just to summarize where I'm at:

I can (with small patches) build as far back as faf3e20..., and that loader fails for me.
a1c9c3ef... is just the first rev after 11.3.0 (2dfc50997e...), the release which works successfully for me (based on memory stick boot testing)

So, assuming the issue is coming from something in stand/, those 5 seem like the only candidates...

What do you think the odds are that this is caused by something outside of stand/ though?
Comment 28 Warner Losh freebsd_committer 2021-08-14 03:53:46 UTC
You can install 11.3 into a jail and build there (but you'll have to hop to the host to install).

I'd say it's even money at the moment if it's inside or outside of stand.

I'd love to see if disabling ZFS has any effect...

I'd forgotten the build issues can be tricky.

Finally, this is a default built, right. nothing in make.conf or src.conf, right?
Comment 29 jwdevel 2021-08-14 17:56:06 UTC
(In reply to Warner Losh from comment #28)
> Finally, this is a default built, right. nothing in make.conf or src.conf, right?

No src.conf. I do have a few things set in make.conf, though I doubt they matter. Here are the contents of that file:

    # enable SASL for Sendmail
    SENDMAIL_CFLAGS=-I/usr/local/include/sasl -DSASL
    SENDMAIL_LDADD=/usr/local/lib/libsasl2.so
    
    DEFAULT_VERSIONS=python=3.8 python2=2.7 python3=3.8
    DEFAULT_VERSIONS+= apache=2.4
    DEFAULT_VERSIONS+= ssl=openssl
    DEFAULT_VERSIONS+= samba=4.13
    DEFAULT_VERSIONS+= bdb=5


> I'd love to see if disabling ZFS has any effect...
>> you can eliminate the possibility it is ZFS related by MK_LOADER_ZFS=no

Where do I put that option? In src.conf? I'm not familiar with it and cursory searching hasn't made it obvious to me...

Also, just to be clear: you're saying I should use that option on a failing commit (eg: faf3e20...) and see if it work, right?
Comment 30 Graham Perrin 2021-09-05 16:15:20 UTC
Glancing at the three attachments (alone), is there any overlap with bug 255072?
Comment 31 chris.torek 2021-09-22 15:21:51 UTC
I seem to have hit this same problem on my old ASUS box.  The EIP when it crashes is different, though.  Restoring the 11.3 loader gets things booting, but the kernel then crashes pretty early on.  More debugging later...
Comment 32 chris.torek 2021-09-22 15:47:17 UTC
(In reply to chris.torek from comment #31)

The kernel crash occurs when kldload-ing /boot/modules/radeonkms.o (which does seem to be updated by freebsd-update, but clearly doesn't quite work).

Additional note: This particular system won't boot off a USB stick (I installed FreeBSD ages ago from a DVD image).  I might have to futz with the BIOS; I wonder if this is related to the boot issues.
Comment 33 jwdevel 2021-09-22 22:01:31 UTC
(In reply to chris.torek from comment #32)

Does your system boot via MBR (no UEFI)?
That seems to be one common thread between similar reports I've seen so far; is it true for you as well?
Comment 34 chris.torek 2021-09-22 22:10:53 UTC
(In reply to jwdevel from comment #33)
Yes, it's an MBR-based system.