Bug 258467 - fails to boot on riscv: loader.efi Unhandled exception: Load access fault
Summary: fails to boot on riscv: loader.efi Unhandled exception: Load access fault
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: riscv (show other bugs)
Version: CURRENT
Hardware: riscv Any
: --- Affects Some People
Assignee: freebsd-riscv (Team)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-09-13 01:22 UTC by Klaus Küchemann
Modified: 2022-02-03 15:51 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Klaus Küchemann 2021-09-13 01:22:51 UTC
U-Boot 2021.07 (Jul 26 2021 - 02:27:08 +0200)
....

root@:~ # uname -paKU
FreeBSD  14.0-CURRENT FreeBSD 14.0-CURRENT #5 main-n249326-b864b67a0d19: Mon Sep 13 00:22:12 UTC 2021     root@generic:/usr/obj/usr/src/riscv.riscv64/sys/GENERIC-NODEBUG  riscv riscv64 1400033 1400033
--
(cross-compiled on aarch64 into NFS-directory )
---------------------reset-loop with current loader.efi-----------------------------------
Consoles: EFI console  
    Reading loader env vars from /efi/freebsd/loader.env
FreeBSD/riscv EFI loader, Revision 1.1
(Tue Aug 17 02:06:19 UTC 2021 root@generic)

   Command line arguments: l
   Image base: 0xfe5dd000
   EFI version: 2.80
   EFI Firmware: Das U-Boot (rev 8225.1792)
   Console: comconsole (0)
   Load Path: /boot\loader.efi
   Load Device: /VenHw(e61d73b9-a384-4acc-aeab-82e828f3628b)/MAC(70b3d592f1d5,1)
Setting currdev to net0:
ethernet@10090000: PHY present at 0
ethernet@10090000: Starting autonegotiation...
ethernet@10090000: Autonegotiation complete
ethernet@10090000: link up, 1000Mbps full-duplex (lpa: 0x3800)
net0: cannot set rx. filters (status=3)
Unhandled exception: Load access fault
EPC: 00000000fe62ce6a RA: 00000000fe62b8be TVAL: ffffffd200000000
EPC: 000000007e8b0e6a RA: 000000007e8af8be reloc adjusted
UEFI image [0x00000000fe5dd000:0x00000000fe7386bf] pc=0x4fe6a '/boot\loader.efi'

resetting ...     
----

when I overwrite loader.efi with a backup file of the previous loader.efi, the machine boots up again.
Comment 1 Mitchell Horne freebsd_committer freebsd_triage 2021-09-13 13:57:30 UTC
What is the age of the previous loader.efi? I'd like to establish the time range so we might narrow down the list of suspect commits.
Comment 2 Klaus Küchemann 2021-09-13 14:15:54 UTC
(In reply to Mitchell Horne from comment #1)
Hi,
I did a few git pull/buildworld/buildkernel the last days and the breakage comes 
very close to  : https://reviews.freebsd.org/D30848  ( by guessing... seems to be a major change).
last "good" loader.efi approx. 3 day ago. So the new broken one is from a fresh build.
Comment 3 Leandro Lupori freebsd_committer freebsd_triage 2021-09-13 14:41:15 UTC
I'm also experiencing boot failures, but on PowerPC64.

I can boot if I revert b4cb3fe0e39a/D30848.

Boot error messages:

>> FreeBSD/powerpc Open Firmware boot block
   Boot path:   /vdevice/v-scsi@71000002/disk@8100000000000000
   Boot loader: /boot/loader
   Boot volume:   /vdevice/v-scsi@71000002/disk@8100000000000000:2
Consoles: Open Firmware console  

FreeBSD/powerpc64 Open Firmware loader, Revision 0.1
(Mon Sep 13 10:45:57 -03 2021 luporl@p9c)
Memory: 16777216KB
Booted from: /vdevice/v-scsi@71000002/disk@8100000000000000

 

( 700 ) Program Exception [ 2c50f60 ]


    R0 .. R7           R8 .. R15         R16 .. R23         R24 .. R31
0000000002c38e88   0000000002c556f8   000000000003a407   0000000002c4f010   
0000000002c50f30   0000000000000000   00000000000433a9   0000000002c4f014   
0000000000000000   0000000000000000   000000000003a612   0000000003450000   
0000000002c50f40   0000000002c55680   0000000000040000   0000000000000000   
0000000002c55640   0000000000000380   0000000000000001   0000000002c55640   
0000000002c50f60   0000000000000000   0000000000000003   0000000000000000   
0000000000000000   000000000003a3fb   0000000000040000   0000000002c556c0   
0000000000000010   0000000000000008   0000000000000000   000000000003a62d   

    CR / XER           LR / CTR          SRR0 / SRR1        DAR / DSISR
        84000c00   0000000002c033ac   0000000000000000   0000000000000000   
0000000000000000   0000000000000000   0000000000083000           00000000   


2 >
Comment 4 Leandro Lupori freebsd_committer freebsd_triage 2021-09-20 18:03:01 UTC
For PowerPC64, this change fixed the boot failure: https://reviews.freebsd.org/D32027. The problem was archsw not being initialized when mount was called. Maybe the issue with RISC-V is similar.
Comment 5 Klaus Küchemann 2021-09-20 19:30:32 UTC
well, this machine(HiFive unleashed) is very special in touching protected memory regions in s-mode ( https://reviews.freebsd.org/D28471 ) e.g. I can't reload dtb 
from loader prompt so that currently all "ofw-parameter"-changes have to be hacked directly in u-boot .
So my question is if this is a known RISC-V issue or related to memory violation only on this machine ( I don`t have another (QEMU-)riscv-instance available.

(In reply to Leandro Lupori from comment #4)
thanks for hint, 
since there currently seems no stand/riscv directory ,
do you have a hint which main.c-file  I could touch for your suggested change ?
Comment 6 Leandro Lupori freebsd_committer freebsd_triage 2021-09-21 16:17:01 UTC
(In reply to Klaus Küchemann from comment #5)
If RISC-V doesn't set archsw by itself, but uses common MI code instead, then it's probably a different issue.
Comment 7 Klaus Küchemann 2021-09-21 16:45:51 UTC
(In reply to Leandro Lupori from comment #6)
yeah, but 
I`ll try your approach to copy the archsw-block before defsw  in /stand/uboot/common/main.c tonight,
because the previous loader.efi worked .
(while aarch64 doesn't have problems with the new loader-changes)
the problem with this machine is :
Physical memory chunk(s):
  0x80000000 - 0x27fffffff,  8192 MB (2097152 pages)
Excluded memory regions:
  0x80000000 - 0x8001ffff,     0 MB (     32 pages) NoAlloc NoDump
  0xf6600000 - 0xf70ecfff,    10 MB (   2797 pages) NoAlloc 
Found 4 CPUs in the device tree
Copyright (c) 1992-2021 The FreeBSD Project.
----
real memory  = 8589934592 (8192 MB)
Physical memory chunk(s):
0x0000000080020000 - 0x00000000f65fffff, 1985871872 bytes (484832 pages)
0x00000000f70ed000 - 0x000000027348dfff, 6379147264 bytes (1557409 pages)
avail memory = 8337354752 (7951 MB)
-
every attempt(after u-boot) to access that regions results in panic but if the OS is once booted everything is O.K.
everything in u-boot is O.K. since opensbi tells u-boot to exclude these regions.
but it seems that no mechanism exists to tell loader.efi the protected memory parameters(while @mhorne made a patch for the booted OS as for opensbi) IIRC.
since I don`t know other code in the loader-changes that could trigger the relocation-issue I hope your failure analysis is also valid for riscv...thanks
Comment 8 Mitchell Horne freebsd_committer freebsd_triage 2021-09-22 16:40:00 UTC
Others have reported similar issues with netbooting, and I believe tsoome@ is looking into it. So the good news is that it is likely nothing related to the quirks of your Unleashed, or anything specific to riscv.
Comment 9 Toomas Soome freebsd_committer freebsd_triage 2021-09-22 16:54:26 UTC
(In reply to Mitchell Horne from comment #8)

Yes, I am investigating it.
Comment 10 Klaus Küchemann 2021-09-22 19:57:15 UTC
(In reply to Mitchell Horne from comment #8)
(In reply to Toomas Soome from comment #9)
thanks,
I hooked up some -j16 machines for  quicker world builds and for the moment 
I would say that it seems that Leandro Lupori had the right idea with moving  up the archsw-block( I did it in u-boot/main.c) ,
great, thanks  !!
before sending e.g. a patch or so and to be sure I will do some netboot-tests from u-boot  with also aarch64 with different world compilations while thinking about whether people could have had netboot-problems with other pxe-environments.
Comment 11 Klaus Küchemann 2021-09-22 23:14:05 UTC
hm ,a clean/untouched current git pull from tonight
that is cross-built by amd64 to riscv is also netbootable on riscv.

So it seems that cross-building from aarch64 to riscv and building on riscv itself caused the issue.
unless I overlooked a patch from you in current/head from today..

 if you would like me to do a few more tests, I am at your disposal..

but 27 minutes -j16 on amd64 compared to 13 hours on an overclocked hifive board gave me a bit for thinking :-), I won't do that again by my own decision, lol..(I don't like buildworld No_CLEAN ).

thanks all !
Comment 12 Mitchell Horne freebsd_committer freebsd_triage 2022-02-02 19:17:48 UTC
(In reply to Klaus Küchemann from comment #11)

So, is any further action needed at this time, or can we close this issue?
Comment 13 Klaus Küchemann 2022-02-02 22:13:49 UTC
(In reply to Mitchell Horne from comment #12)

do you think I should check this from scratch again?
if so, I'm happy to do that , it would take a day or so
(I'm asking because I may be the only one using this board and I'm currently not cross compiling from aarch64 to riscv no more).
If you think it's not worth the effort, you can close this.
Comment 14 Mitchell Horne freebsd_committer freebsd_triage 2022-02-03 15:40:22 UTC
(In reply to Klaus Küchemann from comment #13)

I found that net-booting riscv works for me with a recent main, and it seems this was the case for you in September.

I don't think we need to be too concerned about the cross-compilation from arm64 or riscv case, but if you are inclined to try it again and it does fail then please re-open the PR.
Comment 15 Klaus Küchemann 2022-02-03 15:51:32 UTC
(In reply to Mitchell Horne from comment #14)
installworld/aarch64(not riscv) failed with current src from today on my RPI4-
so, yes,  better to not confuse topics, closing this issue is O.K-thanks
and yes: netboot worked fine last week, also with your new openSBI 1.0, if 
cross-compiled from amd64 and netbooted from from RPI4