Bug 252891

Summary: risc-v : HiFive unleashed kernel entry panic: Memory access exception
Product: Base System Reporter: Klaus Küchemann <maciphone2>
Component: riscvAssignee: Mitchell Horne <mhorne>
Status: In Progress ---    
Severity: Affects Only Me CC: br, emaste, maciphone2, mhorne
Priority: ---    
Version: CURRENT   
Hardware: riscv   
OS: Any   

Description Klaus Küchemann 2021-01-21 17:12:55 UTC
tested in various boot-environments(OpenSBI/BBL/different u-boot-versions), also discussed in freebsd-riscv@freebsd.org :

OK boot -v
Loading kernel...
/boot/kernel/kernel text=0x58f35c text=0xeebec data=0xcd000 data=0x96c+0x66cec syms=[0x8+0xb1ca8+0x8+0xd4422]
Loading configured modules...
can't find '/boot/entropy'
can't find '/etc/hostid'
Using DTB provided by EFI at 0x87f00000.
Kernel entry at 0xf660002e...
Kernel args: -v
---<<BOOT>>---
KDB: debugger backends: ddb
KDB: current backend: ddb
Physical memory chunk(s):
  0x80000000 - 0x27fffffff,  8192 MB (2097152 pages)
Excluded memory regions:
  0x80000000 - 0x801fffff,     2 MB (    512 pages) NoAlloc NoDump
  0xf6600000 - 0xf6f63fff,     9 MB (   2404 pages) NoAlloc 
Found 4 CPUs in the device tree
Copyright (c) 1992-2021 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.0-ALPHA1 #4 5bd565855a9: Thu Jan 21 13:53:57 UTC 2021
    root@freebsd:/usr/obj/usr/riscv-src/freebsd-src/riscv.riscv64/sys/GENERIC-NODEBUG riscv
FreeBSD clang version 11.0.1 (git@github.com:llvm/llvm-project.git llvmorg-11.0.1-0-g43ff75f2c3fe)
t[0] == 0xffffffd1f358b000
t[1] == 0x000000000000000c
t[2] == 0xffffffc0007ab188
t[3] == 0xffffffd1f358b000
t[4] == 0x0000000000000000
t[5] == 0x0000000000000240
t[6] == 0x0000000000000000
s[0] == 0xffffffc000003b40
s[1] == 0xffffffd000800000
s[2] == 0x0000000000000101
s[3] == 0x0000000000000000
s[4] == 0xffffffc0005863b4
s[5] == 0x0000000000000002
s[6] == 0xffffffd076f642c0
s[7] == 0xffffffd076f64580
s[8] == 0x0000000000000000
s[9] == 0x0000000000000002
s[10] == 0x0000000000000002
s[11] == 0x0000000001000000
a[0] == 0xffffffd000800000
a[1] == 0x0000000000000000
a[2] == 0x0000000000000fff
a[3] == 0xffffffd000800000
a[4] == 0xffffffd000800001
a[5] == 0x00000000000000c0
a[6] == 0x0000000000000001
a[7] == 0x0000000080800000
ra == 0xffffffc0005864e0
sp == 0xffffffc000003b30
gp == 0x0000000000000001
tp == 0x000000000000000a
sepc == 0xffffffc000571dd6
sstatus == 0x8000000200006100
panic: Memory access exception at 0xffffffc000571dd6

cpuid = 0
time = 1
KDB: stack backtrace:
(null)() at 0xffffffc000575cb6
(null)() at 0xffffffc0001165f2
(null)() at 0xffffffc000326e6c
(null)() at 0xffffffc0002db8ea
(null)() at 0xffffffc0002db7a2
(null)() at 0xffffffc000585270
(null)() at 0xffffffc0005762e8
(null)() at 0xffffffc0005864dc
(null)() at 0xffffffc00053494e
(null)() at 0xffffffc00053602c
(null)() at 0xffffffc0005330ec
(null)() at 0xffffffc00053294c
(null)() at 0xffffffc00053184c
(null)() at 0xffffffc00053d490
(null)() at 0xffffffc00026a9b6
(null)() at 0xffffffc0000001ba
KDB: enter: panic
[ thread pid 0 tid 0 ]
Stopped at      0xffffffc000326c32
db> show reg
ra          0xffffffc000326c26
sp          0xffffffc000003920
gp          0xffffffc000003820
tp                        0x80
t0                           0
t1                           0
t2                           0
t3                           0
t4                           0
t5                           0
t6                           0
s0          0xffffffc000003940
s1          0xffffffc0005d6d49
s2          0xffffffc0000039a8
s3          0xffffffc0005e01d1
s4          0xffffffc000770380
s5                       0x100
s6                           0
s7          0xffffffd076f64580
s8                           0
s9                         0x2
s10                        0x2
s11                  0x1000000
a0          0xffffffc00074da78
a1                           0
a2          0xffffffc000003820
a3                         0xa
a4          0xffffffc7fffff000
a5                  0x80808080
a6          0xfefefefefefefeff
a7                         0x1
sepc        0xffffffc000326c32
sstatus     0x8000000200006100
stval                        0
scause                     0x3
0xffffffc000326c32
Comment 1 Klaus Küchemann 2021-01-21 17:41:46 UTC
happens even with the original : https://github.com/Microsemi-SoC-IP/HiFive_U-Boot

U-Boot 2018.09-g6f6e014 (Jan 20 2021 - 09:19:19 -0500)

DRAM:  2 GiB
MMC:   
In:    serial
Out:   serial
Err:   serial
Net:   gmac0
RISC-V # setenv ipaddr xxxxx
RISC-V # setenv serverip xxxxxxx
RISC-V #  bootp
gmac0: PHY present at 0
gmac0: Starting autonegotiation...
gmac0: Autonegotiation complete
gmac0: link up, 1000Mbps full-duplex (lpa: 0x3c00)
BOOTP broadcast 1
DHCP client bound to address xxxxxxxx (4 ms)
Using gmac0 device
TFTP from server xxxxxxx; our IP address is xxxxxxxx
Filename 'boot/loader.efi'.
Load address: 0x80000000
Loading: #################################################################
	 ################################
	 1.4 MiB/s
done
Bytes transferred = 1415772 (159a5c hex)
RISC-V # go 0x80000000
## Starting application at 0x80000000 ...
exception code: 2 , Illegal instruction , epc 80000002 , ra fffa3d72
exception code: 2 , Illegal instruction , epc 80000002 , ra fffa3d72
exception code: 2 , Illegal instruction , epc 80000002 , ra fffa3d72
exception code: 2 , Illegal instruction , epc 80000002 , ra fffa3d72
exception code: 2 , Illegal instruction , epc 80000002 , ra fffa3d72
...endless loop...
Comment 2 Mitchell Horne freebsd_committer 2021-01-21 19:21:55 UTC
(In reply to Klaus Küchemann from comment #1)

This second log in particular looks like a different issue, so I will focus on the original report for now.

With some previous discussion, it is believed that this is caused by a hardware erratum in the fu540, which is vaguely described here: https://github.com/riscv/opensbi/issues/103

In particular, the issue is believed to be triggered by FreeBSD's construction of the direct map, which maps the PMP protected area with a 1GB page mapping, triggering the access exception even though the area is never actually accessed by the kernel. I've wanted to address this issue with the dmap for a while, but it will take a bit of time to finish the patch.

kp@ has described the workaround he used for this issue last year, which is to disable the PMP protection of the SBI firmware area. I've once again tried to compile u-boot and OpenSBI with this workaround applied, in hopes that we can apply this solution for the time being.

If you can, please attempt to boot with the following u-boot binaries flashed to the SD card:
https://people.freebsd.org/~mhorne/tmp/u-boot-spl-test3.bin
https://people.freebsd.org/~mhorne/tmp/u-boot-test3.itb
Comment 3 Klaus Küchemann 2021-01-21 21:28:37 UTC
(In reply to Mitchell Horne from comment #2)

Really good job you are doing, thank you very much!


https://dmesgd.nycbug.org/index.cgi?do=view&id=5888
Comment 4 Mitchell Horne freebsd_committer 2021-01-22 13:34:14 UTC
(In reply to Klaus Küchemann from comment #3)

You're welcome, I'm glad it worked this time.

We'll keep the PR open, since it helps track the issue. I am prepping an update to the sysutils/opensbi port, and will make sure the workaround is included. I'll tag you in the review once it is posted.