Bug 273752

Summary: 6926e26: arm: Add support for using VFP in kernel broke my kernel
Product: Base System Reporter: crb <crb>
Component: armAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Some People CC: Andrew, bsd, markj, thoma555-bsd
Priority: --- Flags: markj: mfc-stable13-
Version: 14.0-STABLE   
Hardware: arm   
OS: Any   
Attachments:
Description Flags
Possible fix to 273752 none

Description crb 2023-09-13 06:24:09 UTC
It looks to me like commit 6926e26 broke my kernel with the log at the bottom.

I am trying to migrate to 14.0 in perpetration for the release.  I've been running a 13.0 kernel for a long time now that's been very stable for me.

I'm running on a Diligent ARTYZ7 board which is a AMD/Xilinx Zynq based board (ARMv7) very similar to the zibo or zedboard.

Happy to help debug this if someone can give some direction.


=======

Using DTB provided by EFI at 0x7ef8000.


Kernel entry at 0x17a00200...


Kernel args: (null)


WARNING: Cannot find freebsd,dts-version property, cannot check DTB compliance
---<<BOOT>>---
GDB: debug ports: uart
GDB: current port: uart
KDB: debugger backends: ddb gdb
KDB: current backend: ddb
Copyright (c) 1992-2023 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 14.0-CURRENT #28 n260570-6926e2699ae5-dirty: Tue Sep 12 23:04:05 PDT 2023
    crb@eclipse.ChrisBowman.com:/usr/obj/tmp/src/arm.armv7/sys/ARTYZ7 arm
FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c)
WARNING: WITNESS option enabled, expect reduced performance.
CPU: ARM Cortex-A9 r3p0 (ECO: 0x00000000)
CPU Features: 
  Multiprocessing, Thumb2, Security, VMSAv7, Coherent Walk
Optional instructions: 
  UMULL, SMULL, SIMD(ext)
LoUU:2 LoC:2 LoUIS:2 
Cache level 1:
 32KB/32B 4-way data cache WB Read-Alloc Write-Alloc
 32KB/32B 4-way instruction cache Read-Alloc
real memory  = 536866816 (511 MB)
avail memory = 511766528 (488 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
arc4random: WARNING: initial seeding bypassed the cryptographic random device because it was not yet seeded and the knob 'bypass_before_seeding' was enabled.
random: entropy device external interface
ofwbus0: <Open Firmware Device Tree>
regfix0: <Fixed Regulator> on ofwbus0
simplebus0: <Flattened device tree simple bus> on ofwbus0
simple_mfd0: <Simple MFD (Multi-Functions Device)> mem 0xf8000000-0xf8000fff on simplebus0
l2cache0: <PL310 L2 cache controller> mem 0xf8f02000-0xf8f02fff irq 8 on simplebus0
l2cache0: cannot allocate IRQ, not using interrupt
l2cache0: Part number: 0x3, release: 0x8
l2cache0: L2 Cache enabled: 512KB/32B 8 ways
gic0: <ARM Generic Interrupt Controller> mem 0xf8f01000-0xf8f01fff,0xf8f00100-0xf8f001ff on simplebus0
gic0: pn 0x39, arch 0x1, rev 0x2, implementer 0x43b irqs 96
mp_tmr0: <ARM MPCore Timers> mem 0xf8f00200-0xf8f0021f irq 29 on simplebus0
Timecounter "MPCore" frequency 50000000 Hz quality 800
mp_tmr1: <ARM MPCore Timers> mem 0xf8f00600-0xf8f0061f irq 36 on simplebus0
Event timer "MPCore" frequency 50000000 Hz quality 1000
cpulist0: <Open Firmware CPU Group> on ofwbus0
cpu0: <Open Firmware CPU> on cpulist0
cpu1: <Open Firmware CPU> on cpulist0
uart0: <Cadence UART> mem 0xe0000000-0xe0000fff irq 9 on simplebus0
uart0: console (-1,n,8,1)
zy7_qspi0: <Zynq Quad-SPI Flash Controller> mem 0xe000d000-0xe000dfff irq 13 on simplebus0
zy7_qspi0: must have ref-clock property
device_attach: zy7_qspi0 attach returned 6
cgem0: <Cadence CGEM Gigabit Ethernet Interface> mem 0xe000b000-0xe000bfff irq 14 on simplebus0
cgem0: could not retrieve reference clock.
miibus0: <MII bus> on cgem0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 0 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto
rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy1:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto
cgem0: Ethernet address: 02:1d:18:e6:2c:2d
sdhci_fdt0: <Zynq-7000 generic fdt SDHCI controller> mem 0xe0100000-0xe0100fff irq 17 on simplebus0
sdhci_fdt0: 1 slot(s) allocated
mmc0: <MMC/SD bus> on sdhci_fdt0
zy7_devcfg0: <Zynq devcfg block> mem 0xf8007000-0xf80070ff irq 28 on simplebus0
Timecounters tick every 1.000 msec
[nl_generic] genl_register_family: Registered family nlctrl id 16
[nl_generic] genl_register_cmds: Adding cmd GETFAMILY(3) to family nlctrl
panic: Storing an invalid VFP state
cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self
	 pc = 0xc04f4a10  lr = 0xc00720b4 (db_trace_self_wrapper+0x30)
	 sp = 0xc0a14af8  fp = 0xc0a14c10
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
	 pc = 0xc00720b4  lr = 0xc0202080 (vpanic+0x140)
	 sp = 0xc0a14c18  fp = 0xc0a14c38
	 r4 = 0x00000100  r5 = 0x00000000
	 r6 = 0xc054a726  r7 = 0xc07202a8
vpanic() at vpanic+0x140
	 pc = 0xc0202080  lr = 0xc0201e60 (dump_savectx)
	 sp = 0xc0a14c40  fp = 0xc0a14c44
	 r4 = 0x2015f010  r5 = 0xc06e8639
	 r6 = 0xcfc1e000  r7 = 0xcf0b7e90
	 r8 = 0xc0793880  r9 = 0xc0a050c0
	r10 = 0xc070fd00
dump_savectx() at dump_savectx
	 pc = 0xc0201e60  lr = 0xc051890c (fpu_kern_enter)
	 sp = 0xc0a14c4c  fp = 0xc0a14c58
fpu_kern_enter() at fpu_kern_enter
	 pc = 0xc051890c  lr = 0xc0515cc8 (cpu_switch+0x5c)
	 sp = 0xc0a14c60  fp = 0xcfc1e000
	 r4 = 0xc051890c  r5 = 0xc0a14e90
	 r6 = 0x00000000 r10 = 0xc0793880
cpu_switch() at cpu_switch+0x5c
	 pc = 0xc0515cc8  lr = 0xc0515cc8 (cpu_switch+0x5c)
	 sp = 0xc0a14c60  fp = 0xcfc1e000
Unwind failure (no registers changed)
KDB: enter: panic
[ thread pid 0 tid 100000 ]
Stopped at      kdb_enter+0x54: ldrb    r15, [r15, r15, ror r15]!
db>
Comment 1 Thomas Skibo 2023-09-13 16:15:15 UTC
I have been looking at this.  The problem seems to be the code in sys/arm/arm/vfp.c expects the floating-point unit to be disabled on boot and expects the first floating-point operation to trap (?).  Zynq chips boot with VFP enabled.  If I disable the VFP by pausing in u-boot and writing 0 to the fpexc register (using Xilinx debugging tools), the kernel boots.

I'm surprised this hasn't tripped up other armv7 boards.
Comment 2 crb 2023-09-13 18:48:14 UTC
Thomas,
This is super helpful, thank you!
I'm not sure there is a good way to get u-boot to write 0 to the fpexc register but maybe I could hack that in and it would get me running at least.  Is there a way to make FreeBSD not make the assumption or handle correctly the case where it's wrong?  What do you think is the right way to fix this?
Thanks,
Christopher
Comment 3 Thomas Skibo 2023-09-14 21:14:26 UTC
(In reply to crb from comment #2)

The arm64 version of vfp.c disables the vfp with a call to vfp_disable() in vfp_init() with the comment "/* Disable to be enabled when it's used */".

Doing the same in the armv7 version fixes the problem but I am concerned that vfp_disable() would be called again for each CPU when it appears the fpexc register is not a per-CPU register.  In other words, I am not confident that I know how this is supposed to work.
Comment 4 crb 2023-09-15 05:07:07 UTC
Thomas,
Again, thank you for looking at all this, I'm not sure I would have figured it out on my own.  I wonder if u-boot for other ARMv7 board isn't turning this off before handing over to the kernel?  Is there any reason not to simply clear this bit in lowcore.s or some other similar early boot place?
Thanks again,
Christopher
Comment 5 Andrew Turner freebsd_committer freebsd_triage 2023-09-15 18:52:38 UTC
(In reply to Thomas Skibo from comment #3)
Why do you think it's not per-CPU?

vfp_disable will need to work per-CPU otherwise the current VFP handling will be broken as we disable the VFP in the same way at the end of vfp_store.
Comment 6 Thomas Skibo 2023-09-15 21:22:06 UTC
Created attachment 244904 [details]
Possible fix to 273752
Comment 7 Thomas Skibo 2023-09-15 21:26:37 UTC
(In reply to Andrew Turner from comment #5)
I think I was using the Xilinx debug tool (xsdb) incorrectly and it led me to believe fpexc only existed on CPU #0.  I posted a possible fix in which vfp_disable() is called in vfp_init() for each CPU.  Let me know what you think.
Comment 8 bsd 2023-09-16 05:17:08 UTC
(In reply to Thomas Skibo from comment #6)
For me, this patch seems to solve the issue, I can boot now with kernel based on sources from September 1st, 2023 on Zybo Z7 board. Thanks.
Comment 9 crb 2023-09-17 02:27:34 UTC
This also allows me to build and boot a recent kernel.
Thank you so much Thomas!
Comment 10 crb 2023-09-22 04:01:35 UTC
Is there any possibility we can get this in for 14.0?

14 may be the last version that supports 32 machines (at least if my kernel warning is to be believed) and it would be nice to have that last release working on 32 machines before we abondon them.

Thanks
Christopher
Comment 11 Mark Johnston freebsd_committer freebsd_triage 2023-10-02 12:20:02 UTC
(In reply to Andrew Turner from comment #5)
Do you see any reason not to commit the patch from comment 6?
Comment 12 Andrew Turner freebsd_committer freebsd_triage 2023-10-02 15:21:15 UTC
The patch looks correct so should be committed.
Comment 13 Mark Johnston freebsd_committer freebsd_triage 2023-10-02 16:02:32 UTC
(In reply to Thomas Skibo from comment #7)
In the absence of a commit message, I wrote one.  If this looks ok to you I'll push to main:

commit 4c93213d5345306810a5f2ac272cd8a2703684af (HEAD -> main)
Author: Thomas Skibo <thomas-bsd@skibo.net>
Date:   Mon Oct 2 11:58:54 2023 -0400

    arm: Disable the VFP during boot
    
    The VFP code expects the kernel to boot with VFP disabled, but some
    boards will boot with it enabled.  Make sure that vfp_init() disables
    the VFP on each CPU during boot.
    
    PR:             273752
    Reviewed by:    andrew, markj
    MFC after:      1 week

diff --git a/sys/arm/arm/vfp.c b/sys/arm/arm/vfp.c
index 40a3491c1cf9..d043f89a7e34 100644
--- a/sys/arm/arm/vfp.c
+++ b/sys/arm/arm/vfp.c
@@ -180,6 +180,9 @@ vfp_init(void)
                                elf_hwcap |= HWCAP_VFPv4;
                }
 
+               /* Disable to be enabled when it's used */
+               vfp_disable();
+
                /* initialize the coprocess 10 and 11 calls
                 * These are called to restore the registers and enable
                 * the VFP hardware.
Comment 14 commit-hook freebsd_committer freebsd_triage 2023-12-11 14:16:43 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ce2f34ade8b787b068085fa8a8ddd295b06c2737

commit ce2f34ade8b787b068085fa8a8ddd295b06c2737
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-12-11 14:08:49 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-12-11 14:08:49 +0000

    arm: Disable the VFP during boot

    The VFP code expects the kernel to boot with VFP disabled, but some
    boards will boot with it enabled.  Make sure that vfp_init() disables
    the VFP on each CPU during boot.

    PR:             273752
    Reviewed by:    andrew
    Diagnosed by:   Thomas Skibo <thomas-bsd@skibo.net>
    MFC after:      1 week

 sys/arm/arm/vfp.c | 2 ++
 1 file changed, 2 insertions(+)
Comment 15 commit-hook freebsd_committer freebsd_triage 2023-12-18 02:12:16 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f5ae760cfe2658d79f12c1b7ac9dc577379e5d1c

commit f5ae760cfe2658d79f12c1b7ac9dc577379e5d1c
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2023-12-11 14:08:49 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2023-12-18 02:07:49 +0000

    arm: Disable the VFP during boot

    The VFP code expects the kernel to boot with VFP disabled, but some
    boards will boot with it enabled.  Make sure that vfp_init() disables
    the VFP on each CPU during boot.

    PR:             273752
    Reviewed by:    andrew
    Diagnosed by:   Thomas Skibo <thomas-bsd@skibo.net>
    MFC after:      1 week

    (cherry picked from commit ce2f34ade8b787b068085fa8a8ddd295b06c2737)

 sys/arm/arm/vfp.c | 2 ++
 1 file changed, 2 insertions(+)
Comment 16 Mark Linimon freebsd_committer freebsd_triage 2024-01-14 00:00:04 UTC
^Triage: assign to committer that resolved and fiddle with metadata.
Comment 17 Mark Johnston freebsd_committer freebsd_triage 2024-01-14 00:02:23 UTC
I believe there's nothing left to commit here, please re-open if I'm mistaken.