It looks to me like commit 6926e26 broke my kernel with the log at the bottom. I am trying to migrate to 14.0 in perpetration for the release. I've been running a 13.0 kernel for a long time now that's been very stable for me. I'm running on a Diligent ARTYZ7 board which is a AMD/Xilinx Zynq based board (ARMv7) very similar to the zibo or zedboard. Happy to help debug this if someone can give some direction. ======= Using DTB provided by EFI at 0x7ef8000. Kernel entry at 0x17a00200... Kernel args: (null) WARNING: Cannot find freebsd,dts-version property, cannot check DTB compliance ---<<BOOT>>--- GDB: debug ports: uart GDB: current port: uart KDB: debugger backends: ddb gdb KDB: current backend: ddb Copyright (c) 1992-2023 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 14.0-CURRENT #28 n260570-6926e2699ae5-dirty: Tue Sep 12 23:04:05 PDT 2023 crb@eclipse.ChrisBowman.com:/usr/obj/tmp/src/arm.armv7/sys/ARTYZ7 arm FreeBSD clang version 14.0.5 (https://github.com/llvm/llvm-project.git llvmorg-14.0.5-0-gc12386ae247c) WARNING: WITNESS option enabled, expect reduced performance. CPU: ARM Cortex-A9 r3p0 (ECO: 0x00000000) CPU Features: Multiprocessing, Thumb2, Security, VMSAv7, Coherent Walk Optional instructions: UMULL, SMULL, SIMD(ext) LoUU:2 LoC:2 LoUIS:2 Cache level 1: 32KB/32B 4-way data cache WB Read-Alloc Write-Alloc 32KB/32B 4-way instruction cache Read-Alloc real memory = 536866816 (511 MB) avail memory = 511766528 (488 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs arc4random: WARNING: initial seeding bypassed the cryptographic random device because it was not yet seeded and the knob 'bypass_before_seeding' was enabled. random: entropy device external interface ofwbus0: <Open Firmware Device Tree> regfix0: <Fixed Regulator> on ofwbus0 simplebus0: <Flattened device tree simple bus> on ofwbus0 simple_mfd0: <Simple MFD (Multi-Functions Device)> mem 0xf8000000-0xf8000fff on simplebus0 l2cache0: <PL310 L2 cache controller> mem 0xf8f02000-0xf8f02fff irq 8 on simplebus0 l2cache0: cannot allocate IRQ, not using interrupt l2cache0: Part number: 0x3, release: 0x8 l2cache0: L2 Cache enabled: 512KB/32B 8 ways gic0: <ARM Generic Interrupt Controller> mem 0xf8f01000-0xf8f01fff,0xf8f00100-0xf8f001ff on simplebus0 gic0: pn 0x39, arch 0x1, rev 0x2, implementer 0x43b irqs 96 mp_tmr0: <ARM MPCore Timers> mem 0xf8f00200-0xf8f0021f irq 29 on simplebus0 Timecounter "MPCore" frequency 50000000 Hz quality 800 mp_tmr1: <ARM MPCore Timers> mem 0xf8f00600-0xf8f0061f irq 36 on simplebus0 Event timer "MPCore" frequency 50000000 Hz quality 1000 cpulist0: <Open Firmware CPU Group> on ofwbus0 cpu0: <Open Firmware CPU> on cpulist0 cpu1: <Open Firmware CPU> on cpulist0 uart0: <Cadence UART> mem 0xe0000000-0xe0000fff irq 9 on simplebus0 uart0: console (-1,n,8,1) zy7_qspi0: <Zynq Quad-SPI Flash Controller> mem 0xe000d000-0xe000dfff irq 13 on simplebus0 zy7_qspi0: must have ref-clock property device_attach: zy7_qspi0 attach returned 6 cgem0: <Cadence CGEM Gigabit Ethernet Interface> mem 0xe000b000-0xe000bfff irq 14 on simplebus0 cgem0: could not retrieve reference clock. miibus0: <MII bus> on cgem0 rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 0 on miibus0 rgephy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto rgephy1: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0 rgephy1: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT-FDX, 1000baseT-FDX-master, auto cgem0: Ethernet address: 02:1d:18:e6:2c:2d sdhci_fdt0: <Zynq-7000 generic fdt SDHCI controller> mem 0xe0100000-0xe0100fff irq 17 on simplebus0 sdhci_fdt0: 1 slot(s) allocated mmc0: <MMC/SD bus> on sdhci_fdt0 zy7_devcfg0: <Zynq devcfg block> mem 0xf8007000-0xf80070ff irq 28 on simplebus0 Timecounters tick every 1.000 msec [nl_generic] genl_register_family: Registered family nlctrl id 16 [nl_generic] genl_register_cmds: Adding cmd GETFAMILY(3) to family nlctrl panic: Storing an invalid VFP state cpuid = 0 time = 1 KDB: stack backtrace: db_trace_self() at db_trace_self pc = 0xc04f4a10 lr = 0xc00720b4 (db_trace_self_wrapper+0x30) sp = 0xc0a14af8 fp = 0xc0a14c10 db_trace_self_wrapper() at db_trace_self_wrapper+0x30 pc = 0xc00720b4 lr = 0xc0202080 (vpanic+0x140) sp = 0xc0a14c18 fp = 0xc0a14c38 r4 = 0x00000100 r5 = 0x00000000 r6 = 0xc054a726 r7 = 0xc07202a8 vpanic() at vpanic+0x140 pc = 0xc0202080 lr = 0xc0201e60 (dump_savectx) sp = 0xc0a14c40 fp = 0xc0a14c44 r4 = 0x2015f010 r5 = 0xc06e8639 r6 = 0xcfc1e000 r7 = 0xcf0b7e90 r8 = 0xc0793880 r9 = 0xc0a050c0 r10 = 0xc070fd00 dump_savectx() at dump_savectx pc = 0xc0201e60 lr = 0xc051890c (fpu_kern_enter) sp = 0xc0a14c4c fp = 0xc0a14c58 fpu_kern_enter() at fpu_kern_enter pc = 0xc051890c lr = 0xc0515cc8 (cpu_switch+0x5c) sp = 0xc0a14c60 fp = 0xcfc1e000 r4 = 0xc051890c r5 = 0xc0a14e90 r6 = 0x00000000 r10 = 0xc0793880 cpu_switch() at cpu_switch+0x5c pc = 0xc0515cc8 lr = 0xc0515cc8 (cpu_switch+0x5c) sp = 0xc0a14c60 fp = 0xcfc1e000 Unwind failure (no registers changed) KDB: enter: panic [ thread pid 0 tid 100000 ] Stopped at kdb_enter+0x54: ldrb r15, [r15, r15, ror r15]! db>
I have been looking at this. The problem seems to be the code in sys/arm/arm/vfp.c expects the floating-point unit to be disabled on boot and expects the first floating-point operation to trap (?). Zynq chips boot with VFP enabled. If I disable the VFP by pausing in u-boot and writing 0 to the fpexc register (using Xilinx debugging tools), the kernel boots. I'm surprised this hasn't tripped up other armv7 boards.
Thomas, This is super helpful, thank you! I'm not sure there is a good way to get u-boot to write 0 to the fpexc register but maybe I could hack that in and it would get me running at least. Is there a way to make FreeBSD not make the assumption or handle correctly the case where it's wrong? What do you think is the right way to fix this? Thanks, Christopher
(In reply to crb from comment #2) The arm64 version of vfp.c disables the vfp with a call to vfp_disable() in vfp_init() with the comment "/* Disable to be enabled when it's used */". Doing the same in the armv7 version fixes the problem but I am concerned that vfp_disable() would be called again for each CPU when it appears the fpexc register is not a per-CPU register. In other words, I am not confident that I know how this is supposed to work.
Thomas, Again, thank you for looking at all this, I'm not sure I would have figured it out on my own. I wonder if u-boot for other ARMv7 board isn't turning this off before handing over to the kernel? Is there any reason not to simply clear this bit in lowcore.s or some other similar early boot place? Thanks again, Christopher
(In reply to Thomas Skibo from comment #3) Why do you think it's not per-CPU? vfp_disable will need to work per-CPU otherwise the current VFP handling will be broken as we disable the VFP in the same way at the end of vfp_store.
Created attachment 244904 [details] Possible fix to 273752
(In reply to Andrew Turner from comment #5) I think I was using the Xilinx debug tool (xsdb) incorrectly and it led me to believe fpexc only existed on CPU #0. I posted a possible fix in which vfp_disable() is called in vfp_init() for each CPU. Let me know what you think.
(In reply to Thomas Skibo from comment #6) For me, this patch seems to solve the issue, I can boot now with kernel based on sources from September 1st, 2023 on Zybo Z7 board. Thanks.
This also allows me to build and boot a recent kernel. Thank you so much Thomas!
Is there any possibility we can get this in for 14.0? 14 may be the last version that supports 32 machines (at least if my kernel warning is to be believed) and it would be nice to have that last release working on 32 machines before we abondon them. Thanks Christopher
(In reply to Andrew Turner from comment #5) Do you see any reason not to commit the patch from comment 6?
The patch looks correct so should be committed.
(In reply to Thomas Skibo from comment #7) In the absence of a commit message, I wrote one. If this looks ok to you I'll push to main: commit 4c93213d5345306810a5f2ac272cd8a2703684af (HEAD -> main) Author: Thomas Skibo <thomas-bsd@skibo.net> Date: Mon Oct 2 11:58:54 2023 -0400 arm: Disable the VFP during boot The VFP code expects the kernel to boot with VFP disabled, but some boards will boot with it enabled. Make sure that vfp_init() disables the VFP on each CPU during boot. PR: 273752 Reviewed by: andrew, markj MFC after: 1 week diff --git a/sys/arm/arm/vfp.c b/sys/arm/arm/vfp.c index 40a3491c1cf9..d043f89a7e34 100644 --- a/sys/arm/arm/vfp.c +++ b/sys/arm/arm/vfp.c @@ -180,6 +180,9 @@ vfp_init(void) elf_hwcap |= HWCAP_VFPv4; } + /* Disable to be enabled when it's used */ + vfp_disable(); + /* initialize the coprocess 10 and 11 calls * These are called to restore the registers and enable * the VFP hardware.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ce2f34ade8b787b068085fa8a8ddd295b06c2737 commit ce2f34ade8b787b068085fa8a8ddd295b06c2737 Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2023-12-11 14:08:49 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-12-11 14:08:49 +0000 arm: Disable the VFP during boot The VFP code expects the kernel to boot with VFP disabled, but some boards will boot with it enabled. Make sure that vfp_init() disables the VFP on each CPU during boot. PR: 273752 Reviewed by: andrew Diagnosed by: Thomas Skibo <thomas-bsd@skibo.net> MFC after: 1 week sys/arm/arm/vfp.c | 2 ++ 1 file changed, 2 insertions(+)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f5ae760cfe2658d79f12c1b7ac9dc577379e5d1c commit f5ae760cfe2658d79f12c1b7ac9dc577379e5d1c Author: Mark Johnston <markj@FreeBSD.org> AuthorDate: 2023-12-11 14:08:49 +0000 Commit: Mark Johnston <markj@FreeBSD.org> CommitDate: 2023-12-18 02:07:49 +0000 arm: Disable the VFP during boot The VFP code expects the kernel to boot with VFP disabled, but some boards will boot with it enabled. Make sure that vfp_init() disables the VFP on each CPU during boot. PR: 273752 Reviewed by: andrew Diagnosed by: Thomas Skibo <thomas-bsd@skibo.net> MFC after: 1 week (cherry picked from commit ce2f34ade8b787b068085fa8a8ddd295b06c2737) sys/arm/arm/vfp.c | 2 ++ 1 file changed, 2 insertions(+)
^Triage: assign to committer that resolved and fiddle with metadata.
I believe there's nothing left to commit here, please re-open if I'm mistaken.