Using the latest package (same behaviour on the version from the stable branch) on AArch64: ``` $ uname -a FreeBSD freebsd 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC arm64 $ pkg info git git-2.41.0 Name : git Version : 2.41.0 Installed on : Sat Jun 24 12:14:59 2023 UTC Origin : devel/git Architecture : FreeBSD:13:aarch64 Prefix : /usr/local Categories : devel Licenses : GPLv2 Maintainer : garga@FreeBSD.org WWW : https://git-scm.com/ Comment : Distributed source code management tool Options : CONTRIB : on CURL : on GITWEB : on HTMLDOCS : off ICONV : on NLS : on PCRE2 : on PERL : on SEND_EMAIL : on SUBTREE : on Shared Libs required: libpcre2-8.so.0 libintl.so.8 libexpat.so.1 libcurl.so.4 Annotations : FreeBSD_version: 1301000 cpe : cpe:2.3:a:git-scm:git:2.41.0:::::freebsd13:aarch64 flavor : default repo_type : binary repository : FreeBSD Flat size : 33.4MiB Description : Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. WWW: https://git-scm.com/ $ git Bus error (core dumped) $ lldb git (lldb) target create "git" Current executable set to 'git' (aarch64). (lldb) r Process 6542 launched: '/usr/local/bin/git' (aarch64) This version of LLDB has no plugin for the language "assembler". Inspection of frame variables will be limited. Process 6542 stopped * thread #1, name = 'git', stop reason = signal SIGBUS: hardware error frame #0: 0x00003dcda641e08c ld-elf.so.1`memset at memset.S:136 (lldb) bt * thread #1, name = 'git', stop reason = signal SIGBUS: hardware error * frame #0: 0x00003dcda641e08c ld-elf.so.1`memset at memset.S:136 frame #1: 0x00003dcda64188f8 ld-elf.so.1`map_object(fd=3, path="/usr/local/lib/libpcre2-8.so.0", sb=0x00000000811450e0) at map_object.c:262:3 frame #2: 0x00003dcda64133f8 ld-elf.so.1`load_object [inlined] do_load_object(fd=3, name="libpcre2-8.so.0", path=<unavailable>, sbp=0x00000000811450e0, flags=0) at rtld.c:2833:11 frame #3: 0x00003dcda64133a8 ld-elf.so.1`load_object(name="libpcre2-8.so.0", fd_u=<unavailable>, refobj=<unavailable>, flags=0) at rtld.c:2805:11 frame #4: 0x00003dcda640d200 ld-elf.so.1`_rtld [inlined] process_needed(obj=0x00000000820ac008, needed=0x00000000820a9028, flags=0) at rtld.c:2638:23 frame #5: 0x00003dcda640d1e0 ld-elf.so.1`_rtld [inlined] load_needed_objects(first=<unavailable>, flags=0) at rtld.c:2659:6 frame #6: 0x00003dcda640d1c0 ld-elf.so.1`_rtld(sp=<unavailable>, exit_proc=0x0000000081146cc0, objp=<unavailable>) at rtld.c:861:9 frame #7: 0x00003dcda640b0d8 ld-elf.so.1`.rtld_start at rtld_start.S:41 (lldb) disas ld-elf.so.1`memset: 0x3dcda641e000 <+0>: dup v0.16b, w1 0x3dcda641e004 <+4>: add x4, x0, x2 0x3dcda641e008 <+8>: cmp x2, #0x60 0x3dcda641e00c <+12>: b.hi 0x3dcda641e084 ; <+132> 0x3dcda641e010 <+16>: cmp x2, #0x10 0x3dcda641e014 <+20>: b.hs 0x3dcda641e054 ; <+84> 0x3dcda641e018 <+24>: mov x1, v0.d[0] 0x3dcda641e01c <+28>: tbz w2, #0x3, 0x3dcda641e030 ; <+48> 0x3dcda641e020 <+32>: str x1, [x0] 0x3dcda641e024 <+36>: stur x1, [x4, #-0x8] 0x3dcda641e028 <+40>: ret 0x3dcda641e02c <+44>: nop 0x3dcda641e030 <+48>: tbz w2, #0x2, 0x3dcda641e040 ; <+64> 0x3dcda641e034 <+52>: str w1, [x0] 0x3dcda641e038 <+56>: stur w1, [x4, #-0x4] 0x3dcda641e03c <+60>: ret 0x3dcda641e040 <+64>: cbz x2, 0x3dcda641e050 ; <+80> 0x3dcda641e044 <+68>: strb w1, [x0] 0x3dcda641e048 <+72>: tbz w2, #0x1, 0x3dcda641e050 ; <+80> 0x3dcda641e04c <+76>: sturh w1, [x4, #-0x2] 0x3dcda641e050 <+80>: ret 0x3dcda641e054 <+84>: str q0, [x0] 0x3dcda641e058 <+88>: tbnz w2, #0x6, 0x3dcda641e070 ; <+112> 0x3dcda641e05c <+92>: stur q0, [x4, #-0x10] 0x3dcda641e060 <+96>: tbz w2, #0x5, 0x3dcda641e06c ; <+108> 0x3dcda641e064 <+100>: str q0, [x0, #0x10] 0x3dcda641e068 <+104>: stur q0, [x4, #-0x20] 0x3dcda641e06c <+108>: ret 0x3dcda641e070 <+112>: str q0, [x0, #0x10] 0x3dcda641e074 <+116>: stp q0, q0, [x0, #0x20] 0x3dcda641e078 <+120>: stp q0, q0, [x4, #-0x20] 0x3dcda641e07c <+124>: ret 0x3dcda641e080 <+128>: nop 0x3dcda641e084 <+132>: and w1, w1, #0xff 0x3dcda641e088 <+136>: and x3, x0, #0xfffffffffffffff0 -> 0x3dcda641e08c <+140>: str q0, [x0] 0x3dcda641e090 <+144>: cmp x2, #0x100 (lldb) register read x0 x0 = 0x0000000082c1ea40 $ procstat -v 6542 PID START END PRT RES PRES REF SHD FLAG TP PATH 6542 0x200000 0x2b3000 r-- 179 840 5 1 CN--- vn /usr/local/bin/git 6542 0x2c2000 0x53b000 r-x 633 840 5 1 CN--- vn /usr/local/bin/git 6542 0x54a000 0x54b000 rw- 1 0 1 0 C---- vn /usr/local/bin/git 6542 0x55a000 0x56b000 rw- 17 840 5 1 CN--- vn /usr/local/bin/git 6542 0x56b000 0x592000 rw- 1 1 1 0 ----- df 6542 0x41148000 0x81128000 --- 0 0 0 0 ----- gd 6542 0x81128000 0x81148000 rw- 4 4 1 0 ---D- df 6542 0x820a9000 0x820ca000 rw- 7 7 1 0 ----- df 6542 0x82b48000 0x82b70000 r-- 8 8 5 1 CN--- vn /usr/local/lib/libpcre2-8.so.0.11.2 6542 0x82b70000 0x82b7f000 --- 0 0 0 0 CN--- gd 6542 0x82b7f000 0x82bff000 r-x 0 8 5 1 CN--- vn /usr/local/lib/libpcre2-8.so.0.11.2 6542 0x82bff000 0x82c0e000 --- 0 0 0 0 CN--- gd 6542 0x82c0e000 0x82c0f000 rw- 0 8 5 1 CN--- vn /usr/local/lib/libpcre2-8.so.0.11.2 6542 0x82c0f000 0x82c1e000 --- 0 0 0 0 CN--- gd 6542 0x82c1e000 0x82c1f000 rw- 0 0 1 0 C---- vn /usr/local/lib/libpcre2-8.so.0.11.2 6542 0x83aca000 0x83acb000 r-- 1 8 5 1 CN--- vn /usr/local/lib/libpcre2-8.so.0.11.2 6542 0x3dcda63f5000 0x3dcda63fc000 r-- 7 28 109 51 CN--- vn /libexec/ld-elf.so.1 6542 0x3dcda640b000 0x3dcda6420000 r-x 21 0 1 0 C---- vn /libexec/ld-elf.so.1 6542 0x3dcda642f000 0x3dcda6430000 r-- 1 0 1 0 C---- vn /libexec/ld-elf.so.1 6542 0x3dcda643f000 0x3dcda6440000 rw- 1 0 1 0 C---- vn /libexec/ld-elf.so.1 6542 0x3dcda6440000 0x3dcda6441000 rw- 1 1 1 0 ----- df 6542 0xfffffffff000 0x1000000000000 r-x 1 1 32 0 ----- ph ``` I don't believe this is a bug in git itself, since it appears to be triggered before any user code runs. If I'm reading the disassembly correctly, it's slightly dubious that the str instruction appears to be using the same register as the address the address as the value stored. This appears to be from the Linaro string routines, which are [unchanged in CURRENT](https://github.com/freebsd/freebsd-src/blob/main/contrib/arm-optimized-routines/string/aarch64/memset.S#L55). I am probably missing some understanding of Arm assembly here, but it at least looks like a store that shouldn't fault. The memset appears to be faulting writing into a region that is mapped read-write and the address is strongly aligned and so I'm not sure what's causing the bus error. This is on QEMU with Hypervisor.framework on a M2 MacBook Pro (virtualised AArch64).
I have also tested with hypervisor support disabled (QEMU in pure emulation mode, which is much slower) and can confirm that the problem persists.
I'm not able to reproduce this on 14-CURRENT, 13-STABLE, or 13.2-RELEASE. The store instruction is using q0 as the data, and x0 as the address. These registers don't alias. There are a few cases where the kernel can raise a SIGBUS that are not directly from the trap, e.g. vm_fault returns KERN_RESOURCE_SHORTAGE or KERN_OUT_OF_BOUNDS.
(In reply to Andrew Turner from comment #2) > I'm not able to reproduce this on 14-CURRENT, 13-STABLE, or 13.2-RELEASE. For me, it is 100% deterministic on 13.2-RELEASE (with and without running FreeBSD update). > The store instruction is using q0 as the data, and x0 as the address. Yup, Renato showed me that, it looks as if this spelling of the Neon store is barely documented. > These registers don't alias. There are a few cases where the kernel can raise a SIGBUS that are not directly from the trap, e.g. vm_fault returns KERN_RESOURCE_SHORTAGE or KERN_OUT_OF_BOUNDS. This made me wonder if the problem was the VirtIO balloon driver responding too slowly, but disabling the balloon driver doesn't fix it. A clean boot with minimal things running (sshd is about the only thing) shows 31G free in top and still crashes at this point. I also wondered if it was a problem with the wrong flavour of memory, so I tried reducing the total memory in the VM from 32 GiB to 768 MiB but that didn't make a difference. What would be helpful for me to try? I now have sources installed, which makes the debugger a bit more useful (though rtld is somewhat too optimised for it to be very useful). The fault in rtld is in map_objects.c:262 (the call to memset in the middle of BSS setup). I vaguely remember from some snmalloc debugging a while ago that libpcre2 hits some rtld paths on x86 that almost nothing else (except tcl?) does. The value of mapbase at that point (according to lldb) is 0x000000008178c000, which looks sensible (the base address of /usr/local/lib/libpcre2-8.so.0.11.2), but unfortunately clear_vaddr is optimised away. The procstat output for the region containing the address has C but not N in the flags, which I believe means that it is a CoW page that has already been copied (REF is 1, which supports this?). I'm not sure if this was the first write to that page, if so then it looks as if the CoW is proceeding correctly but we're then receiving a signal on the way back anyway. Setting a breakpoint in rtld doesn't seem to work here for me (maybe lldb is using hardware breakpoints and they are not working in the VM? Switching to emulation mode doesn't make them work either, so maybe they just don't work on AArch64?). Not sure if there's anything useful in these bits of dmesg that might help isolate anything CPU-model specific: ``` CPU 0: Unknown Implementer (midr: 00000000) affinity: 0 Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG> Instruction Set Attributes 0 = <TLBI-OSR,CondM-8.5,FHM,DP,SHA3,RDM,Atomic,CRC32,SHA2+SHA512,SHA1,AES+PMULL> Instruction Set Attributes 1 = <PredInv,SB,FRINTTS,GPI,RCPC-8.4,FCMA,JSCVT,Impl PAuth+FPAC,DCCVADP> Instruction Set Attributes 2 = <> Processor Features 0 = <CSV3,CSV2,PSTATE.DIT,RAS,AdvSIMD+HP,FP+HP,EL1,EL0> Processor Features 1 = <> Memory Model Features 0 = <ExS,TGran4,TGran16,8bit ASID,4TB PA,0x100000000000000> Memory Model Features 1 = <XNX,SpecSEI,PAN+ATS1E1,LO,HPD+TTPBHA,8bit VMID> Memory Model Features 2 = <E0PD,TTL,IDS,AT,32bit CCIDX,48bit VA,IESB,UAO,CnP> Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,Debugv8> Debug Features 1 = <> Auxiliary Features 0 = <> Auxiliary Features 1 = <> ``` I tried replacing the memset in rtld with a trivial C one. This still faults in roughly the same place. I added a debug printf in front of the memset and see (ASLR changes the exact numbers each run): ``` Clearing 1848 bytes from 0x56b8c8 Clearing 1472 bytes from 0x20428e53ca40 ``` And then the crash. When I run clang, I see that this code path is hit multiple times. This address is in the same places as before (from procstat -v): ``` 41103 0x20428e53c000 0x20428e53d000 rw- 0 0 1 0 C---- vn /usr/local/lib/libpcre2-8.so.0.11.2 ``` With the C version, I can see that it is faulting on the *first* byte write. Not sure what's happening here.
Can you print the signal info? In gdb it's in $_siginfo, I'm not sure what it is in lldb. There is some extra information in it that will help narrow down what is causing the SIGBUS, e.g. trapno has the exception number.
(In reply to Andrew Turner from comment #4) Unfortunately, gdb crashes on start with the same error while loading libiconv. I can see some commits for LLDB to support siginfo but I can't figure out how it is exposed in the UI. Ktrace shows: ``` 1489 git CALL mmap(0x83057000,0x1000,0x3<PROT_READ|PROT_WRITE>,0x40012<MAP_PRIVATE|MAP_FIXED|MAP_PREFAULT_READ>,0x3,0xa6000) 1489 git RET mmap 2198171648/0x83057000 1489 git PFLT 0x83057a40 0x2<VM_PROT_WRITE> 1489 git PRET KERN_OUT_OF_BOUNDS 1489 git PSIG SIGBUS SIG_DFL code=BUS_OBJERR ``` Not sure if that helps? It looks like the page fault is failing for some reason (not sure what KERN_OUT_OF_BOUNDS means in this context).
It looks like vm_fault can return KERN_OUT_OF_BOUNDS in a few places. It looks like it could be from vm_fault_allocate or vm_fault_getpages. Was there anything printed to the console? In the vm_fault_getpages case it could print "vm_fault: pager read error, pid <pid> (<proc name>)" in one of the failure cases that would lead to an out of bounds error.
(In reply to Andrew Turner from comment #6) Nothing in the console or dmesg.
I don’t mind trashing this VM and it’s very fast to build a kernel in it, so if there are places in the kernel that you want me to stick some printfs, let me know where they are. I’ll try upgrading to CURRENT soon as well.
Created attachment 243072 [details] Extra debugging for releng/13.2 Something like the attached patch.
Testing with the 20230622 -CURRENT VM image does not reproduce this failure. It seems to have been fixed in something that has not been MFC'd.
With that patch on 13.2-RELEASE, I see this in my dmesg / console: ``` vm_fault_allocate:1152 FAULT_OUT_OF_BOUNDS: a6 60 vm_fault:1566 KERN_OUT_OF_BOUNDS b vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault:1587 KERN_OUT_OF_BOUNDS c 4 vm_fault_allocate:1152 FAULT_OUT_OF_BOUNDS: a6 60 vm_fault:1566 KERN_OUT_OF_BOUNDS b vm_fault_allocate:1152 FAULT_OUT_OF_BOUNDS: a6 60 vm_fault:1566 KERN_OUT_OF_BOUNDS b ``` This is from a single run of git.
Assigning to David since it's not a problem on git port and I'm not working on this issue at all