Bug 269487

Summary: 13.2-PRERELEASE csh dumps core
Product: Base System Reporter: Joerg Wunsch <joerg>
Component: armAssignee: freebsd-arm (Nobody) <freebsd-arm>
Status: Closed Not A Bug    
Severity: Affects Only Me CC: jfc
Priority: ---    
Version: 13.1-STABLE   
Hardware: arm64   
OS: Any   

Description Joerg Wunsch freebsd_committer freebsd_triage 2023-02-11 08:47:18 UTC
System:

13.2-PRERELEASE-arm64-aarch64-ROCKPRO64-20230204-d07eb716f35d-254363

Hardware:

RockPi 4, 4 GiB RAM, SD-card (above image patched with platform u-boot)

System boots fine, updates partition layout.

Upon logging in as root, it produces a coredump.
Comment 1 Joerg Wunsch freebsd_committer freebsd_triage 2023-02-11 08:50:41 UTC
I tried attaching the (compressed) coredump, but get an HTTP error (multipart/form truncated).

Trying to examine the dump on my amd64 system yields:

Reading symbols from /mnt/bin/csh...
(No debugging symbols found in /mnt/bin/csh)

warning: core file may not match specified executable file.
[New LWP 100152]

warning: `/lib/libcrypt.so.5': Shared library architecture i386:x86-64 is not compatible with target architecture aarch64.

warning: .dynamic section for "/lib/libcrypt.so.5" is not at the expected address (wrong library or version mismatch?)

warning: `/lib/libc.so.7': Shared library architecture i386:x86-64 is not compatible with target architecture aarch64.

warning: .dynamic section for "/lib/libc.so.7" is not at the expected address (wrong library or version mismatch?)

warning: `/usr/lib/i18n/libiconv_std.so.4': Shared library architecture i386:x86-64 is not compatible with target architecture aarch64.

warning: .dynamic section for "/usr/lib/i18n/libiconv_std.so.4" is not at the expected address (wrong library or version mismatch?)

warning: `/usr/lib/i18n/libUTF8.so.4': Shared library architecture i386:x86-64 is not compatible with target architecture aarch64.

warning: .dynamic section for "/usr/lib/i18n/libUTF8.so.4" is not at the expected address (wrong library or version mismatch?)

warning: `/usr/lib/i18n/libmapper_none.so.4': Shared library architecture i386:x86-64 is not compatible with target architecture aarch64.

warning: .dynamic section for "/usr/lib/i18n/libmapper_none.so.4" is not at the expected address (wrong library or version mismatch?)

warning: `/libexec/ld-elf.so.1': Shared library architecture i386:x86-64 is not compatible with target architecture aarch64.

warning: .dynamic section for "/libexec/ld-elf.so.1" is not at the expected address (wrong library or version mismatch?)

warning: Could not load shared library symbols for /lib/libncursesw.so.9.
Do you need "set solib-search-path" or "set sysroot"?
Core was generated by `-csh'.
Program terminated with signal SIGILL, Illegal instruction.
Illegal trap.
#0  0x000016f199ba7444 in ?? ()
(gdb) bt
#0  0x000016f199ba7444 in ?? ()
#1  0x000016f199bd5990 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Comment 2 Joerg Wunsch freebsd_committer freebsd_triage 2023-02-11 17:57:30 UTC
Finally got a debugger that can diassemble aarch64.

Since it's an illegal instruction trap, the instruction in question might be of interest:

(gdb) disas $pc-16, $pc+16
Dump of assembler code from 0x16f199ba7434 to 0x16f199ba7454:
   0x000016f199ba7434:    cbz       w23, 0x16f199c275c0
   0x000016f199ba7438:    mov       w20, w2
   0x000016f199ba743c:    mov       x19, x1
   0x000016f199ba7440:    str       x24, [x0]
=> 0x000016f199ba7444:    cmpge     p2.h, p0/z, z10.h, #0
   0x000016f199ba7448:    ldrb      w8, [x19]
   0x000016f199ba744c:    cmp       w8, #0x35
   0x000016f199ba7450:    b.ne      0x16f199ba746c  // b.any
End of assembler dump.
Comment 3 John F. Carr 2023-02-11 20:19:20 UTC
(In reply to Joerg Wunsch from comment #2)

Can you find out what file the instruction is in?

I suspect data corruption in your executable or shared libraries.  That instruction does not make sense in context.

The instruction trapped because it is an SVE instruction.  Your CPU does not implement SVE.  Neither do my systems but csh does not crash for me.  Unless somebody turned on a non-default flag the compiler should not be generating SVE instructions.
Comment 4 Joerg Wunsch freebsd_committer freebsd_triage 2023-02-11 20:37:48 UTC
(In reply to John F. Carr from comment #3)

Since the (pre)release has only stripped binaries, it's pretty hard to find out the respective source file.

When setting sysroot in GDB so it finds the actual shared libs from the SD card, it seems the PC is in the csh binary itself, rather than in a shared lib:

Local exec file:
        `/mnt/bin/csh', file type elf64-littleaarch64.
        Entry point: 0x16f199b672bc
        0x000016f199b402a8 - 0x000016f199b402bd is .interp
        0x000016f199b402c0 - 0x000016f199b40308 is .note.tag
        0x000016f199b40308 - 0x000016f199b412c8 is .dynsym
        0x000016f199b412c8 - 0x000016f199b41418 is .gnu.version
        0x000016f199b41418 - 0x000016f199b41468 is .gnu.version_r
        0x000016f199b41468 - 0x000016f199b41494 is .gnu.hash
        0x000016f199b41494 - 0x000016f199b419dc is .hash
        0x000016f199b419dc - 0x000016f199b41fc5 is .dynstr
        0x000016f199b41fc8 - 0x000016f199b48718 is .rela.dyn
        0x000016f199b48718 - 0x000016f199b49600 is .rela.plt
        0x000016f199b49600 - 0x000016f199b4ef8a is .rodata
        0x000016f199b4ef8c - 0x000016f199b507a0 is .eh_frame_hdr
        0x000016f199b507a0 - 0x000016f199b5702c is .eh_frame
        0x000016f199b6702c - 0x000016f199bb1bd4 is .text
        0x000016f199bb1be0 - 0x000016f199bb1bf4 is .init
        0x000016f199bb1c00 - 0x000016f199bb1c14 is .fini

0x000016f199ba7444 would be in section .eh_frame, if I read that correctly.

Well, I mounted the original distribution image, and when I compare /bin/csh from the SD card and the original image, they indeed differ:

# sha1 /tmp/csh /tmp/csh-orig
SHA1 (/tmp/csh) = 1511de5519ff7619e2577c9fabf849f2e78acec1
SHA1 (/tmp/csh-orig) = 7ff43d30fda3e2a8272c9cde41a1c0507d1b51dc

Strange.
Comment 5 John F. Carr 2023-02-11 20:49:50 UTC
(In reply to Joerg Wunsch from comment #4)

I disassembled the csh I just built from stable/13.  Here is the corresponding set of instructions in my build:

0x67434: 0x34000c77       cbz    w23, 0x675c0
0x67438: 0x2a0203f4       mov    w20, w2
0x6743c: 0xaa0103f3       mov    x19, x1
0x67440: 0xf9400018       ldr    x24, [x0]    # str in bug report
0x67444: 0x35000142       cbnz   w2, 0x6746c  # nonsense SVE instruction in bug report
0x67448: 0x39400268       ldrb   w8, [x19]
0x6744c: 0x7100d51f       cmp    w8, #0x35
0x67450: 0x540000e1       b.ne   0x6746c

Maybe it's just two bit flips difference.
Comment 6 Joerg Wunsch freebsd_committer freebsd_triage 2023-02-11 23:43:48 UTC
I just started over with the image, getting different errors.

I'm starting to suspect the SD card. Will try another one.
Comment 7 Joerg Wunsch freebsd_committer freebsd_triage 2023-02-12 00:04:40 UTC
I'd say it was the SD card. It was a brand new one taken out of the blister, but apparently suspicous. I replaced it by another one, and now all seems to work – even much faster than the other card.