| Summary: | -Stable 14.1 on ARM compiler failure not seen in 14.1-RELEASE Pi3 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | karl | ||||||
| Component: | arm | Assignee: | freebsd-arm (Nobody) <freebsd-arm> | ||||||
| Status: | Open --- | ||||||||
| Severity: | Affects Some People | CC: | dim, markj, marklmi26-fbsd | ||||||
| Priority: | --- | ||||||||
| Version: | 14.1-STABLE | ||||||||
| Hardware: | arm64 | ||||||||
| OS: | Any | ||||||||
| Attachments: |
|
||||||||
"exit code 139" means that cc is dying with SIGSEGV, so at a glance this seems like a compiler bug of some kind. As you note, it only appears on RPi3 though, so there is something platform dependent about the problem. Does the generated reproducer actually work? It might be worth posting that here to start. Can you check dmesg for any out-of-memory errors? Or test your RAM? If the crashes are fairly random, as they seem here, it is almost always due to memory errors: either corruption or failure to allocate something. The backtrace to libthr.so.3 is most likely a red herring. The top few stack entries are interesting for diagnosing any crash, not the bottom ones (which are typically of the form "libthr is starting a thread here"). (In reply to Dimitry Andric from comment #2) That error was from the console, so clearly there was no out-of-memory condition as it would have been logged either on that execution or through the console since a kernel complaint would show up there. That was my first thought as well and in fact that S0-encryption.c file was split off from hd-mcp.c specifically to attempt to isolate that possibility because that particular source file is relatively large (although it has built on everything from a Pi2 forward on FreeBSD for close to 10 years across various revisions.) As for hardware-based corruption I'm skeptical and believe it is an OS (either compiler or kernel) problem specific to something different between releng/14.1 and current (yesterday) stable/14 because (1) releng/14.1 does not exhibit it on the SAME HARDWARE and in fact the precise same build using Crochet, altering ONLY the FreeBSD source worktree so all the RPI-firmware files and u-boot are identical between the builds, (2) hardware is almost-certainly excluded because two entirely-different Pi3s from different generations (and early one without the header for POE and a new one that does have it) both behave identically and (3) a Pi4 which I own does not exhibit the problem on either stable/14 or releng/14.1 Before posting this I went back through git log on the tree to see if I could find something that looked like it might be involved between the time when releng/14.1 was branched and today so I could perform a bisection to test that, but found nothing that stood out at me as likely to be related. (In reply to Mark Johnston from comment #1) Yep. root@rpi:/data/karl/HD-MCP # make cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c S0-encryption.c -o S0-encryption.o PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c S0-encryption.c -o S0-encryption.o 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 'S0-encryption.c'. 4. Running pass 'AArch64O0PreLegalizerCombiner' on function '@generate_mac' #0 0x0000000004b17588 (/usr/bin/cc+0x4b17588) #1 0x0000000004b15650 (/usr/bin/cc+0x4b15650) #2 0x0000000004ae16a0 (/usr/bin/cc+0x4ae16a0) #3 0x000000008a3a9eb8 (/lib/libthr.so.3+0x2aeb8) cc: error: clang frontend command failed with exit code 139 (use -v to see invocation) FreeBSD clang version 18.1.6 (https://github.com/llvm/llvm-project.git llvmorg-18.1.6-0-g1118c2e05e67) Target: aarch64-unknown-freebsd14.1 Thread model: posix InstalledDir: /usr/bin cc: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: cc: note: diagnostic msg: /tmp/S0-encryption-9c36d0.c cc: note: diagnostic msg: /tmp/S0-encryption-9c36d0.sh cc: note: diagnostic msg: ******************** *** Error code 1 Stop. make: stopped in /data/karl/HD-MCP root@rpi:/data/karl/HD-MCP # mkdir REPRO root@rpi:/data/karl/HD-MCP # cp /tmp/S0* REPRO root@rpi:/data/karl/HD-MCP # cd REPRO root@rpi:/data/karl/HD-MCP/REPRO # ls S0-encryption-9c36d0.c S0-encryption-9c36d0.sh root@rpi:/data/karl/HD-MCP/REPRO # sh S0*.sh PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /usr/bin/cc -cc1 -triple aarch64-unknown-freebsd14.1 -emit-obj -mrelax-all -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name S0-encryption.c -mrelocation-model static -mframe-pointer=non-leaf -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu generic -target-feature +v8a -target-feature +fp-armv8 -target-feature +neon -target-abi aapcs -debug-info-kind=standalone -dwarf-version=4 -debugger-tuning=gdb -fdebug-compilation-dir=/data/karl/HD-MCP -fcoverage-compilation-dir=/data/karl/HD-MCP -D VERSION=\"8.0.0-LocalAuth\" -Wstrict-prototypes -ferror-limit 19 -fno-signed-char -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcolor-diagnostics -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -x c S0-encryption-9c36d0.c 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 'S0-encryption-9c36d0.c'. 4. Running pass 'AArch64O0PreLegalizerCombiner' on function '@generate_mac' #0 0x0000000004b17588 (/usr/bin/cc+0x4b17588) #1 0x0000000004b15650 (/usr/bin/cc+0x4b15650) #2 0x0000000004b17cb0 (/usr/bin/cc+0x4b17cb0) #3 0x000000008b94ceb8 (/lib/libthr.so.3+0x2aeb8) Segmentation fault (core dumped) root@rpi:/data/karl/HD-MCP/REPRO # Those two reproducer files are now on the SD card (out of tmp, which is a tempfs) so now I halt that machine and place the same card in a Pi4, boot it and try running the reproducer there: root@rpi:/data/karl/HD-MCP/REPRO # ls S0-encryption-9c36d0.c S0-encryption-9c36d0.sh cc.core root@rpi:/data/karl/HD-MCP/REPRO # ls -al total 25100 drwxr-xr-x 2 root wheel 512 Jun 28 18:30 . drwxr-xr-x 5 root wheel 1024 Jun 28 18:30 .. -rw-r--r-- 1 root wheel 2578055 Jun 28 18:30 S0-encryption-9c36d0.c -rw-r--r-- 1 root wheel 2198 Jun 28 18:30 S0-encryption-9c36d0.sh -rw------- 1 root wheel 33755136 Jun 28 18:30 cc.core root@rpi:/data/karl/HD-MCP/REPRO # sh S0*.sh root@rpi:/data/karl/HD-MCP/REPRO # No crash. root@rpi:/data/karl/HD-MCP/REPRO # uname -v FreeBSD 14.1-STABLE stable/14-n268036-9a53391b601d GENERIC root@rpi:/data/karl/HD-MCP/REPRO # root@rpi:/data/karl/HD-MCP/REPRO # dmesg|grep ARM psci0: <ARM Power State Co-ordination Interface Driver> on ofwbus0 gic0: <ARM Generic Interrupt Controller> mem 0x40041000-0x40041fff,0x40042000-0x40043fff,0x40044000-0x40045fff,0x40046000-0x40047fff irq 30 on simplebus0 generic_timer0: <ARMv8 Generic Timer> irq 4,5,6,7 on ofwbus0 Timecounter "ARM MPCore Timecounter" frequency 54000000 Hz quality 1000 Event timer "ARM MPCore Eventtimer" frequency 54000000 Hz quality 1000 bcm2835_cpufreq0: ARM 600MHz, Core 200MHz, SDRAM 400MHz, Turbo OFF CPU 0: ARM Cortex-A72 r0p3 affinity: 0 CPU 1: ARM Cortex-A72 r0p3 affinity: 1 CPU 2: ARM Cortex-A72 r0p3 affinity: 2 CPU 3: ARM Cortex-A72 r0p3 affinity: 3 root@rpi:/data/karl/HD-MCP/REPRO # Same SD card, but in the "4". Again if I run this same compile on releng/14.1 it succeeds on BOTH. (In reply to karl from comment #4) I expect that markj might have been asking for you to provide copies of files like the ones mentioned in: Preprocessed source(s) and associated run script(s) are located at: cc: note: diagnostic msg: /tmp/S0-encryption-9c36d0.c cc: note: diagnostic msg: /tmp/S0-encryption-9c36d0.sh if they reproduce the problem for you --so others could attempt their own replications of the the problem, possibly even with clang built with debug information so that the backtrace is more useful. As stands you the only one with a context explore beyond exactly what you have written. The only difference I see in the logs digging around in the clang-related things is this one commit that is in 14-STABLE and NOT in releng/14.1 is found in the lib/clang directory: commit f1e3279983d6db1001af5fc9fb3a9821a1c353ef Author: Dimitry Andric <dim@FreeBSD.org> Date: Fri May 24 17:51:19 2024 +0200 Merge llvm-project release/18.x llvmorg-18.1.6-0-g1118c2e05e67 This updates llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and openmp to llvm-project release/18.x llvmorg-18.1.6-0-g1118c2e05e67. PR: 276104 MFC after: 3 days (cherry picked from commit 3a0793336edfc21cb6d4c8c5c5d7f1665f3e6c5a) The last commit in releng/14.1 is from 4 May; that commit bumps that from 18.1.5.0 to 18.1.6. This implicates the following PR which was marked closed as fixing a problem with compilation of something else, particularly against PowerPC targets..... https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=276104 I have not run this all the way down yet on a bisect but bisecting between those two good/bad and then running it down to here works: FreeBSD 14.1-STABLE n267672-ddabe1d3c515 GENERIC So I suspect that commit in lib/clang is the bad one. Where I am right now is here: root@NewFS:/usr/src.14-STABLE/lib/clang # git bisect bad Bisecting: 49 revisions left to test after this (roughly 6 steps) [ddabe1d3c51556c84f830b0203204c55b495e57b] mlx5en: add diagnostic in one more case of failed eeprom read preparation This bisection (after marking the previous one bad as it resulted in a missing include file) built and does NOT reproduce the problem. Will continue, but I'm reasonably sure which commit it is that causes the problem at this point. Update -- that specific commit IS NOT the problem. Doing a full bisect now from where my tree is on stable/14 to there; the build with that commit in it that was suspect works (this is a bit slow because Crochet will not properly build partial changes, so I have to make clean and rebuild world and kernel on each.) Update: We're down to somewhere in between these.... root@NewFS:/usr/src.14-STABLE # git bisect good Bisecting: 27 revisions left to test after this (roughly 5 steps) [dd8575e19ae02e7e8a10abd8592ac764263e9176] qlnx: Use device_set_descf() root@NewFS:/usr/src.14-STABLE # git bisect good Bisecting: 13 revisions left to test after this (roughly 4 steps) [a3f7b81fdd2205207085ba5d038b402aba748c6e] mbuf: provide m_freemp() root@NewFS:/usr/src.14-STABLE # git bisect bad Bisecting: 6 revisions left to test after this (roughly 3 steps) [6e345bea25d476baf6de7fb3b60127d39b464837] makefs/zfs: Add a helper function for adding ZAP entries DING DING DING Winner winner chicken dinner -- this is the bad commit: root@rpi:/data/HD-MCP # make clean rm -f *.o hd-mcp hd-mcp.freeware license-server hd-commit root@rpi:/data/HD-MCP # make cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c config.c -o config.o cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c funcs.c -o funcs.o cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c hd-mcp.c -o hd-mcp.o PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c hd-mcp.c -o hd-mcp.o 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 'hd-mcp.c'. 4. Running pass 'AArch64O0PreLegalizerCombiner' on function '@process_unit_get_response' #0 0x0000000004b17588 (/usr/bin/cc+0x4b17588) #1 0x0000000004b15650 (/usr/bin/cc+0x4b15650) #2 0x0000000004ae16a0 (/usr/bin/cc+0x4ae16a0) #3 0x0000000089daaeb8 (/lib/libthr.so.3+0x2aeb8) cc: error: clang frontend command failed with exit code 139 (use -v to see invocation) FreeBSD clang version 18.1.6 (https://github.com/llvm/llvm-project.git llvmorg-18.1.6-0-g1118c2e05e67) Target: aarch64-unknown-freebsd14.1 Thread model: posix InstalledDir: /usr/bin cc: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: cc: note: diagnostic msg: /tmp/hd-mcp-200d21.c cc: note: diagnostic msg: /tmp/hd-mcp-200d21.sh cc: note: diagnostic msg: ******************** *** Error code 1 Stop. make: stopped in /data/HD-MCP root@NewFS:/usr/src.14-STABLE # git bisect start status: waiting for both good and bad commits root@NewFS:/usr/src.14-STABLE # git bisect good f1e3279983d6db1001af5fc9fb3a9821a1c353ef status: waiting for bad commit, 1 good commit known root@NewFS:/usr/src.14-STABLE # git bisect bad 939f5a7b2bfbd7ba3b23ddc691e12e8a332623f4 Bisecting: 111 revisions left to test after this (roughly 7 steps) [7ad7453748e2adafa1e1a3e44b02fc852d4c5301] LinuxKPI: 802.11: change teardown order to avoid iwlwifi firmware crashes root@NewFS:/usr/src.14-STABLE # git branch * (no branch, bisect started on stable/14) + main + releng/14.1 + stable/12 + stable/13 stable/14 root@NewFS:/usr/src.14-STABLE # git bisect bad Bisecting: 55 revisions left to test after this (roughly 6 steps) [ac658a7c760d9db9fcd11cdeb3b858411dedf754] rc: Set var_run_enable to enable by default root@NewFS:/usr/src.14-STABLE # git bisect good Bisecting: 27 revisions left to test after this (roughly 5 steps) [dd8575e19ae02e7e8a10abd8592ac764263e9176] qlnx: Use device_set_descf() root@NewFS:/usr/src.14-STABLE # git bisect good Bisecting: 13 revisions left to test after this (roughly 4 steps) [a3f7b81fdd2205207085ba5d038b402aba748c6e] mbuf: provide m_freemp() root@NewFS:/usr/src.14-STABLE # git bisect bad Bisecting: 6 revisions left to test after this (roughly 3 steps) [6e345bea25d476baf6de7fb3b60127d39b464837] makefs/zfs: Add a helper function for adding ZAP entries root@NewFS:/usr/src.14-STABLE # git bisect good Bisecting: 3 revisions left to test after this (roughly 2 steps) [a40287d6312e598fc65c5a7bbdefe6f9e15b7a5f] simd(7): add missing aarch64 SIMD functions root@NewFS:/usr/src.14-STABLE # git bisect good Bisecting: 1 revision left to test after this (roughly 1 step) [3562d64e794b2614f15728c8b7e3a6dff0b644a7] sqlite3: Vendor import of sqlite3 3.46.0 root@NewFS:/usr/src.14-STABLE # git bisect good Bisecting: 0 revisions left to test after this (roughly 0 steps) [55c5dad2f305f74d1ff5ca85c453635511aab9b2] Merge commit 382f70a877f0 from llvm-project (by Louis Dionne): root@NewFS:/usr/src.14-STABLE # git bisect bad 55c5dad2f305f74d1ff5ca85c453635511aab9b2 is the first bad commit commit 55c5dad2f305f74d1ff5ca85c453635511aab9b2 (HEAD) Author: Dimitry Andric <dim@FreeBSD.org> Date: Fri Jun 7 20:42:53 2024 +0200 Merge commit 382f70a877f0 from llvm-project (by Louis Dionne): [libc++][NFC] Rewrite function call on two lines for clarity (#79141) Previously, there was a ternary conditional with a less-than comparison appearing inside a template argument, which was really confusing because of the <...> of the function template. This patch rewrites the same statement on two lines for clarity. Merge commit d129ea8d2fa3 from llvm-project (by Vitaly Buka): [libcxx] Align `__recommend() + 1` by __endian_factor (#90292) This is detected by asan after #83774 Allocation size will be divided by `__endian_factor` before storing. If it's not aligned, we will not be able to recover allocation size to pass into `__alloc_traits::deallocate`. we have code like this ``` auto __allocation = std::__allocate_at_least(__alloc(), __recommend(__sz) + 1); __p = __allocation.ptr; __set_long_cap(__allocation.count); void __set_long_cap(size_type __s) _NOEXCEPT { __r_.first().__l.__cap_ = __s / __endian_factor; __r_.first().__l.__is_long_ = true; } size_type __get_long_cap() const _NOEXCEPT { return __r_.first().__l.__cap_ * __endian_factor; } inline ~basic_string() { __annotate_delete(); if (__is_long()) __alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap()); } ``` 1. __recommend() -> even size 2. `std::__allocate_at_least(__alloc(), __recommend(__sz) + 1)` - > not even size 3. ` __set_long_cap() `- > lose one bit of size for __endian_factor == 2 (see `/ __endian_factor`) 4. `__alloc_traits::deallocate(__alloc(), __get_long_pointer(), __get_long_cap())` -> uses even size (see `__get_long_cap`) This should fix incorrect deallocation sizes for some instances of std::string. Memory profiling or debugging tools like AddressSanitizer, LeakSanitizer or TCMalloc could then complain about the the size passed to a deallocation not matching the size originally passed to the allocation. Reported by: Aliaksei Kandratsenka <alkondratenko@gmail.com> PR: 279560 MFC after: 3 days (cherry picked from commit ead8e4c081e5c4de4d508fc353f381457b058ca6) contrib/llvm-project/libcxx/include/string | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) root@NewFS:/usr/src.14-STABLE # (In reply to karl from comment #10) How does anyone else but you test for the status of the problem? Does anyone have the source code that you are compiling so that they have a chance of testing? Some other source code known to reproduce the problem in your context? An example might be a smaller subset of the original source code. (In reply to Mark Millard from comment #11) It won't allow me to attach it -- even the "limited" case, S0-encryption.c that the compiler dumps as a reproducer, is ~2.5Mb as a result of the preprocessor includes. The file itself is a whopping 458 lines. I am now building with that single commit reverted on my local stable/14 tree as another test. With that commit reverted (its 3 lines of change) the build reliably completes. I will take the reversion back out and see if I can remove enough includes which aren't necessary for that specific routine to get it under the attach limit and see if it still crashes. Unfortunately the crash message does not tell me which line of the ~2.5MB dump of the code with the processor includes caused it to blow up. Breaking up the S0-encryption file into constituent functions (specifically breaking two of the functions in that short file into separate files) makes the fault intermittent. That is, it will frequently SEGV, but then if I type "make" again it might complete -- or it might blow up again, even down to the single routine level. Here's one of the routines that *sometimes* blows with the same fault: root@rpi:/data/HD-MCP # make cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c S0-mac.c -o S0-mac.o PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c S0-mac.c -o S0-mac.o 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 'S0-mac.c'. 4. Running pass 'AArch64O0PreLegalizerCombiner' on function '@generate_mac' #0 0x0000000004b17588 (/usr/bin/cc+0x4b17588) #1 0x0000000004b15650 (/usr/bin/cc+0x4b15650) #2 0x0000000004ae16a0 (/usr/bin/cc+0x4ae16a0) #3 0x000000008adfceb8 (/lib/libthr.so.3+0x2aeb8) cc: error: clang frontend command failed with exit code 139 (use -v to see invocation) FreeBSD clang version 18.1.6 (https://github.com/llvm/llvm-project.git llvmorg-18.1.6-0-g1118c2e05e67) Target: aarch64-unknown-freebsd14.1 Thread model: posix InstalledDir: /usr/bin cc: note: diagnostic msg: Error generating preprocessed source(s). *** Error code 1 Stop. make: stopped in /data/HD-MCP This is the source of that file -- I pulled this single routine out. Yet, it does not blow up *always* when I separate things out (I also pulled out the "decrypt_packet" routine into a different file as well) - after that blow up if I execute make again... root@rpi:/data/HD-MCP # make cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c S0-mac.c -o S0-mac.o cc -g -o hd-mcp hd-mcp.o www.o config.o slave.o amcrest.o license.o funcs.o z-wave.o malloc.o S0-encryption.o S0-decrypt.o S0-mac.o root-include.o boot-include.o -lm -lcrypt -lssl -lpthread -lcrypto -lgpio The build completes. But again, with the above commit reverted the compiler *never* blows up on the Pi3, nor does it blow up with the commit in on a Pi4 or on an AMD64 box, all of which are on the same revision of 14/stable. /* * HomeDaemon-MCP - S0-generate_mac.c * Copyright 2016/2017/2024 Karl Denninger (karl@denninger.net); * all rights reserved. * Unauthorized reproduction or distribution of this file, a component * of HomeDaemon MCP, is expressly prohibited. */ #include <stdio.h> #include <ctype.h> #include <sys/types.h> #include <errno.h> #include <stdlib.h> #include <syslog.h> #include <stdarg.h> #include <string.h> #include "defs.h" #ifndef OPENSSL #undef GPIO #undef ANALOG #undef IIC #endif // OPENSSL #include "forwards.h" #include "externs.h" #ifdef OPENSSL #include <openssl/conf.h> #include <openssl/x509v3.h> #include <openssl/ssl.h> #include <openssl/evp.h> #include <openssl/rsa.h> #include <openssl/engine.h> /* * generate_mac -- generate the MAC for the packet we are about to send. * This only actually returns 8 bytes of MAC, but is a bit more complex * than that to compute. */ void generate_mac(int unit, int mode, int from, int to, unsigned char *msgbuf, unsigned char *iv, int len, unsigned char *out) //int unit; /* The unit to get the temp key from */ //int mode; /* The command subclass we're encapsulating */ //int from; /* The unit we are sending from */ //int to; /* The unit we are sending to */ //unsigned char *msgbuf; //unsigned char *iv; //int len; /* Length of msgbuf */ //unsigned char *out; /* Output in binary form, 8 byte buffer assumed */ { int enclen; unsigned char cipher[16]; /* Our encrypted IV */ unsigned char cipher2[16]; /* Temporary holding area */ unsigned char tmphold[16]; /* Holding area for computation */ unsigned char auth[16]; /* Where the MAC goes temporarily */ unsigned char buffer[256]; int bufsize; int loc; int x, y; EVP_CIPHER_CTX *EVP_auth_ctx; bzero(buffer, 256); bzero(tmphold, 16); bzero(cipher, 16); bzero(cipher2, 16); /* * Build a buffer to be turned into the MAC */ buffer[0] = (unsigned char) mode; buffer[1] = (unsigned char) from; buffer[2] = (unsigned char) to; buffer[3] = (unsigned char) len; /* Length of the encrypted data */ memcpy(&buffer[4], msgbuf, len); /* Copy it in */ bufsize = len + 4; /* Initialize the buffer size */ /* * Now encrypt our IV with the auth key using ecb. */ EVP_auth_ctx = EVP_CIPHER_CTX_new(); EVP_EncryptInit_ex(EVP_auth_ctx, EVP_aes_128_ecb(), NULL, units[unit].AuthKey, NULL); EVP_EncryptUpdate(EVP_auth_ctx, cipher, &enclen, iv, 16); if (enclen != 16) { /* Something's wrong */ panic("Wrong encryption return length!"); } bzero(tmphold, 16); /* Zero the temporary holding area */ loc = 0; /* And reset our location */ for (x = 0; x < bufsize; x++) { /* Iterate over the buffer */ tmphold[loc++] = buffer[x]; /* Hold it */ if (loc == 16) { /* Encrypt when filled */ for (y = 0; y < 16; y++) { cipher[y] = tmphold[y] ^ cipher[y]; tmphold[y] = 0; /* And clear it... */ } loc = 0; /* Clear counter */ EVP_EncryptUpdate(EVP_auth_ctx, cipher2, &enclen, cipher, 16); if (enclen != 16) { /* Something's wrong */ panic("Wrong encryption return length!"); } memcpy(cipher, cipher2, 16); /* Copy back */ } } if (loc > 0) { /* If there's a partial block encrypt it too */ for (y = 0; y < 16; y++) { /* Do the last piece */ cipher[y] = tmphold[y] ^ cipher[y]; } EVP_EncryptUpdate(EVP_auth_ctx, cipher2, &enclen, cipher, 16); if (enclen != 16) { /* Something's wrong */ panic("Wrong encryption return length!"); } memcpy(cipher, cipher2, 16); /* Copy back */ } EVP_CIPHER_CTX_free(EVP_auth_ctx); memcpy(out, cipher, 8); return; } #endif // OPENSSL (In reply to karl from comment #12) Compress the file and submit that. Text files generally compress well. (In reply to karl from comment #14) This should allow someone that has build the system clang without stripping symbols to get a backtrace if they manage to reproduce any examples of the failures. (In reply to karl from comment #14) defs.h is missing from what is presented. I was only checking on completeness of the supplied source code: cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c S0-mac.c -o S0-mac.o S0-mac.c:20:17: fatal error: 'defs.h' file not found 20 | #include "defs.h" | ^~~~~~~~ 1 error generated. Can you please upload the original reproducer files somewhere in full? One thing that I can think of is that your clang binary is somehow bad, or not recompiled against the fixed libc++. I didn't bump the clang internal version for this, but maybe that was wrong. In any case, I cannot help if I do not have a reproduction scenario. Created attachment 251827 [details]
Reproducer binary
Created attachment 251828 [details]
Reproducer script
(In reply to Dimitry Andric from comment #18) Reproducer binary and script, compressed, uploaded. That's a minimalist case from the code and does not blow up every time on the Pi3-- and invoking it on my AMD64 primary build box, or the Pi4 booted from the same media as the 3, never blows up. The .sh has: "-fdebug-compilation-dir=/data/HD-MCP" "-fcoverage-compilation-dir=/data/HD-MCP" that presume some local context and might need to be adjusted by folks looking to use the script. (In reply to Mark Millard from comment #22) That is the directory in which the "make" command was given. (In reply to Mark Millard from comment #22) IMHO the correct line of inquiry here, given that I've identified the commit that causes this to happen, is for someone with sufficient knowledge of both the llvm internals *and* the ARM Cortex processor differences between the Pi3 and 4 to take a critical look at that delta, particularly the "endian" element of it. Padding an alignment should never burn you (it might waste RAM on a temporary basis but if I need 3 bytes and allocate 4 that doesn't do harm other than the extra byte used) but if you wind up overflowing a pointer or other allocation somewhere on the next reference you're essentially certain to take a SEGV. I couldn't reproduce any crash on the only arm-like machine I have, which is an arm64 VM on my Mac. Even running the test case a 1000 times shows no hiccups. I tried this on both 14.1-RELEASE and 14.1-RELEASE-p2 (which includes the libc++ fix). Does this crash really only occur on a 32-bit arm host? (In reply to Dimitry Andric from comment #25) Yes. On stable/14 with the commit I bisected to in, if I boot the same SD card on a Pi4, the reproducer does *not* crash irrespective of how many times. It also does not crash on an AMD64 box (my build machine) again, irrespective of how many attempts are made. On a Pi3, with the commit in on stable/14, it *does* crash quite-reliably (not every single time, but most of the time; if I do not pull out routines from that original file then it is 1 out of 100, perhaps, that file will complete but even just that 100-line bit of source blows up most of the time, and the reproducer it dumps does so as well.) On releng/14.1, which does not have the commit, the reproducer does not crash. If I execute a git revert on *just that commit* on stable/14, leaving everything else alone, it does not crash no matter how many times I attempt to compile just the reproducer I posted, again, on the same Pi3 hardware. Note that the build (and running kernel) is aarch64, not aarch32.
From the boot dmesg:
FreeBSD 14.1-STABLE stable/14-n268044-939f5a7b2bfb GENERIC arm64
.....
CPU 0: ARM Cortex-A53 r0p4 affinity: 0
Cache Type = <64 byte D-cacheline,64 byte I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG>
Instruction Set Attributes 0 = <CRC32>
Instruction Set Attributes 1 = <>
Instruction Set Attributes 2 = <>
Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32>
Processor Features 1 = <>
Trying to mount root from ufs:/dev/mmcsd0s2a [ro]...
Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,1TB PA>
Memory Model Features 1 = <8bit VMID>
Memory Model Features 2 = <32bit CCIDX,48bit VA>
Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8>
Debug Features 1 = <>
Auxiliary Features 0 = <>
Auxiliary Features 1 = <>
AArch32 Instruction Set Attributes 5 = <CRC32,SEVL>
AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD>
AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ>
CPU 1: ARM Cortex-A53 r0p4 affinity: 1
CPU 2: ARM Cortex-A53 r0p4 affinity: 2
CPU 3: ARM Cortex-A53 r0p4 affinity: 3
Release APs...done
root@rpi:~ # sysctl -a|grep hw.machine_arch
hw.machine_arch: aarch64
|
This is a rather odd problem and I'm uncertain of the scope. Context: Pi3, checked multiple physical devices including one of the "newest" ones with connections for POE HATs as well as an older unit with identical results. On a Pi4 4Gb, booting the SAME SD card there is no problem. The package that fails is code I've had running for quite some time, and on 13.x-STABLE it works perfectly well. I build using Crochet; -RELEASE, however, was checked both with my own worktree for releng/14.1 and stable/14. The exact place in the source (which function it is compiling at the time) the compiler blows up varies to some degree but the crash is the same in all instances. Whether I have the source on a UFS+Su/J filesystem on the SD card or I copy it to a tempfs (Ramdisk) doesn't matter so I surmise this is something that has changed either in the kernel or clang -- and it may be thread related. That it never occurs on the Pi4 is troublesome as that implies its local to either the CPU on the "3" or its RAM architecture .vs. the 4, given that I am literally plugging the same SD card into each. There is no evidence of RAM exhaustion or similar. Here's an example of the crash; I am running this on the physical (serial) console so if there was a kernel complaint about memory or similar it would be embedded in this output. First, the boot message from "dmesg" on the Pi3 (some elided but including the ARM CPU info): ---<<BOOT>>--- WARNING: Cannot find freebsd,dts-version property, cannot check DTB compliance Copyright (c) 1992-2023 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 14.1-STABLE stable/14-n268036-9a53391b601d GENERIC arm64 FreeBSD clang version 18.1.6 (https://github.com/llvm/llvm-project.git llvmorg-18.1.6-0-g1118c2e05e67) VT(efifb): resolution 656x416 module scmi already present! real memory = 994041856 (947 MB) avail memory = 945451008 (901 MB) Starting CPU 1 (1) Starting CPU 2 (2) Starting CPU 3 (3) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs arc4random: WARNING: initial seeding bypassed the cryptographic random device because it was not yet seeded and the knob 'bypass_before_seeding' was enabled. random: entropy device external interface kbd0 at kbdmux0 ofwbus0: <Open Firmware Device Tree> simplebus0: <Flattened device tree simple bus> on ofwbus0 ofw_clkbus0: <OFW clocks bus> on ofwbus0 regfix0: <Fixed Regulator> on ofwbus0 clk_fixed2: clock-fixed has no clock-frequency .... CPU 0: ARM Cortex-A53 r0p4 affinity: 0 Cache Type = <64 byte D-cacheline,64 byte I-cacheline,VIPT ICache,64 byte ERG,64 byte CWG> Instruction Set Attributes 0 = <CRC32> Instruction Set Attributes 1 = <> Instruction Set Attributes 2 = <> Processor Features 0 = <AdvSIMD,FP,EL3 32,EL2 32,EL1 32,EL0 32> Processor Features 1 = <> Memory Model Features 0 = <TGran4,TGran64,SNSMem,BigEnd,16bit ASID,1TB PA> Trying to mount root from ufs:/dev/mmcsd0s2a [ro]... Memory Model Features 1 = <8bit VMID> Memory Model Features 2 = <32bit CCIDX,48bit VA> Debug Features 0 = <DoubleLock,2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8> Debug Features 1 = <> Auxiliary Features 0 = <> Auxiliary Features 1 = <> AArch32 Instruction Set Attributes 5 = <CRC32,SEVL> AArch32 Media and VFP Features 0 = <FPRound,FPSqrt,FPDivide,DP VFPv3+v4,SP VFPv3+v4,AdvSIMD> AArch32 Media and VFP Features 1 = <SIMDFMAC,FPHP DP Conv,SIMDHP SP Conv,SIMDSP,SIMDInt,SIMDLS,FPDNaN,FPFtZ> CPU 1: ARM Cortex-A53 r0p4 affinity: 1 CPU 2: ARM Cortex-A53 r0p4 affinity: 2 CPU 3: ARM Cortex-A53 r0p4 affinity: 3 Release APs...done And then.... root@rpi:/data/karl/HD-MCP # make clean rm -f *.o hd-mcp hd-mcp.freeware license-server hd-commit root@rpi:/data/karl/HD-MCP # make cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c config.c -o config.o cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c funcs.c -o funcs.o cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c hd-mcp.c -o hd-mcp.o PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c hd-mcp.c -o hd-mcp.o 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 'hd-mcp.c'. 4. Running pass 'AArch64O0PreLegalizerCombiner' on function '@process_unit_get_response' #0 0x0000000004b17588 (/usr/bin/cc+0x4b17588) #1 0x0000000004b15650 (/usr/bin/cc+0x4b15650) #2 0x0000000004ae16a0 (/usr/bin/cc+0x4ae16a0) #3 0x000000008a02eeb8 (/lib/libthr.so.3+0x2aeb8) cc: error: clang frontend command failed with exit code 139 (use -v to see invocation) FreeBSD clang version 18.1.6 (https://github.com/llvm/llvm-project.git llvmorg-18.1.6-0-g1118c2e05e67) Target: aarch64-unknown-freebsd14.1 Thread model: posix InstalledDir: /usr/bin cc: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: cc: note: diagnostic msg: /tmp/hd-mcp-720fdc.c cc: note: diagnostic msg: /tmp/hd-mcp-720fdc.sh cc: note: diagnostic msg: ******************** *** Error code 1 Stop. make: stopped in /data/karl/HD-MCP root@rpi:/data/karl/HD-MCP # The crash is always in libthr.so.3 and at that address; SOMETIMES if I re-execute the "make" command it will get past this file but then blows up in another one: root@rpi:/data/karl/HD-MCP # make cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c hd-mcp.c -o hd-mcp.o cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c www.c -o www.o cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c slave.c -o slave.o cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c amcrest.c -o amcrest.o cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c license.c -o license.o cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c z-wave.c -o z-wave.o cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c malloc.c -o malloc.o cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c S0-encryption.c -o S0-encryption.o PLEASE submit a bug report to https://bugs.freebsd.org/submit/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: cc -g -Wstrict-prototypes -DVERSION=\"8.0.0-LocalAuth\" -c S0-encryption.c -o S0-encryption.o 1. <eof> parser at end of file 2. Code generation 3. Running pass 'Function Pass Manager' on module 'S0-encryption.c'. 4. Running pass 'AArch64O0PreLegalizerCombiner' on function '@generate_mac' #0 0x0000000004b17588 (/usr/bin/cc+0x4b17588) #1 0x0000000004b15650 (/usr/bin/cc+0x4b15650) #2 0x0000000004ae16a0 (/usr/bin/cc+0x4ae16a0) #3 0x000000008a39feb8 (/lib/libthr.so.3+0x2aeb8) cc: error: clang frontend command failed with exit code 139 (use -v to see invocation) FreeBSD clang version 18.1.6 (https://github.com/llvm/llvm-project.git llvmorg-18.1.6-0-g1118c2e05e67) Target: aarch64-unknown-freebsd14.1 Thread model: posix InstalledDir: /usr/bin cc: note: diagnostic msg: ******************** PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT: Preprocessed source(s) and associated run script(s) are located at: cc: note: diagnostic msg: /tmp/S0-encryption-2a4545.c cc: note: diagnostic msg: /tmp/S0-encryption-2a4545.sh cc: note: diagnostic msg: ******************** *** Error code 1 Stop. make: stopped in /data/karl/HD-MCP That file almost NEVER completes -- but once in a great while it will (!!) and when it does the executable that gets produced runs as expected. The crash in the compiler, when it occurs, always has the same traceback to the same place in libthr.so.3 irrespective of which function is the one referenced in the compiler crash itself Clearing the object directory and re-running the build does not change the outcome. But if I build releng/14.1 (same Crochet, just changing the source to releng/14.1 from stable/14) and boot THAT, it never crashes. It also never crashes during build on *either* version of the OS if I am running on a Pi4 -- only on the 3. S0-encryption.c is an unremarkable file that contains a handful of functions that are all related to the use of OpenSSL routines to perform encryption, decryption along with computing (and checking) a MAC against data packets; it is only some 400 lines of C code. If the crash disappears on future updates to stable/14 I'll withdraw it as OBE, but since this implies there's a potential problem with thread handling on the Pi3 under 14/stable I wanted to stick it out there.