We're seeing a php segfault building devel/pear for armv7 on amd64 under qemu-user-static-3.1.0_9. I repro'ed with a stable/12 armv7 jail and current ports tree on head. It appears to be crashing in libpcre but I don't have a lot more info yet. Just logging this here as I debug in case anyone else can provide a clue or a "me too".
Ah, I missed the part where you already engaged in debugging -- feel free to take this back. :-)
(In reply to Kyle Evans from comment #1) I have zero experience debugging qemu-user-static, so pointers would be appreciated. I'm waiting for gdb to cross-compile still. :) I take it I won't be able to run php under emulated gdb, but it should be possible to attach to qemu's gdb stub? Also this is apparently a regression, not sure what the known good version numbers are yet. It might be easier to bisect qemu.
(In reply to Mark Johnston from comment #2) Right, no live debugging, but the gdb stub worked the last time I tried it (maybe a year or two ago? before we rebased to 3.1). It's been helpful at times to leave the jail running and launch qemu-arm-static manually outside using -L ${sysroot} + dumping cpu state (-d) for faster execution tracing, but that's probably a non-issue for someone much more comfortable with gdb. :-)
FYI, Same problem doesn't happen on a system running 12.2-STABLE and qemu-user-static version 3.1.0_1. `FreeBSD buildbot1-nyi.netgate.com 12.2-STABLE FreeBSD 12.2-STABLE acaac0eefa1(stable/12) GENERIC amd64` Also, There is an issue opened at qemu-bsd-user github page https://github.com/qemu-bsd-user/qemu-bsd-user/issues/5
(In reply to Renato Botelho from comment #4) Given the timing, I'd suspect the recent elfload hacks that I did to try and fix kyua and direct-exec rtld.
(In reply to Kyle Evans from comment #5) Luiz (loos@) also mentioned he built some aarch64 stuff on that box and he saw issues with lots of other binaries, it's not only PHP. We also see some messages like this one: Qemu unsupported ioctl: cmd=0xc0306365 dir=INOUT 'c' 101 48
We're crashing on a write to 0xf4215a70. Shortly before, we had mmapped a region containing that address: 71585 mmap(0,65536,7,4098,-1,0) = 0xf4206000 71585 mprotect(0xf4206000,0x10000,7) = 0 and I can't see any subsequent system calls that would modify that mapping, but procstat -v shows: 71585 0xf4206000 0xf4215000 rwx 1 2 2 0 ----- df 71585 0xf4215000 0xf4216000 r-x 1 2 2 0 ----- df so indeed the last page is not writeable. I'm not sure why libpcre is mprotect()ing a region to set the permissions specified by the preceding mmap() call.
Using truss on the host I can see that we are mprotecting the last page (containing the address in question) of that range to PROT_READ | PROT_EXEC. It doesn't show up in qemu's strace output, so presumably this is something internal to qemu. The only syscall which looks relevant here is a sysarch(ARM_SYNC_ICACHE), but it looks like qemu treats that as a no-op...
qemu is doing the mprotect here: Thread 1 hit Catchpoint 1 (call to syscall mprotect), 0x000000006049f48a in ?? () (gdb) bt #0 0x000000006049f48a in ?? () #1 0x00000000602b413a in page_find_alloc (index=5, alloc=1) at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/accel/tcg/translate-all.c:497 #2 page_lock_pair (ret_p1=<optimized out>, phys1=4095827272, ret_p2=<optimized out>, phys2=4294967295, alloc=1) at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/accel/tcg/translate-all.c:882 #3 tb_link_page (tb=0x60598280 <static_code_gen_buffer+166752>, phys_pc=4095827272, phys_page2=4294967295) at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/accel/tcg/translate-all.c:1628 #4 tb_gen_code (cpu=<optimized out>, pc=<optimized out>, cs_base=0, flags=1626480128, cflags=<optimized out>) at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/accel/tcg/translate-all.c:1831 #5 0x00000000602b2a95 in cpu_loop_exit_restore (cpu=0xf4215000, pc=4096) at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/accel/tcg/cpu-exec-common.c:72 #6 0x00000000602c2ff1 in target_cpu_loop (env=0x0) at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/bsd-user/arm/target_arch_cpu.h:259 #7 0x00000000602c2f89 in target_cpu_loop (env=0x860933c00) In tb_page_add() I see: 1560 /* force the host page as non writable (writes will have a 1561 page fault + mprotect overhead) */ but it looks like something's not implementing that...?
The problem appears to be with this commit: https://github.com/qemu-bsd-user/qemu-bsd-user/commit/63d5d4f649f44f8e3d9105dec40a354d92a19550 That check is indeed needed. qemu relies on delivery of SIGSEGV to detect self-modifying code so that it can update its translation cache accordingly. This will manifest as a page fault, so ksi_trapno is T_PAGEFLT == 0xc on amd64.
(In reply to Mark Johnston from comment #10) Heh, I just arrived at the same conclusion, but hadn't yet found the logs where we were talking about this. IMO we should reapply the change, but correctly (drop T_ALIGNFLT, that seems completely wrong) and with an accurate commit message.
(In reply to Kyle Evans from comment #11) (I suspect I was somehow looking at the wrong trap type values and steered the previous discussion amiss. :-()
(In reply to Kyle Evans from comment #12) Yeah, I couldn't really understand the T_ALIGNFLT check. I can submit a patch tomorrow, but feel free to fix it if you prefer. I have another qemu-user-static bug to look at tomorrow. :)
(In reply to Mark Johnston from comment #10) I confirmed reverting this commit makes it to start working again
https://github.com/qemu-bsd-user/qemu-bsd-user/pull/6
A commit references this bug: Author: kevans Date: Fri Feb 26 04:53:22 UTC 2021 New revision: 566578 URL: https://svnweb.freebsd.org/changeset/ports/566578 Log: emulators/qemu-user-static: update to f7fd10d7677c This features a number of fixes; highlights: - Handle aarch64 YIELD instructions - Bump ARG_MAX to match the FreeBSD default on LP64 platforms - Implement __specialfd(2) and copy_file_range(2) - Style fixes - Fix an issue with binary execution[0] - Fix page fault handling for self-modifying binaries[1] - Suppress noise from CIOGSESSION usage and restore CRIOGET handling - Patch _umtx_op(2) through to the kernel where possible[2] [0] Attempting to execute a binary by name was broken when there was an unrelated entry by the same name in PWD. The report below observed it in the cluster while building games/dobutsu, which tried to execute `xz` in a directory that had an `xz` directory inside of it. [1] From the fixing commit, qemu mprotect()s pages contained translated code to PROT_READ | PROT_EXEC and upgrades protections as needed upon page fault. This was broken in a previous commit that misidentified by the trap # that should have been observed. The observed issue a broken JIT compiler in libpcre. [2] _umtx_op can now be handled by the kernel in cases where the target long size is not longer than the host, and the target and host are the same endianness. This is much more reliable than our previous emulation of these operations, and should reduce hangs sometimes observed in threaded applications. Note that this requires a recent stable/12 or 13.x/-CURRENT. PR: 253375 [0] PR: 253335 [1] MFH: 2021Q1 Changes: head/emulators/qemu-user-static/Makefile head/emulators/qemu-user-static/distinfo
A commit references this bug: Author: kevans Date: Fri Feb 26 04:54:03 UTC 2021 New revision: 566579 URL: https://svnweb.freebsd.org/changeset/ports/566579 Log: MFH: r566578 emulators/qemu-user-static: update to f7fd10d7677c This features a number of fixes; highlights: - Handle aarch64 YIELD instructions - Bump ARG_MAX to match the FreeBSD default on LP64 platforms - Implement __specialfd(2) and copy_file_range(2) - Style fixes - Fix an issue with binary execution[0] - Fix page fault handling for self-modifying binaries[1] - Suppress noise from CIOGSESSION usage and restore CRIOGET handling - Patch _umtx_op(2) through to the kernel where possible[2] [0] Attempting to execute a binary by name was broken when there was an unrelated entry by the same name in PWD. The report below observed it in the cluster while building games/dobutsu, which tried to execute `xz` in a directory that had an `xz` directory inside of it. [1] From the fixing commit, qemu mprotect()s pages contained translated code to PROT_READ | PROT_EXEC and upgrades protections as needed upon page fault. This was broken in a previous commit that misidentified by the trap # that should have been observed. The observed issue a broken JIT compiler in libpcre. [2] _umtx_op can now be handled by the kernel in cases where the target long size is not longer than the host, and the target and host are the same endianness. This is much more reliable than our previous emulation of these operations, and should reduce hangs sometimes observed in threaded applications. Note that this requires a recent stable/12 or 13.x/-CURRENT. PR: 253375 [0] PR: 253335 [1] Changes: _U branches/2021Q1/ branches/2021Q1/emulators/qemu-user-static/Makefile branches/2021Q1/emulators/qemu-user-static/distinfo
Thanks for the report and patch!