Bug 253335 - emulators/qemu-user-static php segfault building devel/pear for armv7
Summary: emulators/qemu-user-static php segfault building devel/pear for armv7
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Some People
Assignee: Mark Johnston
URL: https://github.com/qemu-bsd-user/qemu...
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-08 03:19 UTC by Mark Johnston
Modified: 2021-02-26 04:55 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Johnston freebsd_committer freebsd_triage 2021-02-08 03:19:32 UTC
We're seeing a php segfault building devel/pear for armv7 on amd64 under qemu-user-static-3.1.0_9.  I repro'ed with a stable/12 armv7 jail and current ports tree on head.  It appears to be crashing in libpcre but I don't have a lot more info yet.  Just logging this here as I debug in case anyone else can provide a clue or a "me too".
Comment 1 Kyle Evans freebsd_committer freebsd_triage 2021-02-08 03:26:58 UTC
Ah, I missed the part where you already engaged in debugging -- feel free to take this back. :-)
Comment 2 Mark Johnston freebsd_committer freebsd_triage 2021-02-08 03:36:18 UTC
(In reply to Kyle Evans from comment #1)
I have zero experience debugging qemu-user-static, so pointers would be appreciated.  I'm waiting for gdb to cross-compile still. :)

I take it I won't be able to run php under emulated gdb, but it should be possible to attach to qemu's gdb stub?

Also this is apparently a regression, not sure what the known good version numbers are yet.  It might be easier to bisect qemu.
Comment 3 Kyle Evans freebsd_committer freebsd_triage 2021-02-08 03:54:08 UTC
(In reply to Mark Johnston from comment #2)

Right, no live debugging, but the gdb stub worked the last time I tried it (maybe a year or two ago? before we rebased to 3.1).

It's been helpful at times to leave the jail running and launch qemu-arm-static manually outside using -L ${sysroot} + dumping cpu state (-d) for faster execution tracing, but that's probably a non-issue for someone much more comfortable with gdb. :-)
Comment 4 Renato Botelho freebsd_committer freebsd_triage 2021-02-08 11:03:16 UTC
FYI,

Same problem doesn't happen on a system running 12.2-STABLE and qemu-user-static version 3.1.0_1.

`FreeBSD buildbot1-nyi.netgate.com 12.2-STABLE FreeBSD 12.2-STABLE acaac0eefa1(stable/12) GENERIC  amd64`

Also, There is an issue opened at qemu-bsd-user github page

https://github.com/qemu-bsd-user/qemu-bsd-user/issues/5
Comment 5 Kyle Evans freebsd_committer freebsd_triage 2021-02-08 14:06:22 UTC
(In reply to Renato Botelho from comment #4)

Given the timing, I'd suspect the recent elfload hacks that I did to try and fix kyua and direct-exec rtld.
Comment 6 Renato Botelho freebsd_committer freebsd_triage 2021-02-08 20:03:06 UTC
(In reply to Kyle Evans from comment #5)
Luiz (loos@) also mentioned he built some aarch64 stuff on that box and he saw issues with lots of other binaries, it's not only PHP.

We also see some messages like this one:

Qemu unsupported ioctl: cmd=0xc0306365 dir=INOUT 'c' 101 48
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2021-02-08 22:45:55 UTC
We're crashing on a write to 0xf4215a70.  Shortly before, we had mmapped a region containing that address:

71585 mmap(0,65536,7,4098,-1,0) = 0xf4206000
71585 mprotect(0xf4206000,0x10000,7) = 0

and I can't see any subsequent system calls that would modify that mapping, but procstat -v shows:

71585         0xf4206000         0xf4215000 rwx    1    2   2   0 ----- df
71585         0xf4215000         0xf4216000 r-x    1    2   2   0 ----- df

so indeed the last page is not writeable.  I'm not sure why libpcre is mprotect()ing a region to set the permissions specified by the preceding mmap() call.
Comment 8 Mark Johnston freebsd_committer freebsd_triage 2021-02-08 23:06:00 UTC
Using truss on the host I can see that we are mprotecting the last page (containing the address in question) of that range to PROT_READ | PROT_EXEC.  It doesn't show up in qemu's strace output, so presumably this is something internal to qemu.  The only syscall which looks relevant here is a sysarch(ARM_SYNC_ICACHE), but it looks like qemu treats that as a no-op...
Comment 9 Mark Johnston freebsd_committer freebsd_triage 2021-02-09 00:17:00 UTC
qemu is doing the mprotect here:

Thread 1 hit Catchpoint 1 (call to syscall mprotect), 0x000000006049f48a in ?? ()                                                                                                                                                                                                                                             
(gdb) bt                                                                                                                                                                                                                                                                                                                      
#0  0x000000006049f48a in ?? ()                                                                                                                                                                                                                                                                                               
#1  0x00000000602b413a in page_find_alloc (index=5, alloc=1)                                                                                                                                                                                                                                                                  
    at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/accel/tcg/translate-all.c:497                                                                                                                                                                                             
#2  page_lock_pair (ret_p1=<optimized out>, phys1=4095827272, ret_p2=<optimized out>, phys2=4294967295, alloc=1)                                                                                                                                                                                                              
    at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/accel/tcg/translate-all.c:882                                                                                                                                                                                             
#3  tb_link_page (tb=0x60598280 <static_code_gen_buffer+166752>, phys_pc=4095827272, phys_page2=4294967295)                                                                                                                                                                                                                   
    at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/accel/tcg/translate-all.c:1628                                                                                                                                                                                            
#4  tb_gen_code (cpu=<optimized out>, pc=<optimized out>, cs_base=0, flags=1626480128, cflags=<optimized out>)                                                                                                                                                                                                                
    at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/accel/tcg/translate-all.c:1831                                                                                                                                                                                            
#5  0x00000000602b2a95 in cpu_loop_exit_restore (cpu=0xf4215000, pc=4096)                                                                                                                                                                                                                                                     
    at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/accel/tcg/cpu-exec-common.c:72                                                                                                                                                                                            
#6  0x00000000602c2ff1 in target_cpu_loop (env=0x0)                                                                                                                                                                                                                                                                           
    at /usr/home/markj/src/freebsd-ports/emulators/qemu-user-static/work/qemu-bsd-user-39244526c0af/bsd-user/arm/target_arch_cpu.h:259                                                                                                                                                                                        
#7  0x00000000602c2f89 in target_cpu_loop (env=0x860933c00)

In tb_page_add() I see:

1560         /* force the host page as non writable (writes will have a                                                                                                                                                                                                                                                       
1561            page fault + mprotect overhead) */

but it looks like something's not implementing that...?
Comment 10 Mark Johnston freebsd_committer freebsd_triage 2021-02-09 05:27:35 UTC
The problem appears to be with this commit: https://github.com/qemu-bsd-user/qemu-bsd-user/commit/63d5d4f649f44f8e3d9105dec40a354d92a19550

That check is indeed needed.  qemu relies on delivery of SIGSEGV to detect self-modifying code so that it can update its translation cache accordingly.  This will manifest as a page fault, so ksi_trapno is T_PAGEFLT == 0xc on amd64.
Comment 11 Kyle Evans freebsd_committer freebsd_triage 2021-02-09 05:41:35 UTC
(In reply to Mark Johnston from comment #10)

Heh, I just arrived at the same conclusion, but hadn't yet found the logs where we were talking about this.

IMO we should reapply the change, but correctly (drop T_ALIGNFLT, that seems completely wrong) and with an accurate commit message.
Comment 12 Kyle Evans freebsd_committer freebsd_triage 2021-02-09 05:44:06 UTC
(In reply to Kyle Evans from comment #11)

(I suspect I was somehow looking at the wrong trap type values and steered the previous discussion amiss. :-()
Comment 13 Mark Johnston freebsd_committer freebsd_triage 2021-02-09 06:35:04 UTC
(In reply to Kyle Evans from comment #12)
Yeah, I couldn't really understand the T_ALIGNFLT check.  I can submit a patch tomorrow, but feel free to fix it if you prefer.  I have another qemu-user-static bug to look at tomorrow. :)
Comment 14 Renato Botelho freebsd_committer freebsd_triage 2021-02-09 11:35:53 UTC
(In reply to Mark Johnston from comment #10)
I confirmed reverting this commit makes it to start working again
Comment 16 commit-hook freebsd_committer freebsd_triage 2021-02-26 04:53:31 UTC
A commit references this bug:

Author: kevans
Date: Fri Feb 26 04:53:22 UTC 2021
New revision: 566578
URL: https://svnweb.freebsd.org/changeset/ports/566578

Log:
  emulators/qemu-user-static: update to f7fd10d7677c

  This features a number of fixes; highlights:
  - Handle aarch64 YIELD instructions
  - Bump ARG_MAX to match the FreeBSD default on LP64 platforms
  - Implement __specialfd(2) and copy_file_range(2)
  - Style fixes
  - Fix an issue with binary execution[0]
  - Fix page fault handling for self-modifying binaries[1]
  - Suppress noise from CIOGSESSION usage and restore CRIOGET handling
  - Patch _umtx_op(2) through to the kernel where possible[2]

  [0] Attempting to execute a binary by name was broken when there was an
  unrelated entry by the same name in PWD.  The report below observed it in the
  cluster while building games/dobutsu, which tried to execute `xz` in a directory
  that had an `xz` directory inside of it.

  [1] From the fixing commit, qemu mprotect()s pages contained translated code
  to PROT_READ | PROT_EXEC and upgrades protections as needed upon page fault.
  This was broken in a previous commit that misidentified by the trap # that
  should have been observed.  The observed issue a broken JIT compiler in
  libpcre.

  [2] _umtx_op can now be handled by the kernel in cases where the target long
  size is not longer than the host, and the target and host are the same
  endianness.  This is much more reliable than our previous emulation of these
  operations, and should reduce hangs sometimes observed in threaded applications.
  Note that this requires a recent stable/12 or 13.x/-CURRENT.

  PR:		253375 [0]
  PR:		253335 [1]
  MFH:		2021Q1

Changes:
  head/emulators/qemu-user-static/Makefile
  head/emulators/qemu-user-static/distinfo
Comment 17 commit-hook freebsd_committer freebsd_triage 2021-02-26 04:54:34 UTC
A commit references this bug:

Author: kevans
Date: Fri Feb 26 04:54:03 UTC 2021
New revision: 566579
URL: https://svnweb.freebsd.org/changeset/ports/566579

Log:
  MFH: r566578

  emulators/qemu-user-static: update to f7fd10d7677c

  This features a number of fixes; highlights:
  - Handle aarch64 YIELD instructions
  - Bump ARG_MAX to match the FreeBSD default on LP64 platforms
  - Implement __specialfd(2) and copy_file_range(2)
  - Style fixes
  - Fix an issue with binary execution[0]
  - Fix page fault handling for self-modifying binaries[1]
  - Suppress noise from CIOGSESSION usage and restore CRIOGET handling
  - Patch _umtx_op(2) through to the kernel where possible[2]

  [0] Attempting to execute a binary by name was broken when there was an
  unrelated entry by the same name in PWD.  The report below observed it in the
  cluster while building games/dobutsu, which tried to execute `xz` in a directory
  that had an `xz` directory inside of it.

  [1] From the fixing commit, qemu mprotect()s pages contained translated code
  to PROT_READ | PROT_EXEC and upgrades protections as needed upon page fault.
  This was broken in a previous commit that misidentified by the trap # that
  should have been observed.  The observed issue a broken JIT compiler in
  libpcre.

  [2] _umtx_op can now be handled by the kernel in cases where the target long
  size is not longer than the host, and the target and host are the same
  endianness.  This is much more reliable than our previous emulation of these
  operations, and should reduce hangs sometimes observed in threaded applications.
  Note that this requires a recent stable/12 or 13.x/-CURRENT.

  PR:		253375 [0]
  PR:		253335 [1]

Changes:
_U  branches/2021Q1/
  branches/2021Q1/emulators/qemu-user-static/Makefile
  branches/2021Q1/emulators/qemu-user-static/distinfo
Comment 18 Kyle Evans freebsd_committer freebsd_triage 2021-02-26 04:55:43 UTC
Thanks for the report and patch!