Bug 265284 - java/openjdk11: wont run or build running on VMWare on M1 Mac Mini (aarch64)
Summary: java/openjdk11: wont run or build running on VMWare on M1 Mac Mini (aarch64)
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-java (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-07-17 23:46 UTC by Miguel Arroz
Modified: 2023-02-13 10:05 UTC (History)
4 users (show)

See Also:
bugzilla: maintainer-feedback? (java)


Attachments
Error report (27.33 KB, text/plain)
2022-07-17 23:50 UTC, Miguel Arroz
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Miguel Arroz 2022-07-17 23:46:49 UTC
Hardware: Apple Mac Mini M1
VMWare Fusion: 19431034 (Preview 21H1)
Pkg: openjdk11-11.0.15+10.1

Java (openjdk11) crashes when launched on FreeBSD 13.1 running under virtualization using VMWare on an M1 Mac. Simply running "java" causes the crash to happen. Compiling from ports is apparently not possible (tried openjdk11 and 18) because the bootstrappers themselves crash, breaking the configure scripts.

Steps:
1. Create a virtual machine and install FreeBSD 13.1 aarch64.
2. Install openjdk11 using "pkg install openjdk11".
3. Run "java".

Result:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x000000004ad9b190, pid=90863, tid=113952
#
# JRE version:  (11.0.15+10) (build )
# Java VM: OpenJDK 64-Bit Server VM (11.0.15+10-1, mixed mode, tiered, compressed oops, g1 gc, bsd-aarch64)
# Problematic frame:
# v  ~BufferBlob::native signature handlers
#
# Core dump will be written. Default location: /usr/ports/java/openjdk18/java.core
#
# An error report file with more information is saved as:
# /usr/ports/java/openjdk18/hs_err_pid90863.log
Could not load hsdis-aarch64.so; library not loadable; PrintAssembly is disabled
#
#
Abort (core dumped)

I tried to install the same versions of FreeBSD and OpenJDK on QEMU running under emulation and on a Raspberry Pi 3 Model B. Both work fine ("java" launches and prints the long "usage" message).
Comment 1 Miguel Arroz 2022-07-17 23:50:50 UTC
Created attachment 235322 [details]
Error report
Comment 2 Miguel Arroz 2022-07-17 23:54:37 UTC
Apologies for the title swap, this is about openjdk11. I can't test openjdk18 because there's no pre-built pkg, and I can't compile it locally due to bootstrappers also crashing.
Comment 3 Miguel Arroz 2022-07-18 05:11:59 UTC
I may have a theory that explains this. From what I could understand, Apple Silicon chips enforce W^X at the hardware level, and it's not optional. openjdk on FreeBSD requires allowing W^X, even using -Xint (which disables JIT). This means the CPU will prevent openjdk from using W^X even if FreeBSD assumes it's going to work.

If I set kern.elf64.allow_wx to 0, I get the error, already reported in #256477:

> Error occurred during initialization of VM
> Could not reserve enough space in CodeCache (2496K)

When I set it to 1, I get the error I originally reported. With LLDB, I can see it eventually crashes with SIGSEGV, which supports this theory.

This would also explain #260872, which seems similar to this bug but in openjdk8.
Comment 4 Miguel Arroz 2022-07-18 18:02:23 UTC
Some more data: I'm not sure my theory holds. Today I tried:

FreeBSD running on QEMU (UTM) under virtualization (not emulation): same result, java crashes.

However…

Debian running on QEMU (UTM) also under virtualization: openjdk11 seems to work fine, at least it builds and runs a simple Hello World. I could not test on VMWare because the Debian installer wont boot for some reason.

Either OpenJDK on Linux is already dealing with X^W (I couldn't find any info that makes be believe so, I think OpenJDK only supports that on macOS for now since it's required) or the whole W^X is a red herring and that is not what's causing the problem here.
Comment 5 Mikael Urankar freebsd_committer freebsd_triage 2022-07-19 13:39:01 UTC
If you're the same guy as the one on the forum (https://forums.freebsd.org/threads/openjdk-and-freebsd-13-1-on-aarch64.85762/). we are still waiting for your feedback regarding the faulty instruction.
Comment 6 Miguel Arroz 2022-07-19 16:43:08 UTC
I am not the person from the Forum.

Here's the output of gdb and llvm. Let me know if you need additional info.

GDB:

# gdb --args /usr/local/openjdk11/bin/java
GNU gdb (GDB) 12.1 [GDB v12.1 for FreeBSD]
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-portbld-freebsd13.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/openjdk11/bin/java...
(gdb) run
Starting program: /usr/local/openjdk11/bin/java 
[New LWP 100657 of process 7260]
[New LWP 100658 of process 7260]

Thread 2 received signal SIGILL, Illegal instruction.
Illegal trap.
[Switching to LWP 100657 of process 7260]
0x000000004ad9b1b8 in ?? ()
(gdb) disassemble
No function contains program counter for selected frame.



LLVM:

# lldb /usr/local/openjdk11/bin/java
(lldb) target create "/usr/local/openjdk11/bin/java"
Current executable set to '/usr/local/openjdk11/bin/java' (aarch64).
(lldb) run
Process 7265 launched: '/usr/local/openjdk11/bin/java' (aarch64)
Process 7265 stopped
* thread #2, name = 'java', stop reason = signal SIGILL: illegal trap
    frame #0: 0x00003bf396630250
->  0x3bf396630250: add    x1, x24, #0x0             ; =0x0 
    0x3bf396630254: sub    x0, x24, #0x8             ; =0x8 
    0x3bf396630258: mov    x2, #0x0
    0x3bf39663025c: ldr    x8, [x0]
(lldb) bt
* thread #2, name = 'java', stop reason = signal SIGILL: illegal trap
  * frame #0: 0x00003bf396630250
(lldb) disassemble --pc
->  0x3bf396630250: add    x1, x24, #0x0             ; =0x0 
    0x3bf396630254: sub    x0, x24, #0x8             ; =0x8 
    0x3bf396630258: mov    x2, #0x0
    0x3bf39663025c: ldr    x8, [x0]
(lldb) x/i ($pc-4)
    0x3bf39663024c: 0xd65f03c0   ret
Comment 7 Miguel Arroz 2022-07-20 15:05:35 UTC
Some more info: this may actually not be as deterministic as it seemed.

On an M1 chip (Mac Mini, the one I have direct access to), very occasionally I can run “java” (no arguments) and see the usage message. But I can never do that twice in a row, even if it works the first time, it crashes the second one. I could never run a program (ranging in complexity from Hello World to Jenkins), and I could never successfully launch javac.

Another person who kindly offered to test this, on a M1 Pro chip, has the opposite problem: he can run java/javac (including Hello World and Jenkins) most of the time, but occasionally it fails (same error as described here, SIGILL on BufferBlob). I don’t have direct access to an M1 Pro (or any other M* chip) so I can’t grab much data on those.
Comment 8 Miguel Arroz 2022-07-21 23:18:56 UTC
So we confirmed this crashes on code generated by the JIT compiler:

The output of the following command when running java:

> dtrace -n 'inline string process = "java"; ::mmap:entry /execname == process && (arg2 & 0x7) == 0x7/ { this->follow=1; printf("addr=%p size=%p prot=%p", arg0, arg1, arg2) } ::mmap:return /this->follow/ { this->follow = 0; printf("addr=%p", arg0) }'

is:

CPU     ID                    FUNCTION:NAME
  1  54289                       mmap:entry addr=7607c2687000 size=270000 prot=7
  1  54290                      mmap:return addr=ffffffffc2687000
  1  54289                       mmap:entry addr=7607c2c16000 size=270000 prot=7
  1  54290                      mmap:return addr=ffffffffc2c16000
  1  54289                       mmap:entry addr=7607ca14e000 size=270000 prot=7
  1  54290                      mmap:return addr=ffffffffca14e000


I was running java in lldb, and the address where it crashed is inside the first block: ->  0x7607c26ce190: mov    x0, #0x43c

We also confirmed the dtrace is similar in the seldom occasions where "java" runs successfully.

Not exactly sure what this proves aside from the fact the crash happens in generated code. I'm trying to find more info regarding how does W^X protection in M1s work under hypervisors but it's not easy to find anything about that.
Comment 9 Stephen Wall 2022-08-11 18:18:24 UTC
I am also experiencing a SIGILL with VMware Fusion on Mac Mini, but with openjdk8.  I have not tried openjdk11.  I have seen the java usage message only twice, every other attempt was SIGILL.  I can upload my core and/or error log if it will be of any use.