Bug 271081 - www/firefox: crashes on arm64 with ASLR enabled
Summary: www/firefox: crashes on arm64 with ASLR enabled
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Only Me
Assignee: freebsd-gecko (Nobody)
URL: https://www.freshports.org/www/firefox/
Keywords: crash
Depends on:
Blocks: 259968
  Show dependency treegraph
 
Reported: 2023-04-26 16:18 UTC by Mark Johnston
Modified: 2024-01-26 08:11 UTC (History)
7 users (show)

See Also:
grahamperrin: maintainer-feedback? (gecko)


Attachments
backouts breakage on aarch64 freebsd (577 bytes, text/plain)
2023-12-16 20:38 UTC, Jesper Schmitz Mouridsen
no flags Details
prepared patch to use in www/firefox/files/ (8.48 KB, patch)
2023-12-17 12:17 UTC, Nuno Teixeira
no flags Details | Diff
prentents always w^x something might be wrong with allow_wx ? (574 bytes, text/plain)
2023-12-17 13:56 UTC, Jesper Schmitz Mouridsen
no flags Details
allows running with aslr and fixes jit.. (1.69 KB, patch)
2023-12-18 15:39 UTC, Jesper Schmitz Mouridsen
no flags Details | Diff
align mmap (4.85 KB, patch)
2024-01-21 21:19 UTC, Jesper Schmitz Mouridsen
no flags Details | Diff
use MAP_ALIGNED instead of relocating (595 bytes, patch)
2024-01-22 21:47 UTC, Jesper Schmitz Mouridsen
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Johnston freebsd_committer freebsd_triage 2023-04-26 16:18:58 UTC
firefox crashes very readily on arm64 whereas it seems to be fine on amd64.  The backtrace looks like this, we appear to be segfaulting in js::gc::MapAlignedPages():

(gdb) bt
#0  thr_kill () at thr_kill.S:4
#1  0x00005c284cb2cbc8 in __raise (s=11) at /root/freebsd/lib/libc/gen/raise.c:52
#2  0x00005c2884a6e2b4 in nsProfileLock::FatalSignalHandler(int, __siginfo*, void*) () at /usr/local/lib/firefox/libxul.so
#3  0x00005c288536b408 in WasmTrapHandler(int, __siginfo*, void*) () at /usr/local/lib/firefox/libxul.so
#4  0x00005c284d6397e8 in handle_signal (actp=actp@entry=0x5c295553a460, sig=sig@entry=11, info=info@entry=0x5c295553a4d0, ucp=ucp@entry=0x5c295553a520) at /root/freebsd/lib/libthr/thread/thr_sig.c:301
#5  0x00005c284d638f14 in thr_sighandler (sig=11, info=0x5c295553a4d0, _ucp=0x5c295553a520) at /root/freebsd/lib/libthr/thread/thr_sig.c:246
#6  0x00005c28490ad1a8 in <signal handler called> ()
#7  0x00005c2884f3e27c in js::gc::MapAlignedPages(unsigned long, unsigned long) () at /usr/local/lib/firefox/libxul.so
#8  0x00005c2884f18f28 in js::gc::GCRuntime::getOrAllocChunk(js::AutoLockGCBgAlloc&) () at /usr/local/lib/firefox/libxul.so
#9  0x00005c2884f45a04 in js::Nursery::initFirstChunk(js::AutoLockGCBgAlloc&) () at /usr/local/lib/firefox/libxul.so
#10 0x00005c2884f22d78 in js::gc::GCRuntime::init(unsigned int) () at /usr/local/lib/firefox/libxul.so
#11 0x00005c2884c9e4f8 in JSRuntime::init(JSContext*, unsigned int) () at /usr/local/lib/firefox/libxul.so
#12 0x00005c2884c01764 in js::NewContext(unsigned int, JSRuntime*) () at /usr/local/lib/firefox/libxul.so
#13 0x00005c2880f67d08 in mozilla::CycleCollectedJSContext::Initialize(JSRuntime*, unsigned int) () at /usr/local/lib/firefox/libxul.so
#14 0x00005c288386bb24 in mozilla::dom::workerinternals::(anonymous namespace)::WorkerThreadPrimaryRunnable::Run() () at /usr/local/lib/firefox/libxul.so
#15 0x00005c28810105d8 in nsThread::ProcessNextEvent(bool, bool*) () at /usr/local/lib/firefox/libxul.so
#16 0x00005c2881014cac in NS_ProcessNextEvent(nsIThread*, bool) () at /usr/local/lib/firefox/libxul.so
#17 0x00005c288156a6b8 in mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) () at /usr/local/lib/firefox/libxul.so
#18 0x00005c288152029c in MessageLoop::Run() () at /usr/local/lib/firefox/libxul.so
#19 0x00005c288100df98 in nsThread::ThreadFunc(void*) () at /usr/local/lib/firefox/libxul.so
#20 0x00005c2889bd5dd8 in  () at /usr/local/lib/libnspr4.so
#21 0x00005c284d62f5ec in thread_start (curthread=0x5c2941fced00) at /root/freebsd/lib/libthr/thread/thr_create.c:292
#22 0x00005c284d62f148 in _pthread_create (thread=0x5c2848f3c2a8, attr=<optimized out>, start_routine=<optimized out>, arg=<optimized out>) at /root/freebsd/lib/libthr/thread/thr_create.c:187
Comment 1 Nuno Teixeira freebsd_committer freebsd_triage 2023-06-15 07:03:35 UTC
(In reply to Mark Johnston from comment #0)

Hello Mark,

I'm testing firefox-114.0.1 on 13-STABLE with:

$ proccontrol -m aslr -s disable firefox

and it runs fine without problems.

Recent discussion on:
https://lists.freebsd.org/archives/dev-commits-ports-all/2023-June/067113.html

What should be done here?

- Disable aslr at build time via an option
- Show message at install time about disabling aslr at runtime with proccontrol, for aarch64 and let user use it if he having problems
- Fix it upstream, optimal solution :)

I'm new to aarch64 and I did found some dificulties to get into aslr connection.
I remember that some users at forums having same problem with no answer at all.

Thanks
Comment 2 Nuno Teixeira freebsd_committer freebsd_triage 2023-10-04 07:36:45 UTC
Firefox 118.0.1 crashes with and without aslr.
Any clues?
Comment 3 Graham Perrin 2023-10-07 06:42:43 UTC
(In reply to Nuno Teixeira from comment #2)

Would a backtrace from a debug build help? 

(If you attempt a debug build, is result truly a debug build? <https://github.com/freebsd/poudriere/discussions/1077>)
Comment 4 Nuno Teixeira freebsd_committer freebsd_triage 2023-10-14 12:37:41 UTC
(In reply to Graham Perrin from comment #3)

118.0.1 15.0-CURRENT: procconf -m aslr -s disable firefox

https://people.freebsd.org/~eduardo/logs/firefox/error-sync-1697099126473_118.0.1%2C2.txt

Never tried build firefox with aslr off, should it make any difference?

Thanks
Comment 5 Nuno Teixeira freebsd_committer freebsd_triage 2023-10-14 12:39:28 UTC
(In reply to Nuno Teixeira from comment #4)
(...)

**proccontrol
Comment 6 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2023-11-15 20:19:32 UTC
(In reply to Nuno Teixeira from comment #5)

lldb attached to tab proccess (use MOZ_DEBUG_CHILD_PROCESS=20) the number is seconds..
output also in [1] 

 frame #0: 0x000008a54e146010
->  0x8a54e146010: sub    sp, x28, #0x8
    0x8a54e146014: str    x30, [x28, #-0x8]!
    0x8a54e146018: sub    sp, x28, #0x8
    0x8a54e14601c: str    x29, [x28, #-0x8]!
(lldb) bt
* thread #1, name = 'WebExtensions', stop reason = signal SIGILL: illegal trap
  * frame #0: 0x000008a54e146010
    frame #1: 0x000008a54e1167a0
    frame #2: 0x000000004a0b6db4 libxul.so`js::jit::MaybeEnterJit(JSContext*, js::RunState&) + 576
    frame #3: 0x0000000049a0e4c4 libxul.so`js::RunScript(JSContext*, js::RunState&) + 604
    frame #4: 0x0000000049a0e8e4 libxul.so`js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) + 1036
    frame #5: 0x0000000049a0ed80 libxul.so`js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) + 212
    frame #6: 0x0000000049a698a0 libxul.so`JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>) + 236
    frame #7: 0x0000000046f8f254 libxul.so`mozilla::dom::MessageListener::ReceiveMessage(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::dom::ReceiveMessageArgument const&, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&) + 788
    frame #8: 0x000000004874d3fc libxul.so`mozilla::dom::JSActor::CallReceiveMessage(JSContext*, mozilla::dom::JSActorMessageMeta const&, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>, mozilla::ErrorResult&) + 496
    frame #9: 0x000000004874d688 libxul.so`mozilla::dom::JSActor::ReceiveMessage(JSContext*, mozilla::dom::JSActorMessageMeta const&, JS::Handle<JS::Value>, mozilla::ErrorResult&) + 252
    frame #10: 0x000000004874fb50 libxul.so`mozilla::dom::JSActorManager::ReceiveRawMessage(mozilla::dom::JSActorMessageMeta const&, mozilla::Maybe<mozilla::dom::ipc::StructuredCloneData>&&, mozilla::Maybe<mozilla::dom::ipc::StructuredCloneData>&&) + 704
    frame #11: 0x0000000048652e00 libxul.so`mozilla::dom::WindowGlobalChild::RecvRawMessage(mozilla::dom::JSActorMessageMeta const&, mozilla::Maybe<mozilla::dom::ClonedMessageData> const&, mozilla::Maybe<mozilla::dom::ClonedMessageData> const&) + 368
    frame #12: 0x000000004873d910 libxul.so`mozilla::dom::PWindowGlobalChild::OnMessageReceived(IPC::Message const&) + 4140
    frame #13: 0x00000000486bfaa0 libxul.so`mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) + 700
    frame #14: 0x00000000464cdf50 libxul.so`mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) + 200
    frame #15: 0x00000000464cd090 libxul.so`mozilla::ipc::MessageChannel::DispatchMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::UniquePtr<IPC::Message, mozilla::DefaultDelete<IPC::Message>>) + 324
    frame #16: 0x00000000464cd408 libxul.so`mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::ActorLifecycleProxy*, mozilla::ipc::MessageChannel::MessageTask&) + 264
    frame #17: 0x00000000464cd8fc libxul.so`mozilla::ipc::MessageChannel::MessageTask::Run() + 148
    frame #18: 0x0000000045f607f4 libxul.so`mozilla::RunnableTask::Run() + 32
    frame #19: 0x0000000045f5dc14 libxul.so`mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) + 1600
    frame #20: 0x0000000045f5cc20 libxul.so`mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) + 52
    frame #21: 0x0000000045f5ced0 libxul.so`mozilla::TaskController::ProcessPendingMTTask(bool) + 80
    frame #22: 0x0000000045f62dc8 libxul.so`mozilla::detail::RunnableFunction<mozilla::TaskController::TaskController()::$_4>::Run() + 24
    frame #23: 0x0000000045f7060c libxul.so`nsThread::ProcessNextEvent(bool, bool*) + 968
    frame #24: 0x0000000045f74d2c libxul.so`NS_ProcessNextEvent(nsIThread*, bool) + 92
    frame #25: 0x00000000464d092c libxul.so`mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) + 232
    frame #26: 0x00000000464874d8 libxul.so`MessageLoop::Run() + 92
    frame #27: 0x0000000048ab4aa8 libxul.so`nsBaseAppShell::Run() + 48
    frame #28: 0x0000000049941c64 libxul.so`XRE_RunAppShell() + 100
    frame #29: 0x00000000464874d8 libxul.so`MessageLoop::Run() + 92
    frame #30: 0x0000000049941a20 libxul.so`XRE_InitChildProcess(int, char**, XREChildData const*) + 1184
    frame #31: 0x0000000000132574 firefox-bin`main + 780
    frame #32: 0x0000000040500578 libc.so.7`__libc_start1(argc=22, argv=0x0000ffffffffe5d8, env=0x0000ffffffffe690, cleanup=<unavailable>, mainX=(firefox-bin`main)) at libc_start1.c:157:7
    frame #33: 0x0000000000132134 firefox-bin`_start at crt1_s.S:60

[1] https://gist.github.com/jsm222/e6199a03142f5716921c82c3d2f3ddc5
Comment 7 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2023-12-16 20:38:13 UTC
Created attachment 247087 [details]
backouts breakage on aarch64 freebsd

Only for testing a better patch should be made.. Still run with nosalr
Comment 8 Nuno Teixeira freebsd_committer freebsd_triage 2023-12-17 09:40:13 UTC
(In reply to Jesper Schmitz Mouridsen from comment #7)

Hello Jasper,

It seems that firefox 121 is already patched as I did tried apply uploaded patch (reverse (or previouly applied) patch detected).

I will test 121.0,2 (11 Dec 2023) and share results.

Cheers
Comment 9 Nuno Teixeira freebsd_committer freebsd_triage 2023-12-17 12:17:38 UTC
Created attachment 247095 [details]
prepared patch to use in www/firefox/files/

reverse applied patch prepared to use in www/firefox/files/
Comment 10 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2023-12-17 13:52:45 UTC
(In reply to Nuno Teixeira from comment #9)
Try this one 

 /*
  * The original fdlibm code used statements like:
diff -r d80eefe94738 modules/libpref/init/StaticPrefList.yaml
--- a/modules/libpref/init/StaticPrefList.yaml  Tue Nov 28 21:01:37 2023 +0000
+++ b/modules/libpref/init/StaticPrefList.yaml  Sun Dec 17 14:50:57 2023 +0100
@@ -7662,7 +7662,7 @@
 # or executable but never both at the same time. OpenBSD defaults to W^X.
 - name: javascript.options.content_process_write_protect_code
   type: bool
-#if defined(XP_OPENBSD)
+#if defined(XP_OPENBSD) || defined(XP_FREEBSD)
   value: true
 #else
   value: false
Comment 11 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2023-12-17 13:56:28 UTC
Created attachment 247100 [details]
prentents always w^x something might be wrong with allow_wx ?

This runs both with kern.elf64.allow_wx=0 and kern.elf64.allow_wx=1. 
But unpatched is does not run with kern.elf64.allow_wx=1 (Which it should if i do not mixup the values)
Comment 12 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2023-12-18 03:58:46 UTC
Further investegation:

allow_wx=1 (e.g not enforced might to misbehave)

https://gist.github.com/jsm222/38279218adf608b48985c174cedad014
Comment 13 Kyle Evans freebsd_committer freebsd_triage 2023-12-18 05:13:30 UTC
The problem with these last reproducers seems to be insufficient barrier between writes to the mapped page and executing code out of it; it seems like it'd be unlikely for a project like firefox to get it wrong, but maybe they're doing the same (or perhaps some other kind of caching)
Comment 14 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2023-12-18 07:17:40 UTC
diff -r d80eefe94738 js/src/jit/arm64/vixl/MozCpu-vixl.cpp
--- a/js/src/jit/arm64/vixl/MozCpu-vixl.cpp     Tue Nov 28 21:01:37 2023 +0000
+++ b/js/src/jit/arm64/vixl/MozCpu-vixl.cpp     Mon Dec 18 08:06:04 2023 +0100
@@ -110,7 +110,7 @@
   FlushInstructionCache(GetCurrentProcess(), address, length);
 #elif defined(XP_DARWIN)
   sys_icache_invalidate(address, length);
-#elif defined(__aarch64__) && (defined(__linux__) || defined(__android__))
+#elif defined(__aarch64__) && (defined(__linux__) || defined(__android__) || defined(__FreeBSD__))
   // Implement the cache synchronisation for all targets where AArch64 is the
   // host, even if we're building the simulator for an AAarch64 host. This
   // allows for cases where the user wants to simulate code as well as run it

This one works for me as a single patch (i.e all other attempts can be disregarded) did only a sparse test on an incremental build but all cache related code was guarded out. Still needs +noaslr.
Comment 15 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2023-12-18 15:39:27 UTC
Created attachment 247141 [details]
allows running with aslr and fixes jit..

The memory part MAP_FIXED is to avoid (desired != region) (do not know how that relates to aslr but with this it runs with aslr), the jit code has to do some cache sync in order to execute the wirte+exec pages, it was guarded out, and the problem was masked under w^x because mprotect also does some cache syncing.. ,thanks Kyle Evans.
Hopefully someone will pick this up for a test. Do not forget bug #275247 Only tested on aarch64 rpi4 and rk3399.
Comment 16 Nuno Teixeira freebsd_committer freebsd_triage 2023-12-19 08:12:42 UTC
(In reply to Jesper Schmitz Mouridsen from comment #15)

Excellent news!

- wirple.com (webgl benchmarking): OK
- mail.google.com: OK
- facebook.com: OK
- youtube.com: OK
- freebsd.org: OK
- sync (accounts.firefox.com): passwords and bookmarks: OK

- Very fast. Have the impression that have faster response compared to qutebrowser and chromium.
- Ram: OK
- cpu: OK

I can run more tests if needed.

Very happy!
Thanks!
Comment 17 Nuno Teixeira freebsd_committer freebsd_triage 2023-12-19 08:19:36 UTC
(In reply to Nuno Teixeira from comment #16)
(...)

aslr on :)
Comment 18 Nuno Teixeira freebsd_committer freebsd_triage 2023-12-21 11:40:40 UTC
Sugestion of adding uploaded patches https://bugs.freebsd.org/bugzilla/attachment.cgi?id=247141 as EXTRA_PATCHES only for aarch64 ARCH and bump PORTREVISION.

This way we prevent any side effects on other archs that do not match aarch64.

Any thoughts?
Comment 19 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2023-12-22 15:02:42 UTC
(In reply to Nuno Teixeira from comment #18)
the MAP_FIXED needs some testing and thoughts, the other one only applies to aarch64 already..
Comment 20 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2023-12-26 15:18:33 UTC
(In reply to Jesper Schmitz Mouridsen from comment #19)
upstream bug is here for the jit part
https://bugzilla.mozilla.org/show_bug.cgi?id=1871969
Comment 21 Nuno Teixeira freebsd_committer freebsd_triage 2023-12-29 15:41:38 UTC
(In reply to Jesper Schmitz Mouridsen from comment #20)
Nice!
Comment 22 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2024-01-21 21:19:14 UTC
Created attachment 247827 [details]
align mmap

From https://reviews.freebsd.org/rS343964 I concluded that MAP_FIXED simply turns of randomization(?) with aslr enabled. 

With aslr enabled without the MAP_FIXED hack not all pages were aligned causing calls to TryToAlignChunk which somehow causes failures. I cannot yet explain why TryToAlignChunk fails. It did not seem to ever get called with aslr disabled. Thus a second hackish attempt to fix running with aslr enabled on aarch64, is attached, which tries to ensure alignment to the desired alignment, if I did not misread the mmap man page. I marked the other attachment obsolete because the jit part hopefully gets accepted by upstream.
Comment 23 Nuno Teixeira freebsd_committer freebsd_triage 2024-01-22 08:33:10 UTC
(In reply to Jesper Schmitz Mouridsen from comment #22)

While jit patch isn't included upstream, we need to apply:

https://bugs.freebsd.org/bugzilla/attachment.cgi?id=247827

+

patch-js_src_jit_arm64_vixl_MozCpu-vixl.cpp

--- js/src/jit/arm64/vixl/MozCpu-vixl.cpp.orig	2023-12-11 20:42:06 UTC
+++ js/src/jit/arm64/vixl/MozCpu-vixl.cpp

-#elif defined(__aarch64__) && (defined(__linux__) || defined(__android__))
+#elif defined(__aarch64__) && (defined(__linux__) || defined(__android__)|| defined(__FreeBSD__))

Right?

Cheers
Comment 24 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2024-01-22 19:19:11 UTC
(In reply to Nuno Teixeira from comment #23)
Perfectly right I was hoping for a quick upstream commit..
Comment 25 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2024-01-22 21:47:24 UTC
Created attachment 247860 [details]
use MAP_ALIGNED instead of relocating

I still cannot find the difference between amd64 and arm64, but TryToAlignChunk very often fails on both platforms apparently because aslr randomizes the requested aligned addresses in a way so they are no longer aligned as desired, so my idea is to use MAP_ALIGNED instead of trying to relocate. Thoughts?
Comment 26 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2024-01-25 22:10:34 UTC
(In reply to Jesper Schmitz Mouridsen from comment #25)
Ok the difference in my understanding between arm64 and arm64/aarch64 on AMD64 the address pointer is always a valid firefoz jit region i.e 47th bit and above are never set. This is not true for arm64 so MapAlignedPagesRandom returns null when the last attempts are not in a validRange. Furthermore I cannot confirm that linuxes randomizes mmap'ed virtual addresses, so TryToAlignChunk works better there.
Comment 27 Jesper Schmitz Mouridsen freebsd_committer freebsd_triage 2024-01-25 22:29:18 UTC
https://bugzilla.mozilla.org/show_bug.cgi?id=1876632 upstream bug
Comment 28 Nuno Teixeira freebsd_committer freebsd_triage 2024-01-26 08:11:12 UTC
(In reply to Jesper Schmitz Mouridsen from comment #27)

Running firefox for 3 days without any kind of issues.
Thanks for awesome work!

Really curious about upstream feedback :)

Cheers