Bug 240213 - www/firefox compilation fails on powerpc64 due to node.js crashing (SIGBUS) due to a HMI event
Summary: www/firefox compilation fails on powerpc64 due to node.js crashing (SIGBUS) d...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: powerpc Any
: --- Affects Some People
Assignee: Bugmeister
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-30 16:47 UTC by Gustavo Romero
Modified: 2023-08-22 20:54 UTC (History)
3 users (show)

See Also:


Attachments
Error log from poudriere (68.50 KB, application/gzip)
2019-08-30 16:50 UTC, Gustavo Romero
no flags Details
dmesg on HMI and sigbus (1012 bytes, text/plain)
2019-08-30 16:51 UTC, Gustavo Romero
no flags Details
OPAL logs (205.53 KB, text/plain)
2019-08-30 16:52 UTC, Gustavo Romero
no flags Details
kernel version (2.23 KB, text/plain)
2019-08-30 16:53 UTC, Gustavo Romero
no flags Details
node.js core dump list (409 bytes, text/plain)
2019-08-30 16:55 UTC, Gustavo Romero
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gustavo Romero 2019-08-30 16:47:24 UTC
Currently Firefox compilation inside poudriere is falling (please see full log attached firefox-68.0.2_1,1.log) due to a SIGBUS caught on node.js executable 'node'.

ports head used is:
[root@p9 //usr/obj/tree2]# git log --oneline -1
f47d97e0e832 (HEAD -> master, origin/master, origin/HEAD) Get rid of the deprecated @exec and @unexec

dmesg shows something like:
Hypervisor Maintenance Event received(Severity 0, type 1, HMER: 2040000000000000).
pid 2551 (node), jid 479, uid 65534: exited on signal 10 (core dumped)

'node' itself looks to crash at different instructions, mostly load/store ones but in one case a trivial compare instruction, which does not access any position in the storage, was also found, as I believe, to generated the HMI event:

(gdb) x/i $pc
=> 0x109d9258 <._ZN2v88internal9Scavenger14ScavengeObjectINS0_18FullHeapObjectSlotEEENS0_18SlotCallbackResultET_NS0_10HeapObjectE+1048>:	std     r29,0(r9)
(gdb) x/i $pc
=> 0x109d8e68 <._ZN2v88internal9Scavenger14ScavengeObjectINS0_18FullHeapObjectSlotEEENS0_18SlotCallbackResultET_NS0_10HeapObjectE+40>:	ld      r9,0(r26)
(gdb) x/pi $pc
=> 0x812d646c8:	cmpdi   r3,0
(gdb) x/i $pc
=> 0x109debcc <._ZN2v88internal9Scavenger7ProcessEPNS0_14OneshotBarrierE+5228>:	ld      r9,8(r9)

On inspecting the OPAL ring buffer the following can be found:
[213443.350925782,7] HMI: Received HMI interrupt: HMER = 0x2040000000000000
[213443.350954904,7] HMI: Core WOF = 0x0000000020000000 recovered error:
[213443.350956412,7] HMI: LSU - SLB multi hit
(please see attached logs for all details, including how to grab the log using 'pdbg')

Although the HMI severity is 0 (see above) meaning it can safely be ignored as a CPU event critical event, one of the causes for a "HMI: LSU - SLB multi hit" are duplicated entries in SLB which are hit once a EA (Effective Address) is looked up in to the SLB, so it apparently points to some MM issue on FreeBSD kernel.

I'll proceed to create (hopefully) a test case to hit the issue outside the node.js.
Comment 1 Gustavo Romero 2019-08-30 16:50:10 UTC
Created attachment 207027 [details]
Error log from poudriere

Firefox build error log from poudriere:
# poudriere testport -i -j thejail -p head -o www/firefox
Comment 2 Gustavo Romero 2019-08-30 16:51:28 UTC
Created attachment 207028 [details]
dmesg on HMI and sigbus

dmesg as seem after the node.js crashes
Comment 3 Gustavo Romero 2019-08-30 16:52:22 UTC
Created attachment 207029 [details]
OPAL logs

OPAL logs as seem after the node.js crashes
Comment 4 Gustavo Romero 2019-08-30 16:53:46 UTC
Created attachment 207030 [details]
kernel version

kernel version
Comment 5 Gustavo Romero 2019-08-30 16:55:03 UTC
Created attachment 207031 [details]
node.js core dump list

node.js list of generated coredumps (might differ from run to run)
Comment 6 Mark Linimon freebsd_committer freebsd_triage 2023-08-22 20:47:16 UTC
With bugmeister hat, assign to bugmeister.  (Current assignee is
non-committer.)

To submitter: is this aging PR still relevant?
Comment 7 Justin Hibbits freebsd_committer freebsd_triage 2023-08-22 20:54:43 UTC
This was fixed 3 years ago with 81962477fc.