Bug 237960 - lang/erlang does not build on arm with the NATIVE option
Summary: lang/erlang does not build on arm with the NATIVE option
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Erlang FreeBSD Mailing List
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-05-18 01:30 UTC by Bertrand Petit
Modified: 2019-05-28 15:48 UTC (History)
4 users (show)

See Also:
koobs: maintainer-feedback? (erlang)


Attachments
Build log with Hipe and NATIVE (694.90 KB, text/plain)
2019-05-18 01:32 UTC, Bertrand Petit
no flags Details
Build log without Hipe and without NATIVE (448.81 KB, text/plain)
2019-05-18 01:34 UTC, Bertrand Petit
no flags Details
Path for arm disabling optimization and replacing linux syscall with builtins (2.82 KB, patch)
2019-05-22 05:07 UTC, Bertrand Petit
no flags Details | Diff
Patch fixing erlang build on arm by using clang80 and no NATIVE (3.42 KB, patch)
2019-05-28 14:21 UTC, Bertrand Petit
no flags Details | Diff
Patch fixing erlang build on arm by using clang80 and no NATIVE (3.53 KB, patch)
2019-05-28 15:48 UTC, Bertrand Petit
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Bertrand Petit 2019-05-18 01:30:19 UTC
The erland/OTP port (lang/erlang) fails to build on a RPI when the NATIVE
option is selected in addition to the pre-selected ones. Compilation halts on
an bad system call:

[...]
erlc -W  -DMERL_NO_TRANSFORM +debug_info -pa ../ebin -pa ./ -I../include +native +nowarn_shadow_vars +warn_unused_import  -o ./ merl_transform.erl
erlc -W +debug_info -pa ../ebin -pa ./ -I../include +native +nowarn_shadow_vars +warn_unused_import  -o../ebin merl_transform.erl
gmake[5]: *** [Makefile:74: ../ebin/merl_transform.beam] Bad system call (core dumped)
gmake[5]: Leaving directory '/usr/obj/ports/usr/ports/lang/erlang/work/otp-OTP-19.3.6.13/lib/syntax_tools/src'


This build was attempted on a RPI2 using a 11-STABLE base system built from svn revision 345570. The core dump does not look readily informative:

# gdb ../../../bin/armv6-portbld-freebsd11.2/beam.smp beam.smp.core
[...]
(gdb) bt
#0  0x001b1e30 in try_alloc ()
#1  0x001b1cf0 in hipe_alloc_code ()
#2  0x001aab04 in hipe_bifs_enter_code_2 ()
#3  0x00043e3c in $a.18 ()
#4  0x00043e3c in $a.18 ()


Since I'm new at bean/erlang/otp I have no idea on how to further investigate this issue. Having beam machines on the inexpensive RPIs would be extremely useful to experiment with the distributive features of erlang.

As I did suspect that some-thing might be amiss with the HiPE compiler on arm,
I also attempted a build without it and without NATIVE. That build also failed
for another reason which is oulined bellow:

erlc -W  +debug_info -I../include -I../../kernel/include -Werror -o../ebin dets_utils.erl
gmake[5]: *** [/usr/obj/ports/usr/ports/lang/erlang/work/otp-OTP-19.3.6.13/make/armv6-portbld-freebsd11.2/otp.mk:119: ../ebin/dets_utils.beam] Segmentation fault (core dumped)
gmake[5]: Leaving directory '/usr/obj/ports/usr/ports/lang/erlang/work/otp-OTP-19.3.6.13/lib/stdlib/src'

Again the stack trace in uninformative:

# gdb ../../../bin/armv6-portbld-freebsd11.2/beam.smp beam.smp.core
[...]
(gdb) bt
#0  0x00000004 in ?? ()


The only build that did complete and eventually install is with HiPE and without NATIVE. However, at that point, I'm left in a confused state: should I trust this virtual machine? I'm inclined to not give it my trust.
Comment 1 Bertrand Petit 2019-05-18 01:32:58 UTC
Created attachment 204437 [details]
Build log with Hipe and NATIVE
Comment 2 Bertrand Petit 2019-05-18 01:34:35 UTC
Created attachment 204438 [details]
Build log without Hipe and without NATIVE
Comment 3 mikael.urankar 2019-05-19 15:06:10 UTC
(In reply to Bertrand Petit from comment #1)
erlang uses a linux syscall to flush cache [1] :/
you can use __clear_cache instead.

also, armv6 doesn't have smp extension (at least on the board we support), you should uncheck the SMP option.

I had to use CFLAGS+= -O0 (or 03) to build erlang, -O1 and -O2 leads to random crash.

https://github.com/erlang/otp/blob/master/erts/emulator/hipe/hipe_arm.c#L40
Comment 4 Bertrand Petit 2019-05-22 05:05:41 UTC
(In reply to mikael.urankar from comment #3)
Thank you Mikael for your valuable feedback. As suggested I did replace the linux syscall use with rt builtins, that fixed the SIGSYS issue. I confirm there is an issue with the base clang when using optimization, I had to disable it. I suppose that employing llvm80 may help resolve this issue, however I can't test it as I don't have enough storage on that host.

Please see the joined patch which permits an installation with both HIPE and NATIVE.
Comment 5 Bertrand Petit 2019-05-22 05:07:27 UTC
Created attachment 204533 [details]
Path for arm disabling optimization and replacing linux syscall with builtins
Comment 6 Bertrand Petit 2019-05-24 04:28:52 UTC
Please see also this related PR:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237465
Comment 7 Bertrand Petit 2019-05-24 05:11:55 UTC
I eventually managed to compile lang/erlang on armv6 with the HiPE and NATIVE options using clan80. The build has terminated without errors, this again confirms an issue with base clang on 11.2. I launched the test suite to exercise the build, I suppose this could take some days to complete on a RPi2. Meanwhile I will update the proposed patch.
Comment 8 mikael.urankar 2019-05-24 11:00:47 UTC
(In reply to Bertrand Petit from comment #4)
armv7 supports SMP
Comment 9 Dave Cottlehuber freebsd_committer 2019-05-24 12:52:21 UTC
Super bug hunting!

I'd strongly recommend submitting this upstream directly (patch or no patch),
they're very receptive to FreeBSD in general and IIRC do test continuously
internally.

https://bugs.erlang.org/

BTW my general advice for both HiPE and native is not to use them unless
you're hitting limits and you can see that your workload benefits from
them. In some cases, they don't help, and their broad community usage is
much less than "plain BEAM". OTP 22, for example, adds new opcodes to the VM
that are not supported by HiPE.
Comment 10 Bertrand Petit 2019-05-28 14:18:43 UTC
(In reply to Dave Cottlehuber from comment #9)
> BTW my general advice for both HiPE and native is not to use them unless
> you're hitting limits 

I concur with you and add that HiPE is unusable on armv6. I prematurely interrupted the test suite when I observed the occurrence of countless beam segfaults in the system log.

After exploring part of the port configuration space on an RPi, I came to the conclusion that base clang is unsuitable to build OTP and that NATIVE should not be used at all.

I propose a patch which include:
- a fix for the linux-only system call in HiPE.
- constrain options availability on armv6
- build OTP with clang80 on armv6
- add a test target that builds and run the full test suite.

There is still an issue I've not addressed: the test suite directly executes make(1) but it should execute gmake(1) instead.
Comment 11 Bertrand Petit 2019-05-28 14:21:40 UTC
Created attachment 204672 [details]
Patch fixing erlang build on arm by using clang80 and no NATIVE
Comment 12 Bertrand Petit 2019-05-28 15:17:20 UTC
The upstreams bug report is https://bugs.erlang.org/projects/ERL/issues/ERL-958
Comment 13 mikael.urankar 2019-05-28 15:31:21 UTC
(In reply to Bertrand Petit from comment #11)
can you also disable NATIVE on armv7? thanks.
Comment 14 Bertrand Petit 2019-05-28 15:48:00 UTC
Created attachment 204673 [details]
Patch fixing erlang build on arm by using clang80 and no NATIVE

The native option is excluded for both armv6 and armv7.