Bug 207096

Summary: archivers/lzo2: on arm cortex-a7, ./lzotest/lzotest -mlzo -n2 -q ./COPYING gets "Signal 10" and stops the build
Product: Ports & Packages Reporter: Mark Millard <marklmi26-fbsd>
Component: Individual Port(s)Assignee: Matthias Andree <mandree>
Status: Closed Overcome By Events    
Severity: Affects Only Me CC: ian, marklmi26-fbsd
Priority: --- Flags: mandree: maintainer-feedback+
Version: Latest   
Hardware: Any   
OS: Any   

Description Mark Millard 2016-02-11 02:20:58 UTC
[Context basics: projects/clang380-import -r295351 for buildworld/buildkernel that targeted an rpi2 (armv7-a/cortex-a7). This context requires strict alignment: FreeBSD 11.0-CURRNET has SCTLR bit[1]==1 for such contexts.]

When I attempt portinstall/powermaster builds for an arm rpi2 context that depend on lzo2 I get the following.

. . .

   LZO configuration summary
   LZO version                : 2.09
   configured for host        : armv6-portbld-freebsd11.0
   source code location       : .
   compiler                   : /usr/bin/clang
   preprocessor definitions   : -DLZO_HAVE_CONFIG_H=1
   preprocessor flags         : 
   compiler flags             : -O -pipe -target armv6--freebsd11.0-gnueabi -march=armv7-a -mcpu=cortex-a7 -mfloat-abi=softfp -mno-unaligned-access -mfloat-abi=softfp  -fno-strict-aliasing
   build static library       : yes
   build shared library       : yes
   enable i386 assembly code  : no
. . .
===>  Running self-tests for lzo2-2.09 (can take a few minutes)
cd /usr/obj/portswork/usr/ports/archivers/lzo2/work/lzo-2.09 && /usr/bin/env MALLOC_OPTIONS=jz make check test SHELL="/bin/sh -x"
make  check-local
./lzotest/lzotest -mlzo -n2 -q ./COPYING
*** Signal 10

make[3]: stopped in /usr/obj/portswork/usr/ports/archivers/lzo2/work/lzo-2.09
*** Error code 1
. . .

(which stops the overall build).

Stop reading the description here if you do not care about supporting details at this point.

Other context details:

# freebsd-version -ku; uname -aKU
FreeBSD rpi2 11.0-CURRENT FreeBSD 11.0-CURRENT #14 r295351M: Sun Feb  7 03:23:24 PST 2016     markmi@FreeBSDx64:/usr/obj/clang/arm.armv6/usr/src/sys/RPI2-NODBG  arm 1100097 1100097

# more /etc/make.conf 
CFLAGS+=-target ${TO_TYPE}--freebsd${VERSION_CONTEXT}-gnueabi -march=armv7-a -mcpu=cortex-a7 -mfloat-abi=softfp -mno-unaligned-access
.if ${.MAKE.LEVEL} == 0
.export CC
.export CXX
.export CPP

# gdb /usr/obj/portswork/usr/ports/archivers/lzo2/work/lzo-2.09/lzotest/.libs/lzotest /var/crash/lzotest.90450.core 
GNU gdb 6.1.1 [FreeBSD]
. . .
#0  _lzo_config_check () at src/lzo_init.c:117
117	    r &= UA_GET_NE16(p) == 0;
(gdb) bt
#0  _lzo_config_check () at src/lzo_init.c:117
#1  0x2007de68 in __lzo_init_v2 (v=8336, s1=2, s2=4, s3=4, s4=4, s5=4, s6=4, s7=4, s8=4, s9=24) at src/lzo_init.c:226
#2  0x0000b958 in main (argc=5, argv=0xbfbfe928) at lzotest/lzotest.c:1916

For reference. . .

(gdb) info reg
r0             0xbfbfe581	-1077942911
. . .
lr             0x2007d8b0	537385136
pc             0x2007d8b4	537385140
. . .
0x2007d8ac <_lzo_config_check+192>:	bl	0x2007dc08 <u2p>
0x2007d8b0 <_lzo_config_check+196>:	str	r0, [r11, #-36]
0x2007d8b4 <_lzo_config_check+200>:	ldrh	r0, [r0]
. . .
    u.a[0] = u.a[1] = 0;
    u.b[0] = 1; u.b[3] = 2;
    p = u2p(&u, 1);
    r &= UA_GET_NE16(p) == 0;
Comment 1 Mark Millard 2016-02-11 02:33:23 UTC
(In reply to Mark Millard from comment #0)

Yea, right, "powermaster".

"portmaster" would be more like it.
Comment 2 Mark Millard 2016-02-11 02:38:07 UTC
(In reply to Mark Millard from comment #0)

As for what vintage of the port was involved:

# svnlite info archivers/lzo2
Path: archivers/lzo2
Working Copy Root Path: /usr/ports
URL: https://svn0.us-west.freebsd.org/ports/head/archivers/lzo2
Relative URL: ^/head/archivers/lzo2
Repository Root: https://svn0.us-west.freebsd.org/ports
Repository UUID: 35697150-7ecd-e111-bb59-0022644237b5
Revision: 408464
Node Kind: directory
Schedule: normal
Last Changed Author: bapt
Last Changed Rev: 380298
Last Changed Date: 2015-03-02 23:01:26 +0000 (Mon, 02 Mar 2015)
Comment 3 Matthias Andree freebsd_committer 2016-03-01 00:16:42 UTC
1a. How would that be a port bug, rather than a compiler bug? 
1b. Do we need to change compiler flags for ARM, or certain types of ARM?
2. Out of curiosity, does the problem occur on formal FreeBSD releases such as 10.2?
Comment 4 Mark Millard 2016-03-01 01:17:52 UTC
(In reply to Matthias Andree from comment #3)

[Not in order.]

> 2. Out of curiosity, does the problem occur on formal FreeBSD releases such as 10.2?

10.x does not support/have /usr/src/sys/arm/conf/RPI2 . And before clang++ 3.8.0 clang++ also had alignment problems causing the compiler to get Signal 10 in a -march=armv7-a and/or -mcpu=cortext-a7 context, which that kept it from running in the context in question. I had to switch to the clang 3.8.0 context.

[So in some ways the submittal is sort of an early warning for when 11.0 becomes official.]

For -march=armv7-a and/or -mcpu=cortex-a7 FreeBSD configures things to require strict alignment: no misaligned accesses allowed. /usr/src/sys/arm/conf/RPI2 configures the kernel for armv7-a. C/C++ code and compiler options must be set up to produce only strict alignment if the produced code is to be used under a /usr/src/sys/arm/conf/RPI2 FreeBSD build.

This is not necessarily true of all other arm contexts, such as armv6 without the armv7 targeting: FreeBSD may not configure for strict alignment requirements. (The SCTLR bit[1] == 1 in used for armv7 to cause strict alignment need not even be defined.)

[Note: The kernel would require a lot of work to support SCTLR bit[1] == 0 on armv7 as I understand. Linux chose to do that extra work as I understand: Linux does not require strict alignment for armv7 if I understand right.]

So testing the wrong type of context is not appropriate: a certain amount of context is required for the test results to apply.

> 1a. How would that be a port bug, rather than a compiler bug?

Even clang++ 3.7.1 (written in C++) itself had to be fixed (in 3.8.0) to avoid generating misaligned accesses during compiler operation. This was done in order to allow the clang++ to be used in armv7 SCTLR bit[1] == 1 (strict alignment) contexts and on (some?) sparc's that also require strict alignment for some operating systems.

The C/C++ source code and the compiler options have to be used together to avoid misalignment. It is not an automatic compiler result for arbitrary C/C++ code.

While options like -mno-unaligned-access will make the compiled code avoid adding new misalignments as "optimizations" when the original code does not misalign it will not repair code that directly generates misalignments. (The alignment fixes to clang++ were all source code fixes, not compiler option changes.)

1b. Do we need to change compiler flags for ARM, or certain types of ARM?

A kernel built using /usr/src/sys/arm/conf/RPI2 configures for armv7-a and strict alignment. I happen to have also be experimenting with buildworld also targeting armv7-a explicitly and so have extra compiler options for that. I also explicitly used -mno-unaligned-access. (So that option was not enough to fix things but should be used.)

Using /usr/src/sys/arm/conf/RPI-B would not configure the kernel for armv7-a (or for strict alignment). The older RPI's are not armv7 based. This older context would not show the problem.

If you want things to always work with less variations in configuration you can target strict alignment all the time.

-mno-unaligned-access would be involved for correct compilation targeting strict alignment but is not sufficient by itself.
Comment 5 Matthias Andree freebsd_committer 2016-05-23 22:18:13 UTC
Mark, how do we go forward with this?  

You wrote a lot about the kernel, but we're talking about the lzo2 port, no?

So, is there anything I can check from archivers/lzo2/Makefile to get it to compile a working executable on a Raspberry Pi 2?  Such as: Change CFLAGS, force a certain compiler for ARM, anything?  

I don't have such hardware nor FreeBSD 11, I'm stuck here - waiting for concrete proposals as to how to fix this.

Is there anything that has been, or needs to be, forwarded to the upstream maintainer, Markus Oberhumer?
Comment 6 Mark Millard 2016-05-23 23:35:38 UTC
(In reply to Matthias Andree from comment #5)

The kernel controls that strict alignment is required in user space programs. That is why I referenced its details for such. The port's failure was an alignment failure. A compiler option is involved in avoiding misalignment but is insufficient to force everything into alignment: the port's own source code is also involved in guaranteeing alignment.

11.0-CURRENT just changed armv6 (and later/related 32 bit arm variants) over from softfloat to hardfloat and may be a while before I get to such a modernized rpi2 build. As I understand 11.0-CURRENT starts its code freeze June 10, working towards a release projected for Sept. 2.

I'll re-run the attempted port build once I have a modernized rpi2 build. That will confirm or deny the current status is the same at that time without jumping over a significant change to how parts of the armv6 code generation works (ABI change).

But I'm not sure just when I'll get that far given other things going on. Once I have I'll post more comment based on the results.
Comment 7 Mark Millard 2016-05-24 22:05:50 UTC
(In reply to Matthias Andree from comment #5)

https://lists.freebsd.org/pipermail/freebsd-arm/2016-May/013925.html has reported that the FreeBSD requirement for strict alignment may be going away for armv6/armv7.

If so after testing it may be that this defect is to be marked as no longer applying.
Comment 8 Mark Millard 2016-05-25 16:52:29 UTC
(In reply to Matthias Andree from comment #5)

The various gcc and llvm ports have not yet been adjusted to deal with 11.0-CURRENT armv6 now being implicitly hard float instead of soft float. (No more armv6hf.)

So armv6 builds of ports that depends on these other ports will wait for the messy transition to be cleaned up before they ca be done.



and what it in turn points to.

And there is also the pending kernel change to allow misaligned memory accesses in more places as well. (Some instructions never allow misaligned but hopefully the compilers will avoid them when not forced to avoid adding any additional misaligned accesses.)

With my own personal delays after these happen it may be a while before I do any lzo2 testing on a modernized 11.0-CURRENT armv6 context.
Comment 9 Matthias Andree freebsd_committer 2016-05-26 16:38:14 UTC
ISTR SPARC architectures also barf on unaligned access, so is it worth bothering the upstream author?
Comment 10 Ian Lepore freebsd_committer 2016-05-27 18:52:01 UTC
I just had a look.  I think the cause of this is probably found on line 2544 of lzodefs.h:

#  elif 1 && defined(__TARGET_ARCH_ARM) && ((__TARGET_ARCH_ARM)+0 >= 7)

A similar check on line 2551 assumes armv6-a and -r profiles also support unaligned.

Our freebsd clang would normally not define __ARM_FEATURE_UNALIGNED (checked on line 2528 of lzodefs.h) unless someone had specifically added the -munaligned-access option; in the PR we see it specifically has -mno-unaligned-access.  But it also has -march=armv7 (our default is v6 due to the rpi and the ongoing stupidity that we pretend v6 and v7 are "the same enough" to not need separate names).

So with __ARM_FEATURE_UNALIGNED not defined and arch = armv7, the check on line 2544 makes the assumption (incorrect until a few days ago) that if the arch is v7, we must have support for unaligned access.  I think that assumption is right for every major OS, but there could be special embedded environments where it's incorrect.  (In fact, a highly specialized embedded system is pretty much the ONLY place you'd expect someone to legitimately disable unaligned accesses, now that freebsd gets it right).

I think the right thing to do is: if __ARM_ARCH is defined, that means the current compiler properly supports the ACLE feature symbols[1] and thus only __ARM_FEATURE_UNALIGNED should be consulted.  If __ARM_ARCH is not defined, then __ARM_FEATURE_UNALIGNED can't be used, and a fallback to guessing based on arch might be valid.

[1] http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf
Comment 11 Mark Millard 2016-08-02 10:47:11 UTC
As of 11.0-BETA3 (without libsoft or softfp) and -r419343 of /usr/ports the Signal 10 no longer happens for armv6 and the build/install runs to completion, passing all tests.

FYI for the new test context that I used:

   LZO configuration summary
   LZO version                : 2.09
   configured for host        : armv6-portbld-freebsd11.0
   source code location       : .
   compiler                   : cc
   preprocessor definitions   : -DLZO_HAVE_CONFIG_H=1
   preprocessor flags         : -mcpu=cortex-a7
   compiler flags             : -pipe -mcpu=cortex-a7  -g -fno-strict-aliasing
   build static library       : yes
   build shared library       : yes
   enable i386 assembly code  : no

It would take some other FreeBSD/processor architecture combination that still requires a more strict alignment to potentially show the problem.

I do not have such an processor architecture directly available. And so far I only run FreeBSD on slow contexts (for example an rpi2) or under VirtualBox used under a different OS on the faster hardware. I've not experimented with QEMU yet.

So I'm still not ready to test this in the kind of context required, such as QEMU running sparc64 code.
Comment 12 Matthias Andree freebsd_committer 2016-10-07 20:57:22 UTC
I'm closing this report as "overcome by events"? 
Someone finding a bug on a different FreeBSD version is free to reopen it as long as running a fully patched and supported FreeBSD release (which also means that the given release must formally support the given architecture). 

When reopening, please specify the affected architecture and OS version and propose a solution, best in the form of a patch.