Bug 258156 - databases/mysql80-server: 8.0.27 routerfuzz_router_uri crashes (SIGILL) on Penryn CPU (missing popcnt)
Summary: databases/mysql80-server: 8.0.27 routerfuzz_router_uri crashes (SIGILL) on Pe...
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Some People
Assignee: Jochen Neumeister
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2021-08-29 23:05 UTC by Eric Rucker
Modified: 2022-01-03 17:54 UTC (History)
6 users (show)

See Also:
bugzilla: maintainer-feedback? (joneum)
koobs: maintainer-feedback+


Attachments
mysql80-server 8.0.26 build output (579.62 KB, text/plain)
2021-08-29 23:05 UTC, Eric Rucker
no flags Details
Requested information from affected system (8.88 KB, text/plain)
2021-12-26 23:55 UTC, Eric Rucker
no flags Details
Avoid hardcoded popcnt if not available for the target CPU (626 bytes, patch)
2021-12-27 16:38 UTC, Dimitry Andric
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Rucker 2021-08-29 23:05:56 UTC
Created attachment 227543 [details]
mysql80-server 8.0.26 build output

I'm finding that on my FreeBSD 12.2-RELEASE-p7 amd64 system with dual Xeon L5420s (Harpertown, which is Penryn microarchitecture), I'm unable to build mysql80-server, as a SIGILL is thrown running routerfuzz_router_uri. I've tried MAKE_JOBS_UNSAFE=yes with no changes, as well as adding CPUTYPE?=penryn to /etc/make.conf.

I've attached the output from make trying to build the port.

After rebuilding the routerfuzz_router_uri binary that's deleted, and running gdb on the core dump, I receive the following:

root@uncannyvalley:/usr/ports/databases/mysql80-server/work/.build/router/tests/fuzzers # gdb core routerfuzz_router_u.core
GNU gdb (GDB) 10.2 [GDB v10.2 for FreeBSD]
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd12.2".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
core: No such file or directory.
[New LWP 101492]
Core was generated by `./routerfuzz_router_uri -merge=1 -verbosity=0 -merge_control_file=/usr/ports/dat'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00000000002554f0 in ?? ()
(gdb) file routerfuzz_router_uri
warning: core file may not match specified executable file.
Reading symbols from routerfuzz_router_uri...
(gdb) disassemble
Dump of assembler code for function __sanitizer_cov_trace_const_cmp1(uint8_t, uint8_t):
   0x00000000002554e0 <+0>:     push   %rbp
   0x00000000002554e1 <+1>:     mov    %rsp,%rbp
   0x00000000002554e4 <+4>:     mov    0x8(%rbp),%rax
   0x00000000002554e8 <+8>:     mov    %esi,%ecx
   0x00000000002554ea <+10>:    xor    %dil,%cl
   0x00000000002554ed <+13>:    movzbl %cl,%ecx
=> 0x00000000002554f0 <+16>:    popcnt %rcx,%rcx
   0x00000000002554f5 <+21>:    cmp    %sil,%dil
   0x00000000002554f8 <+24>:    jne    0x2554ff <__sanitizer_cov_trace_const_cmp1(uint8_t, uint8_t)+31>
   0x00000000002554fa <+26>:    xor    %r9d,%r9d
   0x00000000002554fd <+29>:    jmp    0x255516 <__sanitizer_cov_trace_const_cmp1(uint8_t, uint8_t)+54>
   0x00000000002554ff <+31>:    movzbl %dil,%edx
   0x0000000000255503 <+35>:    movzbl %sil,%esi
   0x0000000000255507 <+39>:    sub    %rsi,%rdx
   0x000000000025550a <+42>:    bsr    %rdx,%r9
   0x000000000025550e <+46>:    xor    $0x3f,%r9
   0x0000000000255512 <+50>:    add    $0x1,%r9
   0x0000000000255516 <+54>:    mov    %rax,%rsi
   0x0000000000255519 <+57>:    shl    $0x7,%rsi
   0x000000000025551d <+61>:    add    %eax,%eax
   0x000000000025551f <+63>:    and    $0x3fe,%eax
   0x0000000000255524 <+68>:    lea    0x4dad5(%rip),%r8        # 0x2a3000 <_ZN6fuzzer3TPCE>
   0x000000000025552b <+75>:    mov    $0x1,%edi
   0x0000000000255530 <+80>:    mov    $0x1,%edx
   0x0000000000255535 <+85>:    shl    %cl,%rdx
   0x0000000000255538 <+88>:    or     %rdx,0x31800(%r8,%rax,8)
   0x0000000000255540 <+96>:    lea    (%r9,%rsi,1),%rcx
   0x0000000000255544 <+100>:   add    $0x40,%rcx
   0x0000000000255548 <+104>:   mov    %rcx,%rax
   0x000000000025554b <+107>:   shr    $0x3,%rax
   0x000000000025554f <+111>:   shl    %cl,%rdi
   0x0000000000255552 <+114>:   and    $0x1ff8,%eax
   0x0000000000255557 <+119>:   or     %rdi,0x31800(%rax,%r8,1)
   0x000000000025555f <+127>:   pop    %rbp
   0x0000000000255560 <+128>:   ret
End of assembler dump.
(gdb)

popcnt is a Nehalem instruction, which obviously my Penryn CPU wouldn't have. Looks like __sanitizer_cov_trace_const_cmp1 is a clang function - this may actually be a clang bug (at least as FreeBSD supplies it), but we'll go with mysql80-server as being the problem right now because that's the only thing I can't build.

And my current /etc/make.conf:

WITH_PKGNG=     yes
DEFAULT_VERSIONS+=ssl=openssl
MAKE_JOBS_NUMBER=9
#MAKE_JOBS_UNSAFE=yes
OPTIONS_UNSET=  GSSAPI_BASE
OPTIONS_SET=    GSSAPI_MIT
CPUTYPE?=       penryn

And, for completeness, compiler info:

root@uncannyvalley:/usr/ports/databases/mysql80-server/work/.build/router/tests/fuzzers # c++ --version
FreeBSD clang version 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
Target: x86_64-unknown-freebsd12.2
Thread model: posix
InstalledDir: /usr/bin
root@uncannyvalley:/usr/ports/databases/mysql80-server/work/.build/router/tests/fuzzers # which c++
/usr/bin/c++
Comment 1 Andrey Bushev 2021-09-02 09:27:06 UTC
Hi
I have same problem on my test machine on Intel(R) Celeron(R) CPU 900@2.20GHz (Penryn).
I try CPUTYPE?=penryn, CPUTYPE?=core2 and CPUTYPE?=native with and without MAKE_JOBS_UNSAFE=yes in /etc/make.conf, but no result. Still have same error on build mysql80-server-8.0.26.

FreeBSD 12.2-RELEASE-p7 GENERIC  amd64

root@test:/home/test # c++ --version
FreeBSD clang version 10.0.1 (git@github.com:llvm/llvm-project.git llvmorg-10.0.1-0-gef32c611aa2)
Target: x86_64-unknown-freebsd12.2
Thread model: posix
InstalledDir: /usr/bin



Preparing corpus for routerfuzz_router_uri
cd /usr/ports/databases/mysql80-server/work/.build/router/tests/fuzzers && ./routerfuzz_router_uri -merge=1 -verbosity=0 -merge_control_file="/usr/ports/databases/mysql80-server/work/.build/router/tests/fuzzers/routerfuzz_router_uri.control" /usr/ports/databases/mysql80-server/work/mysql-8.0.26/router/tests/fuzzers/corpus/fuzz_router_uri /usr/ports/databases/mysql80-server/work/.build/router/tests/fuzzers/corpus/routerfuzz_router_uri 2> /dev/null
Illegal instruction (core dumped)
*** [router/tests/fuzzers/routerfuzz_router_uri] Error code 132
make[4]: *** router/tests/fuzzers/routerfuzz_router_uri removed

make[4]: stopped in /usr/ports/databases/mysql80-server/work/.build
1 error

make[4]: stopped in /usr/ports/databases/mysql80-server/work/.build
*** [router/tests/fuzzers/CMakeFiles/routerfuzz_router_uri.dir/all] Error code 2

make[3]: stopped in /usr/ports/databases/mysql80-server/work/.build
1 error

make[3]: stopped in /usr/ports/databases/mysql80-server/work/.build
*** [all] Error code 2

make[2]: stopped in /usr/ports/databases/mysql80-server/work/.build
1 error

make[2]: stopped in /usr/ports/databases/mysql80-server/work/.build
===> Compilation failed unexpectedly.
Try to set MAKE_JOBS_UNSAFE=yes and rebuild before reporting the failure to
the maintainer.
*** Error code 1

Stop.
make[1]: stopped in /usr/ports/databases/mysql80-server
*** Error code 1

Stop.
make: stopped in /usr/ports/databases/mysql80-server

===>>> make build failed for databases/mysql80-server
===>>> Aborting update


===>>> You can restart from the point of failure with this command line:
       portmaster <flags> databases/mysql80-server

This command has been saved to ~/portmasterfail.txt
Comment 2 Jochen Neumeister freebsd_committer freebsd_triage 2021-10-17 10:40:25 UTC
I cannot reproduce the problem.

I tested it on 12.2 and 13.0. both on i386 and on amd64 the build ran through without problems
Comment 3 Eric Rucker 2021-10-17 11:02:20 UTC
(In reply to Jochen Neumeister from comment #2)
Just tested again on 12.2, still failing in the same place.

What processor are you testing this on? Anything Core i series or newer, or AMD Phenom or newer, won't exhibit this problem, because they have the popcnt instruction.
Comment 4 Eric Rucker 2021-10-29 10:04:51 UTC
And, as 8.0.26_1 is out, just confirmed, the issue is still occurring on 8.0.26_1.
Comment 5 Andrey Bushev 2021-10-30 13:47:47 UTC
Hi,
In 8.0.26_1 I still have same error in build with 'make DISABLE_VULNERABILITIES=yes'.
Comment 6 Sascha Klauder 2021-12-13 14:50:24 UTC
I can confirm occurrence of this problem on 13.0-RELEASE-p4 amd64, Xeon E5420 (Harpertown).
Comment 7 Kubilay Kocak freebsd_committer freebsd_triage 2021-12-26 23:19:14 UTC
For those that are affected or able to reproduce this issue:

- Is the issue reproducible with an empty /etc/make.conf
- Please include (as a single attachment):

   - uname -a output
   - /var/run/dmesg.boot output
   - /etc/src.conf contents if not empty
   - kernel configuration if not GENERIC

^Triage: Request feedback from dim @ toolchain
Comment 8 Kubilay Kocak freebsd_committer freebsd_triage 2021-12-26 23:23:47 UTC
(In reply to Kubilay Kocak from comment #7)

Additionally in your attachment:

  - /etc/make.conf contents if not empty when reproducible
Comment 9 Eric Rucker 2021-12-26 23:55:51 UTC
Created attachment 230433 [details]
Requested information from affected system

(In reply to Kubilay Kocak from comment #8)

Issue is reproducible with empty /etc/make.conf (I normally run one, though, and received some warnings from this build about OpenSSL as a result).

/etc/src.conf does not exist on this system, and kernel is GENERIC, so it's just uname -a and /var/run/dmesg.boot in this file.
Comment 10 Viktor Štujber 2021-12-27 00:58:58 UTC
Hi, I looked into this a bit and asked around. In my case at least, I concluded that the bytecode in /usr/lib/clang/11.0.1/lib/freebsd/libclang_rt.fuzzer-x86_64.a includes multiple cases of 'popcnt'. The related llvm source code is pure C++ (no inline asm) and involves simple arithmetic, so this is most likely a micro-optimization meant for applicable cpus. This implies that buildworld mis-compiled contrib/llvm-project/compiler-rt/lib/fuzzer, and who knows what else.

My cpu is old intel atom. My target is x86_64-unknown-freebsd13.0. Kernel and world are 13-release from Aug 20. Invoking "clang -### -x c -march=native - -E </dev/null" gives an argument list that includes "-popcnt", so this instruction is supposed to be avoided. However, for some reason, for this fuzzer library, it wasn't. The rest of the OS runs fine, so the issue might only exist in specific clang support libraries. I am not sure how to investigate further. I considered escalating this to the maintainer of contrib/llvm-project (dim@). CURRENT is already at llvm-13.0 so this might have been silently fixed already.

For the moment, it is possible to complete the build of mysql80-server by doing 'make -i' on the port directory, to resume the aborted source build and skip over all build errors.
Comment 11 Dimitry Andric freebsd_committer freebsd_triage 2021-12-27 16:38:41 UTC
Created attachment 230461 [details]
Avoid hardcoded popcnt if not available for the target CPU

Please try this patch. Either build and install world, or rebuild and reinstall the lib/libclang_rt/fuzzer and lib/libclang_rt/fuzzer_no_main directories. Then attempt to build the mysql80 port again.
Comment 12 Viktor Štujber 2021-12-27 20:36:10 UTC
Thank you very much, ATTRIBUTE_TARGET_POPCNT is indeed the root cause, and I should have noticed it when first looking at the llvm sources - it even has popcnt in the name. At that time I did not know what I was looking at, since the macro is obscuring the true meaning.

I have discussed this in #llvm and wrote up an issue report at https://github.com/llvm/llvm-project/issues/52893 with all the supporting info. Your patch probably works, but depending on what exactly it does, it might be simpler to just remove the whole thing. Let's see what the llvm devs have to say about it.

This bad code was in there since at least 2016. Apparently no freebsd port until now used -fsanitize=fuzzer, or something. Until mysql resolved an issue that prevented a bunch of their tests from running. This fix was included in release 8.0.26, and it finally revealed this long-dormant llvm bug.
Comment 13 commit-hook freebsd_committer freebsd_triage 2021-12-30 09:58:38 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=133180557479cd9676758e6f3f93a9d3e1c6b532

commit 133180557479cd9676758e6f3f93a9d3e1c6b532
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2021-12-30 09:53:25 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2021-12-30 09:55:49 +0000

    Avoid emitting popcnt in libclang_rt.fuzzer*.a if unsupported

    Since popcnt is only supported by CPUTYPE=nehalem and later, ensure that
    this instruction is only emitted when appropriate. Otherwise, programs
    using the library can abort with SIGILL.

    See also: https://github.com/llvm/llvm-project/issues/52893

    PR:             258156
    Reported by:    Eric Rucker <bhtooefr@bhtooefr.org>
    MFC after:      3 days

 contrib/llvm-project/compiler-rt/lib/fuzzer/FuzzerPlatform.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 14 Kubilay Kocak freebsd_committer freebsd_triage 2022-01-01 02:10:58 UTC
Is there anything that can be done in ports to workaround this until all supported FreeBSD versions have this fix/change ?
Comment 15 commit-hook freebsd_committer freebsd_triage 2022-01-02 12:02:46 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=802ff7fcee24cb224ea430ac45bece8f8743791f

commit 802ff7fcee24cb224ea430ac45bece8f8743791f
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2021-12-30 09:53:25 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2022-01-02 12:02:19 +0000

    Avoid emitting popcnt in libclang_rt.fuzzer*.a if unsupported

    Since popcnt is only supported by CPUTYPE=nehalem and later, ensure that
    this instruction is only emitted when appropriate. Otherwise, programs
    using the library can abort with SIGILL.

    See also: https://github.com/llvm/llvm-project/issues/52893

    PR:             258156
    Reported by:    Eric Rucker <bhtooefr@bhtooefr.org>
    MFC after:      3 days

    (cherry picked from commit 133180557479cd9676758e6f3f93a9d3e1c6b532)

 contrib/llvm-project/compiler-rt/lib/fuzzer/FuzzerPlatform.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 16 commit-hook freebsd_committer freebsd_triage 2022-01-02 12:03:49 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=50b0f99010bc485e7f60e948ff3bc590fbbeea43

commit 50b0f99010bc485e7f60e948ff3bc590fbbeea43
Author:     Dimitry Andric <dim@FreeBSD.org>
AuthorDate: 2021-12-30 09:53:25 +0000
Commit:     Dimitry Andric <dim@FreeBSD.org>
CommitDate: 2022-01-02 12:02:14 +0000

    Avoid emitting popcnt in libclang_rt.fuzzer*.a if unsupported

    Since popcnt is only supported by CPUTYPE=nehalem and later, ensure that
    this instruction is only emitted when appropriate. Otherwise, programs
    using the library can abort with SIGILL.

    See also: https://github.com/llvm/llvm-project/issues/52893

    PR:             258156
    Reported by:    Eric Rucker <bhtooefr@bhtooefr.org>
    MFC after:      3 days

    (cherry picked from commit 133180557479cd9676758e6f3f93a9d3e1c6b532)

 contrib/llvm-project/compiler-rt/lib/fuzzer/FuzzerPlatform.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 17 Dimitry Andric freebsd_committer freebsd_triage 2022-01-02 12:05:43 UTC
(In reply to Kubilay Kocak from comment #14)
Disable the fuzzer based components, maybe? Unfortunately the hardcoded popcnt instructions ended up in a system library (libclang_rt.fuzzer-*.a) so the only way to avoid it is to stop linking against this library.
Comment 18 Sascha Klauder 2022-01-03 10:26:41 UTC
(In reply to Kubilay Kocak from comment #14)
To unbreak the build, you can disable MySQL unit tests by adding -DWITH_UNIT_TESTS=OFF to CMAKE_ARGS in the ports Makefile.
Comment 19 Viktor Štujber 2022-01-03 17:54:06 UTC
(In reply to Sascha Klauder from comment #18)
If this toggle works and disables tests for both individual subcomponents as well as the huge mysql-test/ directory, it would be worthwhile to add a TEST port option, for speeding up the build, as well as working around this temporary issue.