Bug 246537

Summary: lib.libexecinfo.backtrace_test.backtrace_fmt_basic starts failing on amd64 after r360915
Product: Base System Reporter: Li-Wen Hsu <lwhsu>
Component: testsAssignee: freebsd-testing (Nobody) <testing>
Status: Open ---    
Severity: Affects Only Me CC: cem, dim, emaste, kan, kib
Priority: --- Keywords: regression
Version: CURRENT   
Hardware: amd64   
OS: Any   

Description Li-Wen Hsu freebsd_committer 2020-05-18 12:13:18 UTC
This test case starts failing since this build https://ci.freebsd.org/job/FreeBSD-head-amd64-test/15169/
Still need to check which commit caused this.

Error Message

/usr/src/contrib/netbsd-tests/lib/libexecinfo/t_backtrace.c:93: nptrs >= ncalls + 2 + min_frames not met

Standard Output

got nptrs=17 ncalls=12 (min_frames: 4, max_frames: 9)
backtrace is:
#0: myfunc3
#1: myfunc2
#2: myfunc1
#3: myfunc1
#4: myfunc1
#5: myfunc1
#6: myfunc1
#7: myfunc1
#8: myfunc1
#9: myfunc1
#10: myfunc1
#11: myfunc1
#12: myfunc1
#13: myfunc1
#14: myfunc
#15: atfu_backtrace_fmt_basic_body
#16: atf_tc_run
Comment 1 commit-hook freebsd_committer 2020-05-18 12:37:21 UTC
A commit references this bug:

Author: lwhsu
Date: Mon May 18 12:36:29 UTC 2020
New revision: 361210
URL: https://svnweb.freebsd.org/changeset/base/361210

  Temporarily disable failing case in CI of amd64:

  - lib.libexecinfo.backtrace_test.backtrace_fmt_basic

  PR:		246537
  Sponsored by:	The FreeBSD Foundation

Comment 2 Li-Wen Hsu freebsd_committer 2020-05-18 15:09:21 UTC
After more testing, it starts failing on amd64 after https://svnweb.freebsd.org/changeset/base/360915
i386 is not affected.
Comment 3 Conrad Meyer freebsd_committer 2020-05-18 15:51:32 UTC
I suspect r360915 is just wrong and should be reverted.  We're missing the frame below /above atf_tc_run (probably "_start" and "main").
Comment 4 Conrad Meyer freebsd_committer 2020-05-18 15:56:58 UTC
$ nm crt1.o:
0000000000000000 T _start

If we had one more frame (_start or main), the test would be:

18 >= 12 + 2 + 4 (pass).
Comment 5 Conrad Meyer freebsd_committer 2020-05-18 15:58:35 UTC
(Our backtrace / libexecinfo (llvm-libunwind) relies on asynchronous unwind tables and does not know about frame pointers.  Sure, maybe it should!  That would be nice.)
Comment 6 Dimitry Andric freebsd_committer 2020-05-18 16:02:50 UTC
(In reply to Conrad Meyer from comment #3)
This was done to fix bug 246322. As explained there, it is to work around a BFD ld assertion or bug. I'm fine with reverting, as long as we either fix BFD ld, or ban the use of it. :)
Comment 7 Dimitry Andric freebsd_committer 2020-05-18 16:04:17 UTC
(In reply to Dimitry Andric from comment #6)
> work around a BFD ld assertion or bug.

For more explanation see base r209294. It still hits the assertion with binutils master as of last week.
Comment 8 Conrad Meyer freebsd_committer 2020-05-18 16:05:14 UTC
We switched to both ld.lld and llvm-libunwind in CURRENT, so maybe leave it be in stable/ and just revert in CURRENT?
Comment 9 Conrad Meyer freebsd_committer 2020-05-18 16:07:52 UTC
> places the special CIE into the .eh_frame indicating the end of section,
> that is located before generated unwind table. New ld has assertion that
> verifies that closing CIE is indeed the last CIE,

Would it be possible to (?)fix LLVM to not mark the CIE as end of section?  I guess I'm unclear on if ld.bfd's assertion is reasonable or what.  Why doesn't this happen for other objects built with async unwind tables?
Comment 10 Dimitry Andric freebsd_committer 2020-05-18 16:12:22 UTC
(In reply to Conrad Meyer from comment #9)
Maybe, but I'm not sure that is the right way to go. In base r209294 the problem in BFD ld was first spotted, and at that time it was likely triggered by newer versions of gcc. But kan@ may know a bit more about this.

E.g. with the information I have now, I think this is a BFD ld bug/misfeature, and it should be fixed there.
Comment 11 Conrad Meyer freebsd_committer 2020-05-20 14:38:17 UTC
Any idea why this trips for crt1.o but not any other .o?
Comment 12 Dimitry Andric freebsd_committer 2020-05-20 19:54:00 UTC
(In reply to Conrad Meyer from comment #11)
You mean the ld.bfd assertion? Maybe it is because crt1.o is linked using -r from crt1_c.o and crt1_s.o?
Comment 13 Conrad Meyer freebsd_committer 2020-05-20 20:04:01 UTC
(In reply to Dimitry Andric from comment #12)
Yep, the ld.bfd assertion.
Comment 14 Konstantin Belousov freebsd_committer 2020-05-20 20:15:52 UTC
Trips exactly what ?
crt1.o is specially added to the linker command line to create the final binary.  It mist go first among all object files supplied to the linker.
Comment 15 Conrad Meyer freebsd_committer 2020-05-20 20:21:04 UTC
(In reply to Konstantin Belousov from comment #14)
Trips bug 246322:
> /usr/bin/ld: error in /usr/lib/crt1.o(.eh_frame); no .eh_frame_hdr table will be created.
Comment 16 Conrad Meyer freebsd_committer 2020-12-02 00:33:21 UTC
I would like to fix this for 13.0.  I think we can just revert the change that broke this now, some discussion over in another bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246322#c14
Comment 17 Konstantin Belousov freebsd_committer 2020-12-02 12:43:30 UTC
(In reply to Conrad Meyer from comment #16)
So if reverting, does gcc+binutils toolchain work still ?

If yes, then there should be no reason to not revert.  Might be an exp run
for amd64+i386 is due.
Comment 18 Conrad Meyer freebsd_committer 2020-12-02 18:01:07 UTC
Yeah, in the other bug we are unable to reproduce the original issue with current binutils (comments #15 and 16).  And the original reporter has not responded for ~4 months.