Bug 234201 - Regression in LLVM libunwind: Apache Tomcat web application crashes on 12.0 (but not on 11.2)
Summary: Regression in LLVM libunwind: Apache Tomcat web application crashes on 12.0 (...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.0-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-toolchain mailing list
URL:
Keywords: crash, regression, toolchain
Depends on:
Blocks:
 
Reported: 2018-12-20 09:19 UTC by Marie Helene Kvello-Aune
Modified: 2019-06-10 18:58 UTC (History)
9 users (show)

See Also:


Attachments
Jar of demo code to produce the libunwind crash (60.04 KB, application/x-java-archive)
2019-05-20 22:11 UTC, Debby Johnson
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marie Helene Kvello-Aune 2018-12-20 09:19:54 UTC
When the port devel/jakarta-commons-daemon is built with LLVM in base on 12.0-RELEASE (default configuration), our tomcat webapp "coffeehouse" fails with the message "libunwind: getEncodedP /usr/src/contrib/llvm/projects/libunwind/src/AddressSpace.hpp:280 - unknown pointer encoding"

The offending section of code:
(...)
inline LocalAddressSpace::pint_t
LocalAddressSpace::getEncodedP(pint_t &addr, pint_t end, uint8_t encoding,
                               pint_t datarelBase) {
(...)
 switch (encoding & 0x0F) {
(...)
  default:
    _LIBUNWIND_ABORT("unknown pointer encoding");
(...)

This error does not occur when the port is built with GCC, nor when it's built & run on 11.2 (it works fine with 11.2 world on top of 12.0 kernel).

We've applied a workaround internally which builds it with GCC, but think the correct approach is to fix the regression in base, so that we won't pull in GCC8 just for this.

The web application can be provided upon request.
Comment 1 Dimitry Andric freebsd_committer 2018-12-20 19:43:57 UTC
Hm, the problem is to figure out what the value of 'encoding' is at that point.  In libunwind trunk I see no change in the getEncodedP() function, so it's definitely not supported by a newer version of libunwind either.

Another possibility is that the unwind information gets mangled somehow (maybe by the linker, or by stripping?) causing libunwind to become confused.

Ed, any ideas?
Comment 2 David Chisnall freebsd_committer 2018-12-21 10:37:10 UTC
When I looked at that code a couple of years back, I seem to recall that not all of the DWARF encodings were supported.  I believe only the ones that LLVM emits are well tested (I also vaguely remember adding a couple that were missing in the CHERI branch).  The good news is that they're all pretty trivial (value plus some base address), so if someone can figure out what the value of `encoding` is in the failing case, I can probably give you a patch to fix it quite easily.
Comment 3 Michael Osipov 2019-03-14 22:36:55 UTC
I'll migrate our Tomcat-based apps to a 12-RELEASE jail on top of a 12-STABLE host and will report in a week or two whether I will have the same failures.
Comment 4 Michael Osipov 2019-03-14 22:37:38 UTC
(In reply to Marie Helene Kvello-Aune from comment #0)

Can you please tell when this failures exactly happens?
Comment 5 Michael Osipov 2019-04-12 15:45:58 UTC
Did not have any Daemon crashes in the jail since its start several weeks ago.
Comment 6 Dave Baukus 2019-05-17 23:10:16 UTC
We are seeing this same issue on Release-12.0 in a large Java application; one of our Java developers created a small, standalone test that emits the libunwind error message and core dumps:

Core was generated by `/usr/local/openjdk8/bin/java -cp .:fast-md5-2.7.1.jar MD5Demo'.
Program terminated with signal SIGABRT, Aborted.
#0  0x000000080045230a in thr_kill () from /lib/libc.so.7
[Current thread is 1 (LWP 101996)]
(gdb) bt
#0  0x000000080045230a in thr_kill () from /lib/libc.so.7
#1  0x00000008004506f4 in raise () from /lib/libc.so.7
#2  0x00000008003c3079 in abort () from /lib/libc.so.7
#3  0x00000008007f1f3e in ?? () from /lib/libgcc_s.so.1
#4  0x00000008007f2e49 in ?? () from /lib/libgcc_s.so.1
#5  0x00000008007f2d31 in ?? () from /lib/libgcc_s.so.1
#6  0x000000080020f1ec in dl_iterate_phdr () from /libexec/ld-elf.so.1
#7  0x00000008007f0422 in ?? () from /lib/libgcc_s.so.1
#8  0x00000008007f02a0 in ?? () from /lib/libgcc_s.so.1
#9  0x00000008007ee410 in ?? () from /lib/libgcc_s.so.1
#10 0x00000008007ee6f3 in _Unwind_ForcedUnwind () from /lib/libgcc_s.so.1
#11 0x000000080028cfdc in ?? () from /lib/libthr.so.3
#12 0x000000080028cfa0 in ?? () from /lib/libthr.so.3
#13 0x000000080028cdfb in pthread_exit () from /lib/libthr.so.3
#14 0x000000080027f77e in ?? () from /lib/libthr.so.3
#15 0x0000000000000000 in ?? ()

The problem does NOT occur on FreeBSD-12.0-STABLE-amd64-20190517-r347885. Is anyone aware of a specific fix ?
Comment 7 Dimitry Andric freebsd_committer 2019-05-18 16:15:49 UTC
(In reply to Dave Baukus from comment #6)
> We are seeing this same issue on Release-12.0 in a large Java application;
> one of our Java developers created a small, standalone test that emits the
> libunwind error message and core dumps

If it is possible, it would be nice to attach the jar.


> The problem does NOT occur on FreeBSD-12.0-STABLE-amd64-20190517-r347885. Is
> anyone aware of a specific fix ?

12.0-RELEASE shipped with clang 6.0.1 and a fairly old version of llvm-libunwind. While I updated llvm and clang a few times, I didn't handle llvm-libunwind until the 8.0.0 import in base r346168.  There, I upgraded llvm-libunwind to the same upstream revision as the rest of llvm and clang, e.g. upstream 8.0.0 final r356365.

Unfortunately that is rather a huge commit, also for the libunwind part, so it is not easy to pinpoint one exact upstream revision that fixes this particular issue.
Comment 8 Debby Johnson 2019-05-20 22:11:30 UTC
Created attachment 204496 [details]
Jar of demo code to produce the libunwind crash