234201 – Regression in LLVM libunwind: Apache Tomcat web application crashes on 12.0 (but not on 11.2)

Bug 234201 - Regression in LLVM libunwind: Apache Tomcat web application crashes on 12.0 (but not on 11.2)

Summary: Regression in LLVM libunwind: Apache Tomcat web application crashes on 12.0 (...

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	bin (show other bugs)
Version:	12.0-RELEASE
Hardware:	amd64 Any

Importance:	--- Affects Some People
Assignee:	Dimitry Andric

URL:	https://reviews.llvm.org/rL316224
Keywords:	crash, regression, toolchain

Depends on:
Blocks:

Reported:	2018-12-20 09:19 UTC by Marie Helene Kvello-Aune
Modified:	2019-09-03 14:31 UTC (History)
CC List:	12 users (show)

See Also:

Flags:	koobs: mfc-stable12+ koobs: mfc-stable11+

Attachments
Jar of demo code to produce the libunwind crash (60.04 KB, application/x-java-archive) 2019-05-20 22:11 UTC, Debby Johnson	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Marie Helene Kvello-Aune 2018-12-20 09:19:54 UTC

When the port devel/jakarta-commons-daemon is built with LLVM in base on 12.0-RELEASE (default configuration), our tomcat webapp "coffeehouse" fails with the message "libunwind: getEncodedP /usr/src/contrib/llvm/projects/libunwind/src/AddressSpace.hpp:280 - unknown pointer encoding"

The offending section of code:
(...)
inline LocalAddressSpace::pint_t
LocalAddressSpace::getEncodedP(pint_t &addr, pint_t end, uint8_t encoding,
                               pint_t datarelBase) {
(...)
 switch (encoding & 0x0F) {
(...)
  default:
    _LIBUNWIND_ABORT("unknown pointer encoding");
(...)

This error does not occur when the port is built with GCC, nor when it's built & run on 11.2 (it works fine with 11.2 world on top of 12.0 kernel).

We've applied a workaround internally which builds it with GCC, but think the correct approach is to fix the regression in base, so that we won't pull in GCC8 just for this.

The web application can be provided upon request.

Comment 1 Dimitry Andric freebsd_committer

2018-12-20 19:43:57 UTC

Hm, the problem is to figure out what the value of 'encoding' is at that point.  In libunwind trunk I see no change in the getEncodedP() function, so it's definitely not supported by a newer version of libunwind either.

Another possibility is that the unwind information gets mangled somehow (maybe by the linker, or by stripping?) causing libunwind to become confused.

Ed, any ideas?

Comment 2 David Chisnall freebsd_committer

2018-12-21 10:37:10 UTC

When I looked at that code a couple of years back, I seem to recall that not all of the DWARF encodings were supported.  I believe only the ones that LLVM emits are well tested (I also vaguely remember adding a couple that were missing in the CHERI branch).  The good news is that they're all pretty trivial (value plus some base address), so if someone can figure out what the value of `encoding` is in the failing case, I can probably give you a patch to fix it quite easily.

Comment 3 Michael Osipov 2019-03-14 22:36:55 UTC

I'll migrate our Tomcat-based apps to a 12-RELEASE jail on top of a 12-STABLE host and will report in a week or two whether I will have the same failures.

Comment 4 Michael Osipov 2019-03-14 22:37:38 UTC

(In reply to Marie Helene Kvello-Aune from comment #0)

Can you please tell when this failures exactly happens?

Comment 5 Michael Osipov 2019-04-12 15:45:58 UTC

Did not have any Daemon crashes in the jail since its start several weeks ago.

Comment 6 Dave Baukus 2019-05-17 23:10:16 UTC

We are seeing this same issue on Release-12.0 in a large Java application; one of our Java developers created a small, standalone test that emits the libunwind error message and core dumps:

Core was generated by `/usr/local/openjdk8/bin/java -cp .:fast-md5-2.7.1.jar MD5Demo'.
Program terminated with signal SIGABRT, Aborted.
#0  0x000000080045230a in thr_kill () from /lib/libc.so.7
[Current thread is 1 (LWP 101996)]
(gdb) bt
#0  0x000000080045230a in thr_kill () from /lib/libc.so.7
#1  0x00000008004506f4 in raise () from /lib/libc.so.7
#2  0x00000008003c3079 in abort () from /lib/libc.so.7
#3  0x00000008007f1f3e in ?? () from /lib/libgcc_s.so.1
#4  0x00000008007f2e49 in ?? () from /lib/libgcc_s.so.1
#5  0x00000008007f2d31 in ?? () from /lib/libgcc_s.so.1
#6  0x000000080020f1ec in dl_iterate_phdr () from /libexec/ld-elf.so.1
#7  0x00000008007f0422 in ?? () from /lib/libgcc_s.so.1
#8  0x00000008007f02a0 in ?? () from /lib/libgcc_s.so.1
#9  0x00000008007ee410 in ?? () from /lib/libgcc_s.so.1
#10 0x00000008007ee6f3 in _Unwind_ForcedUnwind () from /lib/libgcc_s.so.1
#11 0x000000080028cfdc in ?? () from /lib/libthr.so.3
#12 0x000000080028cfa0 in ?? () from /lib/libthr.so.3
#13 0x000000080028cdfb in pthread_exit () from /lib/libthr.so.3
#14 0x000000080027f77e in ?? () from /lib/libthr.so.3
#15 0x0000000000000000 in ?? ()

The problem does NOT occur on FreeBSD-12.0-STABLE-amd64-20190517-r347885. Is anyone aware of a specific fix ?

Comment 7 Dimitry Andric freebsd_committer

2019-05-18 16:15:49 UTC

(In reply to Dave Baukus from comment #6)
> We are seeing this same issue on Release-12.0 in a large Java application;
> one of our Java developers created a small, standalone test that emits the
> libunwind error message and core dumps

If it is possible, it would be nice to attach the jar.


> The problem does NOT occur on FreeBSD-12.0-STABLE-amd64-20190517-r347885. Is
> anyone aware of a specific fix ?

12.0-RELEASE shipped with clang 6.0.1 and a fairly old version of llvm-libunwind. While I updated llvm and clang a few times, I didn't handle llvm-libunwind until the 8.0.0 import in base r346168.  There, I upgraded llvm-libunwind to the same upstream revision as the rest of llvm and clang, e.g. upstream 8.0.0 final r356365.

Unfortunately that is rather a huge commit, also for the libunwind part, so it is not easy to pinpoint one exact upstream revision that fixes this particular issue.

Comment 8 Debby Johnson 2019-05-20 22:11:30 UTC

Created attachment 204496 [details]
Jar of demo code to produce the libunwind crash

Comment 9 Eirik Oeverby 2019-07-29 10:18:56 UTC

Is there any hope of having this fixed? An alternative is to update the jakarta-commons-daemon port to USE_GCC=yes if building for 12.

Comment 10 Dimitry Andric freebsd_committer

2019-07-29 17:31:32 UTC

(In reply to Eirik Oeverby from comment #9)
> Is there any hope of having this fixed? An alternative is to update the
> jakarta-commons-daemon port to USE_GCC=yes if building for 12.

We recently had a rather large update for llvm-libunwind, so maybe the experiment can be retried with a recent -CURRENT (after r345345, 2019-03-20), -STABLE12 (after r346168, 2019-04-12), or -STABLE11 (after r346296, 2019-04-16)?

I have belatedly tried to reproduce the issue with the jarfile from comment 8, and it appears to work just fine on -CURRENT r350371 (as of 2019-07-27):

$ uname -v -m
FreeBSD 13.0-CURRENT r350371 GENERIC  amd64

$ pwd
/share/dim/bugs/bug234201

$ ls -l
total 64
-rw-rw-rw-  1 dim  dim  61481 2019-07-29 19:20:35 md5demo2.jar

$ mkdir md5demo2

$ unzip -d md5demo2 md5demo2.jar
Archive:  md5demo2.jar
   creating: md5demo2/META-INF/
 extracting: md5demo2/META-INF/MANIFEST.MF
 extracting: md5demo2/fast-md5-2.7.1.jar
   creating: md5demo2/fast-md5-native-2.7.1/
   creating: md5demo2/fast-md5-native-2.7.1/freebsd_x86/
   creating: md5demo2/fast-md5-native-2.7.1/linux_x86/
   creating: md5demo2/fast-md5-native-2.7.1/win32_x86/
   creating: md5demo2/fast-md5-native-2.7.1/win_amd64/
   creating: md5demo2/fast-md5-native-2.7.1/darwin_x86_64/
   creating: md5demo2/fast-md5-native-2.7.1/linux_amd64/
   creating: md5demo2/fast-md5-native-2.7.1/freebsd_amd64/
   creating: md5demo2/fast-md5-native-2.7.1/darwin_x86/
   creating: md5demo2/fast-md5-native-2.7.1/darwin_ppc/
 extracting: md5demo2/fast-md5-native-2.7.1.jar
   creating: md5demo2/lib/
 extracting: md5demo2/MD5Demo2.class
 extracting: md5demo2/MD5Demo2.java
 extracting: md5demo2/MD5.so
 extracting: md5demo2/README.txt

$ cd md5demo2

$ java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)

$ java -cp .:fast-md5-2.7.1.jar MD5Demo2
MD5 Demo
MD5 value: d464064618e61b35dca3e5dee84c7b56

$ echo $?
0

Comment 11 Michael Osipov 2019-07-29 19:37:47 UTC

I can reproduce this in a Jail. 

Host:
$ uname -a
FreeBSD deblndw011x.ad001.siemens.net 12.0-STABLE FreeBSD 12.0-STABLE #9 r350322: Fri Jul 26 08:55:01 CEST 2019     root@deblndw011x.ad001.siemens.net:/usr/obj/usr/src/amd64.amd64/sys/DEBLNDW011X  amd64

Jail:
$ freebsd-version
12.0-RELEASE-p5
$ java -version
openjdk version "1.8.0_202"
OpenJDK Runtime Environment (build 1.8.0_202-b08)
OpenJDK 64-Bit Server VM (build 25.202-b08, mixed mode)
$ java -cp .:fast-md5-2.7.1.jar MD5Demo2
MD5 Demo
MD5 value: d464064618e61b35dca3e5dee84c7b56
libunwind: getEncodedP /usr/src/contrib/llvm/projects/libunwind/src/AddressSpace.hpp:280 - unknown pointer encoding
Abort trap (Speicherabzug geschrieben)

Comment 12 Dimitry Andric freebsd_committer

2019-07-29 21:25:54 UTC

(In reply to Michael Osipov from comment #11)
> I can reproduce this in a Jail. 
> 
> Host:
> $ uname -a
> FreeBSD deblndw011x.ad001.siemens.net 12.0-STABLE FreeBSD 12.0-STABLE #9
> r350322: Fri Jul 26 08:55:01 CEST 2019    
> root@deblndw011x.ad001.siemens.net:/usr/obj/usr/src/amd64.amd64/sys/
> DEBLNDW011X  amd64
> 
> Jail:
> $ freebsd-version
> 12.0-RELEASE-p5

Sure, but the version in the jail is older than stable/12 r346168, which I mentioned.  It does not have the libunwind update.

There is not much chance a huge commit like the clang/llvm 8.0.0 update is going to be issued as a patch release, or an Errata Notice.  I guess you either have to run a snapshot, or wait for 12.1-RELEASE.

Comment 13 Michael Osipov 2019-07-30 07:41:11 UTC

(In reply to Dimitry Andric from comment #12)

I actually do not suffer from the issue (fingers crossed). Just wanted to reproduce. The Jail will stay on 12.0-RELEASE as long as the host won't update on 12.1-STABLE. I'd still opt for the errata notice for 12.0-RELEASE.

Comment 14 Sergey Mutin 2019-07-31 10:43:14 UTC

net-im/jitsi also affected


% jitsi 
13:35:40.107 SEVERE: [13] org.jitsi.impl.neomedia.device.DeviceConfiguration.error() Failed to register custom Renderer org.jitsi.impl.neomedia.jmfext.media.renderer.video.JAWTRenderer with JMF.
java.lang.UnsatisfiedLinkError: no jnawtrenderer in java.library.path
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
        at java.lang.Runtime.loadLibrary0(Runtime.java:870)
        at java.lang.System.loadLibrary(System.java:1122)
        at org.jitsi.impl.neomedia.jmfext.media.renderer.video.JAWTRenderer.<clinit>(JAWTRenderer.java:90)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.jitsi.impl.neomedia.device.DeviceConfiguration.registerCustomRenderers(DeviceConfiguration.java:1036)
        at org.jitsi.impl.neomedia.device.DeviceConfiguration.<init>(DeviceConfiguration.java:355)
        at org.jitsi.impl.neomedia.MediaServiceImpl.<init>(MediaServiceImpl.java:150)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at java.lang.Class.newInstance(Class.java:442)
        at org.jitsi.impl.libjitsi.LibJitsiImpl.getService(LibJitsiImpl.java:142)
        at org.jitsi.impl.libjitsi.LibJitsiOSGiImpl.getService(LibJitsiOSGiImpl.java:86)
        at org.jitsi.service.libjitsi.LibJitsi.invokeGetServiceOnImpl(LibJitsi.java:163)
        at org.jitsi.service.libjitsi.LibJitsi.getMediaService(LibJitsi.java:115)
        at net.java.sip.communicator.impl.neomedia.NeomediaActivator.start(NeomediaActivator.java:380)
        at org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:645)
        at org.apache.felix.framework.Felix.activateBundle(Felix.java:2152)
        at org.apache.felix.framework.Felix.startBundle(Felix.java:2070)
        at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1297)
        at org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:304)
        at java.lang.Thread.run(Thread.java:748)
libunwind: getEncodedP /usr/src/contrib/llvm/projects/libunwind/src/AddressSpace.hpp:280 - unknown pointer encoding
Abort (core dumped)



% uname -v -m
FreeBSD 12.0-RELEASE r341666 GENERIC  amd64

% java -version
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-b04)
OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)

% jitsi --version
Jitsi 2.8.0.build.by.SVN

Comment 15 Dimitry Andric freebsd_committer

2019-07-31 17:13:23 UTC

Okay, I went back and forth through the LLVM libunwind history, and this is the particular change that fixes the issue: https://reviews.llvm.org/rL316224

It is a pretty small fix, which should be suitable for a releng branch.  I have no idea what kind of paperwork is needed for an EN, though.

Comment 16 Ed Maste freebsd_committer

2019-07-31 17:23:24 UTC

(In reply to Dimitry Andric from comment #15)
There's some (hard to find) documentation on the errata update process at https://www.freebsd.org/doc/en_US.ISO8859-1/articles/freebsd-releng/releng-wrapup.html

Comment 17 Dimitry Andric freebsd_committer

2019-08-07 06:01:27 UTC

Fixed in 12.0-RELEASE-p9 and 11.2-RELEASE-p13 with an Errata Notice:

https://www.freebsd.org/security/advisories/FreeBSD-EN-19:15.libunwind.asc

Comment 18 Kubilay Kocak freebsd_committer

2019-08-07 11:00:31 UTC

Merged/committed to releng/* branches in base r350642

I'm assuming this has been resolved in CURRENT with recent upstream version updates being brought it. Was this also merged to stable/* ? 

If so can we get 'base rXXXXX' references for those

Comment 19 Dimitry Andric freebsd_committer

2019-08-07 13:32:30 UTC

(In reply to Kubilay Kocak from comment #18)
> Merged/committed to releng/* branches in base r350642
> 
> I'm assuming this has been resolved in CURRENT with recent upstream version
> updates being brought it. Was this also merged to stable/* ? 
> 
> If so can we get 'base rXXXXX' references for those

For head, it got fixed in base r345018 ("Merge LLVM libunwind trunk r351319, from just before upstream's release_80 branch point"), on 2019-03-11 18:45:36 UTC.

For stable/12, it got fixed in base r346168 ("Merge llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and openmp 8.0.0 final release"), on 2019-04-12 20:03:27 UTC.

For stable/11, it got fixed in base r346296 ("Merge llvm, clang, compiler-rt, libc++, libunwind, lld, lldb and openmp 8.0.0 final release"), on 2019-04-16 20:05:24 UTC.

Comment 20 Kubilay Kocak freebsd_committer

2019-08-23 03:45:28 UTC

Thanks Dim!