Created attachment 174065 [details]
We have an application running tomcat on several machines. They crashes on a more or less daily basis.
They are running openjdk8-8.102.14 or openjdk8-8.92.14_2, both experience similar problems
Some of the "Problematic frame"s:
# J 26736 C2 java.util.AbstractMap.equals(Ljava/lang/Object;)Z (145 bytes) @ 0x0000000808907030 [0x0000000808907000+0x30]
# j org.hibernate.type.descriptor.java.AbstractTypeDescriptor.getJavaTypeClass()Ljava/lang/Class;+0
# J 9985 C1 org.hibernate.type.descriptor.java.AbstractTypeDescriptor.getJavaTypeClass()Ljava/lang/Class; (5 bytes) @ 0x000000080401afe0 [0x000000080401afa0+0x40]
# J 16010 C2 org.hibernate.internal.SessionImpl.getEntityUsingInterceptor(Lorg/hibernate/engine/spi/EntityKey;)Ljava/lang/Object; (51 bytes) @ 0x0000000806392210 [0x00000008063921e0+0x30]
# J 31297 C2 org.hibernate.type.descriptor.java.AbstractTypeDescriptor.areEqual(Ljava/lang/Object;Ljava/lang/Object;)Z (6 bytes) @ 0x00000008082f95d0 [0x00000008082f95a0+0x30]
# J 8453 C1 org.hibernate.engine.spi.CascadeStyle$2.doCascade(Lorg/hibernate/engine/spi/CascadingAction;)Z (2 bytes) @ 0x0000000805237e20 [0x0000000805237de0+0x40]
# J 67410 C2 java.lang.String.hashCode()I (55 bytes) @ 0x0000000806ed9af0 [0x0000000806ed9ae0+0x10]
# J 37130 C2 org.hibernate.proxy.pojo.javassist.JavassistLazyInitializer.invoke(Ljava/lang/Object;Ljava/lang/reflect/Method;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object; (214 bytes) @ 0x00000008082648b0 [0x0000000808264880+0x30]
# J 20663 C2 org.hibernate.engine.spi.CascadeStyle.reallyDoCascade(Lorg/hibernate/engine/spi/CascadingAction;)Z (6 bytes) @ 0x00000008076e5e50 [0x00000008076e5e20+0x30]
# J 6799 C1 org.hibernate.type.EntityType.isAssociationType()Z (2 bytes) @ 0x0000000804e952a0 [0x0000000804e95260+0x40]
# J 24608 C2 org.hibernate.event.internal.DefaultSaveOrUpdateEventListener.reassociateIfUninitializedProxy(Ljava/lang/Object;Lorg/hibernate/engine/spi/SessionImplementor;)Z (13 bytes) @ 0x00000008080937b0 [0x0000000808093780+0x30]
# J 8330 C1 org.hibernate.engine.spi.CascadeStyle$2.doCascade(Lorg/hibernate/engine/spi/CascadingAction;)Z (2 bytes) @ 0x00000008050b4160 [0x00000008050b4120+0x40]
The servers endure high loads, and so far we cannot see any clear pattern except high load (not extreme, just that systems with higher load are more susceptible to crash).
I've been working with some folks over at OpenNMS that were experiencing similar crashes. Increasing the stack size to 8M seems to have fixed their issue, I wonder if would also resolve yours.
Increase the stack size by adding this startup option:
I haven't had the time to research why increasing the stack size on Freebsd was necessary yet, but I hope it helps you none the less. Please report back whether it helps or not.
(In reply to openjdk from comment #1)
Thank you for the prompt reply!
We tried this the same might you reported it and two weeks later we are still running with no crashes. Thumbs up! :)
This is obviously a work-around, there is a bug in there somewhere that fails to throw a proper exception or remedy the problem in some way.
-Xss8m does help
As a side note, we first tried to turn off compressing 64bit pointers to 32 bit (-XX:-UseCompressedOops) but that did not work. It did however help jmap to create heap dumps from huge (>2^32 bytes) core dumps.
I'm glad it helped. I agree that this is just a workaround and not a fix. I took a peak at libthr and I don't believe there is a way to detect a native stack overflow. I also looked into the default sizing logic and it looks like the default is identical to Linux, at 1M for 64bit architectures. It looks like the default non-initial thread stack size is 2MB for 64bit architectures, so I wonder if 1MB is just too low for FreeBSD.
It looks like the stack sizing logic was directly ported from Linux. So maybe the solution would be to bump up the defaults. Does anybody have any clue as to why we would need a bigger native stack on Freebsd vs Linux?
We are seeing new crashes, not as frequent as before, but occasional crashes anyway.
We are running with -Xss8m
Is it possible to see from my original report that it was the stack, or was it just a hunch?
Just for reference, the intermittent crashes we saw where due to infinite loops or recursions. When we bumped the stack size, we got stack traces and could finally track down the root cause.
The bug in OpenJDK is still there though, and even with the larger stack size, we still saw occasional seg faults where java should really throw StackOverflow exception.
(In reply to Palle Girgensohn from comment #5)
Can you elaborate on the root cause and wether this has been reported to Oracle?