Bug 239894

Summary: security.bsd.stack_guard_page default causes Java to crash
Product: Base System Reporter: Greg Lewis <glewis>
Component: kernAssignee: freebsd-bugs mailing list <bugs>
Status: New ---    
Severity: Affects Many People CC: kib, kurt, markj, shawn.webb
Priority: ---    
Version: 12.0-RELEASE   
Hardware: Any   
OS: Any   

Description Greg Lewis freebsd_committer 2019-08-16 00:02:51 UTC
The default setting of the security.bsd.stack_guard_page sysctl on FreeBSD causes issues for Java thread stack handling.

The most obvious of these is bug #222146 - a Java program which includes infinite recursion will crash rather than throwing StackOverflowError.  While there is a workaround, the workaround is fragile and causes other issues.

A fairly lengthy discussion of the problem from the Java side can be found here:

https://github.com/battleblow/openjdk-jdk11u/issues/51

Briefly, Java attempts to create its own guard pages in the thread stack, but when security.bsd.stack_guard_page is non-zero (by default it is 1), there is an extra guard page which the JVM doesn't anticipate.  This causes SIGSEGV fault addresses to be outside the JVM guard region, meaning that the JVM crashes rather than handling the signal as a StackOverflowError.

The initial naive workarounds discussed (the first of which is still in the ports tree) centre around adjusting for the extra default page of guard.  These are fragile for a couple of reasons.  The obvious one is that the sysctl can be a different value than the default, in which case the workaround is wrong.  This is easy enough to adjust to be reading the value, but that is still fragile since the value can change while the program is running (there are potential additional complications in the pthread library potentially caching stacks as well).  The other problem this causes is that the JVM will use some of its guard region (the reserved region) to allow critical sections to complete so that data is not left in an inconsistent state when a StackOverflowError is thrown.  You can read about that at https://openjdk.java.net/jeps/270.  This facility is completely circumvented by the automated guard page(s) put in place by the kernel since the JVM can't remove the protection on them.

Given that, the only reliable "fix" that works for all the Java functionality is to remove the current workaround from the ports tree and instruct users to set the value of the sysctl to zero.  This is not a great solution for a couple of reasons, since Java is broken by default, and the fix is to remove protection put in place due to the Stack Clash vulnerability.

This bug is basically a request to see if there are any suggestions from the kernel developer side as to how we can have both a fully functional version of Java and the sysctl default still in place.

Note that despite the FreeBSD version selection above, this affects all FreeBSD versions and architectures since https://svnweb.freebsd.org/base?view=revision&revision=320317
Comment 1 Kurt Miller 2019-08-16 13:24:11 UTC
I'd like to additionally point out the current implementation of kernel placed guard pages for MAP_STACK has issues that result in three bugs not specifically related to the jvm:

1) pthread_attr_setguardsize(3) is not working when its size != security.bsd.stack_guard_page size.

2) pthread_create(3) will fail if security.bsd.stack_guard_page is set to number that is greater than pthread_attr_setstacksize(3) size in pages.

3) mprotect(2) of MAP_STACK pages has no effect unless the pages have been touched previously.

Test programs for #1 and #2 can be found here:

https://github.com/battleblow/openjdk-jdk11u/issues/51#issuecomment-522005954

#3 is demonstrated by test the program pthread_attr_setguardsize_test.c when security.bsd.stack_guard_page is set to 1. libthr uses mprotect(3) to set its guard pages and is the reason this test fails.
Comment 2 Konstantin Belousov freebsd_committer 2019-08-20 10:41:44 UTC
(In reply to Greg Lewis from comment #0)
Can you switch from checking that the faulted address belong strongly to the guard pages created explicitly, to the mere fact that the faulted address falls into the (guard pages + stack proper) region ?
Comment 3 Greg Lewis freebsd_committer 2019-08-20 17:23:24 UTC
Thanks for the response Konstantin!

I can see a couple of problems with that approach.

The biggest problem is that not all SIGSEGV should be interpreted as a stack overflow.  With the possibility of whatever homegrown JNI code the user may have running and the internal native code in JDK itself, a SIGSEGV does not necessarily mean that a stack overflow is occurring, and interpreting it that way will lead to other incorrect behaviour.

The second problem is that it seems like that still leaves the JVM unable to allow access to the reserved pages at the top of the guard zone for critical sections to complete.  See the reference I posted earlier to JEP 270 for why that is important (there is another internal test that checks for that).

Kurt may have other concerns, but those would be mine.
Comment 4 Kurt Miller 2019-08-20 17:48:06 UTC
Yes, that's the root of the problem. The JVM needs to be able to deterministically manage its own guard pages independently from both the kernel placed ones and the pthread placed ones. Where 'manage' means change the protection of the pages to PROT_NONE and back to PROT_READ|PROT_WRITE.

I think if the issues mentioned in comment #1 that kernel placed guard pages cause pthreads are corrected, it is very likely the JVM will be able to use the same method as libthr to place its guard pages (currently with mprotect(2)).
Comment 5 Konstantin Belousov freebsd_committer 2019-08-20 18:04:46 UTC
(In reply to Greg Lewis from comment #3)
I am not proposing to interpret any SIGSEGV as a stack overflow.  I propose to consider a SIGSEGV as the stack overflow if it occured in the range of [guard_start; stack_end) for the given stack, instead of the current implementation which considers the range [guard_start; guard_end) (at least this is how I understand what is going on by your explanation).

I can also add e.g. procctl(PROC_STACKGUARD) but to convince me to do that you would need to explain why the above approach is not sufficient.
Comment 6 Greg Lewis freebsd_committer 2019-08-20 21:26:51 UTC
Hi Konstantin,

I think my explanation hasn't been clear enough.  So let me try and include a few more links and some diagrams.

Here is a diagram for what the Java thread stack looks like from https://github.com/battleblow/openjdk-jdk11u/blob/bsd-port/src/hotspot/os/bsd/os_bsd.cpp#L4262

   Low memory addresses
    +------------------------+
    |                        |\  Java thread created by VM does not have
    |   pthread guard page   | - pthread guard, attached Java thread usually
    |                        |/  has 1 pthread guard page.
 P1 +------------------------+ Thread::stack_base() - Thread::stack_size()
    |                        |\
    |  HotSpot Guard Pages   | - red, yellow and reserved pages
    |                        |/
    +------------------------+ JavaThread::stack_reserved_zone_base()
    |                        |\
    |      Normal Stack      | -
    |                        |/
 P2 +------------------------+ Thread::stack_base()

When the JVM is creating the HotSpot guard pages, the kernel, based on the security.bsd.stack_guard_page setting will create some extra guarded pages that extend into the normal stack region.  This causes the SIGSEGV to have a fault address in the normal stack region.

There are two initial problems with this.

The first is that the definition of StackOverflowError is an error that is thrown "Thrown when a stack overflow occurs because an application recurses too deeply." (see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/StackOverflowError.html).  However, there are other reasons a SIGSEGV could occur in the normal stack region (e.g. a buffer overflow).  The JVM uses the guard pages to be able to detect that it is clearly a stack overflow that is causing the SIGSEGV rather than any other possible cause.  You can observe this in the JVM source itself.  See https://github.com/battleblow/openjdk-jdk11u/blob/bsd-port/src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp#L510 where it checks for the fault address being in the guard zone (first in the reserved + yellow zones, which is tries to handle gracefully, and then in the red zone, which is less graceful).  The code on Linux is very similar (see https://github.com/battleblow/openjdk-jdk11u/blob/bsd-port/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp#L356).  I'll note that the continuation of the code provides for some different handling if the fault address doesn't occur within the guard pages.

The second is that you'll note in that code that when a stack overflow does occur, the JVM will often unprotect portions of the guard zone it has set up.  E.g. at https://github.com/battleblow/openjdk-jdk11u/blob/bsd-port/src/hotspot/os_cpu/bsd_x86/os_bsd_x86.cpp#L525.  This is because a StackOverflowError is something the Java program can catch and ignore, if it so chooses.  The reserved pages provide an area the JVM can unprotect to allow a critical code section to complete so that a Java program which catches StackOverflowError and continues execution will not be left in a condition where, for example, it is deadlocked due to the fault occurring during the critical section of changing a lock state.  The pages created by the security.bsd.stack_guard_page setting create problems with doing this.  We're not in the reserved section for starters, but in the normal stack, so unprotecting it won't help.  Also, it was the kernel which protected the pages, the JVM can't unprotect them.  This means the critical section can't complete, meaning that data structures may be in an inconsistent state, which may include a deadlock as above.  The JEP (https://openjdk.java.net/jeps/270) goes into a lot more detail around this and the motivation for introducing reserved pages.

There are some other problems here as well.  E.g., the JVM can't predictably determine which pages might have been protected by the kernel, since the sysctl can be changed dynamically but libthr can cache thread stacks.  These are less likely but still problematic.

Hopefully that has provided some clarification.  I'd also like to draw your attention to Kurt's comment that this doesn't just impact the JVM but the interaction with libthr in general.  This is something to consider in terms of a proposed fix.

I'm also curious about how Linux (and other OSes) went about fixing the Stack Clash vulnerability and whether there is an approach there that might not cause application issues like this.
Comment 7 Konstantin Belousov freebsd_committer 2019-08-21 15:15:32 UTC
(In reply to Greg Lewis from comment #6)
First, I am not sure what you mean by 'ther reasons a SIGSEGV could occur in the normal stack region (e.g. a buffer overflow)'.  If the region is mapped rw, then SIGSEGV cannot occur on accesses.

Second, does JVM use pthread stacks, or is it managing stacks directly by mmap(MAP_STACK) ?  If the later, just remove MAP_STACK from the mmap(2) call and see if it is enough.

Third, how the guard pages are created ?  Do you use mmap(2) or mprotect(2) ?  If the later, try to call mmap(MAP_ANON) before doing mprotect(2) on it.
Comment 8 Konstantin Belousov freebsd_committer 2019-08-21 15:34:17 UTC
I also wrote the promised procctl(PROC_STACKGAP_CTL), see https://reviews.freebsd.org/D21352.  But I do not think that it would alone help with the JVM use of guards.  It might be needed in addition to e.g. use mmap(2) before mprotect(2).
Comment 9 Kurt Miller 2019-08-21 15:54:53 UTC
(In reply to Konstantin Belousov from comment #7)

Answering your questions:
Attempting execute code on the stack I believe with raise a SIGSEGV.

The JVM uses pthread stacks.

Currently the jvm does an mmap(MAP_ANON) before an mprotect(PROT_NONE) on the Hotspot Guard pages at the end of the stack as reported by pthread_attr_getstack(). On 13-current, the kernel moves the kernel managed guard page as follows:

   Low memory addresses
    +------------------------+
    |                        |\  Java thread created by VM does not have
    |   pthread guard page   | - pthread guard, attached Java thread usually
    |                        |/  has 1 pthread guard page.
 P1 +------------------------+ Thread::stack_base() - Thread::stack_size()
    |                        |\
    |  HotSpot Guard Pages   | - red, yellow and reserved pages
    |                        |/
    +------------------------+ JavaThread::stack_reserved_zone_base()
    |                        |\
    |   kernel guard page(s) | - This page is the problem
    |                        |/
    +------------------------+ JavaThread::stack_reserved_zone_base()
    |                        |\
    |      Normal Stack      | -
    |                        |/
 P2 +------------------------+ Thread::stack_base()


The kernel managed guard page follows the HotSpot Guard pages which poses two problems. 

1) SIGSEGV raised before hitting the HotSpot Guard Pages (executing code on stack?)

2) The inability to change the protection on the kernel guard pages so that the thread that hit it can be allowed to temporarily use that space so it can unwind safely.

Please see test program for this case in this comment:

https://github.com/battleblow/openjdk-jdk11u/issues/51#issuecomment-520602973

I encourage you to look at this issue from the perspective of the issues in libthr I outlined in comment #1 instead of from the JVM's needs. If the issues in libthr are corrected, it is very likely that the JVM can use the same approach as libthr to place guard pages at the end of the stack.
Comment 10 Kurt Miller 2019-08-22 00:58:45 UTC
(In reply to Konstantin Belousov from comment #8)

Thank you for writing this. However, it doesn't address the interaction between kernel placed guard pages and pthreads. Consider for example setting security.bsd.stack_guard_page to 513 will call all future pthread_create(2) calls to fail that use default pthread attr (default stack size (512 pages) and guard page (1 page)). An administrator attempting to be more secure will cause most multithreaded applications to fail to work entirely.
Comment 11 Kurt Miller 2019-08-22 01:23:30 UTC
(In reply to Kurt Miller from comment #10)

Sorry, typo in previous comment. It should have read "...setting security.bsd.stack_guard_page to 513 will cause all future pthread_create(2) calls to fail..."

Along the same lines the following command:

sudo sysctl -w security.bsd.stack_guard_page=262144

prevents all programs to start.
Comment 12 Shawn Webb 2019-08-22 12:51:06 UTC
HardenedBSD has noticed spurious issues when setting security.bsd.stack_guard_page greater than 1, especially when building packages. As such, we prevent it from being set to values other than 0 or 1.
Comment 13 Kurt Miller 2019-08-22 13:36:09 UTC
(In reply to Shawn Webb from comment #12)

The authors of the Stack Clash advisory indicate a 4096 byte guard region is not difficult to jump over and avoid. Their recommendation is a 1MB guard region. Restricting the value at 1 leaves this issue unaddressed.

https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt

I think the security.bsd.stack_guard_page mechanism needs a hard look at how it is implemented. I see the following problems with the current approach.

The kernel placed guard pages are placed *within* the stack space requested by mmap(MAP_STACK). This is the primary reason setting this value high causes the issues I pointed out. Currently it not possible for libthr or the JVM to know the number of pages the kernel placed due to TOCTOU and since these pages are kernel placed, userland should not have to do this. The kernel placed guard region should be additional pages outside of the the requested stack size.

mmap on top of a MAP_STACK region causes the kernel to move the kernel managed guard pages into the stack region further. I believe it uses the current value of  security.bsd.stack_guard_page for that but would need to test this to be sure. Nevertheless, moving the kernel managed guard pages means mmap on top of the MAP_STACK region as a precursor to using mprotect means that both libthr and the JVM do not have a way to set their guard pages predictably. The kernel circumvents this by placing additional guard pages that interfere with userland guard pages as described in previous comments and test programs.

Simply using mprotect on top of MAP_STACK pages currently fails to work as well. It appears that for this work the pages need to be previously accessed.

IMO, to address the needs of system security and pthreads guard pages and the JVM some changes need to be made to make things work that consistent with how mmap and mprotect are expected to work.  At a minimum, kernel placed guard pages need to be additional pages that do not invade the space returned from mmap(MAP_STACK). Ideally mprotect on the usable MAP_STACK space would be made to work. This is most straightforward way libthr and the JVM can place their guard pages and is consistent with how these interferences are generally expected to work.
Comment 14 Shawn Webb 2019-08-22 13:51:02 UTC
(In reply to Kurt Miller from comment #13)
> The authors of the Stack Clash advisory indicate a 4096 byte guard region is not difficult to jump over and avoid. Their recommendation is a 1MB guard region. Restricting the value at 1 leaves this issue unaddressed.

Fully agreed. However, the implementation in FreeBSD has shown to be unstable at values greater than 1. For extra fun, set it to a negative value. :)
Comment 15 Greg Lewis freebsd_committer 2019-08-27 17:26:12 UTC
Hi Konstantin,

It looks like you have been working on the procctl approach based on https://reviews.freebsd.org/D21352.  Thanks for doing that!  A couple of questions/comments though.

Can we help test this approach?  I.e., verify that fixes the problem for the JDK?  I'm happy to compile a kernel with the changes and see if that resolves the issue if that would help.

A concern though is that this doesn't address the other problems outside of the JDK that have been raised in this thread.  Those are the issues raced by Kurt and Shawn in the comments here including problems with some values of this sysctl making the system unstable or unusable, whether it should be placing pages within the stack at all, and whether stable values (i.e., 0 or 1) actually prevent the stack clash issues at all.  Do you see what you're doing as a stop gap prior to longer term work here?
Comment 16 Konstantin Belousov freebsd_committer 2019-08-28 10:35:26 UTC
(In reply to Greg Lewis from comment #15)
To check that PROC_STACKGAP_CTL helps, please build kernel and world (or just usr.sbin/proccontrol) with D21352 applied.  Then you can run unmodified java binary like that:
  $ proccontrol -m stackgap -s disable java <args>
and stack overflow detection should work same as before stack clash fixes.

It if works, the best route would be to add procctl(PROC_STACKGAP_CTL, DISABLE)
call at the beginning of the java vm initialization.

WRT future work, I might add some code to outguess the need of the procctl(2), disabling gaps if mappings/mprotects are detected to fall into the gap area to
probably catch overflows.  I am not big fan of this approach because it
effectively disables clash protection, which is the reason why I did not
implemented that already.
Comment 17 Greg Lewis freebsd_committer 2019-08-30 22:58:04 UTC
I got a VM set up with a recent -CURRENT, applied the patches from the review and rebuilt world.

I can confirm that after that I can run

proccontrol -m stackgap -s disable java -cp . InfiniteRecursion

and get a StackOverflowError as expected.  Running without the proccontrol line yields a crash as it did previously.
Comment 18 commit-hook freebsd_committer 2019-09-03 18:57:29 UTC
A commit references this bug:

Author: kib
Date: Tue Sep  3 18:56:27 UTC 2019
New revision: 351773
URL: https://svnweb.freebsd.org/changeset/base/351773

Log:
  Add procctl(PROC_STACKGAP_CTL)

  It allows a process to request that stack gap was not applied to its
  stacks, retroactively.  Also it is possible to control the gaps in the
  process after exec.

  PR:	239894
  Reviewed by:	alc
  Sponsored by:	The FreeBSD Foundation
  Differential revision:	https://reviews.freebsd.org/D21352

Changes:
  head/lib/libc/sys/procctl.2
  head/sys/compat/freebsd32/freebsd32_misc.c
  head/sys/kern/kern_exec.c
  head/sys/kern/kern_fork.c
  head/sys/kern/kern_procctl.c
  head/sys/sys/proc.h
  head/sys/sys/procctl.h
  head/sys/vm/vm_map.c
Comment 19 commit-hook freebsd_committer 2019-09-03 18:59:31 UTC
A commit references this bug:

Author: kib
Date: Tue Sep  3 18:58:48 UTC 2019
New revision: 351774
URL: https://svnweb.freebsd.org/changeset/base/351774

Log:
  Add stackgap control mode to proccontrol(1).

  PR:	239894
  Reviewed by:	alc
  Sponsored by:	The FreeBSD Foundation
  MFC after:	1 week
  Differential revision:	https://reviews.freebsd.org/D21352

Changes:
  head/usr.bin/proccontrol/proccontrol.c
Comment 20 Konstantin Belousov freebsd_committer 2019-09-04 20:39:47 UTC
After r351773, you can add the following fragment at the beginning of the jvm initialization.  It is safe to ignore errors from procctl(2), which means that the kernel is old and stack overflow detection would be still broken.

	int arg = PROC_STACKGAP_DISABLE | PROC_STACKGAP_ENABLE_EXEC;
	procctl(P_PID, getpid(), PROC_STACKGAP_CTL, &arg);
Comment 21 Greg Lewis freebsd_committer 2019-09-06 03:25:41 UTC
Thanks for the update Konstantin.  Can I take it from the MFC timeline that you are intending to merge this back to 12-STABLE prior to the branch for 12.1?
Comment 22 Konstantin Belousov freebsd_committer 2019-09-06 07:45:50 UTC
(In reply to Greg Lewis from comment #21)
Yes, it should be in stable/12 before 12.1.

I think it will be merged to 11 as well, unless it appeared to be too complicated.
Comment 23 Greg Lewis freebsd_committer 2019-09-06 15:04:12 UTC
Thanks, that is good to hear.
Comment 24 commit-hook freebsd_committer 2019-09-10 06:46:39 UTC
A commit references this bug:

Author: kib
Date: Tue Sep 10 06:45:46 UTC 2019
New revision: 352117
URL: https://svnweb.freebsd.org/changeset/base/352117

Log:
  MFC r351773:
  Add procctl(PROC_STACKGAP_CTL).

  PR:	239894

Changes:
_U  stable/12/
  stable/12/lib/libc/sys/procctl.2
  stable/12/sys/compat/freebsd32/freebsd32_misc.c
  stable/12/sys/kern/kern_exec.c
  stable/12/sys/kern/kern_fork.c
  stable/12/sys/kern/kern_procctl.c
  stable/12/sys/sys/proc.h
  stable/12/sys/sys/procctl.h
  stable/12/sys/vm/vm_map.c
Comment 25 commit-hook freebsd_committer 2019-09-10 06:48:42 UTC
A commit references this bug:

Author: kib
Date: Tue Sep 10 06:47:40 UTC 2019
New revision: 352118
URL: https://svnweb.freebsd.org/changeset/base/352118

Log:
  MFC r351774:
  Add stackgap control mode to proccontrol(1).

  PR:	239894

Changes:
_U  stable/12/
  stable/12/usr.bin/proccontrol/proccontrol.c
Comment 26 commit-hook freebsd_committer 2019-09-10 07:29:47 UTC
A commit references this bug:

Author: kib
Date: Tue Sep 10 07:29:23 UTC 2019
New revision: 352125
URL: https://svnweb.freebsd.org/changeset/base/352125

Log:
  MFC r351773:
  Add procctl(PROC_STACKGAP_CTL).

  PR:	239894

Changes:
_U  stable/11/
  stable/11/lib/libc/sys/procctl.2
  stable/11/sys/compat/freebsd32/freebsd32_misc.c
  stable/11/sys/kern/kern_exec.c
  stable/11/sys/kern/kern_fork.c
  stable/11/sys/kern/kern_procctl.c
  stable/11/sys/sys/proc.h
  stable/11/sys/sys/procctl.h
  stable/11/sys/vm/vm_map.c
Comment 27 commit-hook freebsd_committer 2019-09-10 09:57:59 UTC
A commit references this bug:

Author: kib
Date: Tue Sep 10 09:57:25 UTC 2019
New revision: 352133
URL: https://svnweb.freebsd.org/changeset/base/352133

Log:
  MFC r351774:
  Add stackgap control mode to proccontrol(1).

  PR:	239894

Changes:
_U  stable/11/
  stable/11/usr.bin/proccontrol/proccontrol.c