Bug 241905 - SSP setup is not thread-safe ?
Summary: SSP setup is not thread-safe ?
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Kyle Evans
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-11-12 05:20 UTC by Kyle Evans
Modified: 2019-11-25 03:51 UTC (History)
5 users (show)

See Also:
kevans: mfc-stable12+
kevans: mfc-stable11+


Attachments
minimal reproducer-ish (656 bytes, text/plain)
2019-11-12 14:35 UTC, Kyle Evans
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kyle Evans freebsd_committer freebsd_triage 2019-11-12 05:20:40 UTC
I think this is what it boils down to- qemu creates detached threads in __constructor__ functions. When compiled statically with clang, maybe 40% of the time I hit an SSP failure in one of these threads.

My current theory is that these threads spawn and race against __guard_setup. I suspect that the SSP cookie is setup as the default, then later compared after __guard_setup has run and the cookie has actually been initialized, then we get a "failure" and qemu dies.

LLVM does a load from __stack_chk_guard, and that seems to just be a plain ol' global, so the race seems plausible.
Comment 1 Kyle Evans freebsd_committer freebsd_triage 2019-11-12 05:23:50 UTC
CC a couple of people that may be able to assist- kib@ and jilles@ being generally knowledgeable about libc stuff.
Comment 2 Conrad Meyer freebsd_committer freebsd_triage 2019-11-12 06:51:00 UTC
OpenBSD has an (?)interesting solution to this problem: they hack in an ".openbsd.randomdata" section to their ELF objects and initialize it before an object's ctors run.  (Of course, __stack_chk_guard is placed in that section.)

I think this could also be solved by ensuring the __guard_setup ctor runs first, in some implementation-defined way.
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2019-11-12 12:45:34 UTC
What is the object that contains constructors creating the threads ?  Rtld runs user initializers in the reverse-dep order, i.e. the libraries not having dependencies are initialized first, then libs that directly depend on already initialized libraries, and the main object is processed last.

libc normally must be linked as the dependency for everything, so it naturally appears at the root of the dependency graph and initialized first.  If not, it probably means that the build system underlinks.
Comment 4 Kyle Evans freebsd_committer freebsd_triage 2019-11-12 12:48:49 UTC
Ah, I left out an important detail- this only occurs in the statically-linked binaries we produce to make qemu-bsd-user useful/easy to use in jails. Sorry about that. =-(
Comment 5 Konstantin Belousov freebsd_committer freebsd_triage 2019-11-12 13:05:00 UTC
(In reply to Kyle Evans from comment #4)
There is a priority for constructors.  Try to specify some for app-provided constructors.
Comment 6 Kyle Evans freebsd_committer freebsd_triage 2019-11-12 13:12:04 UTC
(In reply to Konstantin Belousov from comment #5)

Huh, today I learned about __constructor__ priority, too. =-)

It looks like it's really only one __constructor__ creating a thread, and that constructor is using whatever the toolchain default is: https://github.com/qemu/qemu/blob/master/util/rcu.c#L376

I guess this is the same as __guard_setup, https://svnweb.freebsd.org/base/head/lib/libc/secure/stack_protector.c?revision=332940&view=markup#l47 -> I can bump the priority on qemu's to 200 and that sufficiently works around the problem, but it seems like __guard_setup should be specifically lower than the default.
Comment 7 Kyle Evans freebsd_committer freebsd_triage 2019-11-12 14:35:16 UTC
Created attachment 209102 [details]
minimal reproducer-ish

Attaching a minimal producer- obviously your mileage may vary, compiled with " cc -fstack-protector-all -O0 -static -pthread repro.c" for optimal effect, then ran in a loop `for i in $(seq 1 100); do ./a.out; done` because I wasn't sure how to tune it more effectively when I'm trying to get it to race against a constructor in libc. Adding more runtime to the misnamed ssp_canary_killer would also help reproduce, but that takes more energy.
Comment 8 commit-hook freebsd_committer freebsd_triage 2019-11-13 02:15:10 UTC
A commit references this bug:

Author: kevans
Date: Wed Nov 13 02:14:17 UTC 2019
New revision: 354669
URL: https://svnweb.freebsd.org/changeset/base/354669

Log:
  ssp: add a priority to the __stack_chk_guard constructor

  First, this commit is a NOP on GCC <= 4.x; this decidedly doesn't work
  cleanly on GCC 4.2, and it will be gone soon anyways so I chose not to dump
  time into figuring out if there's a way to make it work. xtoolchain-gcc,
  clocking in as GCC6, can cope with it just fine and later versions are also
  generally ok with the syntax. I suspect very few users are running GCC4.2
  built worlds and also experiencing potential fallout from the status quo.

  For dynamically linked applications, this change also means very little.
  rtld will run libc ctors before most others, so the situation is
  approximately a NOP for these as well.

  The real cause for this change is statically linked applications doing
  almost questionable things in their constructors. qemu-user-static, for
  instance, creates a thread in a global constructor for their async rcu
  callbacks. In general, this works in other places-

  - On OpenBSD, __stack_chk_guard is stored in an .openbsd.randomdata section
    that's initialized by the kernel in the static case, or ld.so in the
    dynamic case
  - On Linux, __stack_chk_guard is apparently stored in TLS and such a problem
    is circumvented there because the value is presumed stable in the new
    thread.

  On FreeBSD, the rcu thread creation ctor and __guard_setup are both unmarked
  priority. qemu-user-static spins up the rcu thread prior to __guard_setup
  which starts making function calls- some of these are sprinkled with the
  canary. In the middle of one of these functions, __guard_setup is invoked in
  the main thread and __stack_chk_guard changes- qemu-user-static is promptly
  terminated for an SSP violation that didn't actually happen.

  This is not an all-too-common problem. We circumvent it here by giving the
  __stack_chk_guard constructor a solid priority. 200 was chosen because that
  gives static applications ample range (down to 101) for working around it
  if they really need to. I suspect most applications will "just work" as
  expected- the default/non-prioritized flavor of __constructor__ functions
  run last, and the canary is generally not expected to change as of this
  point at the very least.

  This took approximately three weeks of spare time debugging to pin down.

  PR:		241905

Changes:
  head/lib/libc/secure/stack_protector.c
Comment 9 Kubilay Kocak freebsd_committer freebsd_triage 2019-11-25 01:10:59 UTC
^Triage:

* Assign to committer resolving
* Merging?
Comment 10 commit-hook freebsd_committer freebsd_triage 2019-11-25 03:50:17 UTC
A commit references this bug:

Author: kevans
Date: Mon Nov 25 03:49:39 UTC 2019
New revision: 355080
URL: https://svnweb.freebsd.org/changeset/base/355080

Log:
  MFC r354669, r354672, r354689: move __stack_chk_guard constructor

  r354669:
  ssp: add a priority to the __stack_chk_guard constructor

  First, this commit is a NOP on GCC <= 4.x; this decidedly doesn't work
  cleanly on GCC 4.2, and it will be gone soon anyways so I chose not to dump
  time into figuring out if there's a way to make it work. xtoolchain-gcc,
  clocking in as GCC6, can cope with it just fine and later versions are also
  generally ok with the syntax. I suspect very few users are running GCC4.2
  built worlds and also experiencing potential fallout from the status quo.

  For dynamically linked applications, this change also means very little.
  rtld will run libc ctors before most others, so the situation is
  approximately a NOP for these as well.

  The real cause for this change is statically linked applications doing
  almost questionable things in their constructors. qemu-user-static, for
  instance, creates a thread in a global constructor for their async rcu
  callbacks. In general, this works in other places-

  - On OpenBSD, __stack_chk_guard is stored in an .openbsd.randomdata section
    that's initialized by the kernel in the static case, or ld.so in the
    dynamic case
  - On Linux, __stack_chk_guard is apparently stored in TLS and such a problem
    is circumvented there because the value is presumed stable in the new
    thread.

  On FreeBSD, the rcu thread creation ctor and __guard_setup are both unmarked
  priority. qemu-user-static spins up the rcu thread prior to __guard_setup
  which starts making function calls- some of these are sprinkled with the
  canary. In the middle of one of these functions, __guard_setup is invoked in
  the main thread and __stack_chk_guard changes- qemu-user-static is promptly
  terminated for an SSP violation that didn't actually happen.

  This is not an all-too-common problem. We circumvent it here by giving the
  __stack_chk_guard constructor a solid priority. 200 was chosen because that
  gives static applications ample range (down to 101) for working around it
  if they really need to. I suspect most applications will "just work" as
  expected- the default/non-prioritized flavor of __constructor__ functions
  run last, and the canary is generally not expected to change as of this
  point at the very least.

  This took approximately three weeks of spare time debugging to pin down.

  r354672:
  ssp: rework the logic to use priority=200 on clang builds

  The preproc logic was added at the last minute to appease GCC 4.2, and
  kevans@ did clearly not go back and double-check that the logic worked out
  for clang builds to use the new variant.

  It turns out that clang defines __GNUC__ == 4. Flip it around and check
  __clang__ as well, leaving a note to remove it later.

  r354689:
  ssp: further refine the conditional used for constructor priority

  __has_attribute(__constructor__) is a better test for clang than
  defined(__clang__). Switch to it instead.

  While we're already here and touching it, pfg@ nailed down when GCC actually
  introduced the priority argument -- 4.3. Use that instead of our
  hammer-guess of GCC >= 5 for the sake of correctness.

  PR:		241905

Changes:
_U  stable/11/
  stable/11/lib/libc/secure/stack_protector.c
_U  stable/12/
  stable/12/lib/libc/secure/stack_protector.c