I think this is what it boils down to- qemu creates detached threads in __constructor__ functions. When compiled statically with clang, maybe 40% of the time I hit an SSP failure in one of these threads. My current theory is that these threads spawn and race against __guard_setup. I suspect that the SSP cookie is setup as the default, then later compared after __guard_setup has run and the cookie has actually been initialized, then we get a "failure" and qemu dies. LLVM does a load from __stack_chk_guard, and that seems to just be a plain ol' global, so the race seems plausible.
CC a couple of people that may be able to assist- kib@ and jilles@ being generally knowledgeable about libc stuff.
OpenBSD has an (?)interesting solution to this problem: they hack in an ".openbsd.randomdata" section to their ELF objects and initialize it before an object's ctors run. (Of course, __stack_chk_guard is placed in that section.) I think this could also be solved by ensuring the __guard_setup ctor runs first, in some implementation-defined way.
What is the object that contains constructors creating the threads ? Rtld runs user initializers in the reverse-dep order, i.e. the libraries not having dependencies are initialized first, then libs that directly depend on already initialized libraries, and the main object is processed last. libc normally must be linked as the dependency for everything, so it naturally appears at the root of the dependency graph and initialized first. If not, it probably means that the build system underlinks.
Ah, I left out an important detail- this only occurs in the statically-linked binaries we produce to make qemu-bsd-user useful/easy to use in jails. Sorry about that. =-(
(In reply to Kyle Evans from comment #4) There is a priority for constructors. Try to specify some for app-provided constructors.
(In reply to Konstantin Belousov from comment #5) Huh, today I learned about __constructor__ priority, too. =-) It looks like it's really only one __constructor__ creating a thread, and that constructor is using whatever the toolchain default is: https://github.com/qemu/qemu/blob/master/util/rcu.c#L376 I guess this is the same as __guard_setup, https://svnweb.freebsd.org/base/head/lib/libc/secure/stack_protector.c?revision=332940&view=markup#l47 -> I can bump the priority on qemu's to 200 and that sufficiently works around the problem, but it seems like __guard_setup should be specifically lower than the default.
Created attachment 209102 [details] minimal reproducer-ish Attaching a minimal producer- obviously your mileage may vary, compiled with " cc -fstack-protector-all -O0 -static -pthread repro.c" for optimal effect, then ran in a loop `for i in $(seq 1 100); do ./a.out; done` because I wasn't sure how to tune it more effectively when I'm trying to get it to race against a constructor in libc. Adding more runtime to the misnamed ssp_canary_killer would also help reproduce, but that takes more energy.
A commit references this bug: Author: kevans Date: Wed Nov 13 02:14:17 UTC 2019 New revision: 354669 URL: https://svnweb.freebsd.org/changeset/base/354669 Log: ssp: add a priority to the __stack_chk_guard constructor First, this commit is a NOP on GCC <= 4.x; this decidedly doesn't work cleanly on GCC 4.2, and it will be gone soon anyways so I chose not to dump time into figuring out if there's a way to make it work. xtoolchain-gcc, clocking in as GCC6, can cope with it just fine and later versions are also generally ok with the syntax. I suspect very few users are running GCC4.2 built worlds and also experiencing potential fallout from the status quo. For dynamically linked applications, this change also means very little. rtld will run libc ctors before most others, so the situation is approximately a NOP for these as well. The real cause for this change is statically linked applications doing almost questionable things in their constructors. qemu-user-static, for instance, creates a thread in a global constructor for their async rcu callbacks. In general, this works in other places- - On OpenBSD, __stack_chk_guard is stored in an .openbsd.randomdata section that's initialized by the kernel in the static case, or ld.so in the dynamic case - On Linux, __stack_chk_guard is apparently stored in TLS and such a problem is circumvented there because the value is presumed stable in the new thread. On FreeBSD, the rcu thread creation ctor and __guard_setup are both unmarked priority. qemu-user-static spins up the rcu thread prior to __guard_setup which starts making function calls- some of these are sprinkled with the canary. In the middle of one of these functions, __guard_setup is invoked in the main thread and __stack_chk_guard changes- qemu-user-static is promptly terminated for an SSP violation that didn't actually happen. This is not an all-too-common problem. We circumvent it here by giving the __stack_chk_guard constructor a solid priority. 200 was chosen because that gives static applications ample range (down to 101) for working around it if they really need to. I suspect most applications will "just work" as expected- the default/non-prioritized flavor of __constructor__ functions run last, and the canary is generally not expected to change as of this point at the very least. This took approximately three weeks of spare time debugging to pin down. PR: 241905 Changes: head/lib/libc/secure/stack_protector.c
^Triage: * Assign to committer resolving * Merging?
A commit references this bug: Author: kevans Date: Mon Nov 25 03:49:39 UTC 2019 New revision: 355080 URL: https://svnweb.freebsd.org/changeset/base/355080 Log: MFC r354669, r354672, r354689: move __stack_chk_guard constructor r354669: ssp: add a priority to the __stack_chk_guard constructor First, this commit is a NOP on GCC <= 4.x; this decidedly doesn't work cleanly on GCC 4.2, and it will be gone soon anyways so I chose not to dump time into figuring out if there's a way to make it work. xtoolchain-gcc, clocking in as GCC6, can cope with it just fine and later versions are also generally ok with the syntax. I suspect very few users are running GCC4.2 built worlds and also experiencing potential fallout from the status quo. For dynamically linked applications, this change also means very little. rtld will run libc ctors before most others, so the situation is approximately a NOP for these as well. The real cause for this change is statically linked applications doing almost questionable things in their constructors. qemu-user-static, for instance, creates a thread in a global constructor for their async rcu callbacks. In general, this works in other places- - On OpenBSD, __stack_chk_guard is stored in an .openbsd.randomdata section that's initialized by the kernel in the static case, or ld.so in the dynamic case - On Linux, __stack_chk_guard is apparently stored in TLS and such a problem is circumvented there because the value is presumed stable in the new thread. On FreeBSD, the rcu thread creation ctor and __guard_setup are both unmarked priority. qemu-user-static spins up the rcu thread prior to __guard_setup which starts making function calls- some of these are sprinkled with the canary. In the middle of one of these functions, __guard_setup is invoked in the main thread and __stack_chk_guard changes- qemu-user-static is promptly terminated for an SSP violation that didn't actually happen. This is not an all-too-common problem. We circumvent it here by giving the __stack_chk_guard constructor a solid priority. 200 was chosen because that gives static applications ample range (down to 101) for working around it if they really need to. I suspect most applications will "just work" as expected- the default/non-prioritized flavor of __constructor__ functions run last, and the canary is generally not expected to change as of this point at the very least. This took approximately three weeks of spare time debugging to pin down. r354672: ssp: rework the logic to use priority=200 on clang builds The preproc logic was added at the last minute to appease GCC 4.2, and kevans@ did clearly not go back and double-check that the logic worked out for clang builds to use the new variant. It turns out that clang defines __GNUC__ == 4. Flip it around and check __clang__ as well, leaving a note to remove it later. r354689: ssp: further refine the conditional used for constructor priority __has_attribute(__constructor__) is a better test for clang than defined(__clang__). Switch to it instead. While we're already here and touching it, pfg@ nailed down when GCC actually introduced the priority argument -- 4.3. Use that instead of our hammer-guess of GCC >= 5 for the sake of correctness. PR: 241905 Changes: _U stable/11/ stable/11/lib/libc/secure/stack_protector.c _U stable/12/ stable/12/lib/libc/secure/stack_protector.c