Bug 262192 - Crashes at boot with kern.random.initial_seeding.bypass_before_seeding=0 in randomdev_wait_until_seeded()
Summary: Crashes at boot with kern.random.initial_seeding.bypass_before_seeding=0 in r...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2022-02-25 13:20 UTC by Olivier Certner
Modified: 2022-03-18 03:55 UTC (History)
3 users (show)

See Also:
koobs: mfc-stable13?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Olivier Certner freebsd_committer freebsd_triage 2022-02-25 13:20:50 UTC
If kern.random.initial_seeding.bypass_before_seeding is set 0, I get an insta panic when booting 13-STABLE (0add00229d540ef8a98b) in VirtualBox.

Here is part of the message (copied by hand; if you want more details, please ask):

panic: page fault

--- trap 0xc
_thread_lock()
sleepq_add()
_sleep()
randomdev_wait_until_seeded()
read_random()
chacha20_randomstir()
arc4rand()
__stack_chk_init()
mi_startup()
btext()
Comment 1 Olivier Certner freebsd_committer freebsd_triage 2022-02-25 13:23:51 UTC
Correction: It is on a modified kernel on top of the abovementioned revision with kern.random.initial_seeding.bypass_before_seeding set to 1 in the code directly (random_bypass_before_seeding set to true in sys/dev/random/random_infra.c).
Comment 2 Olivier Certner freebsd_committer freebsd_triage 2022-02-25 13:37:02 UTC
Sorry again, too tired...

kern.random.initial_seeding.bypass_before_seeding set to *0* (random_bypass_before_seeding set to *false* in sys/dev/random/random_infra.c)
Comment 3 Robert Wing freebsd_committer freebsd_triage 2022-02-25 17:57:12 UTC
For whats it worth, on stable/13 commit 0add00229d54, I wasn't able to reproduce this in a bhyve vm when setting kern.random.initial_seeding.bypass_before_seeding=0 in /boot/loader.conf
Comment 4 Conrad Meyer freebsd_committer freebsd_triage 2022-02-26 00:39:12 UTC
I'm not really sure why tsleep() is panicing at that point.  Presumably that initialization happens later in mi_startup.

Your virtual hardware plus operating system configuration is resulting in a system that is not seeded by the time mi_startup hits __stack_chk_init (SI_SUB_RANDOM:SI_ORDER_ANY).  This means that the system did not have any bootloader entropy (SI_SUB_RANDOM:SI_ORDER_FOURTH).

__stack_chk_init happens very early, before random_kthread can run.  So the system hasn't polled any "devices" for entropy, including RDRAND.

First order fix is to figure out why you don't have /boot/entropy (or whatever) and fix that.  (Most VMs don't have this problem.)  Second order workaround is to revert e199792d23341b0a887bf54c262147b213edd556 and set security.stack_protect.permit_nonrandom_cookies=1.  This is practically similar to not compiling in stackcookies at all.  Third option is to set kern.random.initial_seeding.bypass_before_seeding=1.

Could this be better?  Sure.  Chacha20_randomstir and/or read_random could attempt to manually poll available random sources (assuming any are loaded and available!) during early boot if not yet seeded, or fetch jitter entropy (maybe not this early, though -- if tsleep is broken, jitter might be as well).  Kyle Evans has expressed some interest in working on the latter.  tsleep() could detect early boot and not panic instead of taking whatever uninitialized lock it's taking.  I don't plan to do any of that.
Comment 5 Olivier Certner freebsd_committer freebsd_triage 2022-02-28 14:59:34 UTC
(In reply to Conrad Meyer from comment #4)

Hi Conrad,

> First order fix is to figure out why you don't have /boot/entropy
> (or whatever) and fix that.  (Most VMs don't have this problem.)

Thanks for pointing this out. The image is entirely custom and indeed I didn't generate some entropy file for boot.

I could create "/boot/entropy" to avoid the panic, but I fear that this "solution" could actually threaten security in the following use case: Having a read-only image (VM, CD-ROM or USB) that is always used to boot a machine.

In this case, the same "/boot/entropy" will be used over and over, leading to the same guard at each boot (similarly to the static guard used in the code before e199792d23341b0a; the only slight advantage is that the static guard would not be known in advance by simple source code inspection). Worse, it consumes only part of the entropy, so later subsystems using arc4rand() will also consume the same entropy boot after boot, until no more entropy is available and the generator has to be reseeded.

If my reasoning is correct, this means that providing a constant "/boot/entropy" will effectively, for the first routines needing random numbers, bypass the security offered by "kern.random.initial_seeding.bypass_before_seeding set to 0". Don't know the practical extent of the problem (who does request random bytes at boot, and for what), but this is a priori worrying.

>Second order workaround is to revert e199792d23341b0a887bf54c262147b213edd556
> and set security.stack_protect.permit_nonrandom_cookies=1.
> This is practically similar to not compiling in stackcookies at all.

At least, this solution avoids these concerns, at the price of a static guard. I agree that this means no stack cookies at all, but only in the perspective of a deliberate attack (the mechanism stays useful to uncover stack overflow bugs) and provided I don't change the hardcoded value myself (then the canary would still have to be discovered once, as if the same "/boot/loader.conf" is used over and over; this is weaker than the current vanilla randomization).

> Third option is to set kern.random.initial_seeding.bypass_before_seeding=1.

Yes, but that's precisely what I've been trying to avoid from the start. I want the random source to always be seeded; if it is not then the system has to wait until there is enough entropy.

> Could this be better?  Sure. (snip)

Interesting possibilities to look at. Thanks. I may look at that (not before several weeks, unfortunately) if Kyle doesn't in the meantime.

Don't know the inner details of SSP, but is the same guard used for all processes/threads? I assume that __stack_chk_guard is salted with specific process/thread info, or some other random source. So it might possible to live with a static guard initially (influencing only some kernel processes) and then change it while entropy is available (for all later processes/threads). Do you think it's a viable idea?
Comment 6 Conrad Meyer freebsd_committer freebsd_triage 2022-02-28 17:37:08 UTC
(In reply to Olivier Certner from comment #5)

Hi Olivier,

Yes, the CSPRNG subsystem is not really designed to be usable from very early in boot with a read-only image, with the various problems you accurately describe.

As far as uncovering stack overflow bugs: doesn't a system without stack cookies also work to uncover stack overflow bugs?  Most of the time, accidental corruption of the return address will also crash the process.

> Don't know the inner details of SSP, but is the same guard used for all processes/threads?

The initialization described in this bug is only for the kernel's stack cookies.  The kernel is essentially a privileged process that lives for the entire boot.  As far as I know, there is no way to safely change the stack guard cookie values of the running kernel.  (I imagine you would have to suspend all cores, including interrupts, and walk all thread stacks, rewriting the cookies.  Or add a layer of indirection to stack check failures.)

Userspace initializes __stack_chk_guard in lib/libc/secure/stack_protector.c, from the AT_CANARY auxinfo.  Auxinfo is initialized in sys/kern/imgact_elf.c from imgp->canary.  For FreeBSD processes (Linuxemul differs), canary is initialized in sys/kern/kern_exec.c by arc4rand(9).

In short, userspace processes are seeded with their own stack guards based on the best random available when they are started -- not a clone of the kernel's stack guards.  (Intuitively, leaking the kernel stack guards to userspace processes would kind of defeat the point of having unpredictable kernel stack guards.  And shared userspace stack guards between processes would also somewhat defeat the point of having unpredictable stack guards.)

I think the most satisfying directions for you to pursue are likely going to be (1) static kernel stack guards, if you can live with that and if that is the only early random request blocking boot or (2) implementing early on-demand seeding in one of the ways discussed in comment #4.

Best,
Conrad
Comment 7 Olivier Certner freebsd_committer freebsd_triage 2022-03-04 11:16:30 UTC
(In reply to Conrad Meyer from comment #6)

Hi Conrad,

> As far as uncovering stack overflow bugs: doesn't a system without stack
> cookies also work to uncover stack overflow bugs?  Most of the time,
> accidental corruption of the return address will also crash the process.

Yes, you're right most of the time, but more subtle forms of corruption could occur and create later problems or crashes, which are more difficult to debug. So yes, I agree it's a small advantage, but still it is.

> The initialization described in this bug is only for the kernel's stack
> cookies.  The kernel is essentially a privileged process that lives for
> the entire boot.  As far as I know, there is no way to safely change the
> stack guard cookie values of the running kernel.

Yes. I was just thinking about starting with whatever (static or not) canary and then later on have new kernel threads use a new random one. That's just speculation at this point. I'd have to dive into code to see if it's realistic or not. I'm now not sure if it's even worth it, fixing the original limitation may be a better investment anyway.

Thanks a lot for the info you provided, it gives some starting points to better understand SSP in FreeBSD.

Indeed, I'll probably choose (1) for now, since it is so easy to do, and come back to (2) when I have enough time. Leaving this bug open for now.
Comment 8 Olivier Certner freebsd_committer freebsd_triage 2022-03-09 16:54:10 UTC
(In reply to Olivier Certner from comment #7)

Unfortunately, (1) is not enough... I only get "random: randomdev_wait_until_seeded unblock wait" messages forever. Indeed, network domains initialization happens at level SI_SUB_PROTO_DOMAINS, whereas the "random_harvestq" thread is launched later (level SI_SUB_KICK_SCHEDULER). This is true in CURRENT as well.

So I guess I must live with constant boot entropy for now.
Comment 9 Conrad Meyer freebsd_committer freebsd_triage 2022-03-09 17:09:07 UTC
> Indeed, network domains initialization happens at level SI_SUB_PROTO_DOMAINS, whereas the "random_harvestq" thread is launched later (level SI_SUB_KICK_SCHEDULER). 

Yeah, that's unfortunate.  I don't think domain registration actually needs random?  But there is a lot of crap in SI_SUB_PROTO_DOMAIN that isn't DOMAIN_SET().  If you want to pursue it, identifying the stack(s) blocking on random and moving them after KICK_SCHEDULER would be a valuable contribution to FreeBSD.
Comment 10 Olivier Certner freebsd_committer freebsd_triage 2022-03-09 21:35:21 UTC
(In reply to Conrad Meyer from comment #9)

Forgot to mention the example of domain init causing a call to arc4rand() I stumbled upon: ip_init => ip_reass, which initializes some hash seed that serves to hash fragments. I suspect the goal here is to make it hard for an attacker to predict which frags end up in which bucket, so that it cannot degrade the hash table's access performance without a more involved attack. Probably this could be avoided by using another, more complex, data structure. Maybe simply delaying this seed's init is possible.

> If you want to pursue it, identifying the stack(s) blocking on random and
> moving them after KICK_SCHEDULER would be a valuable contribution to FreeBSD.

I'll try to pursue that indeed, by recompiling a kernel with a deterministic frag seed, and see what other calls to random exist. In the end, it might not be possible to easily push calls to random after KICK_SCHEDULER without more involved changes. We'll see.

Don't have much time now, but expect to have a lot in approx two months. Then, the ability to boot without an entropy seed file should be one of my main priorities. In the meantime, I'll report about experiments here.

Thanks.
Comment 11 Conrad Meyer freebsd_committer freebsd_triage 2022-03-09 22:37:05 UTC
(In reply to Olivier Certner from comment #10)
> ip_reass_init

Cool.  Yeah, I suspect the hash seed could be initialized later in boot.

> I'll try to pursue that

Awesome!

> I'll report about experiments here.

Thanks.  Here is fine, or feel free to reach out to me or csprng@ directly.  Either way.
Comment 12 Olivier Certner freebsd_committer freebsd_triage 2022-03-17 23:10:49 UTC
FYI, the stacks to all calls to arc4rand(om) (omitted) I see at boot on a custom kernel (which resembles GENERIC, but notably rules out IPv6):

__stack_chk_init
vnet_register_sysinit -> domain_init -> ip_init -> ipreass_init
vnet_register_sysinit -> domain_init -> tcp_init -> syncache_init
vnet_register_sysinit -> domain_init -> tcp_init -> tcp_hc_init
vnet_register_sysinit -> domain_init -> tcp_init -> tcp_fastopen_init
vnet_register_sysinit -> ipid_sysinit
fork_trampoline -> fork_exit -> start_init -> vfs_mountroot -> vfs_mount_alloc
fork_trampoline -> fork_exit -> start_init -> kern_execve -> exec_copyout_strings

After patching all these occurences, the system finishes booting and seems functional.
Comment 13 Olivier Certner freebsd_committer freebsd_triage 2022-03-18 00:06:18 UTC
This is on a relatively recent 13-STABLE. I'll repeat this process with CURRENT's GENERIC kernel when possible.

I'm wondering if it is possible, or even desirable, to initialize the seeds later on. I've not taken the time yet to figure out at which point in the boot TCP connections may be established, and I'm not sure it's not before KICK_SCHEDULER. Moreover, splitting the initialization code is a cognitive burden, so it would be best to avoid it (if possible).

The last two stacks above correspond to random values that are generated each time a new FS is mounted/a process started. They happen after KICK_SCHEDULER. However, in my tests, I introduced code to make the kernel panic if, on the first call to the random dev, seeding doesn't happen within 10s (to catch the earlier stacks, where anyway no harvesting takes place, so the random calls block indefinitely). And I got panics indeed for these two stacks as well, so entropy isn't accumulating "fast enough" (would be interesting to see how long the calls would block before enough entropy is available; waiting tens of seconds might be tolerable at boot in some scenarios, but probably not much more).

If some entropy source could be made available very early, all these considerations and problems would be avoided.
Comment 14 Conrad Meyer freebsd_committer freebsd_triage 2022-03-18 03:55:05 UTC
Hey Olivier, this is great.  I think it's unlikely we need TCP connections before the scheduler is up.  Splitting the initialization code is not especially burdensome either.

> If some entropy source could be made available very early, all these considerations and problems would be avoided.

As mentioned briefly in comment #4, we could attempt to poll available random sources on demand, before the scheduler comes up.  This only works for builtin sources or drivers loaded by loader, though.