|Summary:||Revisit decision to not block read_random(9) on being seeded|
|Product:||Base System||Reporter:||Conrad Meyer <cem>|
|Component:||kern||Assignee:||Conrad Meyer <cem>|
|Severity:||Affects Only Me||CC:||emaste, jmg, markm, secteam|
Description Conrad Meyer 2018-08-25 00:47:18 UTC
read_random() is used, mostly without error checking, in a lot of very sensitive places in the kernel — including seeding the even more widely used arc4random(9). I don't think this is a sane default. I am not really convinced it makes sense for any use, but at least most uses should block until the system RNG is seeded. Currently none do, and very few check that any random data was actually returned. That is a problem. We should change read_random's return type to 'void' to match most use, and have it block until the RNG is seeded. This is the most expedient way to make the behavior match the name. Optionally, if it is really needed, add a second API, read_random_dangerous_I_know_what_I_am_doing() that allows non-blocking reads, *and tag the return value __result_use_check*. For those few consumers that truly need nonblocking access to RNG during early boot, *and are prepared to deal with not having random data available*, update to use this dangerous API. The tradeoffs here are: Pro: 1. (Especially if the 2nd API is not added,) we have rigorous assurance that no sensitive keying information generated by these APIs is highly predictable. Cons: 1. May block availability of some subsystems if the system does not have boot-time entropy. Counter arguments: I think this is solved in two parts: one, per the current comment in chacha20_randomstir: "The answer is to make sure there is an entropy stash at shutdown time." But that is a bit glib and assumes nothing goes wrong with the filesystem, for example. The second prong is that core system services need to be able to start without RNG available, and initialize keys later when the RNG is available. We should at least be able to proceed to userspace init + rc with a blocked RNG. This should be easy to test with a tunable + sysctl.
Comment 1 Mark Murray 2018-08-25 09:02:42 UTC
(In reply to Conrad Meyer from comment #0) Good analysis. I suspect the second "prong" (start system services) without a seeded CSPRNG is going to be the more difficult, as there is a tendency for lockups when unwise consumers of random(4).
Comment 2 Mark Murray 2018-08-25 11:04:19 UTC
(In reply to Conrad Meyer from comment #0) Give your two-prong analysis, it may be worth the trouble to split this PR into two sub-PRs, or make this depend on two more. The in-kernel API and the startup scripts will need separate treatment.
Comment 3 Conrad Meyer 2018-08-25 21:40:30 UTC
Let's track just the kernel pieces in this PR. We can raise other PRs for intolerant userspace services on an as-needed basis.
Comment 5 commit-hook 2019-04-15 18:41:10 UTC
A commit references this bug: Author: cem Date: Mon Apr 15 18:40:38 UTC 2019 New revision: 346250 URL: https://svnweb.freebsd.org/changeset/base/346250 Log: random(4): Block read_random(9) on initial seeding read_random() is/was used, mostly without error checking, in a lot of very sensitive places in the kernel -- including seeding the widely used arc4random(9). Most uses, especially arc4random(9), should block until the device is seeded rather than proceeding with a bogus or empty seed. I did not spy any obvious kernel consumers where blocking would be inappropriate (in the sense that lack of entropy would be ok -- I did not investigate locking angle thoroughly). In many instances, arc4random_buf(9) or that family of APIs would be more appropriate anyway; that work was done in r345865. A minor cleanup was made to the implementation of the READ_RANDOM function: instead of using a variable-length array on the stack to temporarily store all full random blocks sufficient to satisfy the requested 'len', only store a single block on the stack. This has some benefit in terms of reducing stack usage, reducing memcpy overhead and reducing devrandom output leakage via the stack. Additionally, the stack block is now safely zeroed if it was used. One caveat of this change is that the kern.arandom sysctl no longer returns zero bytes immediately if the random device is not seeded. This means that FreeBSD-specific userspace applications which attempted to handle an unseeded random device may be broken by this change. If such behavior is needed, it can be replaced by the more portable getrandom(2) GRND_NONBLOCK option. On any typical FreeBSD system, entropy is persisted on read/write media and used to seed the random device very early in boot, and blocking is never a problem. This change primarily impacts the behavior of /dev/random on embedded systems with read-only media that do not configure "nodevice random". We toggle the default from 'charge on blindly with no entropy' to 'block indefinitely.' This default is safer, but may cause frustration. Embedded system designers using FreeBSD have several options. The most obvious is to plan to have a small writable NVRAM or NAND to persist entropy, like larger systems. Early entropy can be fed from any loader, or by writing directly to /dev/random during boot. Some embedded SoCs now provide a fast hardware entropy source; this would also work for quickly seeding Fortuna. A 3rd option would be creating an embedded-specific, more simplistic random module, like that designed by DJB in  (this design still requires a small rewritable media for forward secrecy). Finally, the least preferred option might be "nodevice random", although I plan to remove this in a subsequent revision. To help developers emulate the behavior of these embedded systems on ordinary workstations, the tunable kern.random.block_seeded_status was added. When set to 1, it blocks the random device. I attempted to document this change in random.4 and random.9 and ran into a bunch of out-of-date or irrelevant or inaccurate content and ended up rototilling those documents more than I intended to. Sorry. I think they're in a better state now. PR: 230875 Reviewed by: delphij, markm (earlier version) Approved by: secteam(delphij), devrandom(markm) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D19744 Changes: head/share/man/man4/random.4 head/share/man/man9/random.9 head/sys/dev/random/fortuna.c head/sys/dev/random/random_harvestq.c head/sys/dev/random/random_infra.c head/sys/dev/random/randomdev.c head/sys/dev/random/randomdev.h head/sys/kern/kern_mib.c head/sys/libkern/arc4random.c head/sys/sys/random.h
Comment 6 Conrad Meyer 2019-04-15 19:38:45 UTC
I have drafted a patch to remove !DEV_RANDOM entirely and am tinderboxing it now. I'll send that out for review to the same audience later. I think this PR is addressed.
Comment 7 commit-hook 2019-04-18 20:49:19 UTC
A commit references this bug: Author: cem Date: Thu Apr 18 20:48:56 UTC 2019 New revision: 346358 URL: https://svnweb.freebsd.org/changeset/base/346358 Log: random(4): Restore availability tradeoff prior to r346250 As discussed in that commit message, it is a dangerous default. But the safe default causes enough pain on a variety of platforms that for now, restore the prior default. Some of this is self-induced pain we should/could do better about; for example, programmatic CI systems and VM managers should introduce entropy from the host for individual VM instances. This is considered a future work item. On modern x86 and Power9 systems, this may be wholly unnecessary after D19928 lands (even in the non-ideal case where early /boot/entropy is unavailable), because they have fast hardware random sources available early in boot. But D19928 is not yet landed and we have a host of architectures which do not provide fast random sources. This change adds several tunables and diagnostic sysctls, documented thoroughly in UPDATING and sys/dev/random/random_infra.c. PR: 230875 (reopens) Reported by: adrian, jhb, imp, and probably others Reviewed by: delphij, imp (earlier version), markm (earlier version) Discussed with: adrian Approved by: secteam(delphij) Relnotes: yeah Security: related Differential Revision: https://reviews.freebsd.org/D19944 Changes: head/UPDATING head/sys/dev/random/random_infra.c head/sys/dev/random/randomdev.c head/sys/dev/random/randomdev.h head/sys/libkern/arc4random.c head/sys/mips/conf/PB92 head/sys/sys/param.h
Comment 8 Conrad Meyer 2019-04-18 20:49:48 UTC
Reopening per logical revert in r346358.
Comment 9 Conrad Meyer 2019-04-18 20:57:07 UTC
So, lessons learned: 1. We have difficulty providing /boot/entropy in a variety of scenarios, such as: a. CI images b. Embedded systems that do not use loader (MIPS, RISCV, PPC, ARM, ARM64, ...) c. Release VM images and installers 2. There is potential concern about availability of /boot/entropy even in systems where it would normally be provided, such as "anything bad happens to the root filesystem" on x86. 3. There is concern about the implied blocking semantic as it relates to deadlocks. 4. Stack cookies initializes itself very early after random initializes (much less is seeded); this is maybe problematic, because they are static for the rest of the kernel runtime. Perhaps they could be initialized later? It is unclear. For now we have worked around it by using static cookies if random isn't seeded. For (2) and some subcases of (1), I have a review out to make fast random sources available sooner and use them to become fully seeded early in boot: https://reviews.freebsd.org/D19928 For (3), I have a review out to add WITNESS warnings when potentially blocking random APIs are accessed with locks held: https://reviews.freebsd.org/D19948 . It is somewhat noisy, because it does not seem to track already-hit instances. There are a couple instances identified already by that change that can be addressed in a further change.