Bug 230875 - Revisit decision to not block read_random(9) on being seeded
Summary: Revisit decision to not block read_random(9) on being seeded
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-25 00:47 UTC by Conrad Meyer
Modified: 2024-10-20 16:55 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Conrad Meyer freebsd_committer freebsd_triage 2018-08-25 00:47:18 UTC
read_random() is used, mostly without error checking, in a lot of very sensitive places in the kernel — including seeding the even more widely used arc4random(9).

I don't think this is a sane default.  I am not really convinced it makes sense for any use, but at least most uses should block until the system RNG is seeded.  Currently none do, and very few check that any random data was actually returned.  That is a problem.

We should change read_random's return type to 'void' to match most use, and have it block until the RNG is seeded.  This is the most expedient way to make the behavior match the name.

Optionally, if it is really needed, add a second API, read_random_dangerous_I_know_what_I_am_doing() that allows non-blocking reads, *and tag the return value __result_use_check*.  For those few consumers that truly need nonblocking access to RNG during early boot, *and are prepared to deal with not having random data available*, update to use this dangerous API.

The tradeoffs here are:

Pro:

1. (Especially if the 2nd API is not added,) we have rigorous assurance that no sensitive keying information generated by these APIs is highly predictable.

Cons:

1. May block availability of some subsystems if the system does not have boot-time entropy.

Counter arguments: I think this is solved in two parts: one, per the current comment in chacha20_randomstir: "The answer is to make sure there is an entropy stash at shutdown time."  But that is a bit glib and assumes nothing goes wrong with the filesystem, for example.

The second prong is that core system services need to be able to start without RNG available, and initialize keys later when the RNG is available.  We should at least be able to proceed to userspace init + rc with a blocked RNG.  This should be easy to test with a tunable + sysctl.
Comment 1 Mark Murray freebsd_committer freebsd_triage 2018-08-25 09:02:42 UTC
(In reply to Conrad Meyer from comment #0)

Good analysis.

I suspect the second "prong" (start system services) without a seeded CSPRNG is going to be the more difficult, as there is a tendency for lockups when unwise consumers of random(4).
Comment 2 Mark Murray freebsd_committer freebsd_triage 2018-08-25 11:04:19 UTC
(In reply to Conrad Meyer from comment #0)

Give your two-prong analysis, it may be worth the trouble to split this PR into two sub-PRs, or make this depend on two more. The in-kernel API and the startup scripts will need separate treatment.
Comment 3 Conrad Meyer freebsd_committer freebsd_triage 2018-08-25 21:40:30 UTC
Let's track just the kernel pieces in this PR.  We can raise other PRs for intolerant userspace services on an as-needed basis.
Comment 4 Mark Murray freebsd_committer freebsd_triage 2018-08-25 22:01:02 UTC
(In reply to Conrad Meyer from comment #3)

Agreed!

M
Comment 5 commit-hook freebsd_committer freebsd_triage 2019-04-15 18:41:10 UTC
A commit references this bug:

Author: cem
Date: Mon Apr 15 18:40:38 UTC 2019
New revision: 346250
URL: https://svnweb.freebsd.org/changeset/base/346250

Log:
  random(4): Block read_random(9) on initial seeding

  read_random() is/was used, mostly without error checking, in a lot of
  very sensitive places in the kernel -- including seeding the widely used
  arc4random(9).

  Most uses, especially arc4random(9), should block until the device is seeded
  rather than proceeding with a bogus or empty seed.  I did not spy any
  obvious kernel consumers where blocking would be inappropriate (in the
  sense that lack of entropy would be ok -- I did not investigate locking
  angle thoroughly).  In many instances, arc4random_buf(9) or that family
  of APIs would be more appropriate anyway; that work was done in r345865.

  A minor cleanup was made to the implementation of the READ_RANDOM function:
  instead of using a variable-length array on the stack to temporarily store
  all full random blocks sufficient to satisfy the requested 'len', only store
  a single block on the stack.  This has some benefit in terms of reducing
  stack usage, reducing memcpy overhead and reducing devrandom output leakage
  via the stack.  Additionally, the stack block is now safely zeroed if it was
  used.

  One caveat of this change is that the kern.arandom sysctl no longer returns
  zero bytes immediately if the random device is not seeded.  This means that
  FreeBSD-specific userspace applications which attempted to handle an
  unseeded random device may be broken by this change.  If such behavior is
  needed, it can be replaced by the more portable getrandom(2) GRND_NONBLOCK
  option.

  On any typical FreeBSD system, entropy is persisted on read/write media and
  used to seed the random device very early in boot, and blocking is never a
  problem.

  This change primarily impacts the behavior of /dev/random on embedded
  systems with read-only media that do not configure "nodevice random".  We
  toggle the default from 'charge on blindly with no entropy' to 'block
  indefinitely.'  This default is safer, but may cause frustration.  Embedded
  system designers using FreeBSD have several options.  The most obvious is to
  plan to have a small writable NVRAM or NAND to persist entropy, like larger
  systems.  Early entropy can be fed from any loader, or by writing directly
  to /dev/random during boot.  Some embedded SoCs now provide a fast hardware
  entropy source; this would also work for quickly seeding Fortuna.  A 3rd
  option would be creating an embedded-specific, more simplistic random
  module, like that designed by DJB in [1] (this design still requires a small
  rewritable media for forward secrecy).  Finally, the least preferred option
  might be "nodevice random", although I plan to remove this in a subsequent
  revision.

  To help developers emulate the behavior of these embedded systems on
  ordinary workstations, the tunable kern.random.block_seeded_status was
  added.  When set to 1, it blocks the random device.

  I attempted to document this change in random.4 and random.9 and ran into a
  bunch of out-of-date or irrelevant or inaccurate content and ended up
  rototilling those documents more than I intended to.  Sorry.  I think
  they're in a better state now.

  PR:		230875
  Reviewed by:	delphij, markm (earlier version)
  Approved by:	secteam(delphij), devrandom(markm)
  Relnotes:	yes
  Differential Revision:	https://reviews.freebsd.org/D19744

Changes:
  head/share/man/man4/random.4
  head/share/man/man9/random.9
  head/sys/dev/random/fortuna.c
  head/sys/dev/random/random_harvestq.c
  head/sys/dev/random/random_infra.c
  head/sys/dev/random/randomdev.c
  head/sys/dev/random/randomdev.h
  head/sys/kern/kern_mib.c
  head/sys/libkern/arc4random.c
  head/sys/sys/random.h
Comment 6 Conrad Meyer freebsd_committer freebsd_triage 2019-04-15 19:38:45 UTC
I have drafted a patch to remove !DEV_RANDOM entirely and am tinderboxing it now.  I'll send that out for review to the same audience later.  I think this PR is addressed.
Comment 7 commit-hook freebsd_committer freebsd_triage 2019-04-18 20:49:19 UTC
A commit references this bug:

Author: cem
Date: Thu Apr 18 20:48:56 UTC 2019
New revision: 346358
URL: https://svnweb.freebsd.org/changeset/base/346358

Log:
  random(4): Restore availability tradeoff prior to r346250

  As discussed in that commit message, it is a dangerous default.  But the
  safe default causes enough pain on a variety of platforms that for now,
  restore the prior default.

  Some of this is self-induced pain we should/could do better about; for
  example, programmatic CI systems and VM managers should introduce entropy
  from the host for individual VM instances.  This is considered a future work
  item.

  On modern x86 and Power9 systems, this may be wholly unnecessary after
  D19928 lands (even in the non-ideal case where early /boot/entropy is
  unavailable), because they have fast hardware random sources available early
  in boot.  But D19928 is not yet landed and we have a host of architectures
  which do not provide fast random sources.

  This change adds several tunables and diagnostic sysctls, documented
  thoroughly in UPDATING and sys/dev/random/random_infra.c.

  PR:		230875 (reopens)
  Reported by:	adrian, jhb, imp, and probably others
  Reviewed by:	delphij, imp (earlier version), markm (earlier version)
  Discussed with:	adrian
  Approved by:	secteam(delphij)
  Relnotes:	yeah
  Security:	related
  Differential Revision:	https://reviews.freebsd.org/D19944

Changes:
  head/UPDATING
  head/sys/dev/random/random_infra.c
  head/sys/dev/random/randomdev.c
  head/sys/dev/random/randomdev.h
  head/sys/libkern/arc4random.c
  head/sys/mips/conf/PB92
  head/sys/sys/param.h
Comment 8 Conrad Meyer freebsd_committer freebsd_triage 2019-04-18 20:49:48 UTC
Reopening per logical revert in r346358.
Comment 9 Conrad Meyer freebsd_committer freebsd_triage 2019-04-18 20:57:07 UTC
So, lessons learned:

1. We have difficulty providing /boot/entropy in a variety of scenarios, such as:
   a. CI images
   b. Embedded systems that do not use loader (MIPS, RISCV, PPC, ARM, ARM64, ...)
   c. Release VM images and installers

2. There is potential concern about availability of /boot/entropy even in systems where it would normally be provided, such as "anything bad happens to the root filesystem" on x86.

3. There is concern about the implied blocking semantic as it relates to deadlocks.

4. Stack cookies initializes itself very early after random initializes (much less is seeded); this is maybe problematic, because they are static for the rest of the kernel runtime.  Perhaps they could be initialized later?  It is unclear.  For now we have worked around it by using static cookies if random isn't seeded.

For (2) and some subcases of (1), I have a review out to make fast random sources available sooner and use them to become fully seeded early in boot: https://reviews.freebsd.org/D19928

For (3), I have a review out to add WITNESS warnings when potentially blocking random APIs are accessed with locks held: https://reviews.freebsd.org/D19948 .  It is somewhat noisy, because it does not seem to track already-hit instances.  There are a couple instances identified already by that change that can be addressed in a further change.
Comment 10 commit-hook freebsd_committer freebsd_triage 2022-02-13 01:23:51 UTC
A commit in branch stable/12 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=e9c023a47aedb678c7fdb470f05cfed8dba2586e

commit e9c023a47aedb678c7fdb470f05cfed8dba2586e
Author:     Conrad Meyer <cem@FreeBSD.org>
AuthorDate: 2019-04-15 18:40:36 +0000
Commit:     David E. O'Brien <obrien@FreeBSD.org>
CommitDate: 2022-02-13 00:32:39 +0000

    random(4): Block read_random(9) on initial seeding

    read_random() is/was used, mostly without error checking, in a lot of
    very sensitive places in the kernel -- including seeding the widely used
    arc4random(9).

    Most uses, especially arc4random(9), should block until the device is seeded
    rather than proceeding with a bogus or empty seed.  I did not spy any
    obvious kernel consumers where blocking would be inappropriate (in the
    sense that lack of entropy would be ok -- I did not investigate locking
    angle thoroughly).  In many instances, arc4random_buf(9) or that family
    of APIs would be more appropriate anyway; that work was done in r345865.

    A minor cleanup was made to the implementation of the READ_RANDOM function:
    instead of using a variable-length array on the stack to temporarily store
    all full random blocks sufficient to satisfy the requested 'len', only store
    a single block on the stack.  This has some benefit in terms of reducing
    stack usage, reducing memcpy overhead and reducing devrandom output leakage
    via the stack.  Additionally, the stack block is now safely zeroed if it was
    used.

    One caveat of this change is that the kern.arandom sysctl no longer returns
    zero bytes immediately if the random device is not seeded.  This means that
    FreeBSD-specific userspace applications which attempted to handle an
    unseeded random device may be broken by this change.  If such behavior is
    needed, it can be replaced by the more portable getrandom(2) GRND_NONBLOCK
    option.

    On any typical FreeBSD system, entropy is persisted on read/write media and
    used to seed the random device very early in boot, and blocking is never a
    problem.

    This change primarily impacts the behavior of /dev/random on embedded
    systems with read-only media that do not configure "nodevice random".  We
    toggle the default from 'charge on blindly with no entropy' to 'block
    indefinitely.'  This default is safer, but may cause frustration.  Embedded
    system designers using FreeBSD have several options.  The most obvious is to
    plan to have a small writable NVRAM or NAND to persist entropy, like larger
    systems.  Early entropy can be fed from any loader, or by writing directly
    to /dev/random during boot.  Some embedded SoCs now provide a fast hardware
    entropy source; this would also work for quickly seeding Fortuna.  A 3rd
    option would be creating an embedded-specific, more simplistic random
    module, like that designed by DJB in [1] (this design still requires a small
    rewritable media for forward secrecy).  Finally, the least preferred option
    might be "nodevice random", although I plan to remove this in a subsequent
    revision.

    To help developers emulate the behavior of these embedded systems on
    ordinary workstations, the tunable kern.random.block_seeded_status was
    added.  When set to 1, it blocks the random device.

    I attempted to document this change in random.4 and random.9 and ran into a
    bunch of out-of-date or irrelevant or inaccurate content and ended up
    rototilling those documents more than I intended to.  Sorry.  I think
    they're in a better state now.

    PR:             230875
    Reviewed by:    delphij, markm (earlier version)
    Approved by:    secteam(delphij), devrandom(markm)
    Relnotes:       yes
    Differential Revision:  https://reviews.freebsd.org/D19744

    (cherry picked from commit 13774e82285e8d5eb3afeff63760725f747f8581)

 share/man/man4/random.4          | 345 ++++++++++++++-------------------------
 share/man/man9/random.9          | 197 +++++++++++-----------
 sys/dev/random/fortuna.c         |  16 +-
 sys/dev/random/random_harvestq.c |   5 -
 sys/dev/random/random_infra.c    |  20 ++-
 sys/dev/random/randomdev.c       | 113 ++++++++-----
 sys/dev/random/randomdev.h       |   2 +-
 sys/kern/kern_mib.c              |  11 +-
 sys/libkern/arc4random.c         |  32 +---
 sys/sys/random.h                 |   6 +-
 10 files changed, 329 insertions(+), 418 deletions(-)
Comment 11 Mark Linimon freebsd_committer freebsd_triage 2024-10-10 20:41:22 UTC
^Triage: committer's bit was taken into safekeeping some time ago.

To commenters: is there still any action needed to be taken on this PR?
Comment 12 Ed Maste freebsd_committer freebsd_triage 2024-10-11 14:41:55 UTC
AFAICT there's no significant change from the state documented in comment #9
Comment 13 Mark Murray freebsd_committer freebsd_triage 2024-10-20 16:54:30 UTC
In the general case - there will always be need for early unblocked randomness, while generating this will be more difficult the earlier it is required.

I think this PR can be closed as it has run its course. If any new improvements come to light, then a new PR can/should be opened with the specifics.
Comment 14 Mark Murray freebsd_committer freebsd_triage 2024-10-20 16:55:29 UTC
In the general case - there will always be need for early unblocked randomness, while generating this will be more difficult the earlier it is required.

I think this PR can be closed as it has run its course. If any new improvements come to light, then a new PR can/should be opened with the specifics.