Bug 230808

Summary: rand_harvestq high CPU utilization
Product: Base System Reporter: Danilo Egea Gondolfo <danilo>
Component: kernAssignee: Mark Murray <markm>
Status: Closed FIXED    
Severity: Affects Many People CC: ali.abdallah, cem, delphij, eadler, eugen, jmg, markj, markm, rwmaillists
Priority: --- Flags: eugen: mfc-stable12?
eugen: mfc-stable11?
Version: 11.2-STABLE   
Hardware: Any   
OS: Any   
Description Flags
Patch to limit fast entropy none

Description Danilo Egea Gondolfo freebsd_committer 2018-08-21 19:20:42 UTC
I'm seeing a high CPU utilization by the thread rand_harvestq.

It's easily reproducible in my system, just reading from /dev/urandom for 2 seconds is enough to trigger this issue. The rand_harvestq start eating CPU for minutes and the system (at least the graphical interface) start freezing for few seconds.

After some minutes the thread stop the high CPU consumption and the system stop freezing.

   19 root       -16    -      0    16K CPU3     3   6:31 100.00% rand_harvestq
   11 root       155 ki31      0    64K CPU2     2 327:17  98.30% idle{idle: cpu2}
   11 root       155 ki31      0    64K RUN      3 323:58  69.61% idle{idle: cpu3}
   11 root       155 ki31      0    64K CPU0     0 325:32  69.01% idle{idle: cpu0}
   11 root       155 ki31      0    64K RUN      1 327:07  58.61% idle{idle: cpu1}

   19 root       -16    -      0    16K CPU1     1  27:50  99.98% rand_harvestq
   11 root       155 ki31      0    64K CPU2     2 359:58  98.65% idle{idle: cpu2}
   11 root       155 ki31      0    64K RUN      3 351:16  98.41% idle{idle: cpu3}
   11 root       155 ki31      0    64K CPU0     0 357:59  94.45% idle{idle: cpu0}
   11 root       155 ki31      0    64K RUN      1 345:23   4.51% idle{idle: cpu1}

   19 root       -16    -      0    16K CPU3     3  28:16 100.00% rand_harvestq
   11 root       155 ki31      0    64K CPU0     0 358:24  97.82% idle{idle: cpu0}
   11 root       155 ki31      0    64K RUN      2 360:23  97.78% idle{idle: cpu2}
   11 root       155 ki31      0    64K RUN      3 351:36  70.94% idle{idle: cpu3}
   11 root       155 ki31      0    64K CPU1     1 345:31  25.92% idle{idle: cpu1}

More information about my setup

$ uname -a
FreeBSD capeta 12.0-ALPHA2 FreeBSD 12.0-ALPHA2 #14 r337973M: Fri Aug 17 13:03:01 -03 2018     danilo@capeta:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG  amd64

$ sysctl kern.random
kern.random.fortuna.minpoolsize: 64
kern.random.harvest.mask_bin: 00000010000000111111111
kern.random.harvest.mask: 66047
kern.random.random_sources: 'Intel Secure Key RNG'
Comment 1 Conrad Meyer freebsd_committer 2018-08-21 20:34:46 UTC
The problem seems to be very excessive attempts to gather more entropy at runtime long after initial seeding.  The problem was introduced in 2015 if not earlier and was even pointed out during code review by jmg@, but never addressed by the submitter (markm@).

Each random_read updates a "read_rate" counter.  This counter tracks some linear function of the number of bytes read out of the random devices since the last random_sources_feed (1 read_rate for every 4 bytes of output, plus a Y intercept of 32).

random_kthread ("rand_harvestq") -> random_sources_feed atomically pulls the read_rate value as local_read_rate and zeroes the global counter.  It then multiplies the value it found, plus one, by the number of pools (32 for Fortuna, 2 for Yarrow, per a comment directly above) and loops that number of times, polling 8 bytes at a time from the configured "fast random source" (RDRAND on most modern amd64 computers) and feeding it into the random implementation (Fortuna by default on FreeBSD for at least 11.0+).

The major problem is that 32 * read_rate is totally stupidly large.  At the limit, for every additional *4 bytes* of /dev/random output, we attempt to read *256 bytes* out of RDRAND (for Fortuna) in 32 separate invocations.  RDRAND may be fast, but this is an insane and unnecessary amplification.

The secondary problem is that each random_sources_feed loop iteration drops and retakes the global Fortuna mutex, which will contest with the read path, especially on WITNESS builds.  Again, at the limit, for every *four bytes* of randomdev output, we will call into Fortuna (via random_harvest_direct()) an additional *32 times*, each time taking the global Fortuna mutex.

(This could be refactored to reduce the number of individual invocations even without changing the entropy collection rate, but that would be unnecessary if we can just reduce the number of loop iterations.)

There is perhaps a little value in continuing to harvest entropy after seeding, but to the best of my knowledge there is no reason it must be scaled by the number of bytes read, and even if it must, the scale between random source and output (64:1) seems to be very far off.  I have not read the Fortuna paper yet and would at least skim it before making any major change.
Comment 2 Conrad Meyer freebsd_committer 2018-08-22 04:37:47 UTC
(At least) one other oddity here:

random_fortuna_pre_read attempts to limit reseeds to once every 100ms (per Fortuna paper).  However, it grabs the current time in sbintime_t units and compares against hz/10.  hz is one second in units of ticks, which are a function of softclock frequency...

The right comparison would be:

    (now - fortuna_state.fs_lasttime > SBT_1S/10)

Since hz is commonly 1000 and SBT_1S/10 is ~400 million, this check will nearly always pass when it should not (~99.99998% false positive).  That's more or less broken.
Comment 3 Conrad Meyer freebsd_committer 2018-08-22 16:06:54 UTC
(In reply to Conrad Meyer from comment #2)
This feature is supposed to restrict attackers from forcing overly frequent reseeds, so this actually seems like a security weakness of the implementation:

"So far we've ignored the fact that we only have 32 pools and that maybe pool P_31 does not collect enough randomness between reseeds to recover from a compromise.  This could happen if the attacker injected so many random events that 2^32 reseeds would occur before the random sources that the attacker has no knowledge about have generated 2^13 bits of entropy.  **This is unlikely, but to stop the attacker from even trying, we will limit the speed of the reseeds.  A reseed will only be performed if the previous reseed was more than 100 ms ago.** This limits the reseed rate to 10 reseeds per second, **so it will take more than 13 years before P_32 would ever have been used, had it existed."

Meanwhile, it seems like we only need collect 8192 bits of entropy — total, 256 bits per pool — to recover from a compromise.  So any collection above 32*32 bytes seems excessive for Fortuna.  (All of this seems covered in §9.5.2, "Pools" of the fortuna.pdf available from Schneier's website.)
Comment 4 Conrad Meyer freebsd_committer 2018-08-22 17:56:18 UTC
I can't promise any immediate time to work on this.  We should prioritize fixing the 100 ms Fortuna reseed timer as a security issue (not exploitable, but a broken defense-in-depth mitigation).  The other stuff is "just" performance optimization.
Comment 5 Mark Murray freebsd_committer 2018-08-22 18:11:14 UTC
I can do this if you like. I have the time, having just finished a very large catch-up.

Comment 6 Mark Murray freebsd_committer 2018-08-22 18:16:54 UTC
I concur on the read-rate. It was me being conservative in the beginning when "bit-distilling" was fashionable, but I am persuaded by the arguments presented to not over-do it.

The timing is just a stupid mistake of mine. I can fix that.

Comment 7 Conrad Meyer freebsd_committer 2018-08-22 18:26:28 UTC
(In reply to Mark Murray from comment #6)
Sure, please go ahead.  I'm happy to help review.
Comment 8 Mark Murray freebsd_committer 2018-08-22 19:26:48 UTC
I'm testing a partial fix now.
Comment 9 rwmaillists 2018-08-22 19:38:32 UTC
Created attachment 196446 [details]
Patch to limit fast entropy

I wrote a simple patch to limit the entropy if anyone is interested.
Comment 10 Conrad Meyer freebsd_committer 2018-08-22 21:35:09 UTC
(In reply to rwmaillists from comment #9)
Numerically, this looks like a correct, conservative limit to me (for Fortuna)[1].

We get 8 loop iterations per pool and read 8 bytes from the configured fast random source each iteration.  That's 64 bytes of raw random source data, per pool, per feed.  64B = 512 bits.

The Fortuna algorithm needs 256 bits of entropy per pool to securely reseed against an attacker that has managed to leak a previous internal state, so 512 bits is nominally overkill by a factor of two.  However, this gives us a 2x margin that allows RDRAND and similar fast entropy sources to be slightly less than perfectly random while still maintaining Fortuna's 128-bit ish unpredictability guarantees for future results *even after compromise of a previous internal state*.

I think the 2x margin is a reasonable amount.  The assumption with a 2x margin is that fast random sources must provide at least 4 bits of entropy per byte.  If our fast random source provides less than 4 bits per byte of entropy, that is kind of a problem and we should revisit that source's eligibility[2].


That said, I think the Fortuna paper's discussion of 8192 bits is assuming those 8192 bits of entropy slowly collect over time from less speedy random source while an attacker is continually reading from the device and forcing reseeds every 100 ms.  If we know we have a fast, high-quality source of real entropy, I think we could get away with just feeding pool zero, perhaps with a small 2x margin on raw random source bits.  Certainly if the attacker controls our fast random source today it does not matter whether we use it to fill just P_0 or all 32 Fortuna pools.

[1]: Fortuna was the default for 11.x, right?  Can we remove Yarrow for 12.0?

[2]: (Monitoring of entropy sources' real entropy over time is a separate topic and one that might be interesting to pursue eventually.)
Comment 11 Mark Murray freebsd_committer 2018-08-23 10:07:13 UTC
Answering CEM's footnotes.

[1] Correct; Fortuna is the "new" default. I'm not sure we can remove Yarrow without more deprecation warning. It is still smaller than Fortuna, and folks with SoCs may want to keep it. I'm happy to mark Yarrow as deprecated and kill in 13, though.

[2] Yarrow proved this to be very hard indeed, which is one of the reasons that Fortuna was invented.
Comment 12 Mark Murray freebsd_committer 2018-08-23 10:10:18 UTC
(In reply to Conrad Meyer from comment #10)

I don't like the idea of only filling pool#0 with he fast entropy. Fortuna's main point is recovery from compromise by always having "old" entropy available, and only filling pool#0 breaks this.
Comment 13 Mark Murray freebsd_committer 2018-08-23 10:29:54 UTC
(In reply to rwmaillists from comment #9)

I like your fix more than I like mine! Thanks!
Comment 14 John-Mark Gurney freebsd_committer 2018-08-23 15:45:04 UTC
Sorry, the rate of collection needs to be measured in bits per 100s of bytes if not KBs of data read from /dev/random.  We already are collecting millions of bits of entropy per second, we don't need to force additional collection based upon the read rate.  All this collection is just a waste of energy and CPU cycles, and provides no significant improvement in security.

FreeBSD need to spend time educating people who use FreeBSD on proper ways to get the initial seed strong.  That we don't issue warnings about reseeding VM and ISO images is a bigger issue than recovering from a pool compromise.  That we don't provide guidance for embedded systems, and warnings on systems that don't have a TRNG driver needs to be addressed.
Comment 15 Conrad Meyer freebsd_committer 2018-08-23 16:30:29 UTC
(In reply to Mark Murray from comment #12)
> I don't like the idea of only filling pool#0 with he fast entropy. Fortuna's
> main point is recovery from compromise by always having "old" entropy
> available, and only filling pool#0 breaks this.

Yes, I agree we should continue filling all pools.  The cost is something like 4 microseconds per invocation of refeed to gather 2048 bytes from RDRANG, on IVB, if we are using the 64-bit instruction.  If we can limit refeeds to once every 100 ms, the 4 us is definitely de minimis.

(In reply to Mark Murray from comment #11)
It's not too late to mark Yarrow gone_in(12) in stable/11 :-).  When you say "bigger," can you quantify that?  It seems to me the additional state is only: 30 additional SHA-256 contexts (960 B), and a sbintime_t (2 B) to track last reseed?  That seems so de minimis to not really worry about it for embedded systems.  The scale of small system FreeBSD runs on at all is in the range of 32-64 *megabytes* of RAM.  962 bytes extra to have a decent random algorithm does not seem burdensome.  Even on 32MB that's +0.003%.

(In reply to John-Mark Gurney from comment #14)
I think one refeed (2048 B from fast random source) per reseed interval (100ms) is fairly reasonable.  It may still be overkill, but is only 20 kB/s instead of GB/s.

The other issues you raise are very important, don't get me wrong, but IMO they are orthogonal to the /dev/random implementation issues raised in this PR.
Comment 16 commit-hook freebsd_committer 2018-08-24 14:54:03 UTC
A commit references this bug:

Author: markm
Date: Fri Aug 24 14:53:43 UTC 2018
New revision: 338292
URL: https://svnweb.freebsd.org/changeset/base/338292

  Fix braino of mine where the reseeds would happen far too often,
  making the kernel process way too busy.

  PR:             230808
  Submitted by:   Conrad Meyer <cem@FreeBSD.org>
  Reported by:    Danilo Egea Gondolfo <danilo@FreeBSD.org>
  Reviewed by:	cem,delphij
  Approved by:	re(rgrimes)
  Approved by:	so(delphij)
  MFC after:      1 Month
  Security:	Yes
  Differential Revision:	https://reviews.freebsd.org/D16872

Comment 17 commit-hook freebsd_committer 2018-08-24 14:54:06 UTC
A commit references this bug:

Author: markm
Date: Fri Aug 24 14:53:47 UTC 2018
New revision: 338293
URL: https://svnweb.freebsd.org/changeset/base/338293

  Limit the amount of "fast" entropy. We don't need nearly as much
  for security, and the excess just slows things down badly.

  PR:             230808
  Submitted by:   rwmaillists@googlemail.com, but tweeked by me
  Reported by:    Danilo Egea Gondolfo <danilo@FreeBSD.org>
  Reviewed by:	cem,delphij
  Approved by:	re(rgrimes)
  Approved by:	so(delphij)
  MFC after:      1 Month
  Differential Revision:	https://reviews.freebsd.org/D16873

Comment 18 Danilo Egea Gondolfo freebsd_committer 2018-08-24 17:45:24 UTC
I'm running r338294 and it seems the problem is gone now.
Comment 19 Mark Murray freebsd_committer 2018-08-24 17:47:01 UTC
(In reply to Danilo Egea Gondolfo from comment #18)

Thank you, closing the ticket.

Comment 20 Conrad Meyer freebsd_committer 2018-08-24 20:24:47 UTC
Thanks for the quick discussion and turnaround on this, Mark.
Comment 21 Mark Murray freebsd_committer 2018-08-24 22:55:14 UTC
(In reply to Conrad Meyer from comment #20)

You are most welcome, Conrad!
Comment 22 Eugene Grosbein freebsd_committer 2019-04-05 04:10:58 UTC
The problem is not fixed for STABLE branches yet. For example, FreeBSD 11.2-RELEASE suffers from this problem badly while running as VM-guest due to simple command:

$ dd if=/dev/urandom bs=1m of=random.bin count=1024

The system becomes almost unresponsible:

    7 root       -16    -     0K    16K -       12:57  99.13% [rand_harvestq]

Please consider merging the fix before 11.3-RELEASE that is already scheduled.
Comment 23 Eugene Grosbein freebsd_committer 2019-04-05 04:14:02 UTC
I'm ready to test patches for stable/11 if needed.

Also, here are (default) settings of my system in the VM:

# sysctl kern.random
kern.random.fortuna.minpoolsize: 64
kern.random.harvest.mask_bin: 00111111111
kern.random.harvest.mask: 511
kern.random.random_sources: 'Intel Secure Key RNG'
Comment 24 Mark Murray freebsd_committer 2019-04-05 10:46:27 UTC
> The problem is not fixed for STABLE branches yet.

Please could you apply the change in https://reviews.freebsd.org/D16872 by hand - it's really small!. When you confirm that it works in your environment, I'll merge.
Comment 25 Mark Murray freebsd_committer 2019-04-05 10:56:41 UTC
Apologies - please also add the change in https://reviews.freebsd.org/D16873. It's not quite so trivial as the previous, but still shot enough of a quick paste.

Comment 26 Eugene Grosbein freebsd_committer 2019-04-05 22:59:56 UTC
I've updated my system to stable/11 with both patches applied cleanly and problem's gone. Please merge them.
Comment 27 commit-hook freebsd_committer 2019-04-06 09:00:39 UTC
A commit references this bug:

Author: markm
Date: Sat Apr  6 09:00:06 UTC 2019
New revision: 345981
URL: https://svnweb.freebsd.org/changeset/base/345981

  Backport fixes from FreeBSD-12 to help the random(4) device thread
  not overwhelm the OS:

  a) Use the correct symbolic constant when calculating 10'ths of a
  second. This means that expensive reseeds happen at ony 1/10 Hz,
  not some kHz.

  b) Rate limit internal high-rate harveting efforts. This stops the
  harvesting thread from total overkilling the high-grade entropy-
  gathering work, while still being very conservatively safe.

  PR:		230808
  Reported by:	danilo,eugen
  Tested by:	eugen
  Approved by:	so (blanket permission granted as I am the authour of this code)
  Relnotes:	Yes

Comment 28 Mark Murray freebsd_committer 2019-04-06 09:02:19 UTC
Fix committed.