I'm seeing a high CPU utilization by the thread rand_harvestq. It's easily reproducible in my system, just reading from /dev/urandom for 2 seconds is enough to trigger this issue. The rand_harvestq start eating CPU for minutes and the system (at least the graphical interface) start freezing for few seconds. After some minutes the thread stop the high CPU consumption and the system stop freezing. PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 19 root -16 - 0 16K CPU3 3 6:31 100.00% rand_harvestq 11 root 155 ki31 0 64K CPU2 2 327:17 98.30% idle{idle: cpu2} 11 root 155 ki31 0 64K RUN 3 323:58 69.61% idle{idle: cpu3} 11 root 155 ki31 0 64K CPU0 0 325:32 69.01% idle{idle: cpu0} 11 root 155 ki31 0 64K RUN 1 327:07 58.61% idle{idle: cpu1} PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 19 root -16 - 0 16K CPU1 1 27:50 99.98% rand_harvestq 11 root 155 ki31 0 64K CPU2 2 359:58 98.65% idle{idle: cpu2} 11 root 155 ki31 0 64K RUN 3 351:16 98.41% idle{idle: cpu3} 11 root 155 ki31 0 64K CPU0 0 357:59 94.45% idle{idle: cpu0} 11 root 155 ki31 0 64K RUN 1 345:23 4.51% idle{idle: cpu1} PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 19 root -16 - 0 16K CPU3 3 28:16 100.00% rand_harvestq 11 root 155 ki31 0 64K CPU0 0 358:24 97.82% idle{idle: cpu0} 11 root 155 ki31 0 64K RUN 2 360:23 97.78% idle{idle: cpu2} 11 root 155 ki31 0 64K RUN 3 351:36 70.94% idle{idle: cpu3} 11 root 155 ki31 0 64K CPU1 1 345:31 25.92% idle{idle: cpu1} More information about my setup $ uname -a FreeBSD capeta 12.0-ALPHA2 FreeBSD 12.0-ALPHA2 #14 r337973M: Fri Aug 17 13:03:01 -03 2018 danilo@capeta:/usr/obj/usr/src/amd64.amd64/sys/GENERIC-NODEBUG amd64 $ sysctl kern.random kern.random.fortuna.minpoolsize: 64 kern.random.harvest.mask_symbolic: PURE_RDRAND,[UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED kern.random.harvest.mask_bin: 00000010000000111111111 kern.random.harvest.mask: 66047 kern.random.random_sources: 'Intel Secure Key RNG'
The problem seems to be very excessive attempts to gather more entropy at runtime long after initial seeding. The problem was introduced in 2015 if not earlier and was even pointed out during code review by jmg@, but never addressed by the submitter (markm@). Each random_read updates a "read_rate" counter. This counter tracks some linear function of the number of bytes read out of the random devices since the last random_sources_feed (1 read_rate for every 4 bytes of output, plus a Y intercept of 32). random_kthread ("rand_harvestq") -> random_sources_feed atomically pulls the read_rate value as local_read_rate and zeroes the global counter. It then multiplies the value it found, plus one, by the number of pools (32 for Fortuna, 2 for Yarrow, per a comment directly above) and loops that number of times, polling 8 bytes at a time from the configured "fast random source" (RDRAND on most modern amd64 computers) and feeding it into the random implementation (Fortuna by default on FreeBSD for at least 11.0+). The major problem is that 32 * read_rate is totally stupidly large. At the limit, for every additional *4 bytes* of /dev/random output, we attempt to read *256 bytes* out of RDRAND (for Fortuna) in 32 separate invocations. RDRAND may be fast, but this is an insane and unnecessary amplification. The secondary problem is that each random_sources_feed loop iteration drops and retakes the global Fortuna mutex, which will contest with the read path, especially on WITNESS builds. Again, at the limit, for every *four bytes* of randomdev output, we will call into Fortuna (via random_harvest_direct()) an additional *32 times*, each time taking the global Fortuna mutex. (This could be refactored to reduce the number of individual invocations even without changing the entropy collection rate, but that would be unnecessary if we can just reduce the number of loop iterations.) There is perhaps a little value in continuing to harvest entropy after seeding, but to the best of my knowledge there is no reason it must be scaled by the number of bytes read, and even if it must, the scale between random source and output (64:1) seems to be very far off. I have not read the Fortuna paper yet and would at least skim it before making any major change.
(At least) one other oddity here: random_fortuna_pre_read attempts to limit reseeds to once every 100ms (per Fortuna paper). However, it grabs the current time in sbintime_t units and compares against hz/10. hz is one second in units of ticks, which are a function of softclock frequency... The right comparison would be: (now - fortuna_state.fs_lasttime > SBT_1S/10) Since hz is commonly 1000 and SBT_1S/10 is ~400 million, this check will nearly always pass when it should not (~99.99998% false positive). That's more or less broken.
(In reply to Conrad Meyer from comment #2) This feature is supposed to restrict attackers from forcing overly frequent reseeds, so this actually seems like a security weakness of the implementation: "So far we've ignored the fact that we only have 32 pools and that maybe pool P_31 does not collect enough randomness between reseeds to recover from a compromise. This could happen if the attacker injected so many random events that 2^32 reseeds would occur before the random sources that the attacker has no knowledge about have generated 2^13 bits of entropy. **This is unlikely, but to stop the attacker from even trying, we will limit the speed of the reseeds. A reseed will only be performed if the previous reseed was more than 100 ms ago.** This limits the reseed rate to 10 reseeds per second, **so it will take more than 13 years before P_32 would ever have been used, had it existed." Meanwhile, it seems like we only need collect 8192 bits of entropy — total, 256 bits per pool — to recover from a compromise. So any collection above 32*32 bytes seems excessive for Fortuna. (All of this seems covered in §9.5.2, "Pools" of the fortuna.pdf available from Schneier's website.)
I can't promise any immediate time to work on this. We should prioritize fixing the 100 ms Fortuna reseed timer as a security issue (not exploitable, but a broken defense-in-depth mitigation). The other stuff is "just" performance optimization.
I can do this if you like. I have the time, having just finished a very large catch-up. markm@freebsd.org.
I concur on the read-rate. It was me being conservative in the beginning when "bit-distilling" was fashionable, but I am persuaded by the arguments presented to not over-do it. The timing is just a stupid mistake of mine. I can fix that. markm@
(In reply to Mark Murray from comment #6) Sure, please go ahead. I'm happy to help review.
I'm testing a partial fix now.
Created attachment 196446 [details] Patch to limit fast entropy I wrote a simple patch to limit the entropy if anyone is interested.
(In reply to rwmaillists from comment #9) Numerically, this looks like a correct, conservative limit to me (for Fortuna)[1]. We get 8 loop iterations per pool and read 8 bytes from the configured fast random source each iteration. That's 64 bytes of raw random source data, per pool, per feed. 64B = 512 bits. The Fortuna algorithm needs 256 bits of entropy per pool to securely reseed against an attacker that has managed to leak a previous internal state, so 512 bits is nominally overkill by a factor of two. However, this gives us a 2x margin that allows RDRAND and similar fast entropy sources to be slightly less than perfectly random while still maintaining Fortuna's 128-bit ish unpredictability guarantees for future results *even after compromise of a previous internal state*. I think the 2x margin is a reasonable amount. The assumption with a 2x margin is that fast random sources must provide at least 4 bits of entropy per byte. If our fast random source provides less than 4 bits per byte of entropy, that is kind of a problem and we should revisit that source's eligibility[2]. === That said, I think the Fortuna paper's discussion of 8192 bits is assuming those 8192 bits of entropy slowly collect over time from less speedy random source while an attacker is continually reading from the device and forcing reseeds every 100 ms. If we know we have a fast, high-quality source of real entropy, I think we could get away with just feeding pool zero, perhaps with a small 2x margin on raw random source bits. Certainly if the attacker controls our fast random source today it does not matter whether we use it to fill just P_0 or all 32 Fortuna pools. [1]: Fortuna was the default for 11.x, right? Can we remove Yarrow for 12.0? [2]: (Monitoring of entropy sources' real entropy over time is a separate topic and one that might be interesting to pursue eventually.)
Answering CEM's footnotes. [1] Correct; Fortuna is the "new" default. I'm not sure we can remove Yarrow without more deprecation warning. It is still smaller than Fortuna, and folks with SoCs may want to keep it. I'm happy to mark Yarrow as deprecated and kill in 13, though. [2] Yarrow proved this to be very hard indeed, which is one of the reasons that Fortuna was invented.
(In reply to Conrad Meyer from comment #10) I don't like the idea of only filling pool#0 with he fast entropy. Fortuna's main point is recovery from compromise by always having "old" entropy available, and only filling pool#0 breaks this.
(In reply to rwmaillists from comment #9) I like your fix more than I like mine! Thanks!
Sorry, the rate of collection needs to be measured in bits per 100s of bytes if not KBs of data read from /dev/random. We already are collecting millions of bits of entropy per second, we don't need to force additional collection based upon the read rate. All this collection is just a waste of energy and CPU cycles, and provides no significant improvement in security. FreeBSD need to spend time educating people who use FreeBSD on proper ways to get the initial seed strong. That we don't issue warnings about reseeding VM and ISO images is a bigger issue than recovering from a pool compromise. That we don't provide guidance for embedded systems, and warnings on systems that don't have a TRNG driver needs to be addressed.
(In reply to Mark Murray from comment #12) > I don't like the idea of only filling pool#0 with he fast entropy. Fortuna's > main point is recovery from compromise by always having "old" entropy > available, and only filling pool#0 breaks this. Yes, I agree we should continue filling all pools. The cost is something like 4 microseconds per invocation of refeed to gather 2048 bytes from RDRANG, on IVB, if we are using the 64-bit instruction. If we can limit refeeds to once every 100 ms, the 4 us is definitely de minimis. (In reply to Mark Murray from comment #11) It's not too late to mark Yarrow gone_in(12) in stable/11 :-). When you say "bigger," can you quantify that? It seems to me the additional state is only: 30 additional SHA-256 contexts (960 B), and a sbintime_t (2 B) to track last reseed? That seems so de minimis to not really worry about it for embedded systems. The scale of small system FreeBSD runs on at all is in the range of 32-64 *megabytes* of RAM. 962 bytes extra to have a decent random algorithm does not seem burdensome. Even on 32MB that's +0.003%. (In reply to John-Mark Gurney from comment #14) I think one refeed (2048 B from fast random source) per reseed interval (100ms) is fairly reasonable. It may still be overkill, but is only 20 kB/s instead of GB/s. The other issues you raise are very important, don't get me wrong, but IMO they are orthogonal to the /dev/random implementation issues raised in this PR.
A commit references this bug: Author: markm Date: Fri Aug 24 14:53:43 UTC 2018 New revision: 338292 URL: https://svnweb.freebsd.org/changeset/base/338292 Log: Fix braino of mine where the reseeds would happen far too often, making the kernel process way too busy. PR: 230808 Submitted by: Conrad Meyer <cem@FreeBSD.org> Reported by: Danilo Egea Gondolfo <danilo@FreeBSD.org> Reviewed by: cem,delphij Approved by: re(rgrimes) Approved by: so(delphij) MFC after: 1 Month Security: Yes Differential Revision: https://reviews.freebsd.org/D16872 Changes: head/sys/dev/random/fortuna.c
A commit references this bug: Author: markm Date: Fri Aug 24 14:53:47 UTC 2018 New revision: 338293 URL: https://svnweb.freebsd.org/changeset/base/338293 Log: Limit the amount of "fast" entropy. We don't need nearly as much for security, and the excess just slows things down badly. PR: 230808 Submitted by: rwmaillists@googlemail.com, but tweeked by me Reported by: Danilo Egea Gondolfo <danilo@FreeBSD.org> Reviewed by: cem,delphij Approved by: re(rgrimes) Approved by: so(delphij) MFC after: 1 Month Differential Revision: https://reviews.freebsd.org/D16873 Changes: head/sys/dev/random/random_harvestq.c
I'm running r338294 and it seems the problem is gone now.
(In reply to Danilo Egea Gondolfo from comment #18) Thank you, closing the ticket. M
Thanks for the quick discussion and turnaround on this, Mark.
(In reply to Conrad Meyer from comment #20) You are most welcome, Conrad!
The problem is not fixed for STABLE branches yet. For example, FreeBSD 11.2-RELEASE suffers from this problem badly while running as VM-guest due to simple command: $ dd if=/dev/urandom bs=1m of=random.bin count=1024 The system becomes almost unresponsible: PID USERNAME PRI NICE SIZE RES STATE TIME WCPU COMMAND 7 root -16 - 0K 16K - 12:57 99.13% [rand_harvestq] Please consider merging the fix before 11.3-RELEASE that is already scheduled.
I'm ready to test patches for stable/11 if needed. Also, here are (default) settings of my system in the VM: # sysctl kern.random kern.random.fortuna.minpoolsize: 64 kern.random.harvest.mask_symbolic: [UMA],[FS_ATIME],SWI,INTERRUPT,NET_NG,NET_ETHER,NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED kern.random.harvest.mask_bin: 00111111111 kern.random.harvest.mask: 511 kern.random.random_sources: 'Intel Secure Key RNG'
> The problem is not fixed for STABLE branches yet. Please could you apply the change in https://reviews.freebsd.org/D16872 by hand - it's really small!. When you confirm that it works in your environment, I'll merge.
Apologies - please also add the change in https://reviews.freebsd.org/D16873. It's not quite so trivial as the previous, but still shot enough of a quick paste. Thanks!
I've updated my system to stable/11 with both patches applied cleanly and problem's gone. Please merge them.
A commit references this bug: Author: markm Date: Sat Apr 6 09:00:06 UTC 2019 New revision: 345981 URL: https://svnweb.freebsd.org/changeset/base/345981 Log: Backport fixes from FreeBSD-12 to help the random(4) device thread not overwhelm the OS: a) Use the correct symbolic constant when calculating 10'ths of a second. This means that expensive reseeds happen at ony 1/10 Hz, not some kHz. b) Rate limit internal high-rate harveting efforts. This stops the harvesting thread from total overkilling the high-grade entropy- gathering work, while still being very conservatively safe. PR: 230808 Reported by: danilo,eugen Tested by: eugen Approved by: so (blanket permission granted as I am the authour of this code) Relnotes: Yes Changes: stable/11/sys/dev/random/fortuna.c stable/11/sys/dev/random/random_harvestq.c
Fix committed.