Bug 254513 - virtio_random: random_harvestq spinning on a CPU with Q35 virtio random device
Summary: virtio_random: random_harvestq spinning on a CPU with Q35 virtio random device
Status: Closed DUPLICATE of bug 253175
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 13.0-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: Kubilay Kocak
Keywords: performance, regression
Depends on:
Reported: 2021-03-23 20:48 UTC by Mina Galić
Modified: 2023-04-23 04:33 UTC (History)
35 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Mina Galić freebsd_triage 2021-03-23 20:48:17 UTC
This virtual machine is running @ Hetzner on their CPX series, which uses the Qemu Q35 chipset. Support for this has been recently added to FreeBSD and is in FreeBSD 13.0.

The problem manifests as a load average of 1 on a 2 CPU system. `top -S` shows:

    8 root          1 -16    -     0B    16K   0B CPU0     0  10:42  99.95% rand_harvestq

So lets dig into the random device.
The device is identified like this:

grep -i entropy /var/run/dmesg.boot
vtrnd0: <VirtIO Entropy Adapter> on virtio_pci4
random: registering fast source VirtIO Entropy Adapter

meena@websrv2-hel1 ~> devinfo -p vtrnd0 -v
vtrnd0 pnpinfo vendor=0x00001af4 device=0x1044 subvendor=0x1af4 device_type=0x00000004
 virtio_pci4 pnpinfo vendor=0x1af4 device=0x1044 subvendor=0x1af4 subdevice=0x1100 class=0x00ff00 at slot=0 function=0 dbsf=pci0:5:0:0
 pcib5 pnpinfo vendor=0x1b36 device=0x000c subvendor=0x1b36 subdevice=0x0000 class=0x060400 at slot=2 function=4 dbsf=pci0:0:2:4                                       pci0
 pcib0 pnpinfo _HID=PNP0A08 _UID=1 _CID=PNP0A03 at handle=\_SB_.PCI0
meena@websrv2-hel1 ~>

procstat kstack on rand_harvestq process looks like this:

    8 100097 rand_harvestq       -                   vtrnd_read random_kthread fork_exit fork_trampoline 

suggesting this is virtio_related

After adding


to rc.conf, and rebooting, the CPU spinning issue goes away.
Comment 1 Mina Galić freebsd_triage 2021-03-23 21:01:02 UTC
n.b.: according to the code in vtrnd_harvest()


	KASSERT(virtqueue_empty(vq), ("%s: non-empty queue", __func__));

when running this under a DEBUG kernel, this isn't asserting, so we *are* getting some data.
Comment 2 Evilham 2021-04-14 11:50:30 UTC
\o Hello, just wanted to bump this a bit and provide some more information:

This is happening in 13.0-RELEASE also virtualised under kvm/qemu, which I didn't catch until now due to only having tested on hardware.

With CPU being busy with rand_harvestq, things were incredibly slow, so, as a non-reboot work around, I did:

mv /boot/kernel/virtio_random.ko{,.disabled}
kldunload virtio_random

(Mina's suggestion is better as it survives a reboot)

Which actually allows me to use the VM meanwhile.
Comment 3 Mina Galić freebsd_triage 2021-04-14 21:53:26 UTC
(In reply to Mina Galić from comment #1)

it has been pointed out to me by two people (jrtc27 and RhodiumToad) that my reading of that assert is wrong

this assert is about whether the queue has been drained
i mean, it's right there in the panic message!
Comment 4 Justin Coffman 2021-04-15 13:22:23 UTC
I can confirm this behavior at Vultr as well (QEMU/KVM). My initial experience with this bug was noticing that, when given a reboot instruction, the system would bring everything down, then freeze after the following:

All buffers sync.
Uptime: 37s
uhid0: detached
uhub0: detached

Blacklisting virtio_random via the rc.conf fixed and restarting solved the issue immediately.
Comment 5 deltatux 2021-05-18 16:16:50 UTC
Just wanted to chime in that I have been experiencing the same issue as well, it looks like this issue may be more widespread, I found someone writing about this issue while trying to research a workaround for this problem:


I tried applying the workaround as suggested in this bug report by blacklisting virtio_random in rc.conf and it appears to be working, random_harvestq is no longer eating up 1 of my VM cores.

However, I'm not sure what the impact is for blacklisting virtio_random...
Comment 6 Matthias Andree freebsd_committer 2021-05-29 10:07:36 UTC
looping in release engineering to get their attention, this - for me, anyways - was a regression from 12.2-RELEASE on the same hardware.
Comment 7 Allan Jude freebsd_committer 2021-05-29 13:55:22 UTC
It might be useful to share the output of `sysctl kern.random` from an impacted machine.

I also wonder if adjusting kern.random.harvest.mask might mitigate the issue
Comment 8 Mina Galić freebsd_triage 2021-05-29 15:04:24 UTC
(In reply to Allan Jude from comment #7)
setting kern.random.harvest.mask had no effect, which is why i resorted to blacklisting the module

i guess i should have documented that in the issue
Comment 9 Matthias Andree freebsd_committer 2021-05-29 15:07:03 UTC
without module:
ern.random.fortuna.concurrent_read: 1
kern.random.fortuna.minpoolsize: 128
kern.random.rdrand.rdrand_independent_seed: 0
kern.random.use_chacha20_cipher: 1
kern.random.block_seeded_status: 0
kern.random.harvest.mask_bin: 000000000000000101011111
kern.random.harvest.mask: 351
kern.random.initial_seeding.disable_bypass_warnings: 0
kern.random.initial_seeding.arc4random_bypassed_before_seeding: 0
kern.random.initial_seeding.read_random_bypassed_before_seeding: 0
kern.random.initial_seeding.bypass_before_seeding: 1

with module:
kern.random.fortuna.concurrent_read: 1
kern.random.fortuna.minpoolsize: 128
kern.random.rdrand.rdrand_independent_seed: 0
kern.random.use_chacha20_cipher: 1
kern.random.block_seeded_status: 0
kern.random.random_sources: 'VirtIO Entropy Adapter'
kern.random.harvest.mask_bin: 000001000000000101011111
kern.random.harvest.mask: 262495
kern.random.initial_seeding.disable_bypass_warnings: 0
kern.random.initial_seeding.arc4random_bypassed_before_seeding: 0
kern.random.initial_seeding.read_random_bypassed_before_seeding: 0
kern.random.initial_seeding.bypass_before_seeding: 1

The system does not respond to setting kern.random.harvest.mask:
# sysctl kern.random.harvest.mask=351
kern.random.harvest.mask: 262495 -> 262495
Comment 10 Matthias Andree freebsd_committer 2021-05-29 15:25:23 UTC
since my virtual server is rented with just one core, it goes down to a crawl.
More information:

# kldload virtio_random ; devinfo -p vtrnd0 -v ; kldunload virtio_random
vtrnd0 pnpinfo vendor=0x00001af4 device=0x1005 subvendor=0x1af4 device_type=0x00000004
 virtio_pci4 pnpinfo vendor=0x1af4 device=0x1005 subvendor=0x1af4 subdevice=0x0004 class=0x00ff00 at slot=7 function=0 dbsf=pci0:0:7:0 handle=\_SB_.PCI0.S38_
 pcib0 pnpinfo _HID=PNP0A03 _UID=1 _CID=none at handle=\_SB_.PCI0

# pciconf -l -v
virtio_pci4@pci0:0:7:0: class=0x00ff00 rev=0x00 hdr=0x00 vendor=0x1af4 device=0x1005 subvendor=0x1af4 subdevice=0x0004
    vendor     = 'Red Hat, Inc.'
    device     = 'Virtio RNG'
    class      = old

# dmesg

CPU: Intel Xeon E312xx (Sandy Bridge) (2600.04-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206a1  Family=0x6  Model=0x2a  Stepping=1
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  XSAVE Features=0x1<XSAVEOPT>

virtio_pci0: <VirtIO PCI (legacy) Network adapter> port 0xc0a0-0xc0bf mem 0xfebf1000-0xfebf1fff,0xfe000000-0xfe003fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
vtnet0: Ethernet address: 00:1a:4a:d0:17:bf
vtnet0: netmap queues/slots: TX 1/256, RX 1/128
000.000760 [ 450] vtnet_netmap_attach       vtnet attached txq=1, txd=256 rxq=1, rxd=128
virtio_pci1: <VirtIO PCI (legacy) Block adapter> port 0xc000-0xc03f mem 0xfebf2000-0xfebf2fff,0xfe004000-0xfe007fff irq 11 at device 4.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci1
vtblk0: 20480MB (41943040 512 byte sectors)
virtio_pci2: <VirtIO PCI (legacy) Block adapter> port 0xc040-0xc07f mem 0xfebf3000-0xfebf3fff,0xfe008000-0xfe00bfff irq 10 at device 5.0 on pci0
vtblk1: <VirtIO Block Adapter> on virtio_pci2
vtblk1: 1024MB (2097152 512 byte sectors)
virtio_pci3: <VirtIO PCI (legacy) Balloon adapter> port 0xc0c0-0xc0df mem 0xfe00c000-0xfe00ffff irq 10 at device 6.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci3
virtio_pci4: <VirtIO PCI (legacy) Entropy adapter> port 0xc0e0-0xc0ff mem 0xfe010000-0xfe013fff irq 11 at device 7.0 on pci0
vtrnd0: <VirtIO Entropy Adapter> on virtio_pci4
random: registering fast source VirtIO Entropy Adapter
[... kldunload here ...]
vtrnd0: detached

# kldload virtio_random ; sleep 5 ; procstat -w 1 kstack 7 ; kldunload virtio_random
  PID    TID COMM                TDNAME              KSTACK                       
    7 100061 rand_harvestq       -                   mi_switch sleepq_timedwait _sleep random_kthread fork_exit fork_trampoline 
    7 100061 rand_harvestq       -                   mi_switch sleepq_timedwait _sleep random_kthread fork_exit fork_trampoline 
    7 100061 rand_harvestq       -                   mi_switch sleepq_timedwait _sleep random_kthread fork_exit fork_trampoline 
    7 100061 rand_harvestq       -                   mi_switch sleepq_timedwait _sleep random_kthread fork_exit fork_trampoline
Comment 11 Billy 2021-07-23 20:06:44 UTC
Can we change the Importance of this bug from "Affects Only Me" to something higher?  Just to reflect the fact that this bug makes it impossible in install FreeBSD 13 from the install CD ISO on several major cloud providers like Vultr.

Further, has anyone found a workaround for this?  I know that devmatch_blacklist can be used if you have access to an existing install.  But if you're using the install CD, I haven't found a way to prevent virtio_random.ko from autoloading.  I tried single user.  I tried to use the "disable-module" command on the loader prompt.  None of that worked, so I am kind of out of ideas.
Comment 12 Matthias Andree freebsd_committer 2021-07-25 20:24:02 UTC
(In reply to Billy from comment #11)
If you can boot into single user mode, the mount your root FS r/w, you can add this devmatch_blacklist to the configuration, the devmatch autoloads happen on full boot from /etc/rc.d.

Not sure how this works with the install CD though, if the install itself already slows down to a hog before you can get your hands on the module loading, can you get a root shell to kldunload virtio_random?

I was upgrading from 12.2 to 13.0 so I don't know how the install CD behaves.
Comment 13 Jamie Landeg-Jones 2021-07-25 21:16:14 UTC
(In reply to Matthias Andree from comment #12)

I was doing a new install on a new vm instance when I got bitten by this. I couldn't find anyway of stopping virtio_random loading, whilst also letting me continue the install.

Tellingly, modifying module_blacklist at the boot prompt didn't work (Presumably because it's loaded by devmatch not the normal kernel booting?? This should be addressed)

I ended up modifying and recreating the install ISO by deleting the virtio_random.ko file from the distribution, and using that, which was a bit convoluted.

Comment 14 Jamie Landeg-Jones 2021-07-25 21:42:33 UTC
(In reply to Billy from comment #11)

Billy, the workaround for vultr is to create a new instance using the vultr provided FreeBSD pre-installed option.

Once that is done, go in to the config and "attach ISO", then reboot, and that lets you do a customised install as normal. This works because vultr have disabled it in the qemu config for this profile on the host side. (They've also kept the FreeBSD profile with pc-q35 instead of pc-i440fx due to problems)

You'll see the install icon in the webpanel will now be the FreeBSD logo rather than the "generic install" icon.

I've checked with Vultr that this won't bite me in the future, and it won't - it's a known workaround there.

Moving forward, I intend on switching the profiles of all my instances so we don't hit similar issues in the future.

You can get a machines profile changed without requiring a reinstall by raising a ticket with vultr.

If you want, I can email you the contents of my discussion.

Cheers, Jamie
Comment 15 Billy 2021-07-26 01:52:13 UTC
(In reply to Jamie Landeg-Jones from comment #14)

Thanks, Jamie -- that workaround worked.  Nice of Vultr to provide that support, and hopefully this bug gets fixed soon.
Comment 16 Chris Collins 2021-09-12 16:29:42 UTC
Just to add a me too to this thread.

I actually do have control of my qemu in proxmox, and sure enough removing the virtio rng device fixed the problem, am running multiple 12-STABLE Guests that do not have this problem.

I looked at kern.random and compared to the 12-STABLE VM's, and the key difference is that the virtio device is available and activated in 13, whilst it isn't even listed as available in 12.  So it works on 12 because it isn't used.
Comment 17 Kyle Evans freebsd_committer 2021-09-13 06:21:15 UTC
(In reply to Chris Collins from comment #16)

Can you provide details on the host here? I'm a little curious if qemu is configure to use a non-default backend for entropy, and what kernel version it's running.

According to https://wiki.qemu.org/Features/VirtIORNG, the default is /dev/random which may block until more recent linux releases. I'm wondering if we're depleting the host's /dev/random with our (10hz?) polling, frequently leaving us stuck in a tight spin here: https://cgit.freebsd.org/src/tree/sys/dev/virtio/virtqueue.c?h=releng/13.0#n605 -- CC'ing some csprng@ folks and some virtio folks.

I wonder if it would make sense to add an optional timeout parameter to virtqueue_poll() so that rng doesn't get stuck if vtrnd can't currently contribute to the pool.
Comment 18 Chris Collins 2021-09-13 16:24:34 UTC
I believe the default on qemu is /dev/random, but I am using /dev/urandom as is the default on proxmox, they state the reason for using /dev/unrandom is its non blocking, I could retry on /dev/random to see if it fixes it.

The kernel on the host is 5.4.128-1-pve, proxmox do have 5.11 available for use as well, although I wont be able to reboot on a new kernel as I have VM's running uptime critical stuff.
Comment 19 Chris Collins 2021-09-13 16:28:32 UTC
Sorry just to add, I am testing this on a local proxmox, so I can reboot and the actual kernel version is the 5.11 version, specifically 5.11.22-2-pve.

so using /dev/urandom proxmox default with 1024 limit and 1000ms period.
Comment 20 Conrad Meyer freebsd_committer 2021-09-13 16:52:59 UTC
It would be nice to get line numbers on that stack in vtrnd_read.

I wonder if this (misguided) Qemu feature is in play:

> Optional parameters to limit the rate of data sent to the guest are inlucded:
> -device virtio-rng-pci,max-bytes=1024,period=1000
> This restricts the data sent to the guest at 1KB per second. This is useful to not > let a guest starve the host of entropy.
Comment 21 Conrad Meyer freebsd_committer 2021-09-13 17:08:14 UTC
Current qemu default appears to be infinity: https://github.com/qemu/qemu/blob/master/hw/virtio/virtio-rng.c#L251
Comment 22 Conrad Meyer freebsd_committer 2021-09-13 17:16:54 UTC
This arbitrary low limit seems to be imposed by proxmox?

> By default, the limit is set to 1024 bytes per 1000 ms (1 KiB/s). It is recommended to always use a limiter to avoid guests using too many host resources. If desired, a value of 0 for max_bytes can be used to disable all limits.

Comment 23 Conrad Meyer freebsd_committer 2021-09-13 17:36:40 UTC
We read a lot from random sources we expect to be “fast”, like virtio-rng.


10Hz x 4B x HARVESTSIZE x (max) RANDOM_KEYSIZE_WORDS x 32 pools



Up to about 20 kB/s.
Comment 24 Conrad Meyer freebsd_committer 2021-09-13 17:42:35 UTC
Some related history: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230808#c1
Comment 25 Conrad Meyer freebsd_committer 2021-09-13 18:23:42 UTC
(Min rate is about 2.5 kB/s, also in excess of the 1kB/s proxmox default.)
Comment 26 Mina Galić freebsd_triage 2021-09-14 08:31:51 UTC
n.b.: It appears as tho Hetzner has caught on to this problem, and disabled the virtio RNG.
New instances of CPX series VMs come with the Intel RNG only.

I might still be able to reproduce this on my /other/ laptop and get us the info we need.
Comment 27 Chris Collins 2021-09-17 15:59:31 UTC
I just added the device back with a 40kB/sec limit, and cpu usage looks normal.

I will see how low I can get it.
Comment 28 Chris Collins 2021-09-17 17:02:19 UTC
Ok I tested the following byte limits

0 (unlimited)

In terms of reported cpu usage on proxmox graphs, 2048 and upwards are all the same, also same with unlimited, so if proxmox even went a bit higher people probably would not have noticed.

The peak usage reported inside the FreeBSD guest is higher on 2048 and vs 4096, and likewise 4096 vs 40960 and unlimited.  But they seem to average out the same.

I will report this to proxmox.

Also even with unlimited, The idle system cpu usage is still about 3x vs without virtio rng (0.3% vs 0.1%), so seems to be less efficient but does drop to a level where most people wont notice it.
Comment 29 Chris Collins 2021-09-17 18:45:35 UTC
I just checked the qemu documentation and this is worth noting.

"Using RDRAND on the host, if available, as a source of entropy to pass on to the guest via virtio-rng is better than the current /dev/random source."

This likely explains why I am seeing virtio rng been less efficient, so on intel chips its probably a good thing anyway to not use virtio rng.

Regardless I have made a request to proxmox to raise the default value.
Comment 30 Conrad Meyer freebsd_committer 2021-09-18 19:43:15 UTC
Thanks for that research, Chris.
Comment 31 Patrik Lundquist 2021-11-16 10:13:16 UTC
Also hangs on reboot when using QEMU's builtin RNG; -object rng-builtin,id=id

<rng model="virtio">
  <backend model="builtin"/>
Comment 32 Dennis Noordsij 2022-01-12 07:42:32 UTC
(Coming here via #261127 - "ifconfig wg0 destroy" also triggering this problem)

Can confirm this is (still) happening on Hetzner (CPX) VPS's with 13.0-RELEASE-p6

Blacklisting virtio_random fixes (works around) the issue.

Regarding ready installation images; I usually boot the rescue mode and install manually by creating partitions and untarring the base system, so perhaps Hetzner/Vultr provide installation media where virtio_random is already pre-blacklisted).

Have a system available for any testing/debugging, or even just try to detect if the issue would occur and not load the driver, if someone wants to point me in the right direction.
Comment 33 Kyle Evans freebsd_committer 2022-01-12 16:46:48 UTC
(In reply to Dennis Noordsij from comment #32)

OK, I'll spin up a Hetzner VM and poke you if I can't reproduce it. It'll probably be ~day or two before I can take a look, though.
Comment 34 Jamie Landeg-Jones 2022-01-12 20:10:40 UTC
(In reply to Dennis Noordsij from comment #32)

> Regarding ready installation images; I usually boot the rescue mode and install manually by creating partitions and untarring the base system, so perhaps Hetzner/Vultr provide installation media where virtio_random is already pre-blacklisted).

The vultr install does indeed disable virtio-random, but that wasn't the only issue: I created my own ISO without virtio-random , and whilst it stopped the boot hang, it still caused problems (hanging on reboot being one)

I talked with the vultr guy and he confirmed - he had to disable virtio-random AND change the KVM profile to boot :

> "2021-07-14 16:37:43
> We've found that v12 and older of FreeBSD do not actually support q35 - so when any version of FreeBSD is selected from our control panel we default that back to i440fx. We do have a few customers using the iso-over-image trick, and we didn't want to break them if they're just selecting the default version (13) and then trying to install 12 over it.

> We have not done extensive testing of FreeBSD on q35 yet - it seems like support for this is very new, so I wouldn't be surprised if there were still issues causing it to hang at reboot.

>Brian, Vultr.com Support
Comment 35 Dennis Noordsij 2022-01-12 20:32:23 UTC
(In reply to Jamie Landeg-Jones from comment #34)

Ah OK. For root servers Hetzner provides an additional special "AX" (AMD line) flavored FreeBSD 13 rescue image - though I am not actually sure what the difference is. 

For the VPS's there is only one FreeBSD rescue image, which includes only a few necessary modules but not virtio and which works without issue.

They do not actually have a FreeBSD installation option for the VPS's - you can only install FreeBSD via the rescue image (either manually or running bsdinstall[image]) - but mostly what I want to say is that they don't know what OS you're actually running and there therefore shouldn't be any special KVM settings and it should be the same thing a Linux OS sees. At most the provided bsdinstall blacklists virtio.

As far as I have been able to tell, this is the only issue with this (virtual) hardware, it even performs surprisingly well.

(For reference, this is a CPX11 in Nuremburg)
Comment 36 Jamie Landeg-Jones 2022-01-12 21:24:12 UTC
(In reply to Dennis Noordsij from comment #35)

One of the reasons I chose Vultr is that they provide FreeBSD images. Even though I've always rolled my own, I wanted to show support, and also, if there was some non-OS issue, I wouldn't get tickets closed with "We don't support FreeBSD!"

As for installation, Vultr allow you to provide your own ISO to boot from, as well as their "customised OS installs" and also their "ISO library of standard releases"

Up until I had this issue with FreeBSD 13, I always used the default Vultr KVM profile rather than the FreeBSD profile - indeed my non-FreeBSD 13 instances are still using it. It was only FreeBSD 13 that got stuck. If Hetzner only provides the older "i440fx" profile, you'd not have noticed the difference.

The other issues with FreeBSD (boot on hang) may have been fixed - I've noticed there has been a bit of activity on this bug and the other bug related to this issue, but I haven't been following closely.

[Note, you'd need to try a STABLE rather than RELEASE image)

I'd never heard of Hetzner until recently, so it will be interesting to compare.

(In reply to Kyle Evans from comment #33)

If you want, I can help from the Vultr end too. Though I'm assuming they're still running their "free first $100" on signup, so you can actually test it yourself without cost if that's easier.

P.S. I've no affiliation with Vultr, I've just been a customer for many years.
Comment 37 Kyle Evans freebsd_committer 2022-01-13 04:03:28 UTC
What's going on with Hetzner, at least, apparently has nothing to do with the polling rate. we make it through exactly 0 iterations of vtrnd_harvest, completely blocked in virtqueue_poll for the first time.

On this machine, vtrnd0 is hanging off of virtio_pci4:

virtio_pci4: <VirtIO PCI (modern) Entropy adapter> mem 0xfc800000-0xfc803fff irq 22 at device 0.0 on pci5
vtrnd0: <VirtIO Entropy Adapter> on virtio_pci4
Comment 38 Kubilay Kocak freebsd_committer freebsd_triage 2022-03-11 00:09:14 UTC
^Triage: Closing duplicate of earlier bug 253175 (see bug 253175 comment 27).

*** This bug has been marked as a duplicate of bug 253175 ***