Bug 254513

Summary: virtio_random: random_harvestq spinning on a CPU with Q35 virtio random device
Product: Base System Reporter: Mina Galić <me>
Component: miscAssignee: freebsd-virtualization (Nobody) <virtualization>
Status: New ---    
Severity: Affects Many People CC: allanjude, bryanv, cem, chrysalis, contact, cws, deltatux, elij+freebsd, emaste, execve, georgewilliamfoster, grahamperrin, j.kelly.hays, jamie, jeffrey, jmg, jrtc27, kevans, lwhsu, mandree, markm, nchevsky, onno, re, rilindo, russian, virtualization, zlei.huang
Priority: --- Keywords: performance, regression
Version: 13.0-STABLE   
Hardware: Any   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253175

Description Mina Galić 2021-03-23 20:48:17 UTC
This virtual machine is running @ Hetzner on their CPX series, which uses the Qemu Q35 chipset. Support for this has been recently added to FreeBSD and is in FreeBSD 13.0.

The problem manifests as a load average of 1 on a 2 CPU system. `top -S` shows:

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
    8 root          1 -16    -     0B    16K   0B CPU0     0  10:42  99.95% rand_harvestq


So lets dig into the random device.
The device is identified like this:

grep -i entropy /var/run/dmesg.boot
vtrnd0: <VirtIO Entropy Adapter> on virtio_pci4
random: registering fast source VirtIO Entropy Adapter

meena@websrv2-hel1 ~> devinfo -p vtrnd0 -v
vtrnd0 pnpinfo vendor=0x00001af4 device=0x1044 subvendor=0x1af4 device_type=0x00000004
 virtio_pci4 pnpinfo vendor=0x1af4 device=0x1044 subvendor=0x1af4 subdevice=0x1100 class=0x00ff00 at slot=0 function=0 dbsf=pci0:5:0:0
 pci5
 pcib5 pnpinfo vendor=0x1b36 device=0x000c subvendor=0x1b36 subdevice=0x0000 class=0x060400 at slot=2 function=4 dbsf=pci0:0:2:4                                       pci0
 pcib0 pnpinfo _HID=PNP0A08 _UID=1 _CID=PNP0A03 at handle=\_SB_.PCI0
 acpi0
 nexus0
meena@websrv2-hel1 ~>


procstat kstack on rand_harvestq process looks like this:

    8 100097 rand_harvestq       -                   vtrnd_read random_kthread fork_exit fork_trampoline 

suggesting this is virtio_related

After adding

devmatch_blacklist="virtio_random.ko"

to rc.conf, and rebooting, the CPU spinning issue goes away.
Comment 1 Mina Galić 2021-03-23 21:01:02 UTC
n.b.: according to the code in vtrnd_harvest()

https://github.com/freebsd/freebsd-src/blob/633218ee4615854702fea05ba6e17c2a1876bb6b/sys/dev/virtio/random/virtio_random.c#L244

	KASSERT(virtqueue_empty(vq), ("%s: non-empty queue", __func__));

when running this under a DEBUG kernel, this isn't asserting, so we *are* getting some data.
Comment 2 Evilham 2021-04-14 11:50:30 UTC
\o Hello, just wanted to bump this a bit and provide some more information:

This is happening in 13.0-RELEASE also virtualised under kvm/qemu, which I didn't catch until now due to only having tested on hardware.

With CPU being busy with rand_harvestq, things were incredibly slow, so, as a non-reboot work around, I did:

mv /boot/kernel/virtio_random.ko{,.disabled}
kldunload virtio_random

(Mina's suggestion is better as it survives a reboot)

Which actually allows me to use the VM meanwhile.
Comment 3 Mina Galić 2021-04-14 21:53:26 UTC
(In reply to Mina Galić from comment #1)

it has been pointed out to me by two people (jrtc27 and RhodiumToad) that my reading of that assert is wrong

this assert is about whether the queue has been drained
i mean, it's right there in the panic message!
Comment 4 Connor Sheridan 2021-04-15 13:22:23 UTC
I can confirm this behavior at Vultr as well (QEMU/KVM). My initial experience with this bug was noticing that, when given a reboot instruction, the system would bring everything down, then freeze after the following:

All buffers sync.
Uptime: 37s
uhid0: detached
uhub0: detached

Blacklisting virtio_random via the rc.conf fixed and restarting solved the issue immediately.
Comment 5 deltatux 2021-05-18 16:16:50 UTC
Just wanted to chime in that I have been experiencing the same issue as well, it looks like this issue may be more widespread, I found someone writing about this issue while trying to research a workaround for this problem:

https://decomo.info/wiki/en/blog/2021/2021-05-02

I tried applying the workaround as suggested in this bug report by blacklisting virtio_random in rc.conf and it appears to be working, random_harvestq is no longer eating up 1 of my VM cores.

However, I'm not sure what the impact is for blacklisting virtio_random...
Comment 6 Matthias Andree freebsd_committer 2021-05-29 10:07:36 UTC
looping in release engineering to get their attention, this - for me, anyways - was a regression from 12.2-RELEASE on the same hardware.
Comment 7 Allan Jude freebsd_committer 2021-05-29 13:55:22 UTC
It might be useful to share the output of `sysctl kern.random` from an impacted machine.

I also wonder if adjusting kern.random.harvest.mask might mitigate the issue
Comment 8 Mina Galić 2021-05-29 15:04:24 UTC
(In reply to Allan Jude from comment #7)
setting kern.random.harvest.mask had no effect, which is why i resorted to blacklisting the module

i guess i should have documented that in the issue
Comment 9 Matthias Andree freebsd_committer 2021-05-29 15:07:03 UTC
without module:
ern.random.fortuna.concurrent_read: 1
kern.random.fortuna.minpoolsize: 128
kern.random.rdrand.rdrand_independent_seed: 0
kern.random.use_chacha20_cipher: 1
kern.random.block_seeded_status: 0
kern.random.random_sources: 
kern.random.harvest.mask_symbolic: [UMA],[FS_ATIME],SWI,[INTERRUPT],NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
kern.random.harvest.mask_bin: 000000000000000101011111
kern.random.harvest.mask: 351
kern.random.initial_seeding.disable_bypass_warnings: 0
kern.random.initial_seeding.arc4random_bypassed_before_seeding: 0
kern.random.initial_seeding.read_random_bypassed_before_seeding: 0
kern.random.initial_seeding.bypass_before_seeding: 1

with module:
kern.random.fortuna.concurrent_read: 1
kern.random.fortuna.minpoolsize: 128
kern.random.rdrand.rdrand_independent_seed: 0
kern.random.use_chacha20_cipher: 1
kern.random.block_seeded_status: 0
kern.random.random_sources: 'VirtIO Entropy Adapter'
kern.random.harvest.mask_symbolic: PURE_VIRTIO,[UMA],[FS_ATIME],SWI,[INTERRUPT],NET_NG,[NET_ETHER],NET_TUN,MOUSE,KEYBOARD,ATTACH,CACHED
kern.random.harvest.mask_bin: 000001000000000101011111
kern.random.harvest.mask: 262495
kern.random.initial_seeding.disable_bypass_warnings: 0
kern.random.initial_seeding.arc4random_bypassed_before_seeding: 0
kern.random.initial_seeding.read_random_bypassed_before_seeding: 0
kern.random.initial_seeding.bypass_before_seeding: 1

The system does not respond to setting kern.random.harvest.mask:
# sysctl kern.random.harvest.mask=351
kern.random.harvest.mask: 262495 -> 262495
Comment 10 Matthias Andree freebsd_committer 2021-05-29 15:25:23 UTC
since my virtual server is rented with just one core, it goes down to a crawl.
More information:

# kldload virtio_random ; devinfo -p vtrnd0 -v ; kldunload virtio_random
vtrnd0 pnpinfo vendor=0x00001af4 device=0x1005 subvendor=0x1af4 device_type=0x00000004
 virtio_pci4 pnpinfo vendor=0x1af4 device=0x1005 subvendor=0x1af4 subdevice=0x0004 class=0x00ff00 at slot=7 function=0 dbsf=pci0:0:7:0 handle=\_SB_.PCI0.S38_
 pci0
 pcib0 pnpinfo _HID=PNP0A03 _UID=1 _CID=none at handle=\_SB_.PCI0
 acpi0
 nexus0



# pciconf -l -v
[...]
virtio_pci4@pci0:0:7:0: class=0x00ff00 rev=0x00 hdr=0x00 vendor=0x1af4 device=0x1005 subvendor=0x1af4 subdevice=0x0004
    vendor     = 'Red Hat, Inc.'
    device     = 'Virtio RNG'
    class      = old



# dmesg

CPU: Intel Xeon E312xx (Sandy Bridge) (2600.04-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206a1  Family=0x6  Model=0x2a  Stepping=1
  Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
  Features2=0x9fba2203<SSE3,PCLMULQDQ,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,HV>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  XSAVE Features=0x1<XSAVEOPT>
...

virtio_pci0: <VirtIO PCI (legacy) Network adapter> port 0xc0a0-0xc0bf mem 0xfebf1000-0xfebf1fff,0xfe000000-0xfe003fff irq 11 at device 3.0 on pci0
vtnet0: <VirtIO Networking Adapter> on virtio_pci0
vtnet0: Ethernet address: 00:1a:4a:d0:17:bf
vtnet0: netmap queues/slots: TX 1/256, RX 1/128
000.000760 [ 450] vtnet_netmap_attach       vtnet attached txq=1, txd=256 rxq=1, rxd=128
virtio_pci1: <VirtIO PCI (legacy) Block adapter> port 0xc000-0xc03f mem 0xfebf2000-0xfebf2fff,0xfe004000-0xfe007fff irq 11 at device 4.0 on pci0
vtblk0: <VirtIO Block Adapter> on virtio_pci1
vtblk0: 20480MB (41943040 512 byte sectors)
virtio_pci2: <VirtIO PCI (legacy) Block adapter> port 0xc040-0xc07f mem 0xfebf3000-0xfebf3fff,0xfe008000-0xfe00bfff irq 10 at device 5.0 on pci0
vtblk1: <VirtIO Block Adapter> on virtio_pci2
vtblk1: 1024MB (2097152 512 byte sectors)
virtio_pci3: <VirtIO PCI (legacy) Balloon adapter> port 0xc0c0-0xc0df mem 0xfe00c000-0xfe00ffff irq 10 at device 6.0 on pci0
vtballoon0: <VirtIO Balloon Adapter> on virtio_pci3
virtio_pci4: <VirtIO PCI (legacy) Entropy adapter> port 0xc0e0-0xc0ff mem 0xfe010000-0xfe013fff irq 11 at device 7.0 on pci0
...
vtrnd0: <VirtIO Entropy Adapter> on virtio_pci4
random: registering fast source VirtIO Entropy Adapter
[... kldunload here ...]
vtrnd0: detached



# kldload virtio_random ; sleep 5 ; procstat -w 1 kstack 7 ; kldunload virtio_random
  PID    TID COMM                TDNAME              KSTACK                       
    7 100061 rand_harvestq       -                   mi_switch sleepq_timedwait _sleep random_kthread fork_exit fork_trampoline 
    7 100061 rand_harvestq       -                   mi_switch sleepq_timedwait _sleep random_kthread fork_exit fork_trampoline 
    7 100061 rand_harvestq       -                   mi_switch sleepq_timedwait _sleep random_kthread fork_exit fork_trampoline 
    7 100061 rand_harvestq       -                   mi_switch sleepq_timedwait _sleep random_kthread fork_exit fork_trampoline
Comment 11 Billy 2021-07-23 20:06:44 UTC
Can we change the Importance of this bug from "Affects Only Me" to something higher?  Just to reflect the fact that this bug makes it impossible in install FreeBSD 13 from the install CD ISO on several major cloud providers like Vultr.

Further, has anyone found a workaround for this?  I know that devmatch_blacklist can be used if you have access to an existing install.  But if you're using the install CD, I haven't found a way to prevent virtio_random.ko from autoloading.  I tried single user.  I tried to use the "disable-module" command on the loader prompt.  None of that worked, so I am kind of out of ideas.
Comment 12 Matthias Andree freebsd_committer 2021-07-25 20:24:02 UTC
(In reply to Billy from comment #11)
If you can boot into single user mode, the mount your root FS r/w, you can add this devmatch_blacklist to the configuration, the devmatch autoloads happen on full boot from /etc/rc.d.

Not sure how this works with the install CD though, if the install itself already slows down to a hog before you can get your hands on the module loading, can you get a root shell to kldunload virtio_random?

I was upgrading from 12.2 to 13.0 so I don't know how the install CD behaves.
Comment 13 Jamie Landeg-Jones 2021-07-25 21:16:14 UTC
(In reply to Matthias Andree from comment #12)

I was doing a new install on a new vm instance when I got bitten by this. I couldn't find anyway of stopping virtio_random loading, whilst also letting me continue the install.

Tellingly, modifying module_blacklist at the boot prompt didn't work (Presumably because it's loaded by devmatch not the normal kernel booting?? This should be addressed)

I ended up modifying and recreating the install ISO by deleting the virtio_random.ko file from the distribution, and using that, which was a bit convoluted.

Cheers
Comment 14 Jamie Landeg-Jones 2021-07-25 21:42:33 UTC
(In reply to Billy from comment #11)

Billy, the workaround for vultr is to create a new instance using the vultr provided FreeBSD pre-installed option.

Once that is done, go in to the config and "attach ISO", then reboot, and that lets you do a customised install as normal. This works because vultr have disabled it in the qemu config for this profile on the host side. (They've also kept the FreeBSD profile with pc-q35 instead of pc-i440fx due to problems)

You'll see the install icon in the webpanel will now be the FreeBSD logo rather than the "generic install" icon.

I've checked with Vultr that this won't bite me in the future, and it won't - it's a known workaround there.

Moving forward, I intend on switching the profiles of all my instances so we don't hit similar issues in the future.

You can get a machines profile changed without requiring a reinstall by raising a ticket with vultr.

If you want, I can email you the contents of my discussion.

Cheers, Jamie
Comment 15 Billy 2021-07-26 01:52:13 UTC
(In reply to Jamie Landeg-Jones from comment #14)

Thanks, Jamie -- that workaround worked.  Nice of Vultr to provide that support, and hopefully this bug gets fixed soon.
Comment 16 Chris Collins 2021-09-12 16:29:42 UTC
Just to add a me too to this thread.

I actually do have control of my qemu in proxmox, and sure enough removing the virtio rng device fixed the problem, am running multiple 12-STABLE Guests that do not have this problem.

I looked at kern.random and compared to the 12-STABLE VM's, and the key difference is that the virtio device is available and activated in 13, whilst it isn't even listed as available in 12.  So it works on 12 because it isn't used.
Comment 17 Kyle Evans freebsd_committer 2021-09-13 06:21:15 UTC
(In reply to Chris Collins from comment #16)

Can you provide details on the host here? I'm a little curious if qemu is configure to use a non-default backend for entropy, and what kernel version it's running.

According to https://wiki.qemu.org/Features/VirtIORNG, the default is /dev/random which may block until more recent linux releases. I'm wondering if we're depleting the host's /dev/random with our (10hz?) polling, frequently leaving us stuck in a tight spin here: https://cgit.freebsd.org/src/tree/sys/dev/virtio/virtqueue.c?h=releng/13.0#n605 -- CC'ing some csprng@ folks and some virtio folks.

I wonder if it would make sense to add an optional timeout parameter to virtqueue_poll() so that rng doesn't get stuck if vtrnd can't currently contribute to the pool.
Comment 18 Chris Collins 2021-09-13 16:24:34 UTC
I believe the default on qemu is /dev/random, but I am using /dev/urandom as is the default on proxmox, they state the reason for using /dev/unrandom is its non blocking, I could retry on /dev/random to see if it fixes it.

The kernel on the host is 5.4.128-1-pve, proxmox do have 5.11 available for use as well, although I wont be able to reboot on a new kernel as I have VM's running uptime critical stuff.
Comment 19 Chris Collins 2021-09-13 16:28:32 UTC
Sorry just to add, I am testing this on a local proxmox, so I can reboot and the actual kernel version is the 5.11 version, specifically 5.11.22-2-pve.

so using /dev/urandom proxmox default with 1024 limit and 1000ms period.
Comment 20 Conrad Meyer freebsd_committer 2021-09-13 16:52:59 UTC
It would be nice to get line numbers on that stack in vtrnd_read.

I wonder if this (misguided) Qemu feature is in play:

> Optional parameters to limit the rate of data sent to the guest are inlucded:
> -device virtio-rng-pci,max-bytes=1024,period=1000
> This restricts the data sent to the guest at 1KB per second. This is useful to not > let a guest starve the host of entropy.
Comment 21 Conrad Meyer freebsd_committer 2021-09-13 17:08:14 UTC
Current qemu default appears to be infinity: https://github.com/qemu/qemu/blob/master/hw/virtio/virtio-rng.c#L251
Comment 22 Conrad Meyer freebsd_committer 2021-09-13 17:16:54 UTC
This arbitrary low limit seems to be imposed by proxmox?

> By default, the limit is set to 1024 bytes per 1000 ms (1 KiB/s). It is recommended to always use a limiter to avoid guests using too many host resources. If desired, a value of 0 for max_bytes can be used to disable all limits.

https://pve.proxmox.com/wiki/Qemu/KVM_Virtual_Machines
Comment 23 Conrad Meyer freebsd_committer 2021-09-13 17:36:40 UTC
We read a lot from random sources we expect to be “fast”, like virtio-rng.

https://github.com/freebsd/freebsd-src/blob/main/sys/dev/random/random_harvestq.c#L231

10Hz x 4B x HARVESTSIZE x (max) RANDOM_KEYSIZE_WORDS x 32 pools

http://fxr.watson.org/fxr/source/dev/random/random_harvestq.h#L32

http://fxr.watson.org/fxr/source/dev/random/hash.h#L37

Up to about 20 kB/s.
Comment 24 Conrad Meyer freebsd_committer 2021-09-13 17:42:35 UTC
Some related history: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=230808#c1
Comment 25 Conrad Meyer freebsd_committer 2021-09-13 18:23:42 UTC
(Min rate is about 2.5 kB/s, also in excess of the 1kB/s proxmox default.)
Comment 26 Mina Galić 2021-09-14 08:31:51 UTC
n.b.: It appears as tho Hetzner has caught on to this problem, and disabled the virtio RNG.
New instances of CPX series VMs come with the Intel RNG only.

I might still be able to reproduce this on my /other/ laptop and get us the info we need.
Comment 27 Chris Collins 2021-09-17 15:59:31 UTC
I just added the device back with a 40kB/sec limit, and cpu usage looks normal.

I will see how low I can get it.
Comment 28 Chris Collins 2021-09-17 17:02:19 UTC
Ok I tested the following byte limits

1024
2048
4096
40960
0 (unlimited)

In terms of reported cpu usage on proxmox graphs, 2048 and upwards are all the same, also same with unlimited, so if proxmox even went a bit higher people probably would not have noticed.

The peak usage reported inside the FreeBSD guest is higher on 2048 and vs 4096, and likewise 4096 vs 40960 and unlimited.  But they seem to average out the same.

I will report this to proxmox.

Also even with unlimited, The idle system cpu usage is still about 3x vs without virtio rng (0.3% vs 0.1%), so seems to be less efficient but does drop to a level where most people wont notice it.
Comment 29 Chris Collins 2021-09-17 18:45:35 UTC
I just checked the qemu documentation and this is worth noting.

"Using RDRAND on the host, if available, as a source of entropy to pass on to the guest via virtio-rng is better than the current /dev/random source."

This likely explains why I am seeing virtio rng been less efficient, so on intel chips its probably a good thing anyway to not use virtio rng.

Regardless I have made a request to proxmox to raise the default value.
Comment 30 Conrad Meyer freebsd_committer 2021-09-18 19:43:15 UTC
Thanks for that research, Chris.