Bug 253175 - virtio_random(4): Hangs after shutdown, reboot, halt commands on Vultr / Hetzner / ARP Networks (Qemu)
Summary: virtio_random(4): Hangs after shutdown, reboot, halt commands on Vultr / Hetz...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: Eugene Grosbein
URL:
Keywords: needs-patch, performance, regression
: 254513 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-02-02 04:22 UTC by danskoya
Modified: 2022-08-14 07:40 UTC (History)
28 users (show)

See Also:
koobs: maintainer-feedback? (jrtc27)
koobs: maintainer-feedback? (bryanv)
koobs: mfc-stable13?
koobs: mfc-stable12-


Attachments
error after installer (38.67 KB, image/png)
2021-02-02 04:22 UTC, danskoya
no flags Details
error after completing installer (31.93 KB, image/png)
2021-02-02 04:22 UTC, danskoya
no flags Details
dmesg.boot of 13.1-STABLE/amd64 (61.24 KB, text/plain)
2022-03-10 23:53 UTC, Eugene Grosbein
no flags Details
pciconf -lv of 13.1-STABLE/amd64 (2.31 KB, text/plain)
2022-03-10 23:54 UTC, Eugene Grosbein
no flags Details
hackish-aplha-preliminary-proof-of-concept patch (680 bytes, patch)
2022-03-11 00:51 UTC, Eugene Grosbein
no flags Details | Diff
patch-virtio_random.c (2.43 KB, patch)
2022-03-11 02:12 UTC, Eugene Grosbein
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description danskoya 2021-02-02 04:22:00 UTC
Created attachment 222088 [details]
error after installer

Attempted to install FreeBSD 13.0-ALPHA1 and ALPHA3 on Vultr but system hangs after a successful run with the installer and issuing "shutdown -r now" after booting into the the installed base.
Comment 1 danskoya 2021-02-02 04:22:45 UTC
Created attachment 222089 [details]
error after completing installer
Comment 2 danskoya 2021-02-02 13:22:17 UTC
My apologies, I forgot to mention these:

Images I used - 

FreeBSD-13.0-ALPHA1-amd64-20210114-7ae27c2d6c4-255938-bootonly.iso
FreeBSD-13.0-ALPHA3-amd64-20210129-40cb0344eb2-256214-bootonly.iso

- formatted target storage as UFS/GPT

- booting into Live CD (instead of running installer) renders the same issue (hanging)

- while stuck, CPU usage jumps up to 100% (Vultr)
Comment 3 danskoya 2021-02-13 05:30:34 UTC
Problem still exist on BETA2, unfortunately.
Comment 4 Dennis Clarke 2021-02-24 11:38:52 UTC
This is a 13-0-BETA3 fresh install : 

# uname -apKU
FreeBSD europa 13.0-BETA3 FreeBSD 13.0-BETA3 #0 releng/13.0-n244525-150b4388d3b: Fri Feb 19 04:04:34 UTC 2021     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64 amd64 1300139 1300139

After the install of 13.0-BETA3 and the initial creation of a few users
and a reboot or two I may utter "shutdown -p 'now'" which then leaves
the machine still powered on. This is what I see on the console : 

Waiting for PIDS: 856.
Stopping devd.
Waiting for PIDS: 587.
Writing entrophy file: .
Writing early boot entrophy file: .
.
Terminated
Feb 23 11:49:53 europa syslogd: exiting on signal 15
Waiting (max 60 seconds) for system process `vnlru' to stop... done
Waiting (max 60 seconds) for system process `syncer' to stop... done
Syncing disks, vnodes remaining... 0 0 0 0 0 done
Waiting (max 60 seconds) for system process `bufdaemon' to stop... done
Waiting (max 60 seconds) for system process `bufspacedaemon-0' to stop... done
Waiting (max 60 seconds) for system process `bufspacedaemon-3' to stop... done
Waiting (max 60 seconds) for system process `bufspacedaemon-1' to stop... done
Waiting (max 60 seconds) for system process `bufspacedaemon-2' to stop... done
All buffers synced.
Uptime: 7h44m23s
uhub6: detached
uhub5: detached
re0: link state changed to DOWN
re0: link state changed to UP
uhub1: detached
uhub7: detached
uhub3: detached
ums0: detached
uhub4: detached
uhub2: detached
uhub0: detached


Then we have the machine stuck at that point still powered on and there
is even a mouse cursor/arrow pointer on the screen. I may try to switch
over to a serial console just to see if that is any different. However
perhaps this has something to do with having a old PS2 style keyboard
plugged in? I am at a loss as to why the machine will not power off. 

This is not fascinating hardware in the least. It is just an old ACER
workstation pulled out of a pile and a few disks connected to it. The
hardware list looks like :

root@europa:~ # pciconf -lv
hostb0@pci0:0:0:0:      class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x9601 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'RS880 Host Bridge'
    class      = bridge
    subclass   = HOST-PCI
pcib1@pci0:0:1:0:       class=0x060400 rev=0x00 hdr=0x01 vendor=0x1025 device=0x9602 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Acer Incorporated [ALI]'
    device     = 'AMD RS780/RS880 PCI to PCI bridge (int gfx)'
    class      = bridge
    subclass   = PCI-PCI
ahci0@pci0:0:17:0:      class=0x010400 rev=0x40 hdr=0x00 vendor=0x1002 device=0x4393 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB7x0/SB8x0/SB9x0 SATA Controller [RAID5 mode]'
    class      = mass storage
    subclass   = RAID
ohci0@pci0:0:18:0:      class=0x0c0310 rev=0x00 hdr=0x00 vendor=0x1002 device=0x4397 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB7x0/SB8x0/SB9x0 USB OHCI0 Controller'
    class      = serial bus
    subclass   = USB
ehci0@pci0:0:18:2:      class=0x0c0320 rev=0x00 hdr=0x00 vendor=0x1002 device=0x4396 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB7x0/SB8x0/SB9x0 USB EHCI Controller'
    class      = serial bus
    subclass   = USB
ohci1@pci0:0:19:0:      class=0x0c0310 rev=0x00 hdr=0x00 vendor=0x1002 device=0x4397 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB7x0/SB8x0/SB9x0 USB OHCI0 Controller'
    class      = serial bus
    subclass   = USB
ehci1@pci0:0:19:2:      class=0x0c0320 rev=0x00 hdr=0x00 vendor=0x1002 device=0x4396 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB7x0/SB8x0/SB9x0 USB EHCI Controller'
    class      = serial bus
    subclass   = USB
intsmb0@pci0:0:20:0:    class=0x0c0500 rev=0x42 hdr=0x00 vendor=0x1002 device=0x4385 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SBx00 SMBus Controller'
    class      = serial bus
    subclass   = SMBus
hdac1@pci0:0:20:2:      class=0x040300 rev=0x40 hdr=0x00 vendor=0x1002 device=0x4383 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SBx00 Azalia (Intel HDA)'
    class      = multimedia
    subclass   = HDA
isab0@pci0:0:20:3:      class=0x060100 rev=0x40 hdr=0x00 vendor=0x1002 device=0x439d subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB7x0/SB8x0/SB9x0 LPC host controller'
    class      = bridge
    subclass   = PCI-ISA
pcib2@pci0:0:20:4:      class=0x060401 rev=0x40 hdr=0x01 vendor=0x1002 device=0x4384 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SBx00 PCI to PCI Bridge'
    class      = bridge
    subclass   = PCI-PCI
ohci2@pci0:0:20:5:      class=0x0c0310 rev=0x00 hdr=0x00 vendor=0x1002 device=0x4399 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB7x0/SB8x0/SB9x0 USB OHCI2 Controller'
    class      = serial bus
    subclass   = USB
pcib3@pci0:0:21:0:      class=0x060400 rev=0x00 hdr=0x01 vendor=0x1002 device=0x43a0 subvendor=0x1002 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB700/SB800/SB900 PCI to PCI bridge (PCIE port 0)'
    class      = bridge
    subclass   = PCI-PCI
pcib4@pci0:0:21:1:      class=0x060400 rev=0x00 hdr=0x01 vendor=0x1002 device=0x43a1 subvendor=0x1002 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB700/SB800/SB900 PCI to PCI bridge (PCIE port 1)'
    class      = bridge
    subclass   = PCI-PCI
pcib5@pci0:0:21:3:      class=0x060400 rev=0x00 hdr=0x01 vendor=0x1002 device=0x43a3 subvendor=0x1002 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB900 PCI to PCI bridge (PCIE port 3)'
    class      = bridge
    subclass   = PCI-PCI
ohci3@pci0:0:22:0:      class=0x0c0310 rev=0x00 hdr=0x00 vendor=0x1002 device=0x4397 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB7x0/SB8x0/SB9x0 USB OHCI0 Controller'
    class      = serial bus
    subclass   = USB
ehci2@pci0:0:22:2:      class=0x0c0320 rev=0x00 hdr=0x00 vendor=0x1002 device=0x4396 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'SB7x0/SB8x0/SB9x0 USB EHCI Controller'
    class      = serial bus
    subclass   = USB
hostb1@pci0:0:24:0:     class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1600 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Family 15h Processor Function 0'
    class      = bridge
    subclass   = HOST-PCI
hostb2@pci0:0:24:1:     class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1601 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Family 15h Processor Function 1'
    class      = bridge
    subclass   = HOST-PCI
hostb3@pci0:0:24:2:     class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1602 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Family 15h Processor Function 2'
    class      = bridge
    subclass   = HOST-PCI
hostb4@pci0:0:24:3:     class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1603 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Family 15h Processor Function 3'
    class      = bridge
    subclass   = HOST-PCI
hostb5@pci0:0:24:4:     class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1604 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Family 15h Processor Function 4'
    class      = bridge
    subclass   = HOST-PCI
hostb6@pci0:0:24:5:     class=0x060000 rev=0x00 hdr=0x00 vendor=0x1022 device=0x1605 subvendor=0x0000 subdevice=0x0000
    vendor     = 'Advanced Micro Devices, Inc. [AMD]'
    device     = 'Family 15h Processor Function 5'
    class      = bridge
    subclass   = HOST-PCI
vgapci0@pci0:1:5:0:     class=0x030000 rev=0x00 hdr=0x00 vendor=0x1002 device=0x9715 subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'RS880 [Radeon HD 4250]'
    class      = display
    subclass   = VGA
hdac0@pci0:1:5:1:       class=0x040300 rev=0x00 hdr=0x00 vendor=0x1002 device=0x970f subvendor=0x1025 subdevice=0x0591
    vendor     = 'Advanced Micro Devices, Inc. [AMD/ATI]'
    device     = 'RS880 HDMI Audio [Radeon HD 4200 Series]'
    class      = multimedia
    subclass   = HDA
xhci0@pci0:4:0:0:       class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x1b6f device=0x7023 subvendor=0x1025 subdevice=0x8030
    vendor     = 'Etron Technology, Inc.'
    device     = 'EJ168 USB 3.0 Host Controller'
    class      = serial bus
    subclass   = USB
re0@pci0:5:0:0: class=0x020000 rev=0x06 hdr=0x00 vendor=0x10ec device=0x8168 subvendor=0x1025 subdevice=0x8000
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller'
    class      = network
    subclass   = ethernet
root@europa:~ # 

So this is bone stock off the shelf boring hardware.
Nothing interesting to see here at all.

-- 
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken
GreyBeard and suspenders optional
Comment 5 Dennis Clarke 2021-02-24 12:19:09 UTC
It may be reasonable to change the version field here to 13.0 BETA3.

Dennis
Comment 6 danskoya 2021-02-24 14:45:33 UTC
(In reply to Dennis Clarke from comment #5)

all set, good Sir. Thank you providing more details.
Comment 7 Dennis Clarke 2021-02-27 16:54:31 UTC
Problem seems to be fixed in BETA4.  I will do a few more tests but for
the moment a shutdown -p 'now' does what it says it does.
Comment 8 danskoya 2021-02-28 05:19:52 UTC
Still the same issue with shutdown -p/-r "now" on Vultr - just stuck. All of my FreeBSD 12.2-RELEASE instances doesn't have this particular issue.

VirtualBox under macOS 10.13.6 (host) doesn't have this issue, it must be QEMU-related (which is what Vultr uses (i believe).
Comment 9 Eric Benner 2021-03-01 22:42:27 UTC
Vultr is indeed using qemu. Its a version 5.x+. Was there any other platforms experiencing this or any headway in discovering the route cause? If the is something we can do on our end to smooth over this release you can contact me directly and I can get it implemented.
Comment 10 Dennis Clarke 2021-03-01 22:54:08 UTC
    One moment please. 

    Do we care about an emulation program here?
    In my mind this problem is solved.

Dennis Clarke
Comment 11 danskoya 2021-03-03 22:13:04 UTC
(In reply to Eric Benner from comment #9)

Hello Sir,

David Finster (@dfinr) who works for Vultr responded to my Tweet and is also  following this report. I filed ticket with Vultr but was told it will take time and they are looking into it. I hope they can get it fixed before the final release. 

Thank you.
Comment 12 danskoya 2021-03-07 03:21:15 UTC
Same issue with Vultr & RC1
Comment 13 danskoya 2021-03-19 01:00:42 UTC
Same issue with Vultr & RC3
Comment 14 danskoya 2021-03-31 04:15:54 UTC
Same issue with Vultr & RC4
Comment 15 danskoya 2021-04-03 16:17:03 UTC
Same issue with Vultr & RC5 using FreeBSD-13.0-RC5-amd64-bootonly.iso
Comment 16 elij 2021-04-13 22:54:23 UTC
Just encountered with with 13.0-RELEASE (also on vultr). 
Note: This did not happen on 12R2 or any version prior that I can recall.
Comment 17 elij 2021-04-13 22:55:45 UTC
Forgot to mention -- this was after an in-place upgrade from 12.2 (with latest patches) to 13.0-RELEASE.
Comment 18 Glen Barber freebsd_committer freebsd_triage 2021-04-14 00:35:24 UTC
We are very much aware of the issue.  Once a fix is in place, a post-release EN is planned.  Until then, it remains an open issue.
Comment 19 Joseph Fierro 2021-05-12 15:09:52 UTC
Hello,

We have done some additional testing on our end and this problem seems to be with virtio_random(4). 

We had a customer report a similar issue with FreeBSD 12.2 after enabling virtio_random with virtio_random_load="YES" in /boot/loader.conf. With 12.2, the system would intermittently hang after detaching all devices and reaching "rebooting...". In 12.2 this does not occur on every reboot--it seems to occur about 30% of the time. Removing that line from /boot/loader.conf resolved this problem.

On a  13.0 RELEASE system in Vultr, issuing shutdown, reboot, or halt commands will cause the system to hang at "detaching uhub0" and the CPU ramps up to max. This happens on every reboot in 13.0, rather than intermittently (using the "-n" flag to skip the filesystem cache flush will actually avoid the problem, and the system will reboot without hanging).

Simply manually unloading the virtio_random module with kldunload after boot is not sufficient to resolve the problem, as it will hang on reboot even after doing this. If the virtio_random module is never loaded in the first place, reboots, shutdowns, and halts will work properly.
Comment 20 danskoya 2021-05-14 02:33:35 UTC
per https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254513

devmatch_blacklist="virtio_random.ko" added to /etc/rc.conf enabled me to reboot several times after upgrade my instances to 13.0-RELEASE
Comment 21 Kelly Hays 2021-07-14 03:35:22 UTC
(In reply to danskoya from comment #20)
This is also an issue on ARP Networks. 
Blacklisting virtio_random.ko works there too.
Comment 22 Ben Woods freebsd_committer freebsd_triage 2021-09-27 22:37:09 UTC
Hi Conrad (cem) / Jessica (jrtc27) / Bryan (bryanv) - looks like you have touched the virtio_random code in the past year - just wanted to check if you have any ideas what could be causing this issue?
Comment 23 Jessica Clarke freebsd_committer freebsd_triage 2021-09-27 22:38:49 UTC
Is this just the same underlying problem as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254513?
Comment 24 Eugene Grosbein freebsd_committer freebsd_triage 2022-03-10 23:51:09 UTC
I have the same problem with my small FreeBSD 13.1-STABLE/amd64 guest at Vultr.com. Attaching verbose dmesg.boot and pciconf -lv.
Comment 25 Eugene Grosbein freebsd_committer freebsd_triage 2022-03-10 23:53:04 UTC
Created attachment 232376 [details]
dmesg.boot of 13.1-STABLE/amd64
Comment 26 Eugene Grosbein freebsd_committer freebsd_triage 2022-03-10 23:54:12 UTC
Created attachment 232377 [details]
pciconf -lv of 13.1-STABLE/amd64

Note that "kldunload virtio_random" eliminates hang for my case, too.
Comment 27 Kubilay Kocak freebsd_committer freebsd_triage 2022-03-11 00:07:51 UTC
Thanks for that Eugene.

Your info is:

vtrnd0 pnpinfo vendor=0x00001af4 device=0x1005 subvendor=0x1af4 device_type=0x00000004
virtio_pci3 pnpinfo vendor=0x1af4 device=0x1005 subvendor=0x1af4 subdevice=0x0004 class=0x00ff00

The device information in bug 254513 is

vtrnd0 pnpinfo vendor=0x00001af4 device=0x1044 subvendor=0x1af4 device_type=0x00000004
 virtio_pci4 pnpinfo vendor=0x1af4 device=0x1044 subvendor=0x1af4 subdevice=0x1100 class=0x00ff00

Difference is a Device  ID 1005 vs 1044 (source of delta unknown as yet)

^Triage: Closing bug 254513 (later created issue) as duplicate. Loop in committers mentioned in comment 22. 13.1 is on its way.
Comment 28 Kubilay Kocak freebsd_committer freebsd_triage 2022-03-11 00:09:14 UTC
*** Bug 254513 has been marked as a duplicate of this bug. ***
Comment 29 Eugene Grosbein freebsd_committer freebsd_triage 2022-03-11 00:51:48 UTC
Created attachment 232378 [details]
hackish-aplha-preliminary-proof-of-concept patch

This patch for virtio_random solves the problem for my Vultr.com 13.1/amd64 guest.
Comment 30 Kyle Evans freebsd_committer freebsd_triage 2022-03-11 00:58:41 UTC
When I looked at this on Hetzner, it seemed that we were completely blocked in virtqueue_poll(), as if the host had just completely dropped it on the floor.
Comment 31 Eugene Grosbein freebsd_committer freebsd_triage 2022-03-11 02:12:37 UTC
Created attachment 232380 [details]
patch-virtio_random.c

Less hackish version of patch doing same thing: it deactivates virtio_random at shutdown time preventing CPU spinning and hang.
Comment 32 Eugene Grosbein freebsd_committer freebsd_triage 2022-03-11 11:30:03 UTC
13.1-BETA1 is ready and we have not much time to settle this before release. I'd like to see some response from anyone who can test the patch.

One does not need to perform long buildworld nor buildkernel proces but rebuild single kernel module only and perform single reboot to test it. You need kernel sources (or full source tree) for any revision of 13.x. Assuming amd64:

fetch -o /tmp/patch-virtio_random.c https://bugs.freebsd.org/bugzilla/attachment.cgi?id=232380
cd /usr/src
patch < /tmp/patch-virtio_random.c
cd sys/modules/virtio/random
make obj depend && make all
kldunload virtio_random # if you did not blacklist it
kldload /usr/obj/usr/src/amd64.amd64/sys/modules/virtio/random/virtio_random.ko
shutdown -r now

If it reboots, please report. If it hangs, please report, too.

Also I'd like to see some response from authors of original code. Otherwise, I will commit it shortly after positive feedback from users.
Comment 33 Jamie Landeg-Jones 2022-03-11 13:01:59 UTC
(In reply to Eugene Grosbein from comment #24)

I posted this on the other thread:

The vultr install does indeed disable virtio-random, but that wasn't the only issue: I created my own ISO without virtio-random , and whilst it stopped the boot hang, it still caused problems (hanging on reboot being one)

I talked with the vultr guy and he confirmed - he had to disable virtio-random AND change the KVM profile to boot :

> "2021-07-14 16:37:43
> We've found that v12 and older of FreeBSD do not actually support q35 - so when any version of FreeBSD is selected from our control panel we default that back to i440fx. We do have a few customers using the iso-over-image trick, and we didn't want to break them if they're just selecting the default version (13) and then trying to install 12 over it.


> We have not done extensive testing of FreeBSD on q35 yet - it seems like support for this is very new, so I wouldn't be surprised if there were still issues causing it to hang at reboot.


>Brian, Vultr.com Support
Comment 34 Jamie Landeg-Jones 2022-03-11 13:51:11 UTC
(In reply to Kubilay Kocak from comment #27)

I have a number of different Vultr instances, on different types of instance and hardware, ranging from host build dates of 6 years ago up to current.

They all give the same id's as Eugene got. Filtering out dupes, I have:

class=0x00ff00 rev=0x00 hdr=0x00 vendor=0x1af4 device=0x1005 subvendor=0x1af4 subdevice=0x0004

System: QEMU - Standard PC (i440FX + PIIX, 1996) (Version: pc-i440fx-2.11)                                                                                                                                                                     System: Vultr - HFC (Version: pc-i440fx-4.1)                                                                                                                                                                                                   System: Vultr - HFC (Version: pc-i440fx-5.2)
System: Vultr - VC2 (Version: pc-i440fx-5.2)
Comment 35 Eugene Grosbein freebsd_committer freebsd_triage 2022-03-11 22:22:34 UTC
Another workaround to disable virtio_random is /boot/device.hints file:

hint.vtrnd.0.disabled=1

This results in a message at boot time:

vtrnd0: disabled via hints entry
Comment 36 Tom 2022-03-15 23:04:58 UTC
(In reply to Eugene Grosbein from comment #32)

That patch works for me, 13.0-RELEASE-p8 now reboots correctly, I'm using Vultr.

I wasn't able to follow your steps exactly as written as the module seemed to immediately and automatically reload after the kldunload. I renamed the module in /boot/kernel and copied the newly compiled one from /usr/obj/usr/src/amd64.amd64/sys/modules/virtio/random/virtio_random.ko
Comment 37 commit-hook freebsd_committer freebsd_triage 2022-03-16 04:53:27 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=adbf7727b3a2aad3c2faa6e543ee7fa7a6c9a3d5

commit adbf7727b3a2aad3c2faa6e543ee7fa7a6c9a3d5
Author:     Eugene Grosbein <eugen@FreeBSD.org>
AuthorDate: 2022-03-16 04:41:51 +0000
Commit:     Eugene Grosbein <eugen@FreeBSD.org>
CommitDate: 2022-03-16 04:41:51 +0000

    virtio_random(8): avoid deadlock at shutdown time

    FreeBSD 13+ running as virtual guest may load virtio_random(8) driver
    by means of devd(8) unless the driver is blacklisted or disabled
    via device.hints(5). Currently, the driver may prevent
    the system from rebooting or shutting down correctly.

    This change deactivates virtio_random at very late stage
    during system shutdown sequence to avoid deadlock
    that results in kernel hang.

    PR:             253175
    Tested by:      tom
    MFC after:      3 days

 sys/dev/virtio/random/virtio_random.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
Comment 38 Eugene Grosbein freebsd_committer freebsd_triage 2022-03-16 05:08:03 UTC
(In reply to Tom from comment #36)

Thank you for taking time to test the patch.

Auto-reloading may be prevented by stopping devd daemon that should be unneeded in virtual guest environment after system completed boot:

killall devd
sysctl hw.bus.devctl_queue=0 # optionally
Comment 39 commit-hook freebsd_committer freebsd_triage 2022-03-19 04:23:24 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4a11315a2c3fc55333772f48aaef32ae1eb11ceb

commit 4a11315a2c3fc55333772f48aaef32ae1eb11ceb
Author:     Eugene Grosbein <eugen@FreeBSD.org>
AuthorDate: 2022-03-16 04:41:51 +0000
Commit:     Eugene Grosbein <eugen@FreeBSD.org>
CommitDate: 2022-03-19 04:20:58 +0000

    virtio_random(8): MFC: avoid deadlock at shutdown time (regression fix)

    FreeBSD 13+ running as virtual guest may load virtio_random(8) driver
    by means of devd(8) unless the driver is blacklisted or disabled
    via device.hints(5). Currently, the driver may prevent
    the system from rebooting or shutting down correctly.

    This change deactivates virtio_random at very late stage
    during system shutdown sequence to avoid deadlock
    that results in kernel hang.

    PR:             253175
    Tested by:      tom
    Relnotes:       yes

    (cherry picked from commit adbf7727b3a2aad3c2faa6e543ee7fa7a6c9a3d5)

 sys/dev/virtio/random/virtio_random.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
Comment 40 commit-hook freebsd_committer freebsd_triage 2022-03-19 12:38:47 UTC
A commit in branch releng/13.1 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=fa67c45842bb5d34780a536b1bff1ac64f381562

commit fa67c45842bb5d34780a536b1bff1ac64f381562
Author:     Eugene Grosbein <eugen@FreeBSD.org>
AuthorDate: 2022-03-16 04:41:51 +0000
Commit:     Eugene Grosbein <eugen@FreeBSD.org>
CommitDate: 2022-03-19 12:36:26 +0000

    virtio_random(8): MFC: avoid deadlock at shutdown time (regression fix)

    FreeBSD 13+ running as virtual guest may load virtio_random(8) driver
    by means of devd(8) unless the driver is blacklisted or disabled
    via device.hints(5). Currently, the driver may prevent
    the system from rebooting or shutting down correctly.

    This change deactivates virtio_random at very late stage
    during system shutdown sequence to avoid deadlock
    that results in kernel hang.

    PR:             253175
    Tested by:      tom
    Relnotes:       yes
    Approved by:    re (gjb)

    (cherry picked from commit adbf7727b3a2aad3c2faa6e543ee7fa7a6c9a3d5)
    (cherry picked from commit 4a11315a2c3fc55333772f48aaef32ae1eb11ceb)

 sys/dev/virtio/random/virtio_random.c | 35 +++++++++++++++++++++++++++++++++++
 1 file changed, 35 insertions(+)
Comment 41 Eugene Grosbein freebsd_committer freebsd_triage 2022-03-19 12:44:44 UTC
The problem believed to be worked-around with latest commit that will be included in upcoming 13.1-RELEASE.

For users, feel free to re-open the PR if you still have the problem.
For authors of original code, I encourage you to review and perhaps improve the fix.
Comment 42 Gian-Simon Purkert 2022-04-18 13:14:45 UTC
Testing FreeBSD 13.1RC3 ATM on Vultr and the problem still exists.
Comment 43 Eugene Grosbein freebsd_committer freebsd_triage 2022-04-18 13:23:47 UTC
(In reply to Gian-Simon Purkert from comment #42)

What is the problem observer with 13.1-RC3, exactly?
Comment 44 Gian-Simon Purkert 2022-04-18 13:30:53 UTC
(In reply to Eugene Grosbein from comment #43)

Booting from a uploaded ISO hangs at "registering fast entropy provider"

But now it's getting interesting, when installing with the provided vultr image (FreeBSD13.0-RELENG), and reboot with the FreeBSD 13.1RC3 ISO it works.
Comment 45 Eugene Grosbein freebsd_committer freebsd_triage 2022-04-18 14:37:03 UTC
(In reply to Gian-Simon Purkert from comment #44)

> Booting from a uploaded ISO hangs at "registering fast entropy provider"

This is distinct problem. Please create new PR and link this PR to new one.
Comment 46 Gian-Simon Purkert 2022-04-18 15:08:28 UTC
(In reply to Eugene Grosbein from comment #45)

Found the difference from Image to Iso:

#cat /etc/rc.conf
devmatch_blacklist="virtio_random.ko"

I repeated the test, and if installed with the image and then booted with the 13.1RC3-ISO it boots even with virtio_random.ko loaded, so there is a special qemu-template for freebsd.

Comment33:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253175#c33

I think we can leave it at that.

Thanks for the fast response, and have a great Day.
Comment 47 Jamie Landeg-Jones 2022-04-19 11:25:01 UTC
(In reply to Gian-Simon Purkert from comment #46)

Yes, The Freebsd profile vultr uses is pc1440 rather than q35 - see the following bug report, specifically these 2 comments:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254513#c14

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=254513#c34

Also, see this bug report:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236922

cheers, Jamie
Comment 48 Bob Grant 2022-07-31 19:35:21 UTC
(In reply to Jamie Landeg-Jones from comment #47)

I too have run into this virtio random hanging issue with the Vultr VMs.  I know the workaround available to use their image to get a VM profile bypasses the virtio issue and then reinstall from a mounted ISO. 

However, I want to use uploaded raw images (customized by me) and those use the generic Q35 VM.  I have reproduced the bootup hang on both FreeBSD-13.1-RELEASE and FreeBSD14.0-CURRENT and have also worked around by blacklisting virtio_random. 

Is there a chance that the virtio_random and Q35 issue will be fixed in a future release.  It would solve a bunch of workarounds people are having to come up with.
Comment 49 Eugene Grosbein freebsd_committer freebsd_triage 2022-07-31 20:11:07 UTC
(In reply to Bob Grant from comment #48)

A hang at boot time seems to be different issue with virtio_random. My Vultr custom instance 13.1/amd64-STABLE has no such problem.

Please fill new PR providing details for your setup and link it to this one.
Comment 50 Bob Grant 2022-08-01 00:52:02 UTC
(In reply to Eugene Grosbein from comment #49)

I have created a new bug report as you recommended.

bug #265549

I'm very intrigued that you don't have an issue on Vultr.  Are you perchance using the kludge of selecting one of their images and installing over the top of it so you get the FreeBSD vm with special settings.  Or perhaps it doesn't happen on larger/different systems.

I am using a Cloud Compute shared vCPU with Intel High Performance smallest server size 1vCPU 1G memory at the Los Angeles colo.  Perhaps the VMs are set with different parameters.

I honestly think the previous hang on shutdown is related and worked around by the modifications in the code for shutdown.
Comment 51 Kyle Evans freebsd_committer freebsd_triage 2022-08-01 01:47:48 UTC
(In reply to Bob Grant from comment #50)

I noted in the related PR 254513 (which was erroneously closed as a duplicate) that Hetzner (and, IIRC, it was the same problem I observed recently on my new Vultr VM) just spins forever at polling, but I wasn't able to get the attention of the very few virtio folks around yet.
Comment 52 Eugene Grosbein freebsd_committer freebsd_triage 2022-08-01 02:13:00 UTC
(In reply to Bob Grant from comment #50)

I only choose hosters that support both of FreeBSD OS and FreeBSD rescue environment. Generally, I create new virtual machine using hoster's FreeBSD template to see how they configure network in /etc/rc.conf because different hosters have different ways of doing it.

But, I never use installing partitioning "as is" because it never satisfies me. Instead, I reboot the VM using hoster's FreeBSD rescue environment and use gpart command to destroy existing partitioning and create it from scratch: for small VM a I create MBR with distict swap partition and another partition for ZFS-on-root. 

I install zfsboot(8) to the partition, then I create ZFS pool, download kernel.txz and base.txz and extract them, then I re-create /boot/loader.conf and /etc/rc.conf and reboot the VM off virtual HDD.
Comment 53 Bob Grant 2022-08-01 02:49:55 UTC
(In reply to Eugene Grosbein from comment #52)

From my reading on the various threads your initial installation using the Vultr FreeBSD install causes them to select a profile that bypasses the Q35 issue (i.e. not a standard Vultr VM configuration).  Thus when you reinstall that same VM with your custom install via a Rescue environment you do not trigger the bug.

I had noticed in rereading your comment #49 I saw you said you were running FreeBSD 13.1 STABLE so I downloaded that image and loaded it and it has the exact hang for me -- no preinstalled FreeBSD from Vultr first.

When you say FreeBSD rescue do you really mean booting a FreeBSD Installation ISO into live mode?  I don't see a formal FreeBSD rescue on their site.
Comment 54 Eugene Grosbein freebsd_committer freebsd_triage 2022-08-01 03:03:42 UTC
(In reply to Bob Grant from comment #53)

> From my reading on the various threads your initial installation using the Vultr FreeBSD install causes them to select a profile that bypasses the Q35 issue

Exactly.

> When you say FreeBSD rescue do you really mean booting a FreeBSD Installation ISO into live mode?

Yes.

I mean, some hosters do not allow booting the VM off "extra" ISO-image at all, so I avoid them. I use hosters that either provide FreeBSD rescue environment, or at least allow to upload FreeBSD ISO-image for rescue boot.
Comment 55 Vladyslav V. Prodan 2022-08-01 20:33:38 UTC
(In reply to Bob Grant from comment #53)

This is an instruction on how to write MfsBSD to a partition with Linux installed.
https://forums.freebsd.org/threads/installing-freebsd-in-hetzner.85399/post-575112

Below in the forum thread there is an instruction and a script for run rescue FreeBSD on top of a running Rescue Linux.