Bug 243063 - NVMe timeouts with bhyve
Summary: NVMe timeouts with bhyve
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 12.1-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: Chuck Tuffli
URL:
Keywords: bhyve
: 271782 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-01-03 19:08 UTC by iron.udjin
Modified: 2024-01-15 09:54 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description iron.udjin 2020-01-03 19:08:49 UTC
Hello,

OS: 12.1-STABLE r356314

I've got two different servers with with NVME disks. First one has VM with Centos 7, second has VM with debian 10. Both Vms use disk0_type="nvme". From time to time disk subsystem in VMs freezes and I see in dmesg:

Centos 7:
Dec 31 13:26:28 localhost kernel: nvme nvme0: I/O 676 QID 4 timeout, aborting
Dec 31 13:26:58 localhost kernel: nvme nvme0: I/O 676 QID 4 timeout, reset controller

Debian 10:
Jan  3 19:43:46 localhost kernel: [  472.062677] nvme nvme0: I/O 363 QID 1 timeout, completion polled
Jan  3 20:24:38 localhost kernel: [ 2925.514461] nvme nvme0: I/O 545 QID 3 timeout, completion polled
Jan  3 20:28:40 localhost kernel: [ 3167.351351] nvme nvme0: I/O 1012 QID 2 timeout, completion polled

...and nothing in logs or dmesg of host systems.

Why does it happen and how can be fixed?

Thank you!
Comment 1 Mateusz Kwiatkowski 2020-01-22 21:38:02 UTC
I also performed some bhyve+nvme tests.
Host: FreeBSD 13.0 r356983, Opteron 6128, ZFS raid10 on 4 HDDs
Guests: FreeBSD 12.1-RELEASE

bhyveload -c /dev/nmdm-1_hv-3.1A -m 2048M -e autoboot_delay=3 -d /dev/zvol/zroot/vm/1_hv-3/disk0 1_hv-3
  [bhyve options: -c 2 -m 2048M -AHP -U 3805a422-c9c8-46c5-aa8d-31142e90ea89 -u]
  [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,nvme,/dev/zvol/zroot/vm/1_hv-3/disk0 -s 4:1,ahci-cd,/zroot/vm/1_hv-3/seed.iso -s 5:0,virtio-net,tap2,mac=58:9c:fc:0f:3e:27]
  [bhyve console: -l com1,/dev/nmdm-1_hv-3.1A]

# zfs get volblocksize zroot/vm/1_hv-3/disk0
NAME                   PROPERTY      VALUE     SOURCE
zroot/vm/1_hv-3/disk0  volblocksize  8K        default


Performance comparison of virtio-blk/ahci-hd/nvme:

Test command: fio --name=test --iodepth=4 --rw=randrw:2 --rwmixread=70 --rwmixwrite=30 --bs=8k --direct=0 --size=256m --numjobs=8
 
baremetal:
   READ: bw=456MiB/s (478MB/s), 57.3MiB/s-65.5MiB/s (60.1MB/s-68.6MB/s), io=1433MiB (1502MB), run=2719-3142msec
  WRITE: bw=196MiB/s (205MB/s), 24.1MiB/s-28.7MiB/s (25.3MB/s-30.1MB/s), io=615MiB (645MB), run=2719-3142msec
 
virtio-blk:
   READ: bw=115MiB/s (121MB/s), 14.4MiB/s-20.5MiB/s (15.1MB/s-21.5MB/s), io=1433MiB (1502MB), run=8773-12453msec
  WRITE: bw=49.4MiB/s (51.8MB/s), 6348KiB/s-8851KiB/s (6501kB/s-9063kB/s), io=615MiB (645MB), run=8773-12453msec
 
ahci-hd:
   READ: bw=110MiB/s (116MB/s), 13.8MiB/s-36.4MiB/s (14.4MB/s-38.1MB/s), io=1433MiB (1502MB), run=4940-12968msec
  WRITE: bw=47.4MiB/s (49.7MB/s), 6107KiB/s-15.5MiB/s (6254kB/s-16.2MB/s), io=615MiB (645MB), run=4940-12968msec
 
nvme:
   READ: bw=19.7MiB/s (20.7MB/s), 2520KiB/s-4811KiB/s (2580kB/s-4926kB/s), io=1433MiB (1502MB), run=38351-72662msec
  WRITE: bw=8671KiB/s (8879kB/s), 1083KiB/s-2025KiB/s (1109kB/s-2073kB/s), io=615MiB (645MB), run=38351-72662msec


From guest's log:

nvme0: Missing interrupt
nvme0: Missing interrupt
nvme0: nvme0: cpl does not map to outstanding cmd
Missing interrupt
cdw0:00000000 sqhd:0019 sqid:0002 cid:0066 p:1 sc:00 sct:0 m:0 dnr:0
nvme0: Resetting controller due to a timeout.
nvme0: resetting controller
nvme0: temperature threshold not supported
nvme0: aborting outstanding i/o
nvme0: resubmitting queued i/o
nvme0: WRITE sqid:2 cid:0 nsid:1 lba:8752079 len:8
Comment 2 Graham Perrin 2023-10-15 10:45:18 UTC
Sorry for the apparent lack of activity. 

Are symptoms reproducible with hosts and guests using branches of FreeBSD that are active? 

NB <https://www.freebsd.org/releases/14.0R/> the release note about NVMe. 

----

^Triage: 

* status
* keyword
* avoid tags in summary lines

<https://wiki.freebsd.org/Bugzilla/DosAndDonts#tags> (please, don't …)
Comment 3 chuck 2023-10-15 14:36:26 UTC
I'm not surprised that guests would see issues when the host was running 12.1, but the emulation in the 13.0 release contains improvements that should have fixed issues like this.

Let me see if I can reproduce this.
Comment 4 Yuri Dolgoruki 2024-01-15 07:30:17 UTC
Good day!

I have similar problems with nvme disk type in BHyVe VM and Windows 2016 Server x64 as guest. VM halts related to nvmestore.sys exception, which because of I/O error. Switch disk type to ahci-hd resolves problem. 
At host has no any errors or I\O related messages in logs. vm.log also shown that guest just reboots, with exit 0

OS: FreeBSD 13.2-STABLE stable/13-2e4ac696d8 amd64
Host ZFS:
--------------
        NAME           STATE     READ WRITE CKSUM
        mySSD          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            gpt/SSD_0  ONLINE       0     0     0
            gpt/SSD_1  ONLINE       0     0     0

SSD's from mirror:
-------------------
ada0: <KINGSTON SA400S37480G> ACS-3 ATA SATA 3.x device
ada3: <AMD R5SL512G> ACS-4 ATA SATA 3.x device

vm.conf
--------------
loader="uefi"
graphics="yes"
xhci_mouse="yes"

cpu=4
cpu_sockets=1
cpu_cores=4

memory=16G

ahci_device_limit="8"

network0_type="virtio-net"
network0_switch="public"

disk0_type="ahci-hd"
disk0_name="disk0.img"

utctime="no"

uuid="39387cd6-e074-11ed-9e3c-d8bbc11c8171"
network0_mac="58:9c:fc:0c:3a:c7"
Comment 5 Yuri Dolgoruki 2024-01-15 09:54:26 UTC
*** Bug 271782 has been marked as a duplicate of this bug. ***