Bug 243063 - [bhyve] nvme timeouts
Summary: [bhyve] nvme timeouts
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 12.1-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
Depends on:
Reported: 2020-01-03 19:08 UTC by iron.udjin
Modified: 2020-01-22 21:38 UTC (History)
2 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description iron.udjin 2020-01-03 19:08:49 UTC

OS: 12.1-STABLE r356314

I've got two different servers with with NVME disks. First one has VM with Centos 7, second has VM with debian 10. Both Vms use disk0_type="nvme". From time to time disk subsystem in VMs freezes and I see in dmesg:

Centos 7:
Dec 31 13:26:28 localhost kernel: nvme nvme0: I/O 676 QID 4 timeout, aborting
Dec 31 13:26:58 localhost kernel: nvme nvme0: I/O 676 QID 4 timeout, reset controller

Debian 10:
Jan  3 19:43:46 localhost kernel: [  472.062677] nvme nvme0: I/O 363 QID 1 timeout, completion polled
Jan  3 20:24:38 localhost kernel: [ 2925.514461] nvme nvme0: I/O 545 QID 3 timeout, completion polled
Jan  3 20:28:40 localhost kernel: [ 3167.351351] nvme nvme0: I/O 1012 QID 2 timeout, completion polled

...and nothing in logs or dmesg of host systems.

Why does it happen and how can be fixed?

Thank you!
Comment 1 Mateusz Kwiatkowski 2020-01-22 21:38:02 UTC
I also performed some bhyve+nvme tests.
Host: FreeBSD 13.0 r356983, Opteron 6128, ZFS raid10 on 4 HDDs
Guests: FreeBSD 12.1-RELEASE

bhyveload -c /dev/nmdm-1_hv-3.1A -m 2048M -e autoboot_delay=3 -d /dev/zvol/zroot/vm/1_hv-3/disk0 1_hv-3
  [bhyve options: -c 2 -m 2048M -AHP -U 3805a422-c9c8-46c5-aa8d-31142e90ea89 -u]
  [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,nvme,/dev/zvol/zroot/vm/1_hv-3/disk0 -s 4:1,ahci-cd,/zroot/vm/1_hv-3/seed.iso -s 5:0,virtio-net,tap2,mac=58:9c:fc:0f:3e:27]
  [bhyve console: -l com1,/dev/nmdm-1_hv-3.1A]

# zfs get volblocksize zroot/vm/1_hv-3/disk0
NAME                   PROPERTY      VALUE     SOURCE
zroot/vm/1_hv-3/disk0  volblocksize  8K        default

Performance comparison of virtio-blk/ahci-hd/nvme:

Test command: fio --name=test --iodepth=4 --rw=randrw:2 --rwmixread=70 --rwmixwrite=30 --bs=8k --direct=0 --size=256m --numjobs=8
   READ: bw=456MiB/s (478MB/s), 57.3MiB/s-65.5MiB/s (60.1MB/s-68.6MB/s), io=1433MiB (1502MB), run=2719-3142msec
  WRITE: bw=196MiB/s (205MB/s), 24.1MiB/s-28.7MiB/s (25.3MB/s-30.1MB/s), io=615MiB (645MB), run=2719-3142msec
   READ: bw=115MiB/s (121MB/s), 14.4MiB/s-20.5MiB/s (15.1MB/s-21.5MB/s), io=1433MiB (1502MB), run=8773-12453msec
  WRITE: bw=49.4MiB/s (51.8MB/s), 6348KiB/s-8851KiB/s (6501kB/s-9063kB/s), io=615MiB (645MB), run=8773-12453msec
   READ: bw=110MiB/s (116MB/s), 13.8MiB/s-36.4MiB/s (14.4MB/s-38.1MB/s), io=1433MiB (1502MB), run=4940-12968msec
  WRITE: bw=47.4MiB/s (49.7MB/s), 6107KiB/s-15.5MiB/s (6254kB/s-16.2MB/s), io=615MiB (645MB), run=4940-12968msec
   READ: bw=19.7MiB/s (20.7MB/s), 2520KiB/s-4811KiB/s (2580kB/s-4926kB/s), io=1433MiB (1502MB), run=38351-72662msec
  WRITE: bw=8671KiB/s (8879kB/s), 1083KiB/s-2025KiB/s (1109kB/s-2073kB/s), io=615MiB (645MB), run=38351-72662msec

From guest's log:

nvme0: Missing interrupt
nvme0: Missing interrupt
nvme0: nvme0: cpl does not map to outstanding cmd
Missing interrupt
cdw0:00000000 sqhd:0019 sqid:0002 cid:0066 p:1 sc:00 sct:0 m:0 dnr:0
nvme0: Resetting controller due to a timeout.
nvme0: resetting controller
nvme0: temperature threshold not supported
nvme0: aborting outstanding i/o
nvme0: resubmitting queued i/o
nvme0: WRITE sqid:2 cid:0 nsid:1 lba:8752079 len:8