276575 – Host can cause a crash in bhyve nvme emulation

Bug 276575 - Host can cause a crash in bhyve nvme emulation

Summary: Host can cause a crash in bhyve nvme emulation

Status:	Open

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	bhyve (show other bugs)
Version:	14.0-RELEASE
Hardware:	amd64 Any

Importance:	--- Affects Only Me
Assignee:	Chuck Tuffli

URL:
Keywords:

Depends on:
Blocks:

Reported:	2024-01-24 00:23 UTC by Duncan
Modified:	2024-02-04 07:09 UTC (History)
CC List:	3 users (show)

See Also:

Attachments
output of windows minidump analysis (6.74 KB, text/plain) 2024-01-24 00:23 UTC, Duncan	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Duncan 2024-01-24 00:23:06 UTC

Created attachment 247908 [details]
output of windows minidump analysis

Hello,

OS: 14.0-RELEASE-p4

I have windows-10 vms which will BSOD upon heavy disk load on the guest or host.

2 different cases. Both running 3 disks (all nvme emulated). 1 boot disk (c:) and two data disks (3tb and 1 tb), backed as image files on zfs. sync is set to always and a small optane zil is used on the pool. I am using vm-bhyve to manage the vms.

Pool setup:
# zpool list -v data_pool
NAME                          SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
data_pool                    10.9T  3.95T  6.96T        -         -     1%    36%  1.00x    ONLINE  -
  mirror-0                   10.9T  3.95T  6.96T        -         -     1%  36.2%      -    ONLINE
    gpt/data_pool_00         10.9T      -      -        -         -      -      -      -    ONLINE
    gpt/data_pool_01         10.9T      -      -        -         -      -      -      -    ONLINE
logs                             -      -      -        -         -      -      -      -         -
  gpt/data_pool_zil            32G  3.17M  31.5G        -         -     0%  0.00%      -    ONLINE
cache                            -      -      -        -         -      -      -      -         -
  gpt/data_pool_cache_0_ssd   932G  55.9G   876G        -         -     0%  6.00%      -    ONLINE
  gpt/data_pool_cache_1_ssd   932G  53.9G   878G        -         -     0%  5.78%      -    ONLINE

cache drives replace recently, so haven't filled up yet.

First case could not complete a full backup of the 3tb data drive (a windows backup). The time of failure would occur after the machine had been running the backup for a while (2-3 minutes +).  These would run at 200+ MB/s (via a 10gb network) to the backup machine. BSOD resulted and minidumps produced indicating NVME issues. I can supply details if required.

Second case was a little worse, since the VM was fairly quiet, but I was testing an nfs connection (sending over a 10gb network), running at 400+ MB .  In this case the 2nd drive on the windows VM just stopped working.  Upon reboot the disk image had been corrupted to the point of needing reformatting to function. (I rollbacked to a previous snapshot and reapplied the days transactions).

The data set has sync=always (optane zil) and the vm dataset is on a zraid1 dataset (3+1).

Comment 1 Duncan 2024-01-24 00:45:57 UTC

Apologies, adding the attachment posted the first comment. I can't seem to be able to edit it, so this is an addition.

Host is 128GB Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz (6 core).

I have attached the minidump from the windows vm crashing itself due to continuous high diskIO.

As I mentioned the case of a high disk load (reads) on the host resulting in a corrupted disk image (due to the disk unexpectedly disappearing from windows) is more worrying for me.

I have reverted all my VMs back to virtIO-blk.

On the weekend I will shutdown the vm, snapshot, change back to nvme, turn on debug and try some more scenarios.

Any suggestions would be welcome.

The vm-bhyve config file is as follows (both guests have the same setup):

#---------------------------------------------
loader="uefi"
bhyve_options="-A"
graphics="yes"
graphics_res="1600x900"
graphics_port="5915"
vnc_password="########"
xhci_mouse="yes"

cpu_sockets=1
cpu_cores=5
cpu=5

wired_memory="yes"
memory=20G

#debug="yes"

# put up to 8 disks on a single ahci controller.
# without this, adding a disk pushes the following network devices onto higher slot numbers,
# which causes windows to see them as a new interface
ahci_device_limit="8"

# ideally this should be changed to virtio-net and drivers installed in the guest
# e1000 works out-of-the-box
#network0_type="e1000"
network0_type="virtio-net"
network0_switch="lan"

#disk0_type="nvme"
disk0_type="virtio-blk"
disk0_name="disk0.img"

#disk1_type="nvme"
disk1_type="virtio-blk"
disk1_name="disk1.img"

#disk2_type="nvme"
disk2_type="virtio-blk"
disk2_name="disk2.img"

disk3_type="ahci-cd"
disk3_dev="custom"
disk3_name="/vm/.iso/virtio-win.iso


# windows expects the host to expose localtime by default, not UTC
utctime="no"
uuid="291ee834-f125-11ec-8580-d05099d1a548"
network0_mac="#################"
#----------------------------------------------

Regards

Duncan

Comment 2 Mark Johnston freebsd_committer

2024-01-26 16:25:14 UTC

Chuck, would you be able to take a look at this?

Comment 3 Chuck Tuffli freebsd_committer

2024-01-29 17:15:21 UTC

The minidump shows:

nt!WheaReportFatalHwErrorDeviceDriverEx+0xf5
storport!StorpWheaReportError+0x9d
storport!StorpMarkDeviceFailed+0x358
storport!StorPortNotification+0x91c
stornvme!NVMeControllerInitPart2+0x226
stornvme!NVMeControllerReset+0x124
stornvme!NVMeControllerAsyncResetWorker+0x3f
storport!StorPortWorkItemRoutine+0x46
nt!IopProcessWorkItem+0x135
nt!ExpWorkerThread+0x105
nt!PspSystemThreadStartup+0x55
nt!KiStartSystemThread+0x28

I am not a Windows developer neither in real life nor on TV, but I _think_ what this shows matches your description. The StorPortWorkItemRoutine looks to be a routine "... often used to handle I/O completions, DPC processing, and other deferred work in storage miniport drivers."

Following that call are calls to perform an NVMe Controller reset + initialization. One of the error handling strategies for I/O timeouts is to reset the NVMe Controller. So, perhaps an I/O timed out and the driver is resetting the NVMe device. After the reset / initialization, it appears to be reporting an error and BSoD'ing. What I don't know is if the "HwError" is reporting the I/O error/timeout or if there was a fatal error during device initialization.

Did bhyve emit any errors (debug="yes" in the vm-bhyve conf file)? Is there anything in dmesg that sheds some light (bhyve crash, I/O errors, etc.)?

Comment 4 Duncan 2024-02-04 07:09:37 UTC

Hi again,

I tried to reproduce the error today, and did struggle a little (I will have more time next weekend).

I found that if whilst running a backup on the guest via network (200-250 MB/sec), if I then copy one of the disk images (D drive on the guest), this would cause the virtual machine to lock up. It was behaving as if there is some type of file lock (my ignorance is showing, but I didn't think this should/would happen). If I stopped the copy (after 10-20 seconds of lockup) the backup stopped (with an error message), but the host came back to life (dropped rdp, but vnc to console remained up but unresponsive until the copy stopped).

Copying any other file didn't cause a problem (including similar sized disk images from other non running vms.

bhyve debug was on, but no messages, and also nothing in dmesg/messages.

I will give it another go next weekend. At the very least I will see if an entire "full" backup can complete (5-6 hours).

regards

Duncan