Bug 271782 - VM crashes with: bhyve exited with status 0
Summary: VM crashes with: bhyve exited with status 0
Status: Closed DUPLICATE of bug 243063
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 13.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-06-02 07:19 UTC by Yuri Dolgoruki
Modified: 2024-01-15 09:54 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yuri Dolgoruki 2023-06-02 07:19:36 UTC
Good day!

I have a machine AMD Ryzen 5 2600 Six-Core Processor + 32Gb RAM. 
OS: 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-525ecfdad 

In bhyve I have a Windows 2016 x64, and it has random crashes one time in 7-9 days. Please help me find reason of problem.

Windows 2016 VM eventlog has only "critical problem possibly power fault" message

/var/log/messages has no any related messages.

In bhyve log I see such messages: 
--------------------------------------------------
Jun 02 10:12:10: bhyve exited with status 0
Jun 02 10:12:10: restarting
Jun 02 10:12:10:  [bhyve options: -c 4,sockets=1,cores=4 -m 12G -Hwl bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -U 39387cd6-e074-11ed-9e3c-d8bbc11c8171]
Jun 02 10:12:10:  [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,nvme,/mySSD/BHyVe/Win2016/disk0.img -s 5:0,virtio-net,tap1,mac=58:9c:fc:0c:3a:c7 -s 6:0,fbuf,tcp=0.0.0.0:5900 -s 7:0,xhci,tablet]
Jun 02 10:12:10:  [bhyve console: -l com1,/dev/nmdm-Win2016.1A]
Jun 02 10:12:10: starting bhyve (run 9)

There is config for that VM:
-------------------------------------------------
loader="uefi"
graphics="yes"
xhci_mouse="yes"
cpu=4
cpu_sockets=1
cpu_cores=4
memory=12G
ahci_device_limit="8"
network0_type="virtio-net"
network0_switch="public"
disk0_type="nvme"
disk0_name="disk0.img"
utctime="no"
uuid="39387cd6-e074-11ed-9e3c-d8bbc11c8171"
network0_mac="58:9c:fc:0c:3a:c7"

there is bhyve-related packages:
--------------------------------------------------
pkg info | grep -e "bhyve|vm"

bhyve-firmware-1.0_1           Collection of Firmware for bhyve
edk2-bhyve-g202202_10          EDK2 Firmware for bhyve
grub2-bhyve-0.40_10            Grub-emu loader for bhyve
uefi-edk2-bhyve-csm-0.2_4,1    UEFI EDK2 firmware for bhyve with CSM (16-bit BIOS)
vm-bhyve-1.5.0                 Management system for bhyve virtual machines

And kldstat
-------------------------------------------------
Id Refs Address                Size Name
 1   38 0xffffffff80200000   d5ca28 kernel
 2    1 0xffffffff80f5d000   576280 vmm.ko
 3    1 0xffffffff814d4000   582850 zfs.ko
 4    2 0xffffffff81a57000     5c50 xdr.ko
 5    1 0xffffffff81ee5000     3378 acpi_wmi.ko
 6    1 0xffffffff81ee9000     3218 intpm.ko
 7    1 0xffffffff81eed000     2180 smbus.ko
 8    1 0xffffffff81ef0000     7638 if_bridge.ko
 9    1 0xffffffff81ef8000     50d8 bridgestp.ko
10    1 0xffffffff81efe000    21db8 ipfw.ko
11    1 0xffffffff81f20000     21cc nmdm.ko
12    1 0xffffffff81f23000     4700 nullfs.ko
13    1 0xffffffff81f28000     3530 fdescfs.ko

Thanks!
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2023-06-08 16:20:22 UTC
"bhyve exited with status 0" just means that the guest rebooted.  See the "EXIT STATUS" section of the bhyve man page.

Without some more diagnostics from the guest, or a reproducible test case, I don't see how this can be tracked down.
Comment 2 Yuri Dolgoruki 2023-06-21 10:57:07 UTC
Mark, good day!

How can I help to provide additional diagnosting?
May problem in datastore structure for BHyVe?

I have a FS /mySSD/BHyVe in ZFS, but VM directory (/mySSD/BHyVe/Win2016) is not a FS but regular directory. Also some messages related to "dataset does not exists" in vm info:

zfs list | grep BHyVe
------------------
mySSD/BHyVe                               200G   226G      200G  /mySSD/BHyVe
mySSD/BHyVe/.templates                    100K   226G      100K  /mySSD/BHyVe/.templates

ls -la /mySSD/BHyVe
------------------
drwxr-xr-x  7 root       wheel  7 Jun  2 12:15 .
drwxr-xr-x  4 root       wheel  4 Apr 28 11:00 ..
drwxr-xr-x  2 root       wheel  4 May  4 22:34 .config
drwxr-xr-x  2 root       wheel  2 May  4 21:16 .img
drwxr-xr-x  2 root       wheel  2 May  4 21:16 .iso
drwxr-xr-x  2 root       wheel  3 May  4 21:16 .templates
drwxr-xr-x  2 root       wheel  7 Jun 20 08:11 Win2016

vm info Win2016
------------------
Virtual Machine: Win2016
------------------------
  state: running (81219)
  datastore: default
  loader: uefi
  uuid: 39387cd6-e074-11ed-9e3c-d8bbc11c8171
  cpu: 4
  cpu-topology: sockets=1, cores=4
  memory: 16G
  memory-resident: 17219084288 (16.036G)

  console-ports
    com1: /dev/nmdm-Win2016.1B
    vnc: 0.0.0.0:5900

  network-interface
    number: 0
    emulation: virtio-net
    virtual-switch: public
    fixed-mac-address: 58:9c:fc:0c:3a:c7
    fixed-device: -
    active-device: -
    desc: -
    mtu:
    bridge: bridge0

  virtual-disk
    number: 0
    device-type: file
    emulation: nvme
    options: -
    system-path: /mySSD/BHyVe/Win2016/disk0.img
    bytes-size: 214748364800 (200.000G)
    bytes-used: 214794060800 (200.042G)
cannot open 'mySSD/BHyVe/Win2016': dataset does not exist
cannot open 'mySSD/BHyVe/Win2016': dataset does not exist

  clone-origin
Comment 3 Yuri Dolgoruki 2023-06-22 07:44:29 UTC
I'm try to get some debug of kernel dump. And there are messages related to NVME Storage driver. Maybe some incimpabilities with bhyve nvme block device?
What can I do? Maybe there are needs of special nvme drivers for guest OS?

Debug from kernel.dmp
-----------------------
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000020, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff80f972f2075, address which referenced memory

Debugging Details:
------------------

Unable to load image \SystemRoot\System32\drivers\stornvme.sys, Win32 error 0n2
***** Kernel symbols are WRONG. Please fix symbols to do analysis.
Comment 4 Yuri Dolgoruki 2023-06-22 11:37:26 UTC
For note:

Try to load that guest IO with CrystalDiskMark and see - if heavy I/O are triggers that panic. Will write information here.
Comment 5 Yuri Dolgoruki 2023-06-22 16:33:14 UTC
Yes, start Crystal Disk Mark on Guest - followed BSOD + restart.

Why this happens? At home I have a clone of that guest vm And there is differences:

1. At Work (where problem exists): Ryzen 5 2600, BHyve VM on SSD-based ZFS dataset. Heavy I/O in Guest - crashes it.
OS: FreeBSD NEW-SITE 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-525ecfdad BSDKERN amd64 

2. At Home: AMD Ryzen 5 5600G with Radeon Graphics, BHyve on HDD-based ZFS dataset. Heavy I/O in Guest - NOT crashes it.
OS: FreeBSD BSD-HOME 13.1-RELEASE-p6 FreeBSD 13.1-RELEASE-p6 GENERIC amd64
Comment 6 Yuri Dolgoruki 2023-06-22 17:37:15 UTC
Yes! I'm solve it myself!

At work, I have a custom kernel config, in which I'm disabled 
device nvme
device nvd

I'm update to 13-STABLE, and uncomment in kernel config that device's rebuild and reinstall world and kernel and voila! CrystalMark works success and nothing died.

Thank's to all!
Comment 7 Yuri Dolgoruki 2024-01-15 09:54:26 UTC
Finally - investigated, that problem with nvme type disk. Earlier bug already exists. Link with them.

*** This bug has been marked as a duplicate of bug 243063 ***