Good day! I have a machine AMD Ryzen 5 2600 Six-Core Processor + 32Gb RAM. OS: 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-525ecfdad In bhyve I have a Windows 2016 x64, and it has random crashes one time in 7-9 days. Please help me find reason of problem. Windows 2016 VM eventlog has only "critical problem possibly power fault" message /var/log/messages has no any related messages. In bhyve log I see such messages: -------------------------------------------------- Jun 02 10:12:10: bhyve exited with status 0 Jun 02 10:12:10: restarting Jun 02 10:12:10: [bhyve options: -c 4,sockets=1,cores=4 -m 12G -Hwl bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -U 39387cd6-e074-11ed-9e3c-d8bbc11c8171] Jun 02 10:12:10: [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,nvme,/mySSD/BHyVe/Win2016/disk0.img -s 5:0,virtio-net,tap1,mac=58:9c:fc:0c:3a:c7 -s 6:0,fbuf,tcp=0.0.0.0:5900 -s 7:0,xhci,tablet] Jun 02 10:12:10: [bhyve console: -l com1,/dev/nmdm-Win2016.1A] Jun 02 10:12:10: starting bhyve (run 9) There is config for that VM: ------------------------------------------------- loader="uefi" graphics="yes" xhci_mouse="yes" cpu=4 cpu_sockets=1 cpu_cores=4 memory=12G ahci_device_limit="8" network0_type="virtio-net" network0_switch="public" disk0_type="nvme" disk0_name="disk0.img" utctime="no" uuid="39387cd6-e074-11ed-9e3c-d8bbc11c8171" network0_mac="58:9c:fc:0c:3a:c7" there is bhyve-related packages: -------------------------------------------------- pkg info | grep -e "bhyve|vm" bhyve-firmware-1.0_1 Collection of Firmware for bhyve edk2-bhyve-g202202_10 EDK2 Firmware for bhyve grub2-bhyve-0.40_10 Grub-emu loader for bhyve uefi-edk2-bhyve-csm-0.2_4,1 UEFI EDK2 firmware for bhyve with CSM (16-bit BIOS) vm-bhyve-1.5.0 Management system for bhyve virtual machines And kldstat ------------------------------------------------- Id Refs Address Size Name 1 38 0xffffffff80200000 d5ca28 kernel 2 1 0xffffffff80f5d000 576280 vmm.ko 3 1 0xffffffff814d4000 582850 zfs.ko 4 2 0xffffffff81a57000 5c50 xdr.ko 5 1 0xffffffff81ee5000 3378 acpi_wmi.ko 6 1 0xffffffff81ee9000 3218 intpm.ko 7 1 0xffffffff81eed000 2180 smbus.ko 8 1 0xffffffff81ef0000 7638 if_bridge.ko 9 1 0xffffffff81ef8000 50d8 bridgestp.ko 10 1 0xffffffff81efe000 21db8 ipfw.ko 11 1 0xffffffff81f20000 21cc nmdm.ko 12 1 0xffffffff81f23000 4700 nullfs.ko 13 1 0xffffffff81f28000 3530 fdescfs.ko Thanks!
"bhyve exited with status 0" just means that the guest rebooted. See the "EXIT STATUS" section of the bhyve man page. Without some more diagnostics from the guest, or a reproducible test case, I don't see how this can be tracked down.
Mark, good day! How can I help to provide additional diagnosting? May problem in datastore structure for BHyVe? I have a FS /mySSD/BHyVe in ZFS, but VM directory (/mySSD/BHyVe/Win2016) is not a FS but regular directory. Also some messages related to "dataset does not exists" in vm info: zfs list | grep BHyVe ------------------ mySSD/BHyVe 200G 226G 200G /mySSD/BHyVe mySSD/BHyVe/.templates 100K 226G 100K /mySSD/BHyVe/.templates ls -la /mySSD/BHyVe ------------------ drwxr-xr-x 7 root wheel 7 Jun 2 12:15 . drwxr-xr-x 4 root wheel 4 Apr 28 11:00 .. drwxr-xr-x 2 root wheel 4 May 4 22:34 .config drwxr-xr-x 2 root wheel 2 May 4 21:16 .img drwxr-xr-x 2 root wheel 2 May 4 21:16 .iso drwxr-xr-x 2 root wheel 3 May 4 21:16 .templates drwxr-xr-x 2 root wheel 7 Jun 20 08:11 Win2016 vm info Win2016 ------------------ Virtual Machine: Win2016 ------------------------ state: running (81219) datastore: default loader: uefi uuid: 39387cd6-e074-11ed-9e3c-d8bbc11c8171 cpu: 4 cpu-topology: sockets=1, cores=4 memory: 16G memory-resident: 17219084288 (16.036G) console-ports com1: /dev/nmdm-Win2016.1B vnc: 0.0.0.0:5900 network-interface number: 0 emulation: virtio-net virtual-switch: public fixed-mac-address: 58:9c:fc:0c:3a:c7 fixed-device: - active-device: - desc: - mtu: bridge: bridge0 virtual-disk number: 0 device-type: file emulation: nvme options: - system-path: /mySSD/BHyVe/Win2016/disk0.img bytes-size: 214748364800 (200.000G) bytes-used: 214794060800 (200.042G) cannot open 'mySSD/BHyVe/Win2016': dataset does not exist cannot open 'mySSD/BHyVe/Win2016': dataset does not exist clone-origin
I'm try to get some debug of kernel dump. And there are messages related to NVME Storage driver. Maybe some incimpabilities with bhyve nvme block device? What can I do? Maybe there are needs of special nvme drivers for guest OS? Debug from kernel.dmp ----------------------- ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses. If kernel debugger is available get stack backtrace. Arguments: Arg1: 0000000000000020, memory referenced Arg2: 0000000000000002, IRQL Arg3: 0000000000000000, value 0 = read operation, 1 = write operation Arg4: fffff80f972f2075, address which referenced memory Debugging Details: ------------------ Unable to load image \SystemRoot\System32\drivers\stornvme.sys, Win32 error 0n2 ***** Kernel symbols are WRONG. Please fix symbols to do analysis.
For note: Try to load that guest IO with CrystalDiskMark and see - if heavy I/O are triggers that panic. Will write information here.
Yes, start Crystal Disk Mark on Guest - followed BSOD + restart. Why this happens? At home I have a clone of that guest vm And there is differences: 1. At Work (where problem exists): Ryzen 5 2600, BHyve VM on SSD-based ZFS dataset. Heavy I/O in Guest - crashes it. OS: FreeBSD NEW-SITE 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-525ecfdad BSDKERN amd64 2. At Home: AMD Ryzen 5 5600G with Radeon Graphics, BHyve on HDD-based ZFS dataset. Heavy I/O in Guest - NOT crashes it. OS: FreeBSD BSD-HOME 13.1-RELEASE-p6 FreeBSD 13.1-RELEASE-p6 GENERIC amd64
Yes! I'm solve it myself! At work, I have a custom kernel config, in which I'm disabled device nvme device nvd I'm update to 13-STABLE, and uncomment in kernel config that device's rebuild and reinstall world and kernel and voila! CrystalMark works success and nothing died. Thank's to all!
Finally - investigated, that problem with nvme type disk. Earlier bug already exists. Link with them. *** This bug has been marked as a duplicate of bug 243063 ***