Bug 242666 - Stopped VM cant restart again
Summary: Stopped VM cant restart again
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords: bhyve
Depends on:
Blocks:
 
Reported: 2019-12-16 16:35 UTC by Bernhard Berger
Modified: 2020-06-13 22:15 UTC (History)
4 users (show)

See Also:


Attachments
fstat output (91.95 KB, text/plain)
2019-12-18 19:13 UTC, Bernhard Berger
no flags Details
Task list (5.36 KB, text/plain)
2019-12-18 19:14 UTC, Bernhard Berger
no flags Details
Truss file (10.03 KB, text/plain)
2019-12-18 19:15 UTC, Bernhard Berger
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bernhard Berger 2019-12-16 16:35:13 UTC
Hi, I have an old problem with BHYVE VM's
Sometimes a VM that has been stopped cannot be restarted. When the VM is restarted, an error message appears stating that it is still running.
Currently this can only be solved by rebooting the server.
This is also the case with the current FreeBSD 12.1 release :-(

Here is my VM list

root@superserver:~ # vm list
NAME DATASTORE LOADER CPU MEMORY VNC AUTOSTART STATE
nextcloud default bhyveload 2 2G - Yes [3] Running (86383)
samba default bhyveload 2 2G - Yes [2] Running (15621)
test-pc default bhyveload 2 2G - No Stopped
unifi default grub 1 2G - Yes [1] Running (15601)


Here I try to start the VM "test-pc

root@superserver:~ # vm start test-pc
Starting test-pc
  * found guest in /data/bhyve/test-pc
  ! guest appears to be running already


However, there is no process 
root@superserver:~ # ps xaf | grep test-pc
19050 14 S+ 0:00.00 grep test-pc

compared to samba
root@superserver:~ # ps xaf | grep samba
15361 2- I 0:00.01 /bin/sh /usr/local/sbin/vm _run samba
15621 2- IC 102:54.67 bhyve: samba (bhyve)
19971 14 S+ 0:00.00 grep samba

but in /dev/vmm the test-pc exists as running VM

root@superserver:~ # ll /dev/vmm
total 0
crw------- 1 root wheel 0xb1 Dec 16 13:35 nextcloud
crw------- 1 root wheel 0xb3 Dec 15 23:25 samba
crw------- 1 root wheel 0xb2 Dec 16 15:26 test-pc
crw------- 1 root wheel 0x7d Dec 15 23:24 unifi

Translated with www.DeepL.com/Translator (free version)
Comment 1 Aleksandr Fedorov 2019-12-17 04:44:10 UTC
As I understand it, you are using sysutils/vm-bhyve port. Then, this may not be a bhyve problem.

If the bhyve process unexpectedly terminates with an error or is killed via kill -9, the device file remains in /dev/vmm/<vmname>. And the attempt to restart the virtual machine will fail until you run the bhyvectl --destroy --vm <vmname> command.

But, I don't know how well sysutils/vm-bhyve handles such situations.

Can you show us the output of the bhyvectl --get-exit-reason --vm test-pc command to determine the exit reason for the VM?
Comment 2 Jason Tubnor 2019-12-17 05:21:46 UTC
Can you also check for a run.lock file?  This will be located in the guest vm configuration directory:

/data/bhyve/test-pc

If this is present, vm-bhyve will argue.  If you remove it, vm-bhyve will clean up and launch your vm again.
Comment 3 Bernhard Berger 2019-12-17 10:20:53 UTC
(In reply to Jason Tubnor from comment #2)

no, the file dont exists
Comment 4 Bernhard Berger 2019-12-17 11:01:46 UTC
(In reply to Aleksandr Fedorov from comment #1)


root@superserver:/data/bhyve # bhyvectl --get-exit-reason --vm test-pc
VM:test-pc is not created.

root@superserver:/data/bhyve # bhyvectl --destroy --vm test-pc
VM:test-pc is not created.

root@superserver:/data/bhyve # vm start test-pc
Starting test-pc
  * found guest in /data/bhyve/test-pc
  ! guest appears to be running already

root@superserver:/data/bhyve #  ls /dev/vmm
nextcloud       samba           test-pc         unifi

root@superserver:/data/bhyve # ls test-pc
disk0.img       test-pc.conf

root@superserver:/data/bhyve # vm list
NAME       DATASTORE  LOADER     CPU  MEMORY  VNC  AUTOSTART  STATE
nextcloud  default    bhyveload  2    2G      -    Yes [3]    Running (86383)
samba      default    bhyveload  2    2G      -    Yes [2]    Running (15621)
test-pc    default    bhyveload  2    2G      -    No         Stopped
unifi      default    grub       1    2G      -    Yes [1]    Running (15601)


root@superserver:/data/bhyve # ps xaf | grep bhyve
86383  0- SC    450:08.40 bhyve: nextcloud (bhyve)
15601  2- SC     53:29.89 bhyve: unifi (bhyve)
15621  2- IC    207:39.88 bhyve: samba (bhyve)


root@superserver:/data/bhyve # ls /dev | grep nmdm
nmdm-nextcloud.1A
nmdm-nextcloud.1B
nmdm-samba.1A
nmdm-samba.1B
nmdm-unifi.1A
nmdm-unifi.1B

Definitely the VM "test-pc" is not started and cannot be started because: /dev/vmm/test-pc still exists. 

There must be a way to remove the /dev/vmm/test-pc device.

Just to understand, this is not the first time this problem has occurred, but since I've been working with bhyve since FreeBSD 11, it's very off. 
It often happens that if you type "reboot" in the guest system (FreeBSD), this is exactly what happens. If you enter "shutdown -p now" in the guest system (FreeBSD), it may also happen that the VM cannot be started afterwards. 
 I think it has to do with a non-closed console.
 
In the past, I solved this by rebooting the host system. But that can't be the solution and that's why I opened this ticket.


VM list apparently only checks if the process bhyve <vm-name> exists

bhyveload seems to only check if the device /dev/vmm/<vm-name> exists. 

bhyvectl A process "bhyve <vm-name>" must also exist for /dev/vm/<vm-name>. If the process does not exist, the device must be removed. This should not be so difficult to program.

Yours sincerely


There is another problem which has to do with VM-BHYVE but how to set a ticket
Comment 5 Jason Tubnor 2019-12-17 20:09:07 UTC
FWIW:  We don't use anything except UEFI in bhyve now.  Our platform is based on 11.3-RELEASE and follows the 11 tree.

Do you have any issues with UEFI guests or is it purely bhyveload and/or grub-bhyve??

For issues with vm-bhyve, raise an issue on churchers github page for the project:

https://github.com/churchers/vm-bhyve/issues
Comment 6 Rodney W. Grimes freebsd_committer 2019-12-17 20:10:46 UTC
(In reply to Bernhard Berger from comment #4)
> root@superserver:/data/bhyve # bhyvectl --destroy --vm test-pc
> VM:test-pc is not created.

I believe bhyvectl is arg order sensitive, try that command as
bhyvectl --vm test-pc --destroy
Comment 7 Bernhard Berger 2019-12-18 15:27:37 UTC
(In reply to Rodney W. Grimes from comment #6)

how many times do you want me to try? 
Do you read what I write?


It's a bug in the BHYVE system. 
Under certain circumstances, a VM that has not ended cleanly can no longer be started. 

In such cases I solve the problem by restarting the host server but that can't be a general solution. 

It shouldn't be a problem to program a check or repair function in bhyvectl that detects such problems and fixes them.  


With kind regards 

Translated with www.DeepL.com/Translator (free version)
Comment 8 Bernhard Berger 2019-12-18 16:11:39 UTC
I'm going to describe what this is all about.

It is "not" about a solution on "my" server but about a global solution for all who use BHYVE. 

My VM "test-pc" which can no longer be started is only the trigger of a problem that has been occurring to me for a long time to finally send you a bug report so that you get to know about it and look for a solution.

I found out the following:

1. a stopped VM cannot be started anymore if the /dev/nmdm-<vm-name>.1A/B still exists

2 A terminated VM can no longer be started if the device in /dev/vmm/<vm-name> still exists.

So, take the problem seriously and forward it to the appropriate developers.

I hope that this will be solved in the next version of BHYVE.


Yours sincerely
Comment 9 Aleksandr Fedorov 2019-12-18 17:15:48 UTC
Yes, this is the bhyve problem. But we need to know what is happened.

Can you run: truss bhyvectl --destroy --vm test-pc.

This will help us to find out the cause of the openat() system call error.

Also, please add to the attachments a complete list of processes on the host.

The output of fstat utility is also very interesting.
Comment 10 Bernhard Berger 2019-12-18 19:13:28 UTC
Created attachment 210041 [details]
fstat output
Comment 11 Bernhard Berger 2019-12-18 19:14:59 UTC
Created attachment 210042 [details]
Task list
Comment 12 Bernhard Berger 2019-12-18 19:15:22 UTC
Created attachment 210043 [details]
Truss file
Comment 13 Bernhard Berger 2019-12-18 19:16:59 UTC
the expenses as requested.

i hope it helps

Yours sincerely
Comment 14 Bernhard Berger 2020-01-02 13:04:10 UTC
Is there any status overview

The error has repeated itself with another VM