On one of our servers, there are three unkillable bhyve processes which stuck in 'STOP' state. All has the same symptoms: # procstat -kk 9277 | sort -k 4 PID TID COMM TDNAME KSTACK 9277 102504 bhyve blk-1:0:0-0 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102514 bhyve blk-1:0:0-1 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102752 bhyve blk-1:0:0-2 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102760 bhyve blk-1:0:0-3 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102761 bhyve blk-1:0:0-4 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102763 bhyve blk-1:0:0-5 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102764 bhyve blk-1:0:0-6 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102766 bhyve blk-1:0:0-7 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102767 bhyve blk-3:0-0 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102768 bhyve blk-3:0-1 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102769 bhyve blk-3:0-2 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102770 bhyve blk-3:0-3 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102771 bhyve blk-3:0-4 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102772 bhyve blk-3:0-5 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102773 bhyve blk-3:0-6 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102774 bhyve blk-3:0-7 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102775 bhyve blk-4:0:0-0 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102780 bhyve blk-4:0:0-1 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102887 bhyve blk-4:0:0-2 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102888 bhyve blk-4:0:0-3 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102889 bhyve blk-4:0:0-4 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102890 bhyve blk-4:0:0-5 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102891 bhyve blk-4:0:0-6 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102892 bhyve blk-4:0:0-7 mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102687 bhyve mevent mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102941 bhyve pci-devstat mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102983 bhyve pci-reconfig mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102937 bhyve rfb mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f 9277 102991 bhyve vcpu 0 mi_switch+0xe2 sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2 devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x101 9277 100782 bhyve vcpu 1 mi_switch+0xe2 thread_suspend_switch+0xd4 thread_single+0x47b sigexit+0x57 postsig+0x2f8 ast+0x327 fast_syscall_common+0x198 9277 100788 bhyve vcpu 2 mi_switch+0xe2 sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2 devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x101 9277 101385 bhyve vcpu 3 mi_switch+0xe2 sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2 devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x101 9277 103402 bhyve vcpu 4 mi_switch+0xe2 sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2 devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x101 9277 102206 bhyve vcpu 5 mi_switch+0xe2 sleepq_timedwait+0x2f msleep_spin_sbt+0xd8 vm_run+0x6c7 vmmdev_ioctl+0x9b2 devfs_ioctl+0xc7 VOP_IOCTL_APV+0x56 vn_ioctl+0x16a devfs_ioctl_f+0x1f kern_ioctl+0x27d sys_ioctl+0x15d amd64_syscall+0x3b0 fast_syscall_common+0x101 9277 102932 bhyve vtnet-5:0 tx mi_switch+0xe2 thread_suspend_check+0x297 ast+0x3b9 doreti_ast+0x1f # procstat threads 9277|grep vcpu 9277 100782 bhyve vcpu 1 -1 124 stop - 9277 100788 bhyve vcpu 2 -1 123 stop vmsusp 9277 101385 bhyve vcpu 3 -1 123 stop vmsusp 9277 102206 bhyve vcpu 5 -1 122 stop vmsusp 9277 102991 bhyve vcpu 0 -1 124 stop vmsusp 9277 103402 bhyve vcpu 4 -1 123 stop vmsusp As you can see, One of the VCPU threads exit, the rest are sleeping in kernel mode: https://svnweb.freebsd.org/base/head/sys/amd64/vmm/vmm.c?view=markup#l1515 The main problem is that if the bhyve process is killed, the threads driving those vCPUs will be gone, resulting in the remaining thread waiting in kernelspace forever: https://svnweb.freebsd.org/base/head/sys/amd64/vmm/vmm.c?view=markup#l1507 I found that the same bug was fixed in SmartOS: https://github.com/joyent/illumos-joyent/commit/dce228e4331f185347c3e0325cab8a3af72d6410#diff-721b4b2a86bbb8d11c6e01a2995868e6 Unfortunately the site https://smartos.org/bugview/OS-6888 is not available, so it is impossible to see the detailed description. Can we apply similar fix to our bhyve code?
Created attachment 210059 [details] procstat -k
Created attachment 210060 [details] prockstat threads
Created attachment 210061 [details] prockstat signals
Created attachment 210062 [details] procstat tsignals
Show line number for vm_run+0x6c7. What you wrote about vm_handle_rendezvous() might be reasonable but I am not sure that this is the case. Anyway, the line number information should confirm or deny. If indeed the issue is the infinite wait for rendezvous, it seems to be easy to handle it.
Try https://reviews.freebsd.org/D22881. I did not tested that.
https://smartos.org/bugview/OS-6888 yandex save :) https://yandexwebcache.net/yandbtm?lang=en&fmode=inject&tm=1576826734&tld=ru&la=1560740864&text=OS-6888%20bhyve%20wedged%20on%20vioapic%20write&url=https%3A%2F%2Fsmartos.org%2Fbugview%2FOS-6888&l10n=ru&mime=html&sign=0c8dee200fcb04690f50b361149607d8&keyno=0
(In reply to Andrey Fesenko from comment #7) I have no idea what did you tried to say. smartos link opens up as a blank page, and the yandex link returns 404.
(In reply to Konstantin Belousov from comment #8) hm... yandex cach not permanent link :( original page bug OS-6888 long time not work, now working, save https://web.archive.org/web/20191220195714/https://smartos.org/bugview/OS-6888 if smartos bugview down again.
A commit references this bug: Author: kib Date: Thu Jan 2 22:37:05 UTC 2020 New revision: 356296 URL: https://svnweb.freebsd.org/changeset/base/356296 Log: bhyve: terminate waiting loops if thread suspension is requested. PR: 242724 Reviewed by: markj Reported and tested by: Aleksandr Fedorov <aleksandr.fedorov@itglobal.com> (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22881 Changes: head/sys/amd64/include/vmm.h head/sys/amd64/vmm/io/vioapic.c head/sys/amd64/vmm/vmm.c head/sys/amd64/vmm/vmm_dev.c
A commit references this bug: Author: kib Date: Thu Jan 9 08:29:32 UTC 2020 New revision: 356543 URL: https://svnweb.freebsd.org/changeset/base/356543 Log: MFC r356296: bhyve: terminate waiting loops if thread suspension is requested. PR: 242724 Changes: _U stable/12/ stable/12/sys/amd64/include/vmm.h stable/12/sys/amd64/vmm/io/vioapic.c stable/12/sys/amd64/vmm/vmm.c stable/12/sys/amd64/vmm/vmm_dev.c