Bug 267779

Summary: bhyve crashes host kernel: panic: rendezvous not in progress
Product: Base System Reporter: Bjoern A. Zeeb <bz>
Component: bhyveAssignee: Corvin Köhne <corvink>
Status: Closed FIXED    
Severity: Affects Only Me CC: bz, corvink
Priority: --- Keywords: crash
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Bjoern A. Zeeb freebsd_committer freebsd_triage 2022-11-15 10:00:59 UTC
For testing purposes I am restarting a FreeBSD in FreeBSd bhyve instance more or less in a loop.  It has 1 PCI passthru in case that matters, a local file based disk.

During guest boot, often while printing CPU/TLB information the host will panic after a few iterations;  I updated main to last night and it still happens:

panic: rendezvous not in progress
cpuid = 3
time = 1668475188
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe011567d790
vpanic() at vpanic+0x151/frame 0xfffffe011567d7e0
panic() at panic+0x43/frame 0xfffffe011567d840
vm_exit_rendezvous() at vm_exit_rendezvous+0x6a/frame 0xfffffe011567d850
vmx_run() at vmx_run+0x276c/frame 0xfffffe011567d9a0
vm_run() at vm_run+0x223/frame 0xfffffe011567daa0
vmmdev_ioctl() at vmmdev_ioctl+0x507/frame 0xfffffe011567db40
devfs_ioctl() at devfs_ioctl+0xcd/frame 0xfffffe011567db90
vn_ioctl() at vn_ioctl+0x131/frame 0xfffffe011567dca0
devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfffffe011567dcc0
kern_ioctl() at kern_ioctl+0x202/frame 0xfffffe011567dd30
sys_ioctl() at sys_ioctl+0x12a/frame 0xfffffe011567de00
amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe011567df30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe011567df30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x4802ffe533a, rsp = 0x480bf115e58, rbp = 0x480bf115f10 ---

#11 kdb_enter (why=<optimized out>, msg=<optimized out>)
    at /worktrees/wireless-dev/sys/kern/subr_kdb.c:509
#12 0xffffffff80bec6b2 in vpanic (fmt=<optimized out>,
    ap=ap@entry=0xfffffe011567d820)
    at /worktrees/wireless-dev/sys/kern/kern_shutdown.c:967
#13 0xffffffff80bec453 in panic (
    fmt=0xffffffff81e8de70 <cnputs_mtx> "p\336)\201\377\377\377\377")
    at /worktrees/wireless-dev/sys/kern/kern_shutdown.c:903
#14 0xffffffff82557f2a in vm_exit_rendezvous (vm=<optimized out>, vcpuid=0,
    vcpuid@entry=3, rip=18446744071581284439, rip@entry=0)
    at /worktrees/wireless-dev/sys/amd64/vmm/vmm.c:1699
#15 0xffffffff82570b1c in vmx_run (arg=0xfffffe0114abf000, vcpu=0,
    rip=-2128267177, pmap=0xfffffe00e0e3e530, evinfo=0x1)
    at /worktrees/wireless-dev/sys/amd64/vmm/intel/vmx.c:3070
#16 0xffffffff82558223 in vm_run (vm=0xfffffe00e0502000,
    vmrun=vmrun@entry=0xfffff8000437bb00)
    at /worktrees/wireless-dev/sys/amd64/vmm/vmm.c:1775
#17 0xffffffff8255b917 in vmmdev_ioctl (cdev=<optimized out>,
    cmd=<optimized out>, data=0xfffff8000437bb00 "\003",
    fflag=<optimized out>, td=<optimized out>)
    at /worktrees/wireless-dev/sys/amd64/vmm/vmm_dev.c:504
#18 0xffffffff80a7bd0d in devfs_ioctl (ap=0xfffffe011567dba8)
    at /worktrees/wireless-dev/sys/fs/devfs/devfs_vnops.c:933
#19 0xffffffff80cf6201 in vn_ioctl (fp=0xfffff8000a0cd140,
    com=<optimized out>, data=0xfffff8000437bb00,
    active_cred=0xfffff80428875b00, td=0x0)
    at /worktrees/wireless-dev/sys/kern/vfs_vnops.c:1699
#20 0xffffffff80a7c3be in devfs_ioctl_f (fp=0xffffffff81e8de70 <cnputs_mtx>,
    com=0, data=0xffffffff81253857, cred=0x1, td=0x0)
    at /worktrees/wireless-dev/sys/fs/devfs/devfs_vnops.c:864
#21 0xffffffff80c644a2 in fo_ioctl (fp=0xfffff8000a0cd140, com=3230692865,
    data=0x1c200001, active_cred=0x1, td=<optimized out>)
    at /worktrees/wireless-dev/sys/sys/file.h:365
#22 kern_ioctl (td=td@entry=0xfffffe0114aefe40, fd=<optimized out>,
    com=com@entry=3230692865,
    data=0x1c200001 <error: Cannot access memory at address 0x1c200001>,
    data@entry=0xfffff8000437bb00 "\003")
    at /worktrees/wireless-dev/sys/kern/sys_generic.c:803
#23 0xffffffff80c641ea in sys_ioctl (td=0xfffffe0114aefe40,
    uap=0xfffffe0114af0238)
    at /worktrees/wireless-dev/sys/kern/sys_generic.c:711
#24 0xffffffff810d33be in syscallenter (td=<optimized out>)
    at /worktrees/wireless-dev/sys/amd64/amd64/../../kern/subr_syscall.c:189
#25 amd64_syscall (td=0xfffffe0114aefe40, traced=0)
    at /worktrees/wireless-dev/sys/amd64/amd64/trap.c:1200
#26 <signal handler called>
#27 0x000004802ffe533a in ?? ()
Comment 1 Corvin Köhne freebsd_committer freebsd_triage 2022-11-15 11:04:46 UTC
Could you please try if https://reviews.freebsd.org/D37390 solves your issue?
Comment 2 Bjoern A. Zeeb freebsd_committer freebsd_triage 2022-11-15 12:26:59 UTC
I am on it; should know by the end of the day or tomorrow morning.
Comment 3 Corvin Köhne freebsd_committer freebsd_triage 2022-11-15 12:45:26 UTC
I noticed that vmm doesn't reset vm->rendezvous_req_cpus at the end of a rendezvous. So, my patch shouldn't work. I'm working on a correct patch.
Comment 4 commit-hook freebsd_committer freebsd_triage 2022-11-21 07:20:40 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=fde8ce889201bf7fe86d7a3b3dfe2abf27cd2d73

commit fde8ce889201bf7fe86d7a3b3dfe2abf27cd2d73
Author:     Corvin Köhne <corvink@FreeBSD.org>
AuthorDate: 2022-11-17 06:51:51 +0000
Commit:     Corvin Köhne <corvink@FreeBSD.org>
CommitDate: 2022-11-21 07:19:36 +0000

    vmm: remove unneccessary rendezvous assertion

    When a vcpu sees that a rendezvous is in progress, it exits and tries to
    handle the rendezvous. The vcpu doesn't check if it's part of the
    rendezvous or not. If the vcpu isn't part of the rendezvous, the
    rendezvous could be done before it reaches the assertion. This will
    cause a panic.

    The assertion isn't needed at all because vm_handle_rendezvous properly
    handles a spurious rendezvous. So, we can just remove it.

    PR:                     267779
    Reviewed by:            jhb, markj
    Tested by:              bz
    Approved by:            manu (mentor)
    MFC after:              1 week
    Sponsored by:           Beckhoff Automation GmbH & Co. KG
    Differential Revision:  https://reviews.freebsd.org/D37417

 sys/amd64/vmm/vmm.c | 3 ---
 1 file changed, 3 deletions(-)
Comment 5 commit-hook freebsd_committer freebsd_triage 2022-11-29 13:58:42 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=59339f3a16c0aacbdb789b8600365f576a6a6a31

commit 59339f3a16c0aacbdb789b8600365f576a6a6a31
Author:     Corvin Köhne <corvink@FreeBSD.org>
AuthorDate: 2022-11-17 06:51:51 +0000
Commit:     Corvin Köhne <corvink@FreeBSD.org>
CommitDate: 2022-11-29 13:53:09 +0000

    vmm: remove unneccessary rendezvous assertion

    When a vcpu sees that a rendezvous is in progress, it exits and tries to
    handle the rendezvous. The vcpu doesn't check if it's part of the
    rendezvous or not. If the vcpu isn't part of the rendezvous, the
    rendezvous could be done before it reaches the assertion. This will
    cause a panic.

    The assertion isn't needed at all because vm_handle_rendezvous properly
    handles a spurious rendezvous. So, we can just remove it.

    PR:                     267779
    Reviewed by:            jhb, markj
    Tested by:              bz
    Approved by:            manu (mentor)
    MFC after:              1 week
    Sponsored by:           Beckhoff Automation GmbH & Co. KG
    Differential Revision:  https://reviews.freebsd.org/D37417

    (cherry picked from commit fde8ce889201bf7fe86d7a3b3dfe2abf27cd2d73)

 sys/amd64/vmm/vmm.c | 2 --
 1 file changed, 2 deletions(-)