215972 – Bhyve crash more then 1 cpu AMD

Bug 215972 - Bhyve crash more then 1 cpu AMD

Summary: Bhyve crash more then 1 cpu AMD

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	11.0-STABLE
Hardware:	amd64 Any

Importance:	--- Affects Many People
Assignee:	Andriy Gapon

URL:
Keywords:

Duplicates (1):	215377 (view as bug list)
Depends on:
Blocks:

Reported:	2017-01-11 15:48 UTC by ajschot
Modified:	2019-02-20 05:06 UTC (History)
CC List:	13 users (show)

See Also:

Flags:	koobs: mfc-stable11+ koobs: mfc-stable10+

Attachments
Screenshot Terminal SSH (68.89 KB, image/png) 2017-01-11 15:48 UTC, ajschot	no flags	Details
ktr capture of the problem (67.65 KB, text/plain) 2018-01-09 09:34 UTC, Andriy Gapon	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description ajschot 2017-01-11 15:48:57 UTC

Created attachment 178755 [details]
Screenshot Terminal SSH

Hi problem with bhyve when using more then 1 cpu bhyve crashes.
Trying windows 10 x64 on an AMD A8 7600, ASRock FM2A88X, 32Gb DDR3-1600. 500Gb
FreebSD 11 updated today before try, also bhyve-firmware updated to latest version.

It works only with 1 cpu....


Using to boot:

sudo bhyve -c 2 -m 4G -H -w \
  -s 0,hostbridge \
  -s 3,ahci-cd,virtio-win-0.1.126.iso \
  -s 4,ahci-hd,win10.img \
  -s 5,virtio-net,tap10 \
  -s 29,fbuf,tcp=0.0.0.0:5900,wait \
  -s 30,xhci,tablet \
  -s 31,lpc \
  -l com1,stdio \
  -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
  win10


Only works when changing '-c 2' into '-c 1'

I tried on an intel i5 and it worked with 2 cpu's so it looks like this is an AMD related problem.

Comment 1 ajschot 2017-01-12 10:39:44 UTC

Also added following lines to /boot/loader.conf

hw.vmm.topology.cores_per_package=4
hw.vmm.topology.threads_per_core=4

but still only can install/boot windows bhyve with 1 cpu more then 1 will freeze the vm

Comment 2 Peter Grehan freebsd_committer

2017-01-17 18:59:43 UTC

The workaround is to install with 1, and then increase that post-install.

I can reproduce this. Looks like it needs some quality time in the Windows debugger to see where the CPUs start to spin.

Comment 3 ajschot 2017-01-18 22:21:42 UTC

(In reply to Peter Grehan from comment #2)
I tried after install to start with 2 cpu's but it hangs in start screen of windows 10 x64

Comment 4 Peter Grehan freebsd_committer

2017-01-19 08:17:37 UTC

You have to wait until the install is complete (i.e. the 3rd reboot, where you enter username etc). At that point, you should be able to power off and then restart with > 1 vCPU.

Comment 5 ajschot 2017-01-19 09:16:54 UTC

(In reply to Peter Grehan from comment #4)
I did that.
1 cpu:
- Install
Reboot
- Setup Windows
Reboot
-Install Virtio driver
Reboot

When everything was setup booted with 2 vcpu's
But then the thing freezes.... with 1 cpu it does boot. I can try again but i tried with FreeBSD 12 and 11 and the same thing happens.

Comment 6 Peter Grehan freebsd_committer

2017-01-19 16:14:20 UTC

I didn't add the virtio driver - maybe that was what did it. Also, I'm installing on an Opteron 6320.

The process was, with 1 vCPU

install
 - reboot
2nd phase
 - reboot
final phase (set up account, etc. Goes to desktop)
 - reboot.

Now restart with multiple vCPUs. Tried 2, and also 6 after setting hw.vmm.topology.cores_per_package.

Comment 7 ajschot 2017-01-19 16:19:03 UTC

(In reply to Peter Grehan from comment #6)
I did it almost the same way only Virtio driver and tried 2 and 4 cpu's

Also added hw.vmm.topology.cores_per_package="4" to /boot/loader.conf
I have really now idea maybe something that bhyve does not like about the A8?

Comment 8 Nils Beyer 2017-03-30 12:59:28 UTC

Same behaviour here on a Ryzen 1700 and "FreeBSD 12.0-CURRENT #0
334829e6c(drm-next)-dirty".

Setting vCPU count greater than 1 leads to random lock-ups of the Windows 10
VM. Two, sometimes three of the vCPUs are creating 100% load on the host
system. Keyboard input via VNC doesn't work at all.

"bhyve" itself writes:
-------------------------------------------------------------------------------
fbuf frame buffer base: 0xa43200000 [sz 16777216]
rdmsr to register 0xc0010114 on vcpu 0
rdmsr to register 0xc0010114 on vcpu 1
wrmsr to register 0x10(0) on vcpu 1
rdmsr to register 0xc0010114 on vcpu 2
wrmsr to register 0x10(0) on vcpu 2
rdmsr to register 0xc0010114 on vcpu 3
wrmsr to register 0x10(0) on vcpu 3
wrmsr to register 0x10(0xcc75fcd2078) on vcpu 3
wrmsr to register 0x10(0xcc75fcd2078) on vcpu 0
wrmsr to register 0x10(0xcc75fcd2078) on vcpu 1
wrmsr to register 0x10(0xcc75fcd2078) on vcpu 2
atkbd data buffer full
atkbd data buffer full
atkbd data buffer full
atkbd data buffer full
atkbd data buffer full
atkbd data buffer full
atkbd data buffer full
atkbd data buffer full
atkbd data buffer full
atkbd data buffer full
atkbd data buffer full
atkbd data buffer full
atkbd data buffer full
-------------------------------------------------------------------------------

sysctls:
-------------------------------------------------------------------------------
#sysctl hw.vmm
hw.vmm.npt.pmap_flags: 507
hw.vmm.svm.num_asids: 32768
hw.vmm.svm.disable_npf_assist: 0
hw.vmm.svm.features: 113919
hw.vmm.svm.vmcb_clean: 959
hw.vmm.vmx.vpid_alloc_failed: 0
hw.vmm.vmx.posted_interrupt_vector: -1
hw.vmm.vmx.cap.posted_interrupts: 0
hw.vmm.vmx.cap.virtual_interrupt_delivery: 0
hw.vmm.vmx.cap.invpcid: 0
hw.vmm.vmx.cap.monitor_trap: 0
hw.vmm.vmx.cap.unrestricted_guest: 0
hw.vmm.vmx.cap.pause_exit: 0
hw.vmm.vmx.cap.halt_exit: 0
hw.vmm.vmx.initialized: 0
hw.vmm.vmx.cr4_zeros_mask: 0
hw.vmm.vmx.cr4_ones_mask: 0
hw.vmm.vmx.cr0_zeros_mask: 0
hw.vmm.vmx.cr0_ones_mask: 0
hw.vmm.ept.pmap_flags: 0
hw.vmm.vrtc.flag_broken_time: 1
hw.vmm.ppt.devices: 0
hw.vmm.iommu.enable: 1
hw.vmm.iommu.initialized: 0
hw.vmm.bhyve_xcpuids: 136
hw.vmm.topology.cpuid_leaf_b: 1
hw.vmm.topology.cores_per_package: 4
hw.vmm.topology.threads_per_core: 1
hw.vmm.create: beavis
hw.vmm.destroy: beavis
hw.vmm.trace_guest_exceptions: 0
hw.vmm.ipinum: 251
hw.vmm.halt_detection: 1
-------------------------------------------------------------------------------

started "bhyve" with:
-------------------------------------------------------------------------------
bhyve -c 4 -m 8G \                                                                                                                                                        
        -w -H -A -P \                                                                                                                                                               
        -s 0,amd_hostbridge \                                                                                                                                                       
        -s 1,lpc \                                                                                                                                                                  
        -s 2,ahci-cd,/mnt/ryzen/iso/Windows10-PRO.de.iso \                                                                                                                          
        -s 3,ahci-hd,/mnt/ryzen/vms/${NAME}/lun0.img \                                                                                                                              
        -s 9,e1000,tap${ID} \                                                                                                                                                       
        -s 29,fbuf,tcp=0.0.0.0:5901,w=1024,h=768,wait \                                                                                                                             
        -s 30,xhci,tablet \                                                                                                                                                         
        -l com1,/dev/nmdm0A \                                                                                                                                                       
        -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \                                                                                                                   
        ${NAME}                                                      
-------------------------------------------------------------------------------

looking at the "ktrace -p" file, I see lots of:
-------------------------------------------------------------------------------
[...]
  3826 vcpu 0   CALL  ioctl(0x3,0xc0907601,0x7fffddbebe30)
  3826 vcpu 3   RET   ioctl 0
  3826 vcpu 0   RET   ioctl 0
  3826 vcpu 0   CALL  ioctl(0x3,0xc0907601,0x7fffddbebe30)
  3826 vcpu 3   CALL  ioctl(0x3,0xc0907601,0x7fffdd3e7e30)
  3826 vcpu 2   CALL  ioctl(0x3,0xc0907601,0x7fffdd5e8e30)
  3826 vcpu 3   RET   ioctl 0
  3826 vcpu 3   CALL  ioctl(0x3,0xc0907601,0x7fffdd3e7e30)
  3826 vcpu 2   RET   ioctl 0
  3826 vcpu 3   RET   ioctl 0
  3826 vcpu 3   CALL  ioctl(0x3,0xc0907601,0x7fffdd3e7e30)
  3826 vcpu 0   RET   ioctl 0
  3826 vcpu 3   RET   ioctl 0
  3826 vcpu 3   CALL  ioctl(0x3,0xc0907601,0x7fffdd3e7e30)
  3826 vcpu 0   CALL  ioctl(0x3,0xc0907601,0x7fffddbebe30)
  3826 vcpu 3   RET   ioctl 0
  3826 vcpu 0   RET   ioctl 0
  3826 vcpu 2   CALL  ioctl(0x3,0xc0907601,0x7fffdd5e8e30)
  3826 vcpu 2   RET   ioctl 0
  3826 vcpu 3   CALL  ioctl(0x3,0xc0907601,0x7fffdd3e7e30)
  3826 vcpu 3   RET   ioctl 0
  3826 vcpu 0   CALL  ioctl(0x3,0xc0907601,0x7fffddbebe30)
  3826 vcpu 2   CALL  ioctl(0x3,0xc0907601,0x7fffdd5e8e30)
  3826 vcpu 2   RET   ioctl 0
[...]
-------------------------------------------------------------------------------

Anything I can do to help debugging?

Comment 9 Peter Grehan freebsd_committer

2017-03-30 22:32:37 UTC

Insta-repro for me on a Ryzen 1700. Happens almost immediately on install with >= 2 vCPUs, and the more configured, the faster the freeze. Single vCPU install is reliable, and I've been able to get occasional long uptimes with server sku's and 2 vCPUs.

I also see cases where it's only some vCPUs that are stuck at 100% - sometimes 2, with the remainder idle. The RIPs of the spinning vCPUs are generally constant, indicating a lock-spin or similar.

To debug further with Windows, it probably needs the Windows kernel debugger to be hooked up, and then trapped into once the spin is seen.

However, I can repro this doing a FreeBSD buildworld with >= 12 vCPUs. It takes a lot longer (~20 mins) but seems to be reliable. Backtraces in ddb seem to show a missed IPI while holding a spinlock, which eventually blocks the entire system.

Comment 10 Nils Beyer 2017-03-31 12:52:24 UTC

Peter Grehan wrote in comment #9:
> However, I can repro this doing a FreeBSD buildworld with >= 12 vCPUs. It takes 
> a lot longer (~20 mins) but seems to be reliable. Backtraces in ddb seem to 
> show a missed IPI while holding a spinlock, which eventually blocks the entire 
> system.

is that a DDB from within the guest VM or the host?

Comment 11 Peter Grehan freebsd_committer

2017-03-31 13:41:14 UTC

It's ddb from within the guest. The signature is:

1 vCPU will panic with a lock-spin timeout:
CPU 11, panic spin lock 0xffffffff81ea0480 (smp rendezvous) held by 0xfffff800079da000 (tid 100093) too long
vpanic() at vpanic+0x1b9/frame 0xfffffe02ba76f6f0
panic() at panic+0x43/frame 0xfffffe02ba76f750
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x328/frame 0xfffffe02ba76f7d0
__mtx_lock_spin_flags() at __mtx_lock_spin_flags+0xe0/frame 0xfffffe02ba76f810
smp_rendezvous_cpus() at smp_rendezvous_cpus+0xab/frame 0xfffffe02ba76f880
dtrace_sync() at dtrace_sync+0x77/frame 0xfffffe02ba76f8d0
dtrace_state_deadman() at dtrace_state_deadman+0x13/frame 0xfffffe02ba76f900

That spinlock is held by another vCPU that is waiting for an ack to it's IPI
CPU 5
--- trap 0x13, rip = 0xffffffff81033ac2, rsp = 0xfffffe02c8009860, rbp = 0xfffffe02c80098d0 ---
smp_targeted_tlb_shootdown() at smp_targeted_tlb_shootdown+0x352/frame 0xfffffe02c80098d0
smp_masked_invlpg() at smp_masked_invlpg+0x4c/frame 0xfffffe02c8009900
pmap_invalidate_page() at pmap_invalidate_page+0x191/frame 0xfffffe02c8009950
pmap_ts_referenced() at pmap_ts_referenced+0x7b3/frame 0xfffffe02c8009a00
vm_pageout() at vm_pageout+0xe04/frame 0xfffffe02c8009a70

... and all other vCPUs are waiting on the lock held by the vCPU awaiting
the ack.
--- trap 0x13, rip = 0xffffffff80a8d222, rsp = 0xfffffe02c8349600, rbp = 0xfffffe02c8349610 ---
lock_delay() at lock_delay+0x42/frame 0xfffffe02c8349610
__mtx_lock_sleep() at __mtx_lock_sleep+0x228/frame 0xfffffe02c83496a0
__mtx_lock_flags() at __mtx_lock_flags+0xe8/frame 0xfffffe02c83496f0
vm_page_enqueue() at vm_page_enqueue+0x6b/frame 0xfffffe02c8349720
vm_fault_hold() at vm_fault_hold+0x1ab9/frame 0xfffffe02c8349850
vm_fault() at vm_fault+0x75/frame 0xfffffe02c8349890

Comment 12 Nils Beyer 2017-03-31 14:31:41 UTC

I had this vCPU lock-up behaviour on a "Phenom II X6 1055T", too. So it seems 
that the desktop lines of AMD CPUs are generally unsupported in bhyve's SVM 
implementation.

Ok, while studying https://en.wikipedia.org/wiki/Inter-processor_interrupt and
applying to http://mitadmissions.org/apply

is there anything I can check/debug here on my system? I have no idea how to
remotely kernel-debug Windows...

Comment 13 jesper 2017-04-17 16:34:34 UTC

I appear to be running into the same problem with different circumstances.

I am running a Windows 2012R2 VM with a little help from chyves. It works perfectly well for a 3-4 days, idling at about 1% CPU on my Xeon E5-2630v3. Then, the VM goes unresponsive and bhyve starts consuming ~100% of a core, regardless of the number of vCPUs assigned to the VM. I've tested this with both one and four cores assigned to the VM.

One crash filled the screen with "atkbd data buffer full", but most don't. The VNC console is blank and unresponsive.

==================================================
Platform:
==================================================
White box server
Motherboard: Asrock X99/Extreme4
CPU: Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz (2399.35-MHz K8-class CPU)
RAM: 8x 16 GB ECC (128GB)
Intel NIC, IBM/LSI HBA, a few other odds and ends i doubt would make much difference

===================================================
top reports one of 16 (8, hyperthreaded) cores in use:
===================================================
last pid: 41496;  load averages:  1.12,  1.13,  1.09                                           up 44+19:40:00  17:55:06
59 processes:  1 running, 58 sleeping
CPU:  0.0% user,  0.0% nice,  6.2% system,  0.0% interrupt, 93.8% idle
Mem: 12M Active, 1281M Inact, 121G Wired, 2684M Free
ARC: 92G Total, 39G MFU, 50G MRU, 300K Anon, 787M Header, 2850M Other
Swap:

  PID USERNAME      THR PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
29470 root           23  20    0 17482M  6750M kqread  3  21.9H 101.21% bhyve

==================================================
root@chef:~ # uname -a
FreeBSD chef.bofh 11.0-STABLE FreeBSD 11.0-STABLE #0: Fri Mar  3 04:28:46 CET 2017     root@chef.bofh:/usr/obj/usr/src/sys/CHEF  amd64
root@chef:~ # chyves ike get all
Getting all ike's properties...
bargs                                -A -H -P -S
bhyve_disk_type                      ahci-hd
bhyve_net_type                       e1000
bhyveload_flags
chyves_guest_version                 0300
cpu                                  4
creation                             Created on Fri Mar 24 20:26:53 CET 2017 by chyves v0.2.0 2016/09/11 using __create()
description                          -
eject_iso_on_n_reboot                3
loader                               uefi
net_ifaces                           tap51
notes                                -
os                                   windows
ram                                  16G
rcboot                               0
revert_to_snapshot
revert_to_snapshot_method            off
serial                               nmdm51
template                             no
uefi_console_output                  vnc
uefi_firmware                        BHYVE_UEFI.fd
uefi_vnc_client                      print
uefi_vnc_client_custom_cmd
uefi_vnc_ip                          0.0.0.0
uefi_vnc_mouse_type                  usb3
uefi_vnc_pause_until_client_connect  no
uefi_vnc_port                        5901
uefi_vnc_res                         800x600
uuid                                 d5302114-10c7-11e7-91c6-d05099803cdc

==================================================
I get the same kdump output as Nils Beyer:
==================================================

 29470 vcpu 1   CALL  ioctl(0x3,0xc0907601,0x7fffdd9eae30)
 29470 vcpu 2   CALL  ioctl(0x3,0xc0907601,0x7fffdd7e9e30)
 29470 vcpu 3   CALL  ioctl(0x3,0xc0907601,0x7fffdd5e8e30)
 29470 vcpu 1   RET   ioctl 0
 29470 vcpu 3   RET   ioctl 0
 29470 vcpu 1   CALL  ioctl(0x3,0xc0907601,0x7fffdd9eae30)
 29470 vcpu 3   CALL  ioctl(0x3,0xc0907601,0x7fffdd5e8e30)
 29470 vcpu 2   RET   ioctl 0
 29470 vcpu 2   CALL  ioctl(0x3,0xc0907601,0x7fffdd7e9e30)
 29470 vcpu 1   RET   ioctl 0
 29470 vcpu 1   CALL  ioctl(0x3,0xc0907601,0x7fffdd9eae30)
 29470 vcpu 3   RET   ioctl 0
 29470 vcpu 3   CALL  ioctl(0x3,0xc0907601,0x7fffdd5e8e30)
 29470 vcpu 2   RET   ioctl 0
 29470 vcpu 2   CALL  ioctl(0x3,0xc0907601,0x7fffdd7e9e30)
 29470 vcpu 0   RET   ioctl 0
 29470 vcpu 0   CALL  ioctl(0x3,0xc0907601,0x7fffddbebe30)

Comment 14 Peter Grehan freebsd_committer

2017-04-17 18:37:19 UTC

>bhyve_net_type                       e1000

 The lockup you are seeing is unrelated to the AMD one, and is a known one with the e1000 under Windows.

 I've created 218715 to track the e1000 issue.

Comment 15 Nils Beyer 2017-07-25 08:57:25 UTC

(In reply to Peter Grehan from comment #11)

Peter, do you have any news regarding that issue? The guest freezes still happen on 11.1-RELEASE. Sometimes the Windows 10 guest boots, I can login, but then it freezes after some time (all vcores 100% loaded). Sometimes it even freezes before the Windows login screen.
-------------------------------------------------------------------------------
hw.vmm.topology.cores_per_package: 16
hw.vmm.topology.threads_per_core: 1
-------------------------------------------------------------------------------

AMD SVM is not production-ready, yet, is it?

Comment 16 Peter Grehan freebsd_committer

2017-07-25 15:19:00 UTC

I've been working with Anish to narrow down the problem seen on the Ryzen with a FreeBSD guest. We are making (slow) progress on this.

>AMD SVM is not production-ready, yet, is it?

It depends on the guest. I've not seen any issues with Linux guests for example.

Comment 17 Nils Beyer 2017-07-26 13:18:22 UTC

(In reply to Peter Grehan from comment #16)

cool, thanks...

Comment 18 David Gilbert 2017-07-27 22:45:51 UTC

I have been able to reproduce something like this:

FreeBSD-11.1-RC3 host, FreeBSD 11.1-RC3 guest.

Host: AMD 9590 (8 core), 32G RAM.
Guest: 4 cores, 4G RAM.

make -j4 buildworld on the guest.

Comment 19 Peter Grehan freebsd_committer

2017-07-27 23:47:28 UTC

(In reply to dgilbert from comment #18)

Would you be able to try your same test, but with the guest vCPUs pinned ? e.g. add the following bhyve parameters

 -p 0:1 -p 1:2 -p 2:3 -p 3:4

Comment 20 David Gilbert 2017-07-28 04:41:16 UTC

(In reply to Peter Grehan from comment #19)

You asked me this in email on the list. I replied that this didn't seem to have any effect... Ie: it still hung.

Comment 21 Peter Grehan freebsd_committer

2017-07-28 05:35:46 UTC

(In reply to dgilbert from comment #20)

Sorry, didn't know that was you.

There are 2 other things to try here:

- when the guest is hung, on the host issue

  bhyvectl --get-rip --cpu=0 --vm=<your vm name>
  bhyvectl --get-rip --cpu=1 --vm=<your vm name>
  bhyvectl --get-rip --cpu=2 --vm=<your vm name>
  bhyvectl --get-rip --cpu=3 --vm=<your vm name>

 You can look at what the resulting RIP values correspond to by restarting the guest, and within the guest,

   kgdb /boot/kernel/kernel
   x/i <rip value>

- Run the same test with a 12-current guest. With luck, it will panic and drop into ddb. If it hangs but doesn't panic, for the guest to drop into ddb from the host by issuing

  bhyvectl --inject-nmi --vm=<your vm name>

 From within ddb you can issue a backtrace.

Comment 22 domhauton 2017-12-05 21:40:14 UTC

Hey,

Is there a solution or has there been any progress on debugging this?

I'm getting the same issue with a 1100T on Win10 Pro / Win10 Education / Windows Server 2016 Datacenter.

I've been trying to setup surveillance software which unfortunately needs more than one core.

Many Thanks,
Dom

Comment 23 mikkel 2017-12-31 00:04:30 UTC

This exact problem also happens under bhyve in FreeNAS 11.1 when installing pfSense or OPNsense, so this is not limited to Windows guests - perhaps it is easier to debug with FreeBSD based guests?

Comment 24 Peter Grehan freebsd_committer

2017-12-31 01:30:24 UTC

Yes, much easier with a FreeBSD (-based) guest. Some config questions - what version of pfsense/Opnsense, how many guest vCPUs, and what's the AMD h/w setup ?

Comment 25 mikkel 2017-12-31 10:09:53 UTC

(In reply to Peter Grehan from comment #24)

OPNsense-17.7.5-OpenSSL-dvd-amd64.iso
pfSense-CE-2.4.2-RELEASE-amd64.iso

Latest FreeNAS 11.1

2,4,8 vCPU
4,8 vRAM

Threadripper 1950X
MSI X399 GAMING PRO CARBON AC (latest BIOS)
8x16GB 3200Mhz DDR4
3x 512GB NVMe in RAIDz1 - 40GB ZVOL per guest

Comment 26 Andriy Gapon freebsd_committer

2018-01-09 09:34:12 UTC

Created attachment 189559 [details]
ktr capture of the problem

I am able to reproduce the problem with a FreeBSD guest on Phenom II X6 1090T.
The problem seems to be a guest IPI lost by vmm/svm.
The attached ktr demonstrates that.

Comment 27 Andriy Gapon freebsd_committer

2018-01-09 13:58:04 UTC

Please see https://reviews.freebsd.org/D13780 for a possible / potential fix.

Comment 28 Andriy Gapon freebsd_committer

2018-01-15 08:59:41 UTC

And an alternative proposal: https://reviews.freebsd.org/D13828

Comment 29 Nils Beyer 2018-01-31 08:50:41 UTC

(In reply to Andriy Gapon from comment #27)

thanks, with that patch (D13780), I also am able to use multiple vCPUs at every stage of Windows pleasure.

When will it go upstream?

Comment 30 Andriy Gapon freebsd_committer

2018-01-31 09:04:26 UTC

(In reply to Nils Beyer from comment #29)

I still cannot decide between D13780 and D13828.
I have given some light testing to both, both seem to work.

Comment 31 Peter Grehan freebsd_committer

2018-01-31 09:07:16 UTC

Please check in D13780 - I much prefer that one unless the later version can be shown to have better performance.

Comment 32 Nils Beyer 2018-01-31 09:46:59 UTC

(In reply to Andriy Gapon from comment #30)

well, performance-wise I did a Cinebench R15 (RC184115DEMO) benchmark (CPU) under Windows 10 (latest release) with both patch variants - here are the results:

D13780 - CB-Results: 484, 483, 484
D13828 - CB-Results: 481, 482, 479

no much difference. Regarding stability (production-quality-wise) I can't say anything... Yet.

For giggles, here's the Cinebench info panel's content:
--------------------------------------------------------------------------
Processor: AMD Ryzen 7 1700 Eight-Core Processor
Cores x GHz: 4 Cores, 4 Threads @3.00 GHz
OS: Windows 8, 64Bit, Professional Edition
GFX Board: <empty>
--------------------------------------------------------------------------

Comment 33 commit-hook freebsd_committer

2018-01-31 11:14:44 UTC

A commit references this bug:

Author: avg
Date: Wed Jan 31 11:14:26 UTC 2018
New revision: 328622
URL: https://svnweb.freebsd.org/changeset/base/328622

Log:
  vmm/svm: post LAPIC interrupts using event injection, not virtual interrupts

  The virtual interrupt method uses V_IRQ, V_INTR_PRIO, and V_INTR_VECTOR
  fields of VMCB to inject a virtual interrupt into a guest VM.  This
  method has many advantages over the direct event injection as it
  offloads all decisions of whether and when the interrupt can be
  delivered to the guest.  But with a purely software emulated vAPIC the
  advantage is also a problem.  The problem is that the hypervisor does
  not have any precise control over when the interrupt is actually
  delivered to the guest (or a notification about that).  Because of that
  the hypervisor cannot update the interrupt vector in IRR and ISR in the
  same way as real hardware would.  The hypervisor becomes aware that the
  interrupt is being serviced only upon the first VMEXIT after the
  interrupt is delivered.  This creates a window between the actual
  interrupt delivery and the update of IRR and ISR.  That means that IRR
  and ISR might not be correctly set up to the point of the
  end-of-interrupt signal.

  The described deviation has been observed to cause an interrupt loss in
  the following scenario.  vCPU0 posts an inter-processor interrupt to
  vCPU1.  The interrupt is injected as a virtual interrupt by the
  hypervisor.  The interrupt is delivered to a guest and an interrupt
  handler is invoked.  The handler performs a requested action and
  acknowledges the request by modifying a global variable.  So far, there
  is no VMEXIT and the hypervisor is unaware of the events.  Then, vCPU0
  notices the acknowledgment and sends another IPI with the same vector.
  The IPI gets collapsed into the previous IPI in the IRR of vCPU1.  Only
  after that a VMEXIT of vCPU1 occurs.  At that time the vector is cleared
  in the IRR and is set in the ISR.  vCPU1 has vAPIC state as if the
  second IPI has never been sent.
  The scenario is impossible on the real hardware because IRR and ISR are
  updated just before the interrupt handler gets started.

  I saw several possibilities of fixing the problem.  One is to intercept
  the virtual interrupt delivery to update IRR and ISR at the right
  moment.  The other is to deliver the LAPIC interrupts using the event
  injection, same as legacy interrupts.  I opted to use the latter
  approach for several reasons.  It's equivalent to what VMM/Intel does
  (in !VMX case).  It appears to be what VirtualBox and KVM do.  The code
  is already there (to support legacy interrupts).

  Another possibility was to use a special intermediate state for a vector
  after it is injected using a virtual interrupt and before it is known
  whether it was accepted or is still pending.
  That approach was implemented in https://reviews.freebsd.org/D13828
  That method is more complex and does not have any clear advantage.

  Please see sections 15.20 and 15.21.4 of "AMD64 Architecture
  Programmer's Manual Volume 2: System Programming" (publication 24593,
  revision 3.29) for comparison between event injection and virtual
  interrupt injection.

  PR:		215972
  Reported by:	ajschot@hotmail.com, grehan
  Tested by:	anish, grehan,  Nils Beyer <nbe@renzel.net>
  Reviewed by:	anish, grehan
  MFC after:	2 weeks
  Differential Revision: https://reviews.freebsd.org/D13780

Changes:
  head/sys/amd64/vmm/amd/svm.c

Comment 34 Andriy Gapon freebsd_committer

2018-01-31 11:40:02 UTC

(In reply to Nils Beyer from comment #32)
Thank you for testing!
I've just committed D13780 based on Peter's guidance and your testing.

Comment 35 Nils Beyer 2018-01-31 14:55:49 UTC

thank you very much. Any chance to get that in 11-STABLE as well?

Comment 36 Nils Beyer 2018-01-31 20:05:43 UTC

(In reply to Nils Beyer from comment #35)

sorry guys; please forget my last comment. Didn't see that MFC note...

Comment 37 commit-hook freebsd_committer

2018-02-15 17:10:18 UTC

A commit references this bug:

Author: avg
Date: Thu Feb 15 17:09:48 UTC 2018
New revision: 329320
URL: https://svnweb.freebsd.org/changeset/base/329320

Log:
  MFC r328622: vmm/svm: post LAPIC interrupts using event injection

  PR:		215972

Changes:
_U  stable/11/
  stable/11/sys/amd64/vmm/amd/svm.c

Comment 38 commit-hook freebsd_committer

2018-02-15 17:11:24 UTC

A commit references this bug:

Author: avg
Date: Thu Feb 15 17:10:42 UTC 2018
New revision: 329321
URL: https://svnweb.freebsd.org/changeset/base/329321

Log:
  MFC r328622: vmm/svm: post LAPIC interrupts using event injection

  PR:		215972

Changes:
_U  stable/10/
  stable/10/sys/amd64/vmm/amd/svm.c

Comment 39 Adam Jimerson 2018-05-10 11:14:51 UTC

It seems I'm running into this issue still running FreeBSD 12.0-CURRENT as the guest and trying to run make buildworld.

Host: 11.1-RELEASE-p10
Guest: 12.0-CURRENT

Stacktrace
---
spin lock 0xffffffff81d42760 (smp rendezvous) held by 0xfffff800040c0560 (tid 100089) too long
panic: spin lock held too long
cpuid = 3
time = 1525935605
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe000046e570
vpanic() at vpanic+0x18d/frame 0xfffffe000046e5d0
panic() at panic+0x43/frame 0xfffffe000046e630
_mtx_lock_indefinite_check() at _mtx_lock_indefinite_check+0x8c/frame 0xfffffe000046e650
_mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0xd5/frame 0xfffffe000046e6c0
__mtx_lock_spin_flags() at __mtx_lock_spin_flags+0xd8/frame 0xfffffe000046e700
smp_targeted_tlb_shootdown() at smp_targeted_tlb_shootdown+0xd8/frame 0xfffffe000046e780
smp_masked_invlpg_range() at smp_masked_invlpg_range+0x42/frame 0xfffffe000046e7b0
pmap_invalidate_range() at pmap_invalidate_range+0x291/frame 0xfffffe000046e810
pmap_remove_ptes() at pmap_remove_ptes+0xae/frame 0xfffffe000046e870
pmap_remove() at pmap_remove+0x404/frame 0xfffffe000046e8f0
_kmem_unback() at _kmem_unback+0x43/frame 0xfffffe000046e930
kmem_free() at kmem_free+0x37/frame 0xfffffe000046e950
zone_drain_wait() at zone_drain_wait+0x374/frame 0xfffffe000046e9b0
arc_kmem_reap_now() at arc_kmem_reap_now+0xa4/frame 0xfffffe000046e9e0
arc_reclaim_thread() at arc_reclaim_thread+0x2e5/frame 0xfffffe000046ea70
fork_exit() at fork_exit+0x84/frame 0xfffffe000046eab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe000046eab0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
[ thread pid 8 tid 100056 ]
Stopped at      kdb_enter+0x3b: movq    $0,kdb_why

Sysctls
---
hw.vmm.npt.pmap_flags: 507
hw.vmm.svm.num_asids: 32768
hw.vmm.svm.disable_npf_assist: 0
hw.vmm.svm.features: 113919
hw.vmm.svm.vmcb_clean: 959
hw.vmm.vmx.vpid_alloc_failed: 0
hw.vmm.vmx.posted_interrupt_vector: -1
hw.vmm.vmx.cap.posted_interrupts: 0
hw.vmm.vmx.cap.virtual_interrupt_delivery: 0
hw.vmm.vmx.cap.invpcid: 0
hw.vmm.vmx.cap.monitor_trap: 0
hw.vmm.vmx.cap.unrestricted_guest: 0
hw.vmm.vmx.cap.pause_exit: 0
hw.vmm.vmx.cap.halt_exit: 0
hw.vmm.vmx.initialized: 0
hw.vmm.vmx.cr4_zeros_mask: 0
hw.vmm.vmx.cr4_ones_mask: 0
hw.vmm.vmx.cr0_zeros_mask: 0
hw.vmm.vmx.cr0_ones_mask: 0
hw.vmm.ept.pmap_flags: 0
hw.vmm.vrtc.flag_broken_time: 1
hw.vmm.ppt.devices: 0
hw.vmm.iommu.enable: 1
hw.vmm.iommu.initialized: 0
hw.vmm.bhyve_xcpuids: 8346
hw.vmm.topology.cpuid_leaf_b: 1
hw.vmm.topology.cores_per_package: 2
hw.vmm.topology.threads_per_core: 1
hw.vmm.create: beavis
hw.vmm.destroy: beavis
hw.vmm.trace_guest_exceptions: 0
hw.vmm.ipinum: 251
hw.vmm.halt_detection: 1

Bhyve options (running bhyve using https://github.com/churchers/vm-bhyve as a frontend if need be I can see if I can get it to spit out the full command rather than just the options passed)
---
May 09 20:05:33:  [bhyve options: -c 4 -m 6G -AHPw -U 84b02223-f0d7-11e7-a8e5-1c1b0de910d7]
May 09 20:05:33:  [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,virtio-blk,/bhyve/fbsd-current/disk0.img -s 5:0,virtio-net,tap0,mac=58:9c:fc:0b:23:f9]
May 09 20:05:33:  [bhyve console: -l com1,stdio]

CPU info
---
hw.model: AMD Ryzen 7 1700 Eight-Core Processor
hw.machine: amd64
hw.ncpu: 16

My FreeBSD 12-Current guest is the only one I have problems with so fair (also have a Linux guest and another BSD guest but neither have done anything CPU intensive)

Comment 40 Anish Gupta freebsd_committer

2018-05-10 19:21:26 UTC

Can you provide host 11.1 change number? Andiy's fix r328622 is in 11-stable https://svnweb.freebsd.org/base/stable/11/sys/amd64/vmm/amd/svm.c?view=log, just want to confirm.

Comment 41 Adam Jimerson 2018-05-14 13:16:36 UTC

Sorry didn't realize this was still only on the STABLE branch. As my host is currently on RELEASE branch I probably won't get the patch until 11.2.

Comment 42 Andriy Gapon freebsd_committer

2018-05-31 21:00:03 UTC

*** Bug 215377 has been marked as a duplicate of this bug. ***

Comment 43 Rodney W. Grimes freebsd_committer

2019-02-20 05:06:19 UTC

(In reply to Adam Jimerson from comment #41)
Should we try to push an EN for this issue?