Bug 250802 - bhyve exited with status 134 when GPU PCI passthrough
Summary: bhyve exited with status 134 when GPU PCI passthrough
Status: Closed Unable to Reproduce
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 12.2-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-02 09:10 UTC by mr.xanto
Modified: 2021-05-13 07:23 UTC (History)
5 users (show)

See Also:


Attachments
debugging #2 (1.34 KB, patch)
2020-11-09 12:16 UTC, Konstantin Belousov
no flags Details | Diff
debugging #3 (660 bytes, patch)
2020-11-09 14:29 UTC, Konstantin Belousov
no flags Details | Diff
debugging #4 (408 bytes, patch)
2020-11-09 21:38 UTC, Konstantin Belousov
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description mr.xanto 2020-11-02 09:10:56 UTC
Try to pass through GPU to a Windows 10 guest.

/boot/loader.conf:
vmm_load="YES"
pptdevs="1/0/0 1/0/1"

# pciconf -lv
ppt0@pci0:1:0:0:        class=0x030000 card=0x38991642 chip=0x0a6310de rev=0xa2 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'GT218 [GeForce 310]'
    class      = display
    subclass   = VGA
ppt1@pci0:1:0:1:        class=0x040300 card=0x38991642 chip=0x0be310de rev=0xa1 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'High Definition Audio Controller'
    class      = multimedia
    subclass   = HDA

bhyve options:
  [bhyve options: -c 1 -m 3G -Hwl bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -U 00dad61d-0d2e-11eb-936d-70f3951447ea -S]
  [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,ahci,hd:/mnt/Data/vm/win10/disk0.img -s 5:0,virtio-net,tap0,mac=58:9c:fc:01:00:a8 -s 6:0,passthru,1/0/0 -s 6:1,passthru,1/0/1 -s 7:0,xhci,tablet]
  [bhyve console: -l com1,stdio]
  [bhyve iso device: -s 3:0,ahci-cd,/mnt/Data/vm/.config/null.iso]

Windows guest see NVidia GPU, try to install driver, and after reboot bhyve stop working with error:

bhyve exited with status 134
Unhandled ps2 keyboard command 0x02
Unhandled ps2 keyboard command 0x02
Assertion failed: (pi->pi_bar[baridx].type == PCIBAR_IO), function passthru_write, file /usr/src/usr.sbin/bhyve/pci_passthru.c, line 916.

# uname -v
FreeBSD 12.2-STABLE r366667 GENERIC
Comment 1 Konstantin Belousov freebsd_committer 2020-11-02 13:55:42 UTC
Show the output of pciconf -lvcb on host, and preferrably in guest.  For later
you would need to start FreeBSD VM with the same pass-through config once.
Comment 2 Robert Crowston 2020-11-02 23:31:56 UTC
Are you on AMD? I have had the same problem.

For some reason the processor is unable to handle accesses to BARs mapped above 4 GB.  The vm exits on such access and transfers control back to the vmm. But the vmm does not expect to handle memory passthrough -- the processor is supposed to handle that natively -- only i/o passthrough, tripping this assertion.

There is a hardcoded limit where we decide whether to allocate a bar above or below the 4 GB mark, in pci_emul.c, at pci_emul_alloc_pbar():
                                                                                      
/*
 * XXX
 * Some drivers do not work well if the 64-bit BAR is allocated
 * above 4GB. Allow for this by allocating small requests under
 * 4GB unless then allocation size is larger than some arbitrary
 * number (32MB currently).
 */
 if (size > 32 * 1024 * 1024) {

In the past I have found that by raising this limit, such that all bars are allocated in the lower 32 bit address space, I can start a GPU under Linux. I have not had success under Windows.

It's worth noting that most real BIOS or UEFIs preferentially allocate even large BARs in the lower 4 GB of the address space, so that configuration is much better tested for consumer devices.

    -- RHC.
Comment 3 mr.xanto 2020-11-03 07:02:33 UTC
(In reply to Konstantin Belousov from comment #1)

From host:
ppt0@pci0:1:0:0:        class=0x030000 card=0x38991642 chip=0x0a6310de rev=0xa2 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'GT218 [GeForce 310]'
    class      = display
    subclass   = VGA
    bar   [10] = type Memory, range 32, base rxfa000000, size 16777216, enabled
    bar   [14] = type Prefetchable Memory, range 64, base rxc0000000, size 268435456, enabled
    bar   [1c] = type Prefetchable Memory, range 64, base rxd0000000, size 33554432, enabled
    bar   [24] = type I/O Port, range 32, base rxe000, size 128, enabled
    cap 01[60] = powerspec 3  supports D0 D3  current D0
    cap 05[68] = MSI supports 1 message, 64 bit
    cap 10[78] = PCI-Express 2 endpoint max data 128(128) RO NS
                 max read 128
                 link x16(x16) speed 2.5(2.5) ASPM disabled(L0s/L1) ClockPM enabled
    cap 09[b4] = vendor (length 20)
    ecap 0002[100] = VC 1 max VC0
    ecap 0004[128] = Power Budgeting 1
    ecap 000b[600] = Vendor 1 ID 1
ppt1@pci0:1:0:1:        class=0x040300 card=0x38991642 chip=0x0be310de rev=0xa1 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'High Definition Audio Controller'
    class      = multimedia
    subclass   = HDA
    bar   [10] = type Memory, range 32, base rxfb080000, size 16384, enabled
    cap 01[60] = powerspec 3  supports D0 D3  current D0
    cap 05[68] = MSI supports 1 message, 64 bit
    cap 10[78] = PCI-Express 2 endpoint max data 128(128) NS
                 max read 128
                 link x16(x16) speed 2.5(2.5) ASPM L0s/L1(L0s/L1) ClockPM enabled

bhyve options:
  [bhyve options: -c 1 -m 3G -Hwl bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -U 7bd95998-1d4b-11eb-b522-70f3951447ea -S]
  [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,ahci,hd:/mnt/Data/vm/bsd_test/disk0.img -s 5:0,virtio-net,tap0,mac=58:9c:fc:0a:22:9a -s 6:0,passthru,1/0/0 -s 6:1,passthru,1/0/1 -s 7:0,xhci,tablet]
  [bhyve console: -l com1,stdio]
  [bhyve iso device: -s 3:0,ahci-cd,/mnt/Data/vm/.iso/FreeBSD-12.1-STABLE-amd64-20200827-r364849-bootonly.iso,ro]

FreeBSD guest:
root@:~ # pciconf -lvcb
hostb0@pci0:0:0:0:      class=0x060000 card=0x00000000 chip=0x12751275 rev=0x00 hdr=0x00
    vendor     = 'Network Appliance Corporation'
    class      = bridge
    subclass   = HOST-PCI
    cap 10[40] = PCI-Express 2 root port max data 128(128)
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s)
ahci0@pci0:0:3:0:       class=0x010601 card=0x00000000 chip=0x28218086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA Controller [AHCI mode]'
    class      = mass storage
    subclass   = SATA
    bar   [24] = type Memory, range 32, base rxc0000000, size 1024, enabled
    cap 05[40] = MSI supports 8 messages, 64 bit enabled with 8 messages
ahci1@pci0:0:4:0:       class=0x010601 card=0x00000000 chip=0x28218086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA Controller [AHCI mode]'
    class      = mass storage
    subclass   = SATA
    bar   [24] = type Memory, range 32, base rxc0000400, size 1024, enabled
    cap 05[40] = MSI supports 8 messages, 64 bit enabled with 8 messages
virtio_pci0@pci0:0:5:0: class=0x020000 card=0x00011af4 chip=0x10001af4 rev=0x00 hdr=0x00
    vendor     = 'Red Hat, Inc.'
    device     = 'Virtio network device'
    class      = network
    subclass   = ethernet
    bar   [10] = type I/O Port, range 32, base rx2000, size 32, enabled
    bar   [14] = type Memory, range 32, base rxc0002000, size 8192, enabled
    cap 11[40] = MSI-X supports 3 messages, enabled
                 Table in map 0x14[0x0], PBA in map 0x14[0x1000]
    cap 05[4c] = MSI supports 1 message, 64 bit
vgapci0@pci0:0:6:0:     class=0x030000 card=0x38991642 chip=0x0a6310de rev=0xa2 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'GT218 [GeForce 310]'
    class      = display
    subclass   = VGA
    bar   [10] = type Memory, range 32, base rxc1000000, size 16777216, enabled
    bar   [14] = type Prefetchable Memory, range 64, base rxd000000000, size 268435456, enabled
    bar   [1c] = type Prefetchable Memory, range 64, base rxc2000000, size 33554432, enabled
    bar   [24] = type I/O Port, range 32, base rx2080, size 128, enabled
    cap 01[60] = powerspec 3  supports D0 D3  current D0
    cap 05[68] = MSI supports 1 message, 64 bit
    cap 10[78] = PCI-Express 2 endpoint max data 128(128) RO NS
                 link x16(x16) speed 2.5(2.5) ASPM disabled(L0s/L1) ClockPM enabled
    cap 09[b4] = vendor (length 20)
hdac0@pci0:0:6:1:       class=0x040300 card=0x38991642 chip=0x0be310de rev=0xa1 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'High Definition Audio Controller'
    class      = multimedia
    subclass   = HDA
    bar   [10] = type Memory, range 32, base rxc4000000, size 16384, enabled
    cap 01[60] = powerspec 3  supports D0 D3  current D0
    cap 05[68] = MSI supports 1 message, 64 bit
    cap 10[78] = PCI-Express 2 endpoint max data 128(128) NS
                 link x16(x16) speed 2.5(2.5) ASPM L0s/L1(L0s/L1) ClockPM enabled
xhci0@pci0:0:7:0:       class=0x0c0330 card=0x00000000 chip=0x1e318086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '7 Series/C210 Series Chipset Family USB xHCI Host Controller'
    class      = serial bus
    subclass   = USB
    bar   [10] = type Memory, range 32, base rxc4004000, size 4096, enabled
    cap 05[40] = MSI supports 1 message, 64 bit enabled with 1 message
isab0@pci0:0:31:0:      class=0x060100 card=0x00000000 chip=0x70008086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82371SB PIIX3 ISA [Natoma/Triton II]'
    class      = bridge
    subclass   = PCI-ISA
root@:~ #
Comment 4 mr.xanto 2020-11-03 07:11:35 UTC
(In reply to Robert Crowston from comment #2)

If you mean CPU - here is Intel(R) Core(TM) i5 CPU. Do you looked at D26209?
As i understand, this patch add possibility to pass through internal GPU to guest. Windows guest is supported too. Maybe it also rework PCIBAR, but I haven't tried it yet.
Comment 5 Konstantin Belousov freebsd_committer 2020-11-04 20:36:56 UTC
Please try https://reviews.freebsd.org/D27092.  I am not completely sure if this
would work, but worth an attempt.
Comment 6 Peter Grehan freebsd_committer 2020-11-04 21:04:20 UTC
I just commented on the review: that patch shouldn't be needed (a comment on why the assert is there could help I guess).

A possibility here is that the 64-bit PCI that bhyve reserves is out of physical address range for that model CPU. The fix for this is to make the base of this region dynamic after a CPUID to determine the max phys addr lines.

Another fix that Robert mentioned is to relax the overly-small 32MB restriction on placing mmio bars in the PCI hole below 4G. It can be 256MB without any issues, and 512MB with sorting of BARs.
Comment 7 Konstantin Belousov freebsd_committer 2020-11-04 21:10:53 UTC
(In reply to Peter Grehan from comment #6)
I do not think we can get out with just increase of the space for BAR mapping below
4G in general.  Modern GPUs tend to increase aperture size, and for something with
10G of VRAM together with coherent CPU/GPU memory, we simply cannot put the
apperture below 4G.

In comment #3 there is pciconf output from the guest, bars seems to be below 4G.
Comment 8 Peter Grehan freebsd_committer 2020-11-04 21:22:50 UTC
I don't mean to increase the aperture (it's 1G starting at 3G) - just to relax the check on whether BARs should be punted to the 64-bit area. That check is too restrictive.

For this case the bar is only 256MB so can fit there.

>pciconf output from the guest, bars seems to be below 4G.

 The bar in question was placed into the 64-bit area in the guest:

    bar   [14] = type Prefetchable Memory, range 64, base rxd000000000, size 268435456, enabled

... but was below 4G in the host:
    bar   [1c] = type Prefetchable Memory, range 64, base rxd0000000, size 33554432, enabled

As mentioned, the 0xd000000000 area might be out of range on that host.
Comment 9 Konstantin Belousov freebsd_committer 2020-11-05 00:52:09 UTC
Indeed 0xd0_0000_0000 is above the max supported phys address on any i5.

I put together a patch https://reviews.freebsd.org/D27095 to select the starting
address for 64bit membars below the max phys address read from CPUID.

Please test.
Comment 10 mr.xanto 2020-11-05 09:20:40 UTC
(In reply to Konstantin Belousov from comment #9)

bhyve options:
  [bhyve options: -c 1 -m 3G -Hwl bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -U 7bd95998-1d4b-11eb-b522-70f3951447ea -S]
  [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,ahci,hd:/mnt/Data/vm/bsd_test/disk0.img -s 5:0,virtio-net,tap0,mac=58:9c:fc:0a:22:9a -s 6:0,passthru,1/0/0 -s 6:1,passthru,1/0/1 -s 7:0,xhci,tablet]
  [bhyve console: -l com1,stdio]
  [bhyve iso device: -s 3:0,ahci-cd,/mnt/Data/vm/.iso/FreeBSD-12.1-STABLE-amd64-20200827-r364849-bootonly.iso,ro]

log errors:
bhyve exited with status 4
bhyve: failed to initialize BARs for PCI 1/0/0
device emulation initialization error: No such file or directory

# pciconf -lvcb
ppt0@pci0:1:0:0:        class=0x030000 card=0x38991642 chip=0x0a6310de rev=0xa2 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'GT218 [GeForce 310]'
    class      = display
    subclass   = VGA
    bar   [10] = type Memory, range 32, base rxfa000000, size 16777216, enabled
    bar   [14] = type Prefetchable Memory, range 64, base rxc0000000, size 268435456, enabled
    bar   [1c] = type Prefetchable Memory, range 64, base rxd0000000, size 33554432, enabled
    bar   [24] = type I/O Port, range 32, base rxe000, size 128, enabled
    cap 01[60] = powerspec 3  supports D0 D3  current D0
    cap 05[68] = MSI supports 1 message, 64 bit
    cap 10[78] = PCI-Express 2 endpoint max data 128(128) RO NS
                 max read 128
                 link x16(x16) speed 2.5(2.5) ASPM disabled(L0s/L1) ClockPM enabled
    cap 09[b4] = vendor (length 20)
    ecap 0002[100] = VC 1 max VC0
    ecap 0004[128] = Power Budgeting 1
    ecap 000b[600] = Vendor 1 ID 1
ppt1@pci0:1:0:1:        class=0x040300 card=0x38991642 chip=0x0be310de rev=0xa1 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'High Definition Audio Controller'
    class      = multimedia
    subclass   = HDA
    bar   [10] = type Memory, range 32, base rxfb080000, size 16384, enabled
    cap 01[60] = powerspec 3  supports D0 D3  current D0
    cap 05[68] = MSI supports 1 message, 64 bit
    cap 10[78] = PCI-Express 2 endpoint max data 128(128) NS
                 max read 128
                 link x16(x16) speed 2.5(2.5) ASPM L0s/L1(L0s/L1) ClockPM enabled
Comment 11 Konstantin Belousov freebsd_committer 2020-11-05 21:46:19 UTC
(In reply to mr.xanto from comment #10)
Try with the Diff 79234 from the same review.
Comment 12 Robert Crowston 2020-11-05 22:26:39 UTC
With diff 79192 I have the same error on my AMD Ryzen system.

The (base + size) <= limit check in pci_emul_alloc_resource() fails.
Comment 13 Robert Crowston 2020-11-05 22:55:01 UTC
Likewise for 79234 (if I leave the size of the small allocation at 32 MB). Raising it to 256 MB "fixes" the problem.
Comment 14 Konstantin Belousov freebsd_committer 2020-11-06 00:02:15 UTC
(In reply to Robert Crowston from comment #13)
Please try Diff 79243.  I think I see the issue.
Comment 15 mr.xanto 2020-11-06 14:09:46 UTC
(In reply to Konstantin Belousov from comment #14)
 I've applied patch https://pastebin.com/JJ80p7jf

FreeBSD guest:
root@:~ # pciconf -lvcb
hostb0@pci0:0:0:0:      class=0x060000 card=0x00000000 chip=0x12751275 rev=0x00 hdr=0x00
    vendor     = 'Network Appliance Corporation'
    class      = bridge
    subclass   = HOST-PCI
    cap 10[40] = PCI-Express 2 root port max data 128(128)
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s)
ahci0@pci0:0:3:0:       class=0x010601 card=0x00000000 chip=0x28218086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA Controller [AHCI mode]'
    class      = mass storage
    subclass   = SATA
    bar   [24] = type Memory, range 32, base rxc0000000, size 1024, enabled
    cap 05[40] = MSI supports 8 messages, 64 bit enabled with 8 messages
ahci1@pci0:0:4:0:       class=0x010601 card=0x00000000 chip=0x28218086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801HR/HO/HH (ICH8R/DO/DH) 6 port SATA Controller [AHCI mode]'
    class      = mass storage
    subclass   = SATA
    bar   [24] = type Memory, range 32, base rxc0000400, size 1024, enabled
    cap 05[40] = MSI supports 8 messages, 64 bit enabled with 8 messages
virtio_pci0@pci0:0:5:0: class=0x020000 card=0x00011af4 chip=0x10001af4 rev=0x00 hdr=0x00
    vendor     = 'Red Hat, Inc.'
    device     = 'Virtio network device'
    class      = network
    subclass   = ethernet
    bar   [10] = type I/O Port, range 32, base rx2000, size 32, enabled
    bar   [14] = type Memory, range 32, base rxc0002000, size 8192, enabled
    cap 11[40] = MSI-X supports 3 messages, enabled
                 Table in map 0x14[0x0], PBA in map 0x14[0x1000]
    cap 05[4c] = MSI supports 1 message, 64 bit
vgapci0@pci0:0:6:0:     class=0x030000 card=0x38991642 chip=0x0a6310de rev=0xa2 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'GT218 [GeForce 310]'
    class      = display
    subclass   = VGA
    bar   [10] = type Memory, range 32, base rxc1000000, size 16777216, enabled
    bar   [14] = type Prefetchable Memory, range 64, base rx800000000, size 268435456, enabled
    bar   [1c] = type Prefetchable Memory, range 64, base rxc2000000, size 33554432, enabled
    bar   [24] = type I/O Port, range 32, base rx2080, size 128, enabled
    cap 01[60] = powerspec 3  supports D0 D3  current D0
    cap 05[68] = MSI supports 1 message, 64 bit
    cap 10[78] = PCI-Express 2 endpoint max data 128(128) RO NS
                 link x16(x16) speed 2.5(2.5) ASPM disabled(L0s/L1) ClockPM enabled
    cap 09[b4] = vendor (length 20)
hdac0@pci0:0:6:1:       class=0x040300 card=0x38991642 chip=0x0be310de rev=0xa1 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'High Definition Audio Controller'
    class      = multimedia
    subclass   = HDA
    bar   [10] = type Memory, range 32, base rxc4000000, size 16384, enabled
    cap 01[60] = powerspec 3  supports D0 D3  current D0
    cap 05[68] = MSI supports 1 message, 64 bit
    cap 10[78] = PCI-Express 2 endpoint max data 128(128) NS
                 link x16(x16) speed 2.5(2.5) ASPM L0s/L1(L0s/L1) ClockPM enabled
xhci0@pci0:0:7:0:       class=0x0c0330 card=0x00000000 chip=0x1e318086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '7 Series/C210 Series Chipset Family USB xHCI Host Controller'
    class      = serial bus
    subclass   = USB
    bar   [10] = type Memory, range 32, base rxc4004000, size 4096, enabled
    cap 05[40] = MSI supports 1 message, 64 bit enabled with 1 message
isab0@pci0:0:31:0:      class=0x060100 card=0x00000000 chip=0x70008086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82371SB PIIX3 ISA [Natoma/Triton II]'
    class      = bridge
    subclass   = PCI-ISA
root@:~ #

But Windows 10 guest still won't work:

 booting
  [bhyve options: -c 1 -m 3G -Hwl bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -U 00dad61d-0d2e-11eb-936d-70f3951447ea -S]
  [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,ahci,hd:/mnt/Data/vm/win10/disk0.img,cd:/mnt/Data/vm/.iso/virtio-win-0.1.189.iso -s 5:0,virtio-net,tap0,mac=58:9c:fc:01:00:a8 -
s 6:0,passthru,1/0/0 -s 6:1,passthru,1/0/1 -s 7:0,xhci,tablet]
  [bhyve console: -l com1,stdio]
  [bhyve iso device: -s 3:0,ahci-cd,/mnt/Data/vm/.config/null.iso]
 starting bhyve (run 1)
 bhyve exited with status 134

Unhandled ps2 keyboard command 0x02
Unhandled ps2 keyboard command 0x02
Assertion failed: (pi->pi_bar[baridx].type == PCIBAR_IO), function passthru_write, file /usr/src/usr.sbin/bhyve/pci_passthru.c, line 916.
Comment 16 Konstantin Belousov freebsd_committer 2020-11-06 17:01:12 UTC
(In reply to mr.xanto from comment #15)
Can you install sysutils/x86info and paste the output of 'x86info -a' there, please ?
Comment 17 Konstantin Belousov freebsd_committer 2020-11-06 17:46:10 UTC
(In reply to Konstantin Belousov from comment #16)
Also I assume that vm itself should be around after the bhyve process terminated.
Paste the output from 'bhyvectl --vm=<your vm> --get-all'
Comment 18 mr.xanto 2020-11-06 20:15:54 UTC
(In reply to Konstantin Belousov from comment #16)
x86info output: https://pastebin.com/LAkTvc9G

(In reply to Konstantin Belousov from comment #17)
As I understand, bhyvectl --get-all output will be different in different times,
one output: https://pastebin.com/wdV6dJ3v
Comment 19 Konstantin Belousov freebsd_committer 2020-11-06 20:57:15 UTC
(In reply to mr.xanto from comment #18)
The output from bhyvectl --get-all should be stable after bhyve process aborted,
since VM is stopped and there is nobody who would direct it to continue execution.
Was your paste obtained this way ?

Because 'exit_reason[0]  0x1' means that VMX was exited due to external interrupts,
from what I understand.  It should not result in bhyve usermode emulating BAR
access.  So it is strange at least.
Comment 20 mr.xanto 2020-11-07 11:51:09 UTC
(In reply to Konstantin Belousov from comment #19)
I mean, that when bhyve process is aborted, bhyvectl return "VM:<name> is not created.

I've wrote stupid script for get bhyvectl message before start and after close vm:

#!/bin/sh

echo "=============================================================================" >> /mnt/Data/vm/win10/get.log
date +"%Y.%m.%d - %X" >> /mnt/Data/vm/win10/get.log
bhyvectl --vm=win10 --get-all >> /mnt/Data/vm/win10/get.log
echo "=============================================================================" >> /mnt/Data/vm/win10/get.log
vm start win10
sleep 1

while pgrep bhyve >/dev/null;
do
    date +"%Y.%m.%d - %X" >> /mnt/Data/vm/win10/get.log
    bhyvectl --vm=win10 --get-all >> /mnt/Data/vm/win10/get.log
    echo "=============================================================================" >> /mnt/Data/vm/win10/get.log
    sleep 0.1
done
    sleep 1
    date +"%Y.%m.%d - %X" >> /mnt/Data/vm/win10/get.log
    bhyvectl --vm=win10 --get-all >> /mnt/Data/vm/win10/get.log
    echo "=============================================================================" >> /mnt/Data/vm/win10/get.log

Result of work this script I've placed at https://yadi.sk/d/Xn3AFSoW_wDdRA

bhyve log:
Unhandled ps2 keyboard command 0x02
Unhandled ps2 keyboard command 0x02
Assertion failed: (pi->pi_bar[baridx].type == PCIBAR_IO), function passthru_write, file /usr/src/usr.sbin/bhyve/pci_passthru.c, line 916.

bhyve options:
Nov 07 14:32:42: booting
Nov 07 14:32:42:  [bhyve options: -c 1 -m 3G -Hwl bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -U 00dad61d-0d2e-11eb-936d-70f3951447ea -S]
Nov 07 14:32:42:  [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,ahci,hd:/mnt/Data/vm/win10/disk0.img -s 5:0,virtio-net,tap0,mac=58:9c:fc:01:00:a8 -s 6:0,passthru,1/0
/0 -s 6:1,passthru,1/0/1]
Nov 07 14:32:42:  [bhyve console: -l com1,stdio]
Nov 07 14:32:42:  [bhyve iso device: -s 3:0,ahci-cd,/mnt/Data/vm/.config/null.iso]
Nov 07 14:32:42: starting bhyve (run 1)
Nov 07 14:33:40: bhyve exited with status 0
Nov 07 14:33:40: restarting
Nov 07 14:33:40:  [bhyve options: -c 1 -m 3G -Hwl bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -U 00dad61d-0d2e-11eb-936d-70f3951447ea -S]
Nov 07 14:33:40:  [bhyve devices: -s 0,hostbridge -s 31,lpc -s 4:0,ahci,hd:/mnt/Data/vm/win10/disk0.img -s 5:0,virtio-net,tap0,mac=58:9c:fc:01:00:a8 -s 6:0,passthru,1/0
/0 -s 6:1,passthru,1/0/1]
Nov 07 14:33:40:  [bhyve console: -l com1,stdio]
Nov 07 14:33:40:  [bhyve iso device: -s 3:0,ahci-cd,/mnt/Data/vm/.config/null.iso]
Nov 07 14:33:40: starting bhyve (run 2)
Nov 07 14:34:15: bhyve exited with status 134

Some explanations: first start of Windows guest after crash will be in diagnostic mode, and because I don't have
console access to VM (as I understand, GPU pass through not compatible with "fbuf" options), then I've reset VM
and when VM load, bhyve exited.
Comment 21 Robert Crowston 2020-11-07 14:56:57 UTC
With the latest patches I now get a failure at the vm_map_pptdev_mmio() call.

(gdb) 
597                             error = vm_map_pptdev_mmio(ctx, sc->psc_sel.pc_bus,
=> 0x000000000023d178 <passthru_init+2904>:     41 0f b6 b4 24 ac 00 00 00      movzbl 0xac(%r12),%esi
(gdb) p/x *sc                                                                                                                                                                       
$14 = {psc_pi = 0x800ad4a00, psc_bar = {{type = 0x2, size = 0x1000000, addr = 0xf5000000}, {type = 0x3, size = 0x8000000, addr = 0xe0000000}, {type = 0x0, size = 0x0, addr = 0x0}, 
    {type = 0x0, size = 0x0, addr = 0x0}, {type = 0x0, size = 0x0, addr = 0x0}, {type = 0x0, size = 0x0, addr = 0x0}}, psc_msi = {capoff = 0x68, msgctrl = 0x80, emulated = 0x0}, 
  psc_msix = {capoff = 0x0}, psc_sel = {pc_domain = 0x0, pc_bus = 0x7, pc_dev = 0x0, pc_func = 0x0}}
(gdb) p i
$15 = 1
(gdb) p/x pi->pi_bar[1]
$16 = {type = 0x3, size = 0x8000000, addr = 0x800000000000}
Comment 22 Konstantin Belousov freebsd_committer 2020-11-07 18:39:27 UTC
(In reply to mr.xanto from comment #20)
Ok, do the following.  Start your bhyve command under gdb. Like this:
/usr/local/bin/gdb /usr/sbin/bhyve
(gdb) set args ...
(gdb) run
Now on assert the process should stop in debugger instead of terminating,
and hopefully bhyvectl would find still alive VM.


(In reply to Robert Crowston from comment #21)
What exactly is the 'failure' ?  Is it SIGSEGV, other signal,
error returned from ioctl, etc ?
Comment 23 Robert Crowston 2020-11-07 20:27:45 UTC
(In reply to Konstantin Belousov from comment #22)

The ioctl fails with ENOMEM. 

(gdb) c
Continuing.
bhyve: failed to initialize BARs for PCI 7/0/0
device emulation initialization error: Cannot allocate memory
[LWP 102417 of process 27361 exited]
[LWP 102535 of process 27361 exited]
[LWP 102530 of process 27361 exited]
[LWP 102534 of process 27361 exited]
[LWP 102528 of process 27361 exited]
[LWP 102533 of process 27361 exited]
[LWP 102536 of process 27361 exited]
[LWP 102537 of process 27361 exited]
[LWP 102538 of process 27361 exited]
[Inferior 1 (process 27361) exited with code 04]

vm_map_pptdev_mmio() is called with (ctx=0x800299040, bus=7, slot=0, func=0, 
    gpa=140737488355328, len=134217728, hpa=3758096384).

I don't have a kernel debugger set up, but it looks like ENOMEM can only happen if vm_pager_allocate() fails.
Comment 24 Robert Crowston 2020-11-07 20:36:23 UTC
Correction: ENOMEM can also happen if vm_map_find() fails [which seems more likely given that vm_pager_allocate() in vmm_mmio_alloc() is not a function of the gpa].
Comment 25 mr.xanto 2020-11-07 20:42:33 UTC
(In reply to Konstantin Belousov from comment #22)
(gdb) run
Starting program: /usr/sbin/bhyve -c 1 -m 3G -Hwl bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -U 00dad61d-0d2e-11eb-936d-70f3951447ea -S -s 0,hostbridge -s 31,lpc -s 4:0,ahci,hd:/mnt/Data/vm/win10/disk0.img -s 5:0,virtio-net,tap0,mac=58:9c:fc:01:00:a8 -s 6:0,passthru,1/0/0 -s 6:1,passthru,1/0/1 win10
[New LWP 101371 of process 48097]
[New LWP 101372 of process 48097]
[New LWP 101373 of process 48097]
[New LWP 101374 of process 48097]
[New LWP 101375 of process 48097]
[New LWP 101376 of process 48097]
[New LWP 101377 of process 48097]
[New LWP 101378 of process 48097]
[New LWP 101379 of process 48097]
[New LWP 101380 of process 48097]
Unhandled ps2 keyboard command 0x02
Unhandled ps2 keyboard command 0x02
Assertion failed: (pi->pi_bar[baridx].type == PCIBAR_IO), function passthru_write, file /usr/src/usr.sbin/bhyve/pci_passthru.c, line 916.

Thread 11 "vcpu 0" received signal SIGABRT, Aborted.
[Switching to LWP 101380 of process 48097]
0x000000080080aafa in thr_kill () from /lib/libc.so.7
(gdb)

# bhyvectl --vm=win10 --get-all https://pastebin.com/77mgyp9Q
Comment 26 Konstantin Belousov freebsd_committer 2020-11-07 21:21:23 UTC
(In reply to Robert Crowston from comment #24)
Please try Diff 79310 from the review.

I assume you are on HEAD. If you use stable/12. then replace
VM_MAXUSER_ADDRESS_LA47 with VM_MAXUSER_ADDRESS.
Comment 27 Robert Crowston 2020-11-07 21:46:33 UTC
(In reply to Konstantin Belousov from comment #26)

I am on stable/12, so I added
    if (cpu_maxphysaddr > VM_MAXUSER_ADDRESS)
            cpu_maxphysaddr = VM_MAXUSER_ADDRESS;

(I also needed #include <vm/pmap.h> and #include <machine/vmparam.h>.)

Now I am back to the original error:

wrmsr to register 0xc0011029(0x3) on vcpu 0
rdmsr to register 0xc00000e9 on vcpu 0
Unhandled ps2 mouse command 0xe1
Unhandled ps2 mouse command 0x88
Assertion failed: (pi->pi_bar[baridx].type == PCIBAR_IO), function passthru_write, file /usr/src/usr.sbin/bhyve/pci_passthru.c, line 916.
Comment 28 Robert Crowston 2020-11-07 21:49:27 UTC
(Sorry, I see there are more changes than just those two lines. I will apply the diff again from stable/12 to ensure there is no confusion.)
Comment 29 Robert Crowston 2020-11-07 22:08:33 UTC
OK, to be clear, I did not change vmm.c (because the LA47 isn't in 12.2, right?), and I used VM_MAXUSER_ADDRESS in place of VM_MAXUSER_ADDRESS_LA47. Aside from that I applied Diff 79310.

I also needed to add
#include <vm/pmap.h>
#include <machine/vmparam.h>
to pci_emul.c to find the VM_MAXUSER_ADDRESS macro constant.

I lower the limit from 128 MB to 32 MB (because I want to test if your patch fixes the 64 bit mapping, and my BAR is 128 MB.)

Then I get:
wrmsr to register 0xc0011029(0x3) on vcpu 0
rdmsr to register 0xc00000e9 on vcpu 0
Unhandled ps2 mouse command 0xe1
Unhandled ps2 mouse command 0x88
Assertion failed: (pi->pi_bar[baridx].type == PCIBAR_IO), function passthru_write, file /usr/src/usr.sbin/bhyve/pci_passthru.c, line 916.
Comment 30 Konstantin Belousov freebsd_committer 2020-11-07 22:43:37 UTC
(In reply to Robert Crowston from comment #28)
If you see the same error i.e. vmexit to assist memory access in BAR, please
provide *consistent* set of the following data, for the problematic VM:
1. pciconf -lvcb from the guest (boot FreeeBSD installation ISO once)
2. bhyvectl --get-all --vm=VM (with bhyvectl abort catched in gdb, see
   comment #22
3. gdb 'p/x pi->pi_bar' output from bhyve stopped in abort in item 2.
Comment 31 mr.xanto 2020-11-08 09:26:25 UTC
(In reply to Konstantin Belousov from comment #30)

# /usr/local/bin/gdb /usr/sbin/bhyve
GNU gdb (GDB) 9.2 [GDB v9.2 for FreeBSD]

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/bhyve...
(gdb) set args -c 1 -m 3G -Hwl bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd -U 00dad61d-0d2e-11eb-936d-70f3951447ea -S -s 0,hostbridge -s 31,lpc -s 4:0,ahci,hd:/mnt/Data/vm/win10/disk0.img -s 5:0,virtio-net,tap0,mac=58:9c:fc:01:00:a8 -s 6:0,passthru,1/0/0 -s 6:1,passthru,1/0/1 win10
(gdb) run
Assertion failed: (pi->pi_bar[baridx].type == PCIBAR_IO), function passthru_write, file /usr/src/usr.sbin/bhyve/pci_passthru.c, line 916.

Thread 11 "vcpu 0" received signal SIGABRT, Aborted.
[Switching to LWP 101424 of process 52943]
0x000000080080aafa in thr_kill () from /lib/libc.so.7
(gdb) backtrace
#0  0x000000080080aafa in thr_kill () from /lib/libc.so.7
#1  0x0000000800808f54 in raise () from /lib/libc.so.7
#2  0x000000080077f259 in abort () from /lib/libc.so.7
#3  0x00000008007f9161 in __assert () from /lib/libc.so.7
#4  0x000000000023d9e4 in passthru_write (ctx=0x800299080, vcpu=0, pi=0x800ad4780, baridx=1, offset=242044928, size=4, value=0)
    at /usr/src/usr.sbin/bhyve/pci_passthru.c:916
#5  0x000000000023685f in pci_emul_mem_handler (ctx=0x800299080, vcpu=6, dir=<optimized out>, addr=<optimized out>, size=<optimized out>, val=0x7fffdedf4d60,
    arg1=0x800ad4780, arg2=1) at /usr/src/usr.sbin/bhyve/pci_emul.c:411
#6  0x0000000000228c15 in mem_write (ctx=0x18c30, vcpu=6, gpa=34368170778, wval=0, size=0, arg=0x0) at /usr/src/usr.sbin/bhyve/mem.c:162
#7  0x000000000024ebb6 in emulate_mov (vm=<optimized out>, vcpuid=<optimized out>, gpa=<optimized out>, vie=<optimized out>, memread=<optimized out>,
    memwrite=0x228be0 <mem_write>, arg=0x800b76100) at /usr/src/sys/amd64/vmm/vmm_instruction_emul.c:600
#8  vmm_emulate_instruction (vm=<optimized out>, vcpuid=<optimized out>, gpa=<optimized out>, vie=<optimized out>, paging=<optimized out>, memread=<optimized out>,
    memwrite=0x228be0 <mem_write>, memarg=0x800b76100) at /usr/src/sys/amd64/vmm/vmm_instruction_emul.c:1697
#9  0x000000000022859f in emulate_mem_cb (ctx=0x18c30, vcpu=6, paddr=0, mr=0x0, arg=<optimized out>) at /usr/src/usr.sbin/bhyve/mem.c:238
#10 0x00000000002284c4 in access_memory (ctx=0x800299080, vcpu=0, paddr=3731705856, cb=0x228580 <emulate_mem_cb>, arg=0x7fffdedf4ee8)
    at /usr/src/usr.sbin/bhyve/mem.c:215
#11 0x00000000002283e9 in emulate_mem (ctx=0x18c30, vcpu=6, paddr=0, vie=<optimized out>, paging=<optimized out>) at /usr/src/usr.sbin/bhyve/mem.c:251
#12 0x000000000021e845 in vmexit_inst_emul (ctx=0x18c30, vmexit=0x256140 <vmexit>, pvcpu=<optimized out>) at /usr/src/usr.sbin/bhyve/bhyverun.c:716
#13 0x000000000021e29c in vm_loop (ctx=0x800299080, vcpu=0, startrip=65520) at /usr/src/usr.sbin/bhyve/bhyverun.c:853
#14 0x000000000021d5a3 in fbsdrun_start_thread (param=0x2569c0 <mt_vmm_info>) at /usr/src/usr.sbin/bhyve/bhyverun.c:427
#15 0x0000000800635fac in ?? () from /lib/libthr.so.3
#16 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffdedf5000
(gdb) f 4
#4  0x000000000023d9e4 in passthru_write (ctx=0x800299080, vcpu=0, pi=0x800ad4780, baridx=1, offset=242044928, size=4, value=0)
    at /usr/src/usr.sbin/bhyve/pci_passthru.c:916
916                     assert(pi->pi_bar[baridx].type == PCIBAR_IO);
(gdb) p/x pi->pi_bar
$1 = {{type = 0x2, size = 0x1000000, addr = 0xc1000000}, {type = 0x3, size = 0x10000000, addr = 0xd0000000}, {type = 0x4, size = 0x0, addr = 0x0}, {type = 0x3,
    size = 0x2000000, addr = 0xc2000000}, {type = 0x4, size = 0x0, addr = 0x0}, {type = 0x1, size = 0x80, addr = 0x2080}}
(gdb)

bhyvectl --get-all: https://pastebin.com/u1exMRin
Comment 32 Konstantin Belousov freebsd_committer 2020-11-08 12:49:58 UTC
(In reply to mr.xanto from comment #31)
Ok, my prolonged confusion comes from the fact that bhyvectl --get-all reports
EPT violation vmexit.  While your bhyve instance actually tries to handle
instruction emulation exit, and emulation assist requires access to the
membar.

So I restored the initial patch, that seems to do the right thing after all.
https://reviews.freebsd.org/D27138

Please apply both D27095 and D27138 and see if it helps.
Comment 33 Peter Grehan freebsd_committer 2020-11-08 12:58:58 UTC
I don't believe that change is required: it must be masking a bug somewhere else. There should never be instruction emulation for a pass-thru BAR - if there is an EPT fault for the region allocated in vmm_mmio_alloc(), the mapping should just be set up and the instruction retried.
Comment 34 mr.xanto 2020-11-08 14:43:52 UTC
(In reply to Konstantin Belousov from comment #32)

Aplied patches: https://pastebin.com/fAEepFn6 and https://pastebin.com/ARYK653R.

Unhandled ps2 keyboard command 0x02
Unhandled ps2 keyboard command 0x02

Thread 11 "vcpu 0" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 101448 of process 54659]
write_mem (ctx=0x800299080, vcpu=0, gpa=3731705856, wval=0, size=4) at /usr/src/usr.sbin/bhyve/mem.c:287
287             rma.val = &wval;
(gdb) backtrace
#0  write_mem (ctx=0x800299080, vcpu=0, gpa=3731705856, wval=0, size=4) at /usr/src/usr.sbin/bhyve/mem.c:287
#1  0x000000000023d874 in passthru_write (ctx=0x800299080, vcpu=0, pi=0x800ad4780, baridx=1, offset=<optimized out>, size=4, value=0)
    at /usr/src/usr.sbin/bhyve/pci_passthru.c:930
#2  0x00000000002368ba in pci_emul_mem_handler (ctx=0x800299080, vcpu=0, dir=<optimized out>, addr=<optimized out>, size=4, val=0x7fffdebf6140, arg1=0x800ad4780,
    arg2=1) at /usr/src/usr.sbin/bhyve/pci_emul.c:416
#3  0x00000000002285f7 in rw_mem_cb (ctx=0x800299080, vcpu=0, paddr=0, mr=0x7fffdebf6000, arg=<optimized out>) at /usr/src/usr.sbin/bhyve/mem.c:267
#4  0x0000000000228494 in access_memory (ctx=0x800299080, vcpu=0, paddr=3731705856, cb=0x2285d0 <rw_mem_cb>, arg=0x7fffdebf6130) at /usr/src/usr.sbin/bhyve/mem.c:215
#5  0x0000000000228638 in write_mem (ctx=0x800299080, vcpu=0, gpa=3731705856, wval=<optimized out>, size=<optimized out>) at /usr/src/usr.sbin/bhyve/mem.c:290
#6  0x000000000023d874 in passthru_write (ctx=0x800299080, vcpu=0, pi=0x800ad4780, baridx=1, offset=<optimized out>, size=4, value=0)
    at /usr/src/usr.sbin/bhyve/pci_passthru.c:930
#7  0x00000000002368ba in pci_emul_mem_handler (ctx=0x800299080, vcpu=0, dir=<optimized out>, addr=<optimized out>, size=4, val=0x7fffdebf6140, arg1=0x800ad4780,
    arg2=1) at /usr/src/usr.sbin/bhyve/pci_emul.c:416
#8  0x00000000002285f7 in rw_mem_cb (ctx=0x800299080, vcpu=0, paddr=0, mr=0x7fffdebf6000, arg=<optimized out>) at /usr/src/usr.sbin/bhyve/mem.c:267
#9  0x0000000000228494 in access_memory (ctx=0x800299080, vcpu=0, paddr=3731705856, cb=0x2285d0 <rw_mem_cb>, arg=0x7fffdebf6270) at /usr/src/usr.sbin/bhyve/mem.c:215
#10 0x0000000000228638 in write_mem (ctx=0x800299080, vcpu=0, gpa=3731705856, wval=<optimized out>, size=<optimized out>) at /usr/src/usr.sbin/bhyve/mem.c:290
#11 0x000000000023d874 in passthru_write (ctx=0x800299080, vcpu=0, pi=0x800ad4780, baridx=1, offset=<optimized out>, size=4, value=0)
    at /usr/src/usr.sbin/bhyve/pci_passthru.c:930
#12 0x00000000002368ba in pci_emul_mem_handler (ctx=0x800299080, vcpu=0, dir=<optimized out>, addr=<optimized out>, size=4, val=0x7fffdebf6140, arg1=0x800ad4780,
    arg2=1) at /usr/src/usr.sbin/bhyve/pci_emul.c:416
and many other frames.

bhyvectl --get-all https://pastebin.com/fRmHDX4b
Comment 35 Konstantin Belousov freebsd_committer 2020-11-08 20:45:33 UTC
(In reply to mr.xanto from comment #34)
Ok, so D27138 cannot work because BARs are not mapped into userspace, and patch
caused infinite recursion.  Not to mention Peter' objection.

Lets try to see why EPT violation was translated to instruction assist instead
of being handled by vm_fault().

First, you use stable/12 and you do not have debugging turned on.  Please enable
at least INVARIANTS in your kernel.

Second, I do not see how to find the cause except by some debugging.  Apply
the following patch for start and report if the printf triggered

diff --git a/sys/amd64/vmm/vmm.c b/sys/amd64/vmm/vmm.c
index 3a1d0d54bca..1e715d458a9 100644
--- a/sys/amd64/vmm/vmm.c
+++ b/sys/amd64/vmm/vmm.c
@@ -650,6 +650,7 @@ vm_mem_allocated(struct vm *vm, int vcpuid, vm_paddr_t gpa)
 			return (true);		/* 'gpa' is sysmem or devmem */
 	}
 
+if (gpa >= 0xd0000000 && gpa < 0xe0000000) printf("ppt_is_mmio %#lx %d\n", gpa, ppt_is_mmio(vm, gpa));
 	if (ppt_is_mmio(vm, gpa))
 		return (true);			/* 'gpa' is pci passthru mmio */
Comment 36 mr.xanto 2020-11-09 08:16:43 UTC
(In reply to Konstantin Belousov from comment #35)

I've removed D27138, set debug options for kernel (https://pastebin.com/5YLGAG91), aplied patch to vmm.c (https://pastebin.com/CSgmJ49g).
gdb trace - https://pastebin.com/A81z4g3p

From vmm I see next: # ppt_is_mmio 0xde6d5000 0
Comment 37 Konstantin Belousov freebsd_committer 2020-11-09 12:16:02 UTC
Created attachment 219488 [details]
debugging #2

Please apply the attached debugging patch, rebuild kernel, and report what messages
you see from vmm.ko.
Comment 38 mr.xanto 2020-11-09 13:56:13 UTC
(In reply to Konstantin Belousov from comment #37)
May be I do something wrong, but I'm don't see any new messages from vmm.ko exept "ppt_is_mmio 0xde6d5000 0".
I should running bhyve under gdb or this isn't matter?
Comment 39 Konstantin Belousov freebsd_committer 2020-11-09 14:08:54 UTC
(In reply to mr.xanto from comment #38)
No need for gdb, and the fact that you do not see any other messages
is probably telling.  I will update later.
Comment 40 Konstantin Belousov freebsd_committer 2020-11-09 14:29:44 UTC
Created attachment 219490 [details]
debugging #3

Please apply the following patch to usr.sbin/bhyve, no need to revert kernel patch
or rebuild the kernel.  Then I want to see messages from bhyve during startup.
Comment 41 mr.xanto 2020-11-09 14:42:21 UTC
(In reply to Konstantin Belousov from comment #40)
vm_map_pptdev_mmio bar 0 err 0 pci0:1:0:0 addr 0xc1000000 sz 0x1000000 base rxfa000000
vm_map_pptdev_mmio bar 1 err 0 pci0:1:0:0 addr 0x800000000 sz 0x10000000 base rxc0000000
vm_map_pptdev_mmio bar 3 err 0 pci0:1:0:0 addr 0xc2000000 sz 0x2000000 base rxd0000000
vm_map_pptdev_mmio bar 0 err 0 pci0:1:0:1 addr 0xc4000000 sz 0x4000 base rxfb080000
Comment 42 Konstantin Belousov freebsd_committer 2020-11-09 17:31:09 UTC
(In reply to mr.xanto from comment #41)
I cannot make sense of this output, it seems to not align with your previous
data from gdb.  Also bugzilla adds formatting.

Or it could be the problem, but lets re-check.

Please re-run patched bhyve one more, now under gdb.
I want bhyve output, plus printout from 'p/x pi->pi_bar'.
Also I want the 'bhyvectl --vm=VM --get-all' from the same moment.
Put this into some paste service or as a plain text attachment.

Sorry for the hassle.
Comment 43 mr.xanto 2020-11-09 19:18:57 UTC
(In reply to Konstantin Belousov from comment #42)
bhyve output - https://pastebin.com/2pG1y31v
bhyvectl output - https://pastebin.com/5wXAGjzK
vmm.ko output - https://pastebin.com/wNJJ1f5r
Comment 44 Konstantin Belousov freebsd_committer 2020-11-09 19:41:25 UTC
(In reply to mr.xanto from comment #43)
Thank you, this is useful.  Another request, please boot FreeBSD into this VM,
and paste the output from pciconf -lvbc again (please use pastebin again).
Comment 45 mr.xanto 2020-11-09 19:54:13 UTC
(In reply to Konstantin Belousov from comment #44)
pciconf -lvbc from FreeBSD guest: https://pastebin.com/EWrys8GL
Comment 46 Konstantin Belousov freebsd_committer 2020-11-09 20:24:47 UTC
(In reply to mr.xanto from comment #45)
Ok, so my current understanding after looking at all debug data:

1. The real hardware BAR base address is 0xd000000.
2. We remap the BAR at 0x800000000 in guest.
3. The guest vmexit was due to EPT violation (exit reason 0x30) at GPA 
   0xde6d5000.

Indeed, we did not mapped anything in the guest phys address 0xd0000000.
I cannot explain it in any other way, then assume that NVIDIA device exposes
BAR' bases somewhere beyond config space, and NVIDIA driver knows about that.
Then it accesses (remaps in guest VA) something by that side-channel address.

So there is no bug in bhyve as is, instead I would say that we have an
incompatibility between hardware and virtualization.  A possible, but quite
work-intensive approach to *try* to fix it is to ensure that pass-through
devices get their BARs bases mapped with GPA identical to HPA (host physical
address).  Doing that requires full rewrite of the BAR resource alloc code
in bhyve.

Any thought ?
Comment 47 Konstantin Belousov freebsd_committer 2020-11-09 21:38:34 UTC
Created attachment 219501 [details]
debugging #4

A hack to try my theory.  Revert everything except D27095, and also apply
'debugging #4' patch.  It should create 1:1 hpa:gpa mapping for that BAR.
Comment 48 Peter Grehan freebsd_committer 2020-11-09 21:43:33 UTC
I'm suspecting that this is due to lack of ROM support.

There is a Linux VFIO talk somewhere that discusses 1:1 mappings, and also some other quirk issues such as adapter host config space being available in MMIO regions.
Comment 49 Konstantin Belousov freebsd_committer 2020-11-09 22:34:10 UTC
(In reply to Peter Grehan from comment #48)
I remember that starting with SandyBridge, Intel GPUs have a window in their
membar that translates accesses to mchbar.  Quick look at the docs suggests
that Skylake still does this.

So do you think that 1:1 gpa:hpa mapping for pass-through would be helpful ?
Do you remember a title of the talk, or have an url ?
Comment 50 Peter Grehan freebsd_committer 2020-11-09 23:04:07 UTC
Forcing a 1:1 mapping could be useful for a number of reasons, though it complicates the resource allocation code, and also won't allow guest re-assignment of BARs.

I'll dig around for the presentation, but it was by Alex Williamson and might be linked somewhere in http://vfio.blogspot.com
Comment 52 mr.xanto 2020-11-10 07:19:33 UTC
(In reply to Konstantin Belousov from comment #47)

/usr/src # find . -name "*.orig"
./sys/amd64/vmm/io/ppt.c.orig
./sys/amd64/vmm/vmm.c.orig
./usr.sbin/bhyve/pci_emul.c.orig
root@host:/usr/src #

sys/amd64/vmm/io/ppt.c - https://pastebin.com/12zAc7tH
sys/amd64/vmm/vmm.c - https://pastebin.com/ucZ4TpMW
usr.sbin/bhyve/pci_emul.c - https://pastebin.com/fFAdTCY7

pciconf from FreeBSD guest - https://pastebin.com/SP2ePhsu
bhyve under gdb - https://pastebin.com/y1y3A23V
bhyvectl - https://pastebin.com/G85SDUHg

Run bhyve directly get the same "Assertion failed" error.
Comment 53 Konstantin Belousov freebsd_committer 2020-11-10 21:12:18 UTC
(In reply to Peter Grehan from comment #51)
Peter, the pdf was interesting, thank you for the direct reference.

I considered 1:1, and I have a plan to produce something like libuvmmem.so
from kern/subr_vmem.c, then it should be not that hard to add proper resource
management for MEMBARs (and IOBARs as well, in fact).

But this would take time.  Peter, do you think that D27095 is worth committing
meantime ?  IMO it is still an improvement.
Comment 54 Peter Grehan freebsd_committer 2020-11-10 22:35:33 UTC
On D27095 - yes, it's worth going in.
Comment 55 commit-hook freebsd_committer 2020-11-12 00:47:50 UTC
A commit references this bug:

Author: kib
Date: Thu Nov 12 00:46:53 UTC 2020
New revision: 367606
URL: https://svnweb.freebsd.org/changeset/base/367606

Log:
  bhyve: avoid allocating BARs above the end of supported physical addresses.

  Read CPUID leaf 0x8000008 to determine max supported phys address and
  create BAR region right below it, reserving 1/4 of the supported guest
  physical address space to the 64bit BARs mappings.

  PR:    250802 (although the issue from PR is not fixed by the change)
  Noted and reviewed by:	grehan
  Sponsored by:	The FreeBSD Foundation
  MFC after:	2 weeks
  Differential revision:	https://reviews.freebsd.org/D27095

Changes:
  head/usr.sbin/bhyve/pci_emul.c
Comment 56 mr.xanto 2021-02-26 15:16:45 UTC
I've try to pass through another (more newer) GPU:
ppt0@pci0:1:0:0:        class=0x030000 card=0x00007377 chip=0x128710de rev=0xa1 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'GK208B [GeForce GT 730]'
    class      = display
    subclass   = VGA
    bar   [10] = type Memory, range 32, base rxfa000000, size 16777216, enabled
    bar   [14] = type Prefetchable Memory, range 64, base rxf0000000, size 134217728, enabled
    bar   [1c] = type Prefetchable Memory, range 64, base rxf8000000, size 33554432, enabled
    bar   [24] = type I/O Port, range 32, base rxe000, size 128, enabled
    cap 01[60] = powerspec 3  supports D0 D3  current D0
    cap 05[68] = MSI supports 1 message, 64 bit
    cap 10[78] = PCI-Express 2 legacy endpoint max data 128(256) NS
                 max read 128
                 link x8(x8) speed 5.0(5.0) ASPM disabled(L0s/L1) ClockPM disabled
    ecap 0002[100] = VC 1 max VC0
    ecap 0004[128] = Power Budgeting 1
    ecap 000b[600] = Vendor 1 ID 1
ppt1@pci0:1:0:1:        class=0x040300 card=0x00007377 chip=0x0e0f10de rev=0xa1 hdr=0x00
    vendor     = 'NVIDIA Corporation'
    device     = 'GK208 HDMI/DP Audio Controller'
    class      = multimedia
    subclass   = HDA
    bar   [10] = type Memory, range 32, base rxfb080000, size 16384, enabled
    cap 01[60] = powerspec 3  supports D0 D3  current D0
    cap 05[68] = MSI supports 1 message, 64 bit
    cap 10[78] = PCI-Express 2 endpoint max data 128(256) NS
                 max read 128
                 link x8(x8) speed 5.0(5.0) ASPM L0s/L1(L0s/L1) ClockPM disabled

and get random traps. Now there is FreeBSD 12.2-STABLE GENERIC (with debug):
#cat /usr/src/.gituprevision
stable/12:879039312
#
One of text-dump: https://disk.yandex.ru/d/l8PdtxrFKrlzSg

Any thought what going on?
Comment 57 mr.xanto 2021-05-13 07:22:56 UTC
Not reproducible after upgrade to stable/13, uefi-edk2-bhyve and hardware update to GK208B [GeForce GT 730]. 
But Windows guest still can't work with GPU - "Windows has stopped this device because it has reported problems. (Code 43)". Perhaps, there should be setting in bhyve to hide vendor id from guest, as it can be do in KVM.