Bug 241581

Summary: PCIe passthrough is broken in QEMU 4 due to PCI Device ID conflict
Product: Base System Reporter: kevo
Component: kernAssignee: Vincenzo Maffione <vmaffione>
Status: In Progress ---    
Severity: Affects Some People CC: drum, freebsd, freebsd, grehan, kevo, tommyhp2, vmaffione
Priority: ---    
Version: 11.2-STABLE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
Data collection during tests none

Description kevo 2019-10-30 01:51:20 UTC
I run a couple of FreeBSD based VMs in Proxmox. Upgrading to Proxmox 6 which includes QEMU 4 broke the PCIe passthrough I had setup in both VMs. 

On further investigation someone on the proxmox forums suggested that the problem is a device ID conflict.

https://forum.proxmox.com/threads/vm-w-pcie-passthrough-not-working-after-upgrading-to-6-0.56021/post-274339

Apparently the QEMU vendor and device ids are being used for the ptnetmap-memdev device.

https://svnweb.freebsd.org/base/release/12.0.0/sys/net/netmap_virt.h?view=markup#l44

https://devicehunt.com/view/type/pci/vendor/1B36/device/000C

Any chance this can be fixed. I am currently having to run my VMs with an older version of the Q35 machine in QEMU to make them work and I don't see any way I can upgrade to later revisions until this is fixed. I suspect there are quite a few people affected by this.
Comment 1 John Hartley 2020-01-22 02:24:49 UTC
(In reply to kevo from comment #0)

HI Kevo & Vincenzo,


I did the following tests:

QEMU Q35 V4.0 / OVMF with VirtIO SCSI & NIC == e1000 + vmxnet + netmap patched FreeBSD:

SCSI - VirtIO SCS disk not found
NIC - em0 & vmx0 found

QEMU Q35 V4.0 / OVMF with SATA & NIC == e1000 + vmxnet + netmap patched FreeBSD:

SATA - disk found
NIC - em0 & vmx0 found

QEMU Q35 V4.0 OVMF with SATA & NIC == e1000e + vmxnet + netmap patched FreeBSD:

SATA - disk found
NIC - em - not found, vmx0 found

So this appears to confirm that there is general issues with Q35 V4.0 PCIe devices (VirtIO PCIe and e100e), as identified by Tommy T here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=241774#c77

Could this be confirmed by doing QEMU Q35 PCIe passthrough using netmap patched (which addresses PCI ID conflicts) FreeBSD VM ?

I can run this test if requried as I have test machine with Intel PCIe NIC installed and should be able to try passthrough to this.

Cheers,


John Hartley.
Comment 2 Vincenzo Maffione freebsd_committer freebsd_triage 2020-01-22 23:08:25 UTC
(In reply to kevo from comment #0)
You are right. Sorry for that.
This has been fixed in r356805.
Comment 3 Tommy P 2020-01-26 08:07:48 UTC
From others' and my investigations in BR 236922 & 241774, I doubt netmap is the cause.  If the configuration is imported from pre QEMU 4.0 environment, everything works as expected.  If the VM is created in the new QEMU 4.0 environment, all of the peripherals connected to the PCIe stops working in FreeBSD because of the new:

    -device pcie-pci-bridge,id=pci.2,bus=pci.1,addr=0x0

instead of the older:

    -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e
    -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0

Initially I thought it was this configuration:

   -machine pc-q35-4.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off

But when changed to:

   -machine pc-q35-2.11,accel=kvm,usb=off,vmport=off,dump-guest-core=off

has no effect.  I thought it was limitations of IRQ # from dmesg's verbose:

    Failed to allocate interrupt for PCI-e events

But further research [1] shows that it's not since I can not find that entry in the specified code of 12.1 r356769.  I did find an error report of similar issue [2] but no official documentation into the specifics of PCIE-PCI bridge.  There are plenty of official documentation on PCI-PCI bridge including the recent addition of NEW_PCIB [3].  There are 2 others in BR 236922 & 241774 have also experienced similar issue with QEMU 4.0


[1] https://lists.freebsd.org/pipermail/freebsd-current/2018-April/069203.html
[2] https://lists.freebsd.org/pipermail/freebsd-drivers/2007-December/000584.html
[3] https://wiki.freebsd.org/NEW_PCIB

===================================================================================

Here are the devices of my testing VM with different 12.1 kernels:

**** Original kernel from ISO media ---  Operational
VirtIO
    PCIe - Console                       no
    PCIe - Memory Balloon                no
    PCIe - Network                       no
    PCIe - SCSI                          no
    PCIe - Serial                        no
NIC (non VirtIO NIC)
    PCI  - e1000 Intel 82545EM (em)      no
    PCIe - e1000e Intel 82574 (em ?)     no
    PCI  - rtl8139                       no
Mass Storage (non VirtIO)
    PCI  - SATA (AHCI)                   yes
    PCI  - SCSI (LSI/Symbios sym)        no

**** Custom r356769 kernel w/ applied patches [4] from BR 236922 with netmap
VirtIO
    PCIe - Console                       no
    PCIe - Memory Balloon                no
    PCIe - Network                       no
    PCIe - SCSI                          no
    PCIe - Serial                        no
NIC (non VirtIO NIC)
    PCI  - e1000 Intel 82545EM (em)      yes
    PCIe - e1000e Intel 82574 (em ?)     no
    PCI  - rtl8139                       yes
Mass Storage (non VirtIO)
    PCI  - SATA (AHCI)                   yes
    PCI  - SCSI (LSI/Symbios sym)        yes

**** Custom r356769 kernel w/ applied patches [4] from BR 236922 without netmap
VirtIO
    PCIe - Console                       no
    PCIe - Memory Balloon                no
    PCIe - Network                       no
    PCIe - SCSI                          no
    PCIe - Serial                        no
NIC (non VirtIO NIC)
    PCI  - e1000 Intel 82545EM (em)      yes
    PCIe - e1000e Intel 82574 (em ?)     no
    PCI  - rtl8139                       yes
Mass Storage (non VirtIO)
    PCI  - SATA (AHCI)                   yes
    PCI  - SCSI (LSI/Symbios sym)        yes

If the hard drive is attached to an imported configuration from pre QEMU 4.x, all of the VirtIO works utilizing:

    -device i82801b11-bridge,id=pci.1,bus=pcie.0,addr=0x1e
    -device pci-bridge,chassis_nr=2,id=pci.2,bus=pci.1,addr=0x0

instead of the newer:

    -device pcie-pci-bridge,id=pci.2,bus=pci.1,addr=0x0

-----------------------------------------------------------------------------------

[4] BR 236922 - VirtIO support for PCIe in Q35
        https://bugs.freebsd.org/bugzilla/attachment.cgi?id=210737
    BR 236922 & 241774 - Disable VirtIO + netmap interop since netmap doesn't fully supports PCIe ATM.
        https://bugs.freebsd.org/bugzilla/attachment.cgi?id=210783
Comment 4 Tommy P 2020-01-26 08:11:47 UTC
Created attachment 211053 [details]
Data collection during tests

Custom kernel
QEMU 4.0 VM config
dmesg verbose
pciconf
sysctl of hw.pci
Comment 5 John Hartley 2020-01-26 09:02:06 UTC
(In reply to Tommy P from comment #3)

Hi Tommy,

I have also done testing on this:

Q35 / v4 / OVMF / SATA / e1000 / e1000e / vmxnet3 / PCI Passthrough to Intel X550 10GbE / FreeBSD 12.1 with disable "dev netmap" (sys/amd64/conf/GENERIC):

Results:

SATA - OK
e1000 - ok comes up as em0
e1000e - fails does not get detected (PCIe connected virtual device)
vmxnet3 - ok comes up as vmx0
X550 10GbE - fails does not get detected (PCIe connected physical device)

dmest error (as per yours):

<<DMESG>>
...
pcib2: <PCI-PCI bridge> mem 0xc8b87000-0xc8b87fff irq 22 at device 2.1 on pci0
pcib2: Failed to allocate interrupt for PCI-e events
pcib3: <PCI-PCI bridge> mem 0xc8b86000-0xc8b86fff irq 22 at device 2.2 on pci0
pcib3: Failed to allocate interrupt for PCI-e events
pcib4: <PCI-PCI bridge> mem 0xc8b85000-0xc8b85fff irq 22 at device 2.3 on pci0
pcib4: Failed to allocate interrupt for PCI-e events
pcib5: <PCI-PCI bridge> mem 0xc8b84000-0xc8b84fff irq 22 at device 2.4 on pci0
pcib5: Failed to allocate interrupt for PCI-e events
...
<<END DMESG>>

ifconfig

<<IFCONFIG>>
# ifconfig -a
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
	inet 127.0.0.1 netmask 0xff000000
	groups: lo
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=e403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
	ether 52:53:01:17:15:aa
	inet XX.XXX.XXX.53 netmask 0xffffff80 broadcast 203.XXX.XXX.127
	media: Ethernet autoselect
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
em0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=81209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER>
	ether 52:54:00:a4:13:df
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
<<END IFCONFIG>>


I then retested with except with QEMU Q35 V3.1:

SATA - OK
e1000 - ok comes up as em0
e1000e - ok comes up as em1
vmxnet3 - ok comes up as vmx0
X550 10GbE - ok comes up as ix0

ifconfig

<<IFCONFIG>>
# ifconfig -a
vmx0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=e403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
	ether 52:53:01:17:15:aa
	inet XXX.XXX.XXX.53 netmask 0xffffff80 broadcast XXX.XXX.XXX.127
	media: Ethernet autoselect
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
em0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=81209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWFILTER>
	ether 52:54:00:a4:13:df
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
ix0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=e53fbb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_UCAST,WOL_MCAST,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
	ether b4:96:91:21:4a:ce
	media: Ethernet autoselect
	status: no carrier
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
em1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=81249b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,WOL_MAGIC,VLAN_HWFILTER>
	ether 52:54:00:f8:3b:94
	media: Ethernet autoselect (1000baseT <full-duplex>)
	status: active
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5
	inet 127.0.0.1 netmask 0xff000000
	groups: lo
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
<<END IFCONFIG>>

Here is PCI passthrough XML snippet:

<<LIBVIRT XML>>
virsh dumpxml test-freebsd-12.1 
<domain type='kvm' id='5'>
  <name>test-freebsd-12.1</name>
  <uuid>a50005d7-7425-435f-82e9-e76f18784693</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://freebsd.org/freebsd/12.0"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static'>2</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-q35-3.1'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram>/home/XXX/DIR/OVMF_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state='off'/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>Broadwell-IBRS</model>
    <vendor>Intel</vendor>
    <feature policy='require' name='vme'/>
    <feature policy='require' name='ss'/>
    <feature policy='require' name='vmx'/>
    <feature policy='require' name='f16c'/>
    <feature policy='require' name='rdrand'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='arat'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='umip'/>
    <feature policy='require' name='md-clear'/>
    <feature policy='require' name='stibp'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='ssbd'/>
    <feature policy='require' name='xsaveopt'/>
    <feature policy='require' name='pdpe1gb'/>
    <feature policy='require' name='abm'/>
    <feature policy='disable' name='skip-l1dfl-vmentry'/>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/home/XXX/DIR/test-hd1-01.qcow2'/>
      <backingStore/>
      <target dev='sda' bus='sata'/>
      <boot order='1'/>
      <alias name='sata0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/home/XXX/DIR/FreeBSD-12.1-RELEASE-amd64-dvd1.iso'/>
      <backingStore/>
      <target dev='sdb' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <alias name='sata0-0-1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
...
...
    <controller type='sata' index='0'>
      <alias name='ide'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'>
      <alias name='pcie.0'/>
    </controller>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <alias name='pci.1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <alias name='pci.2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x11'/>
      <alias name='pci.3'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x12'/>
      <alias name='pci.4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x13'/>
      <alias name='pci.5'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x14'/>
      <alias name='pci.6'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:18:15:aa'/>
      <source bridge='br20'/>
      <target dev='vnet0'/>
      <model type='vmxnet3'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:a4:13:df'/>
      <source bridge='br20'/>
      <target dev='vnet1'/>
      <model type='e1000'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:f8:3b:94'/>
      <source bridge='br20'/>
      <target dev='vnet2'/>
      <model type='e1000e'/>
      <alias name='net2'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </interface>
...
...
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0000' bus='0x06' slot='0x00' function='0x1'/>
      </source>
      <alias name='hostdev0'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
...
...

</domain>

<<END LIBVIRT XML>>

So PCI Passthrough is affected by Q35 V4.0 issues and netmap is not root cause.

Can we get FreeBSD PCI guru to look into QEMU V4.0 issue ?

Cheers,


John Hartley.
Comment 6 Vincenzo Maffione freebsd_committer freebsd_triage 2020-01-26 20:33:58 UTC
(In reply to Tommy P from comment #3)

Yes, it's a combination of two or more issues.
One of these issues is the netmap one, for which I provided a fix for head, stable/12 and stable/11.
Comment 7 Vincenzo Maffione freebsd_committer freebsd_triage 2020-01-26 20:35:53 UTC
Since this bug report is about the netmap PCI conflict, I think it would make sense to close it, since a fix has been provided for stable/11 (r356805).
Comment 8 John Hartley 2020-01-27 01:45:10 UTC
(In reply to Vincenzo Maffione from comment #7)

Hi Vincenzo,

agree, not sure how to close this one but I have opened a new and more specific bug report on PCIe device section and QEMU Q35 V4.x:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=243640

Cheers,


John Hartley.
Comment 9 John Hartley 2020-01-28 22:17:10 UTC
(In reply to Vincenzo Maffione from comment #2)

Hi Vincenzo,

Not sure if you saw this blog post by Gerd Kraxel : https://www.kraxel.org/blog/2020/01/qemu-pci-ids/

It outline process to get QEMU pci ids , to avoid conflicts that we experienced with netmap .

Process seems to be pretty straight forward and would ensure netplan gets long term ids that will get managed as part of QEMU code base.

Cheers,

John Hartley
Comment 10 Vincenzo Maffione freebsd_committer freebsd_triage 2020-01-29 19:47:31 UTC
I've seen it, thanks.
However I plan to allocate the ids once (and only if) bhyve gets support for the netmap virtual devices.
Comment 11 John Hartley 2020-07-06 06:14:31 UTC
(In reply to Vincenzo Maffione from comment #10)

Hi Vincenzo,

I have added Peter Grehan onto this bug based on your comment:

"I've seen it, thanks.
However I plan to allocate the ids once (and only if) bhyve gets support for the netmap virtual devices."

Peter might be able to comment on possibility of adding netmap support into bhyve.

Cheers,


John Hartley
Comment 12 Vincenzo Maffione freebsd_committer freebsd_triage 2020-07-06 20:23:24 UTC
Hi John,
  Bhyve already supports netmap(4), since you can attach a bhyve VM to a vale(4) switch rather than attaching it to an if_bridge(4) through an if_tuntap(4) interface. No PCI IDs are necessary to support this.

However, netmap also supports a "passthough" mode, that allows you to make a netmap port (of the host OS) visible within a VM. This particular feature requires support within libvmm API, and also requires the allocation of an id.