Bug 229852

Summary: [PATCH] bhyve: IOMMU (Intel VTd) PCI passthrough attempt locks up some systems
Product: Base System Reporter: Callum <callum>
Component: kernAssignee: freebsd-virtualization mailing list <virtualization>
Status: New ---    
Severity: Affects Some People CC: allanjude, araujo, callum, dexter, felix, js, mgrooms, niels=freebsd, rgrimes, t_uemura, virtualization
Priority: --- Keywords: patch
Version: 11.2-RELEASE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
Patch for VT-d capability detection on chipsets that have multiple translation units with differing capabilities none

Description Callum 2018-07-18 02:02:21 UTC
Created attachment 195225 [details]
Patch for VT-d capability detection on chipsets that have multiple translation units with differing capabilities

When an attempt is made to passthrough a PCI device to a bhyve VM (causing initialisation of IOMMU) on certain Intel chipsets using VT-d the PCI bus stops working entirely. This issue occurs on the E3-1275 v5 processor on C236 chipset and has also been encountered by others on the forums with different hardware in the Skylake series.

The chipset has two VT-d translation units. The issue is caused by an attempt to use the VT-d device-IOTLB capability that is supported by only the first unit for devices attached to the second unit which lacks that capability. Only the capabilities of the first unit are checked and are assumed to be the same for all units.

Attached is a patch to rectify this issue by determining which unit is responsible for the device being added to a domain and then checking that unit's device-IOTLB capability. In addition to this a few fixes have been made to other instances where the first unit's capabilities are assumed for all units for domains they share. In these cases a mutual set of capabilities is determined. The patch should hopefully fix any bugs for current/future hardware with multiple translation units supporting different capabilities.

A description is on the forums at https://forums.freebsd.org/threads/pci-passthrough-bhyve-usb-xhci.65235
The thread includes observations by other users of the bug occurring, and description as well as confirmation of the fix. I'd also like to thank Ordoban for their help.

The attached patch applies to 11.2-RELEASE and the current 11-STABLE. It will also apply to 12.0-CURRENT since the only difference in source at present is an extra 2 lines of licensing comment. Although I have personally only tested the patch on 11.2-RELEASE there's no reason results should differ on 12.0-CURRENT.
Comment 1 Niels Bakker 2018-09-21 16:42:41 UTC
I ran into this issue (on an Intel Celeron 3865U) with the exact symptoms described in the linked thread, and the patch resolved it for me as well.

I had tried the other workaround before - only pick devices on another PCI bus and IRQ line for passthru - but that did not help. Without this patch, any attempt to use a passthru device immediately crashes the whole computer by rendering all PCI devices like AHCI and USB controllers absent.
Comment 2 t_uemura 2018-10-09 12:25:47 UTC
I had the same issue on my Shuttle DS77U mini-PC (Intel Celeron 3865U;
Sunrise Point-LP chipset) and the patch fixes the issue perfectly. Both of
my host and guest run 11.2-STABLE as of 28th Sep..

Someone please make sure there's no side effect and commit/MFC.
Comment 3 Felix Hanley 2019-01-20 06:22:52 UTC
Attached patch fixes the hanging system for me running 12.0-RELEASE-p2 on a Kabylake series i7-8550U.
Comment 4 js 2019-01-24 00:27:29 UTC
Patch works great on 12.0 with Skylake i7-6820HQ.  Please commit and MFC.
Comment 5 Marcelo Araujo freebsd_committer 2019-01-24 04:25:32 UTC
Thanks for the patch!!!

Could you guys share with me how did you test it? As an example:
1) bhyve command line
2) CPU Type
3) Guest OS USED
4) Device used via passthrough

Best,
Comment 6 Niels Bakker 2019-01-24 13:45:10 UTC
(In reply to Marcelo Araujo from comment #5)

> 1) bhyve command line
I'm not sure tbh - created and started it via vm-bhyve and it rewrites its cmdline.
Its config file contains these lines, plus others that deal with storage and vnet:
---
loader="bhyveload"
cpu=2
memory=4G
passthru0="0/31/6"
bhyve_options="-S"
---

> 2) CPU Type
CPU: Intel(R) Celeron(R) CPU 3865U @ 1.80GHz (1800.08-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x806e9  Family=0x6  Model=0x8e  Stepping=9

This is a Kaby Lake CPU (same class as 7th gen Core) from 2017.

> 3) Guest OS USED
guest# uname -srv
FreeBSD 11.2-RELEASE-p7 FreeBSD 11.2-RELEASE-p7 #0: Tue Dec 18 08:29:33 UTC 2018     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC 

> 4) Device used via passthrough
---
host# grep ^ppt /boot/loader.conf
pptdevs="0/31/6 2/0/0"

host# pciconf -lv ppt1@pci0:0:31:6
ppt1@pci0:0:31:6:	class=0x020000 card=0x00008086 chip=0x156f8086 rev=0x21 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection I219-LM'
    class      = network
    subclass   = ethernet
---
guest# pciconf -lv em0
em0@pci0:0:6:0:	class=0x020000 card=0x00008086 chip=0x156f8086 rev=0x21 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection I219-LM'
    class      = network
    subclass   = ethernet
---
(The second device, some WiFi chipset, isn't passed through to any VM, and there is no FreeBSD driver for it anyway)

As said, without the patch the system dies an immediate death as soon as the bhyve with passthrough is started.
Comment 7 Rodney W. Grimes freebsd_committer 2019-01-24 21:14:24 UTC
The patch has some formatting only type changes that should be reduced, but looks ok overall.   I would also like to here some test results on systems that are NOT having this issue to insure it does not break anything there.   I brought this review up in the bhyve every other week meeting to get some more eyes on it.
Comment 8 Marcelo Araujo freebsd_committer 2019-01-24 23:48:11 UTC
Sorry, I'm gonna put this bug report back to the pool, I'm sure Rodney will check it soon.
Comment 9 Rodney W. Grimes freebsd_committer 2019-01-25 00:37:02 UTC
(In reply to Callum from comment #0)
Do you have a phabricator account on reviews.freebsd.org?  If so would you put your patch up in a review over there?  If not would you be either willing to set up one, or have me copy your patch to a review so we can move forward with fixing this issue?
Comment 10 Callum 2019-01-28 12:02:44 UTC
(In reply to Marcelo Araujo from comment #5)

> 1) bhyve command line
bhyve -AHP -S -u -c 4 -p 0:6 -p 1:7 -p 2:4 -p 3:5 -m 2G \
-s 0:0,hostbridge \
-s 1:0,lpc \
-s 2:0,virtio-blk,/dev/zvol/zroot/bhyve/tv \
-s 4:0,virtio-net,tap8 \
-s 5:0,virtio-net,tap9 \
-s 8:0,passthru,4/0/0 \
-s 9:0,passthru,5/0/0 \
-s 10:0,passthru,6/0/0 \
-s 11:0,passthru,7/0/0 \
-l com1,/dev/nmdm0A \
-l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
tv

> 2) CPU Type
E3-1275 v5

> 3) Guest OS USED
OpenSUSE Leap 15.0

> 4) Device used via passthrough
4x
class=0x0c0330 card=0x00151912 chip=0x00151912 rev=0x02 hdr=0x00
    vendor     = 'Renesas Technology Corp.'
    device     = 'uPD720202 USB 3.0 Host Controller'
    class      = serial bus
    subclass   = USB
Comment 11 Callum 2019-01-28 12:07:44 UTC
(In reply to Rodney W. Grimes from comment #9)

Submitted for review - D19001 (https://reviews.freebsd.org/D19001)
Comment 12 Niels Bakker 2019-01-28 16:31:21 UTC
Tested the patch on an i5-4690K with no immediate adverse affects.
Comment 13 Rodney W. Grimes freebsd_committer 2019-01-28 17:32:51 UTC
(In reply to Niels Bakker from comment #12)
Are you passing through any devices?
Comment 14 Niels Bakker 2019-01-28 23:09:42 UTC
Yes, otherwise it wouldn't be a real test, wouldn't it? :-)

Specifically, I passed through an audio device which was recognised in the guest, both in the stock 12.0 kernel and in one with the patch attached to this PR applied on the host.