Bug 207602 - 11.0-RELEASE-p2 doesn't boot with VT-d enabled and vmm in loader.conf for Skylake CPUs
Summary: 11.0-RELEASE-p2 doesn't boot with VT-d enabled and vmm in loader.conf for Sky...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: 10.3-BETA2
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-29 23:47 UTC by ehrmann
Modified: 2017-11-22 20:14 UTC (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ehrmann 2016-02-29 23:47:09 UTC
I installed FreeBSD 10.3-BETA2 on a system with a Z170 chipset and Skylake CPU on a ZFS root device. When I have both VT-d enabled in the bios, vmm enabled (for bhyve) and pptdevs="<any pci dev>" (for PCI passthrough) in loader.conf, sometime during the boot, there's an unrecoverable error with my SATA hard drive, it's reattched, but the boot was interrupted, and the boot loader can't find the ZFS root.

If either VT-d is disabled, vmm isn't enabled in loader.conf, or there's no PCI passthrough device, the system boots fine.

The cause seems to be an interrupt storm. It always happens on IRQ64, even when a different device claims it. It's the lowest IRQ listed by vmstat -i.

Here's my /boot/loader.conf:

    kern.geom.label.gptid.enable="0"
    net.link.tap.up_on_open="1"
    net.inet.ip.forwarding="1"
    zfs_load="YES"
    pptdevs="0/31/6"
    vmm_load="YES"

This is the device I'm trying to do PCI passthrough on
    ppt0@pci0:0:31:6:       class=0x020000 card=0x86721043 chip=0x15b88086 rev=0x31 hdr=0x00
        vendor     = 'Intel Corporation'
        device     = 'Ethernet Connection (2) I219-V'
        class      = network
        subclass   = ethernet

Because this is a modern motherboard/chipset, it lacks IO like RS-232, and I couldn't capture the log directly. The best I could do is this video: http://imgur.com/wiiU9Bf

Root mount waiting for: usbus1 usbus0
uhub1: 4 ports with 4 removable, self powered
uhub0: 26 ports with 26 removable, self powered
Root mount waiting for: usbus0
<another line?  it's not clear in the video>
xhci_interrupt: host controller halted
<repeated ~200 times>
interrupt storm detected on "irq264:"; throttling interrupt source
Trying to mount root from zfs:zroot/ROOT/default []...
uhub0: (ada0:ahcich0:0:0:0) READ_FPDMA_QUEUED. ACB: 60 00 00 00 00 40 00 00 00 01 00 00
(ada:ahcich0:0:0:0): CAM status: CCB request was invalid
at usbus0, port 1. addr 1 (disconnected)
xhci_interrupt: host controller halted
(ada0:ahcich0:0:0:0): Error 22. Unretryable error
xhci_interrupt: host controller halted
<missing lines?>
(aprobe0:ahcich0:0:0:0): CAM status: CCB request was invalid
(aprobe0:ahcich0:0:0:0): Error 22. Unretryable error.
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: ... detached
Solaris: NOTICE: Cannot find the pool label for 'zroot'
Mounting from zfs:zroot/ROOT/default failed with error 5.
Comment 1 ehrmann 2016-03-06 00:25:31 UTC
The issue still exists in 10.3RC1.
Comment 2 ehrmann 2016-03-06 02:42:51 UTC
Not surprising, but it also happens with a ufs root.
Comment 3 jyoung15 2016-04-09 05:32:01 UTC
Same issue here on 10.3-RELEASE amd64 GENERIC kernel. Lenovo P50s laptop.

setting pptdevs in /boot/loader.conf seems to trigger the issue

vmm_load="YES"
pptdevs="4/0/0"

Same errors seen as original report. Doesn't seem to be related to zfs since my boot partition is ufs.
Comment 4 Vesselin Mirevsky 2016-04-25 14:01:51 UTC
Bug confirmed on 10.3 RELEASE, HP ML 350 G6 with Xeon 5675.

The device to pass through: - HP ioDrive Duo (Gen 1) 600281-B21 ( pciconf -lv ) :

none3@pci0:11:0:0: class=0x018000 card=0x178d103c chip=0x10051aed rev=0x01
hdr=0x00
    vendor     = 'Fusion-io'
    device     = 'ioDimm3'
    class      = mass storage
none4@pci0:12:0:0: class=0x018000 card=0x178d103c chip=0x10051aed rev=0x01
hdr=0x00
    vendor     = 'Fusion-io'
    device     = 'ioDimm3'
    class      = mass storage

/boot/loader.conf :

pptdevs="11/0/0 12/0/0"

The last line crashes the system - "Interrupt trap" and the system starts
to shutdown the cpus ( which gets an ugly loop till it finally reboots )
Comment 5 borisboris@gmx.net 2016-06-11 02:36:42 UTC
The same Issue here.

CPU: Intel(R) Core(TM) i5-5675C CPU @ 3.10GHz (3092.90-MHz K8-class CPU)

The Board: Gigabyte H97N WIFI

The problem happens on an actual 10.3 and 11-CURRENT. Tested it on both versions.

I tested it via internal em0 and re0. I also tested it on em-Interface on PCI-X.

vmm_load="YES" ... just that runs fine.

As soon as I include one of the three Network Interface into pptdevs (/boot/loader.conf), the same problem occurs. On all 3 NICs the same. The Kernel boots and suddenly after it detaches lots of Devices. Then it's not able to mount root (in these cases via ZFS; but I suggest it would be the same on UFS, because the device has been gone).

I also tested it on a Mac Mini Late 2014. There it works; except for OpenBSD as guest OS. There I get interrupt mapping issues. But that's a different kind of problem and may be an Issue of OpenBSD drivers. With FreeBSD guest I can use the NIC without problems with PCI passthrough. 

So, the problem seems to be there with Broadwell and Skylake.
Comment 6 borisboris@gmx.net 2016-08-12 02:35:49 UTC
I replaced, or let me say "downgraded" the CPU. Now it's working fine. It's a haswell, not a broadwell anymore...:

CPU: Intel(R) Core(TM) i5-4570S CPU @ 2.90GHz (2893.36-MHz K8-class CPU)

Just FYI.
Comment 7 ehrmann 2016-12-18 23:31:58 UTC
I reproduced it with 11.0.
Comment 8 Marius Halden 2017-11-22 20:14:04 UTC
I see the same problem on 11.1-RELEASE-p4 with an Intel i3-8350k cpu.