Bug 245392 - bhyve: PCI passthru built-in rtl8168 to OpenBSD no packets (AMD Ryzen 3 1200)
Summary: bhyve: PCI passthru built-in rtl8168 to OpenBSD no packets (AMD Ryzen 3 1200)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords: needs-qa
Depends on:
Blocks: 246647
  Show dependency treegraph
 
Reported: 2020-04-06 07:29 UTC by Anatoli
Modified: 2020-07-08 20:09 UTC (History)
4 users (show)

See Also:
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments
a dmesg from FreeBSD 12.1-RELEASE r354233 running on the machine (8.96 KB, text/plain)
2020-04-06 07:29 UTC, Anatoli
no flags Details
a dmesg from OpenBSD 6.6 running on the maching without virtualization (9.90 KB, text/plain)
2020-04-06 07:32 UTC, Anatoli
no flags Details
a dmesg from OpenBSD 6.6 running inside bhyve (5.09 KB, text/plain)
2020-04-06 07:33 UTC, Anatoli
no flags Details
pcidump at OpenBSD 6.6 running on the machine without virtualization (35.04 KB, text/plain)
2020-04-06 07:35 UTC, Anatoli
no flags Details
pcidump at OpenBSD 6.6 inside bhyve (4.15 KB, text/plain)
2020-04-06 07:36 UTC, Anatoli
no flags Details
vm-bhyve passthru right before launching an OpenBSD bhyve instance (2.98 KB, text/plain)
2020-04-06 07:38 UTC, Anatoli
no flags Details
OpenBSD crash boot sequence (1.87 KB, text/plain)
2020-04-06 07:39 UTC, Anatoli
no flags Details
The patch fixing the problem (1.12 KB, patch)
2020-05-22 02:02 UTC, Anatoli
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Anatoli 2020-04-06 07:29:55 UTC
Created attachment 213114 [details]
a dmesg from FreeBSD 12.1-RELEASE r354233 running on the machine

Trying to PCI passthru a built-in NIC on 12.1-RELEASE r354233 to OpenBSD 6.6 inside bhyve on a AMD Ryzen 3 1200.

OpenBSD sees the NIC (re0 Realtek 8168), it can even detect the link state correctly, but it can't send/receive packets.

/boot/loader.conf contains hw.vmm.amdvi.enable="1".

I launch the instance this way:

sudo bhyve -c 4 -m 4G -wuHP \
-s 0,amd_hostbridge \
-S \
-s 2,passthru,5/0/0 \
-s 3,virtio-blk,/vm/ppt/disk.img \
-s 5,virtio-net,tap0 \
-s 31,lpc -l com1,/dev/nmdm0A \
-l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \
ppt

When OpenBSD runs on the machine without virtualization, the nic works correctly. When virtualized with passthru, it can't send/receive data on the nic, but it detects the link correctly.

Another (possibly related) issue is if I execute the command above without -w (same happens with or without PCI passthru), OpenBSD fails at initialization with "protection fault trap, code=0; Stopped at identifycpu+0xa5c: wrmsr".

Please find attached the following files:
 * dmesg.freebsd: a dmesg from FreeBSD 12.1-RELEASE r354233 running on the machine.
 * dmesg.openbsd.native: a dmesg from OpenBSD 6.6 running on the maching without virtualization.
 * dmesg.openbsd.bhyve: a dmesg from OpenBSD 6.6 running inside bhyve, lauched from the FreeBSD (the first dmesg) with the command mentioned above.
 * pcidump.native: a result of the pcidump command under OpenBSD 6.6 running on the machine without virtualization (5:0:0 is the device in question).
 * pcidump.bhyve: a result of the pcidump command under OpenBSD 6.6 inside bhyve (0:2:0).
 * vm-bhyve.passthru: a result of the vm-bhyve passthru command right before launching an OpenBSD bhyve instance.
 * boot.crash: the OpenBSD boot sequence that leads to a crash when bhyve is invoked without the -w option.
Comment 1 Anatoli 2020-04-06 07:32:25 UTC
Created attachment 213115 [details]
a dmesg from OpenBSD 6.6 running on the maching without virtualization
Comment 2 Anatoli 2020-04-06 07:33:39 UTC
Created attachment 213116 [details]
a dmesg from OpenBSD 6.6 running inside bhyve
Comment 3 Anatoli 2020-04-06 07:35:37 UTC
Created attachment 213118 [details]
pcidump at OpenBSD 6.6 running on the machine without virtualization
Comment 4 Anatoli 2020-04-06 07:36:26 UTC
Created attachment 213119 [details]
pcidump at OpenBSD 6.6 inside bhyve
Comment 5 Anatoli 2020-04-06 07:38:06 UTC
Created attachment 213120 [details]
vm-bhyve passthru right before launching an OpenBSD bhyve instance
Comment 6 Anatoli 2020-04-06 07:39:06 UTC
Created attachment 213121 [details]
OpenBSD crash boot sequence
Comment 7 Peter Grehan freebsd_committer 2020-04-06 10:45:51 UTC
This is a bit puzzling. The PCI dump from the OpenBSD guest is showing that MSI is enabled on the realtek device, and the OpenBSD driver will use MSI.

Would you be able to try a FreeBSD guest and pass through the Realtek device to that ?
Comment 8 Anatoli 2020-04-08 05:11:48 UTC
Hi Peter,

Inside the FreeBSD 12.1-RELEASE guest the NIC works well.

Here's the output from lspci -vvv:

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 7432
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin ? routed to IRQ 255
	Capabilities: [40] Express (v2) Root Port (Slot-), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE-
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
			TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		RootCap: CRSVisible-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, LN System CLS Not Supported, TPHComp-, ExtTPHComp-, ARIFwd-
			 AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
			 AtomicOpsCtl: ReqEn- EgressBlck-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-

00:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
	Subsystem: ASUSTeK Computer Inc. Device 8677
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort+ <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 23
	Region 0: I/O ports at 2000
	Region 2: Memory at c0000000 (64-bit, prefetchable)
	Region 4: Memory at c0004000 (64-bit, prefetchable)
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 01
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 26.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+
			 10BitTagComp-, 10BitTagReq-, OBFF Via message/WAKE#, ExtFmt-, EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, TPHComp-, ExtTPHComp-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=4 Masked-
		Vector table: BAR=4 offset=00000000
		PBA: BAR=4 offset=00000800

00:03.0 SCSI storage controller: Red Hat, Inc. Virtio block device
	Subsystem: Red Hat, Inc. Device 0002
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 16
	Region 0: I/O ports at 2100
	Region 1: Memory at c0008000 (32-bit, non-prefetchable)
	Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
		Vector table: BAR=1 offset=00000000
		PBA: BAR=1 offset=00001000
	Capabilities: [4c] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000

00:05.0 Ethernet controller: Red Hat, Inc. Virtio network device
	Subsystem: Red Hat, Inc. Device 0001
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 18
	Region 0: I/O ports at 2140
	Region 1: Memory at c000a000 (32-bit, non-prefetchable)
	Capabilities: [40] MSI-X: Enable+ Count=3 Masked-
		Vector table: BAR=1 offset=00000000
		PBA: BAR=1 offset=00001000
	Capabilities: [4c] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000

00:1f.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin ? routed to IRQ 255


Is that enough details to investigate why it doesn't work on OpenBSD? If not, please let me know what exactly to execute.

Thanks
Comment 9 Anatoli 2020-04-14 01:27:04 UTC
Hi, is there anything else I can add to help to research the issue?
Comment 10 Anatoli 2020-05-22 02:00:20 UTC
After a lot of triage, Peter Grehan identified the source of the problem and prepared a patch (attached).

His description of the cause: the problem was that OpenBSD issues 4-byte PCI configuration-space register reads/writes to consecutive 2-byte fields. In general this is benign, but it exposed 2 bugs in the bhyve PCI emulation where this wasn't being handled correctly.

This can be fixed by applying the attached patch and rebuilding user-space bhyve.
Comment 11 Anatoli 2020-05-22 02:02:31 UTC
Created attachment 214741 [details]
The patch fixing the problem

The bug can be fixed by applying the attached patch and rebuilding user-space bhyve.
Comment 12 Peter Grehan freebsd_committer 2020-05-22 12:10:36 UTC
Review at  https://reviews.freebsd.org/D24951 , slightly different than the patch.
Comment 13 commit-hook freebsd_committer 2020-05-25 06:26:34 UTC
A commit references this bug:

Author: grehan
Date: Mon May 25 06:25:32 UTC 2020
New revision: 361442
URL: https://svnweb.freebsd.org/changeset/base/361442

Log:
  Fix pci-passthru MSI issues with OpenBSD guests

  - Return 2 x 16-bit registers in the correct byte order
   for a 4-byte read that spans the CMD/STATUS register.
    This reversal was hiding the capabilities-list, which prevented
   the MSI capability from being found for XHCI passthru.

  - Reorganize MSI/MSI-x config writes so that a 4-byte write at the
   capability offset would have the read-only portion skipped.
    This prevented MSI interrupts from being enabled.

   Reported and extensively tested by Anatoli (me at anatoli dot ws)

  PR:	245392
  Reported by:	Anatoli (me at anatoli dot ws)
  Reviewed by:	jhb (bhyve)
  Approved by:	jhb, bz (mentor)
  MFC after:	1 week
  Differential Revision:	https://reviews.freebsd.org/D24951

Changes:
  head/usr.sbin/bhyve/pci_emul.c
  head/usr.sbin/bhyve/pci_emul.h
  head/usr.sbin/bhyve/pci_passthru.c
Comment 14 commit-hook freebsd_committer 2020-06-01 05:14:37 UTC
A commit references this bug:

Author: grehan
Date: Mon Jun  1 05:14:02 UTC 2020
New revision: 361686
URL: https://svnweb.freebsd.org/changeset/base/361686

Log:
  MFC r361442
  Fix pci-passthru MSI issues with OpenBSD guests

  PR:	245392

Changes:
_U  stable/12/
  stable/12/usr.sbin/bhyve/pci_emul.c
  stable/12/usr.sbin/bhyve/pci_emul.h
  stable/12/usr.sbin/bhyve/pci_passthru.c
Comment 15 Anatoli 2020-06-07 01:01:07 UTC
I just tested 13-C latest snapshot and the problem is fixed. Closing the issue.
Comment 16 Ed Maste freebsd_committer 2020-07-08 20:09:49 UTC
Author: gordon
Date: Wed Jul  8 19:56:34 2020
New Revision: 363022
URL: https://svnweb.freebsd.org/changeset/base/363022

Log:
  Fix host crash in bhyve with PCI device passthrough.

  Approved by:  so
  Security:     FreeBSD-EN-20:13.bhyve

Modified:
  releng/12.1/sys/amd64/vmm/intel/vtd.c
  releng/12.1/usr.sbin/bhyve/pci_emul.c
  releng/12.1/usr.sbin/bhyve/pci_emul.h
  releng/12.1/usr.sbin/bhyve/pci_passthru.c