Summary: | bhyve: PCI passthru built-in rtl8168 to OpenBSD no packets (AMD Ryzen 3 1200) | ||
---|---|---|---|
Product: | Base System | Reporter: | Anatoli <me> |
Component: | bhyve | Assignee: | freebsd-virtualization (Nobody) <virtualization> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | CC: | emaste, grehan, me, so |
Priority: | --- | Keywords: | needs-qa |
Version: | 12.1-RELEASE | Flags: | koobs:
mfc-stable12?
koobs: mfc-stable11? |
Hardware: | amd64 | ||
OS: | Any | ||
Bug Depends on: | |||
Bug Blocks: | 246647 | ||
Attachments: |
Created attachment 213115 [details]
a dmesg from OpenBSD 6.6 running on the maching without virtualization
Created attachment 213116 [details]
a dmesg from OpenBSD 6.6 running inside bhyve
Created attachment 213118 [details]
pcidump at OpenBSD 6.6 running on the machine without virtualization
Created attachment 213119 [details]
pcidump at OpenBSD 6.6 inside bhyve
Created attachment 213120 [details]
vm-bhyve passthru right before launching an OpenBSD bhyve instance
Created attachment 213121 [details]
OpenBSD crash boot sequence
This is a bit puzzling. The PCI dump from the OpenBSD guest is showing that MSI is enabled on the realtek device, and the OpenBSD driver will use MSI. Would you be able to try a FreeBSD guest and pass through the Realtek device to that ? Hi Peter, Inside the FreeBSD 12.1-RELEASE guest the NIC works well. Here's the output from lspci -vvv: 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Device 7432 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin ? routed to IRQ 255 Capabilities: [40] Express (v2) Root Port (Slot-), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag- RBE- DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- RootCap: CRSVisible- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR- 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt-, EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS-, LN System CLS Not Supported, TPHComp-, ExtTPHComp-, ARIFwd- AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- AtomicOpsCtl: ReqEn- EgressBlck- LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- 00:02.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15) Subsystem: ASUSTeK Computer Inc. Device 8677 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort+ <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 23 Region 0: I/O ports at 2000 Region 2: Memory at c0000000 (64-bit, prefetchable) Region 4: Memory at c0004000 (64-bit, prefetchable) Capabilities: [40] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable+ DSel=0 DScale=0 PME- Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [70] Express (v2) Endpoint, MSI 01 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 26.000W DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq- RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop- MaxPayload 128 bytes, MaxReadReq 4096 bytes DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 <64us ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+ LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s (ok), Width x1 (ok) TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, NROPrPrP-, LTR+ 10BitTagComp-, 10BitTagReq-, OBFF Via message/WAKE#, ExtFmt-, EETLPPrefix- EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit- FRS-, TPHComp-, ExtTPHComp- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled AtomicOpsCtl: ReqEn- LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [b0] MSI-X: Enable+ Count=4 Masked- Vector table: BAR=4 offset=00000000 PBA: BAR=4 offset=00000800 00:03.0 SCSI storage controller: Red Hat, Inc. Virtio block device Subsystem: Red Hat, Inc. Device 0002 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: I/O ports at 2100 Region 1: Memory at c0008000 (32-bit, non-prefetchable) Capabilities: [40] MSI-X: Enable+ Count=2 Masked- Vector table: BAR=1 offset=00000000 PBA: BAR=1 offset=00001000 Capabilities: [4c] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 00:05.0 Ethernet controller: Red Hat, Inc. Virtio network device Subsystem: Red Hat, Inc. Device 0001 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 18 Region 0: I/O ports at 2140 Region 1: Memory at c000a000 (32-bit, non-prefetchable) Capabilities: [40] MSI-X: Enable+ Count=3 Masked- Vector table: BAR=1 offset=00000000 PBA: BAR=1 offset=00001000 Capabilities: [4c] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 00:1f.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin ? routed to IRQ 255 Is that enough details to investigate why it doesn't work on OpenBSD? If not, please let me know what exactly to execute. Thanks Hi, is there anything else I can add to help to research the issue? After a lot of triage, Peter Grehan identified the source of the problem and prepared a patch (attached). His description of the cause: the problem was that OpenBSD issues 4-byte PCI configuration-space register reads/writes to consecutive 2-byte fields. In general this is benign, but it exposed 2 bugs in the bhyve PCI emulation where this wasn't being handled correctly. This can be fixed by applying the attached patch and rebuilding user-space bhyve. Created attachment 214741 [details]
The patch fixing the problem
The bug can be fixed by applying the attached patch and rebuilding user-space bhyve.
Review at https://reviews.freebsd.org/D24951 , slightly different than the patch. A commit references this bug: Author: grehan Date: Mon May 25 06:25:32 UTC 2020 New revision: 361442 URL: https://svnweb.freebsd.org/changeset/base/361442 Log: Fix pci-passthru MSI issues with OpenBSD guests - Return 2 x 16-bit registers in the correct byte order for a 4-byte read that spans the CMD/STATUS register. This reversal was hiding the capabilities-list, which prevented the MSI capability from being found for XHCI passthru. - Reorganize MSI/MSI-x config writes so that a 4-byte write at the capability offset would have the read-only portion skipped. This prevented MSI interrupts from being enabled. Reported and extensively tested by Anatoli (me at anatoli dot ws) PR: 245392 Reported by: Anatoli (me at anatoli dot ws) Reviewed by: jhb (bhyve) Approved by: jhb, bz (mentor) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24951 Changes: head/usr.sbin/bhyve/pci_emul.c head/usr.sbin/bhyve/pci_emul.h head/usr.sbin/bhyve/pci_passthru.c A commit references this bug: Author: grehan Date: Mon Jun 1 05:14:02 UTC 2020 New revision: 361686 URL: https://svnweb.freebsd.org/changeset/base/361686 Log: MFC r361442 Fix pci-passthru MSI issues with OpenBSD guests PR: 245392 Changes: _U stable/12/ stable/12/usr.sbin/bhyve/pci_emul.c stable/12/usr.sbin/bhyve/pci_emul.h stable/12/usr.sbin/bhyve/pci_passthru.c I just tested 13-C latest snapshot and the problem is fixed. Closing the issue. Author: gordon Date: Wed Jul 8 19:56:34 2020 New Revision: 363022 URL: https://svnweb.freebsd.org/changeset/base/363022 Log: Fix host crash in bhyve with PCI device passthrough. Approved by: so Security: FreeBSD-EN-20:13.bhyve Modified: releng/12.1/sys/amd64/vmm/intel/vtd.c releng/12.1/usr.sbin/bhyve/pci_emul.c releng/12.1/usr.sbin/bhyve/pci_emul.h releng/12.1/usr.sbin/bhyve/pci_passthru.c |
Created attachment 213114 [details] a dmesg from FreeBSD 12.1-RELEASE r354233 running on the machine Trying to PCI passthru a built-in NIC on 12.1-RELEASE r354233 to OpenBSD 6.6 inside bhyve on a AMD Ryzen 3 1200. OpenBSD sees the NIC (re0 Realtek 8168), it can even detect the link state correctly, but it can't send/receive packets. /boot/loader.conf contains hw.vmm.amdvi.enable="1". I launch the instance this way: sudo bhyve -c 4 -m 4G -wuHP \ -s 0,amd_hostbridge \ -S \ -s 2,passthru,5/0/0 \ -s 3,virtio-blk,/vm/ppt/disk.img \ -s 5,virtio-net,tap0 \ -s 31,lpc -l com1,/dev/nmdm0A \ -l bootrom,/usr/local/share/uefi-firmware/BHYVE_UEFI.fd \ ppt When OpenBSD runs on the machine without virtualization, the nic works correctly. When virtualized with passthru, it can't send/receive data on the nic, but it detects the link correctly. Another (possibly related) issue is if I execute the command above without -w (same happens with or without PCI passthru), OpenBSD fails at initialization with "protection fault trap, code=0; Stopped at identifycpu+0xa5c: wrmsr". Please find attached the following files: * dmesg.freebsd: a dmesg from FreeBSD 12.1-RELEASE r354233 running on the machine. * dmesg.openbsd.native: a dmesg from OpenBSD 6.6 running on the maching without virtualization. * dmesg.openbsd.bhyve: a dmesg from OpenBSD 6.6 running inside bhyve, lauched from the FreeBSD (the first dmesg) with the command mentioned above. * pcidump.native: a result of the pcidump command under OpenBSD 6.6 running on the machine without virtualization (5:0:0 is the device in question). * pcidump.bhyve: a result of the pcidump command under OpenBSD 6.6 inside bhyve (0:2:0). * vm-bhyve.passthru: a result of the vm-bhyve passthru command right before launching an OpenBSD bhyve instance. * boot.crash: the OpenBSD boot sequence that leads to a crash when bhyve is invoked without the -w option.