Bug 270966 - AMD IOMMU/bhyve/PCI passthru fails with "ivhd, ILLEGAL CMD, IO_PAGE_FAULT"
Summary: AMD IOMMU/bhyve/PCI passthru fails with "ivhd, ILLEGAL CMD, IO_PAGE_FAULT"
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: bhyve (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-virtualization (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-04-20 19:59 UTC by Raúl
Modified: 2024-04-25 16:23 UTC (History)
7 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Raúl 2023-04-20 19:59:23 UTC
I've seen this problem (or related) since 13.1 using passthru/SRIOV and somewhat complex configs. Setting a new server give me the chance to reduce the problem to this scenario. Reboot a bhyve guest using ppt nic, ~30 times.

fresh install on new server, using passthru with an integrated nic (bge)

With a script like:
[....]
#!/bin/sh
count=1
while :
do
  vm stop debian1
  sleep 5
  vm start debian1
  sleep 15
  echo "Reboots: $count"
  count=`expr $count + 1`
done
[....]

I see:
[....]
Sending ACPI shutdown to debian1                                                
Starting debian1                                                                
  * found guest in /usr/local/vm-bhyve/debian1                                  
  * booting...                                                                  
Reboots: 1                                                                      
Sending ACPI shutdown to debian1                                                
Starting debian1                                                                
  * found guest in /usr/local/vm-bhyve/debian1                                  
  * booting...                                                                  
Reboots: 2                                                                      
Sending ACPI shutdown to debian1                                                
Starting debian1                                                                
  * found guest in /usr/local/vm-bhyve/debian1                                  
  * booting...                         
  
... ... ...

Reboots: 27
Sending ACPI shutdown to debian1
Starting debian1
  * found guest in /usr/local/vm-bhyve/debian1
  ! guest appears to be running already
[....]

and messages show:
[....]
Apr 20 21:26:17 cache-D-2023 kernel: ivhd0: EVT INTR 0 Status:0xa EVT Head:0x0 Tail:0x10]
Apr 20 21:26:17 cache-D-2023 kernel:   [CMD Total 0x105] Tail:0x50, Head:0x30.
Apr 20 21:26:17 cache-D-2023 kernel: ivhd0:     [Event0: Head:0x0 Tail:0x10]
Apr 20 21:26:17 cache-D-2023 kernel:    [ILLEGAL CMD EVT]
Apr 20 21:26:17 cache-D-2023 kernel:    CMD opcode= 0x3 0xc009 0x14 0x7ffffffffffff003
Apr 20 21:26:18 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xe0, head:0x30.
Apr 20 21:26:18 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:26:18 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:26:20 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x190, head:0x30.
Apr 20 21:26:20 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:26:20 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:26:29 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x240, head:0x30.
Apr 20 21:26:29 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:26:29 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:26:31 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x2f0, head:0x30.
Apr 20 21:26:31 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:26:31 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:26:32 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x3a0, head:0x30.
Apr 20 21:26:32 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:26:32 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:26:40 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x450, head:0x30.
Apr 20 21:26:40 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:26:40 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:26:41 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x500, head:0x30.
Apr 20 21:26:41 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:26:41 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:26:42 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x5b0, head:0x30.
Apr 20 21:26:42 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:26:42 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:26:51 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x660, head:0x30.
Apr 20 21:26:51 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:26:51 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:26:52 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x710, head:0x30.
Apr 20 21:26:52 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:26:52 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:26:53 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x7c0, head:0x30.
Apr 20 21:26:53 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:26:53 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:01 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x870, head:0x30.
Apr 20 21:27:01 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:01 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:02 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x920, head:0x30.
Apr 20 21:27:02 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:02 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:03 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x9d0, head:0x30.
Apr 20 21:27:03 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:03 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:12 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xa80, head:0x30.
Apr 20 21:27:12 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:12 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:13 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xb30, head:0x30.
Apr 20 21:27:13 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:13 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:14 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xbe0, head:0x30.
Apr 20 21:27:14 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:14 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:22 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xc90, head:0x30.
Apr 20 21:27:22 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:22 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:23 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xd40, head:0x30.
Apr 20 21:27:23 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:23 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:24 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xdf0, head:0x30.
Apr 20 21:27:24 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:24 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:33 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xea0, head:0x30.
Apr 20 21:27:33 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:33 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:34 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xf50, head:0x30.
Apr 20 21:27:34 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:34 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:35 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x0, head:0x30.
Apr 20 21:27:35 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:35 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:38 cache-D-2023 kernel: ivhd0: EVT INTR 1 Status:0xa EVT Head:0x10 Tail:0x20]
Apr 20 21:27:38 cache-D-2023 kernel:   [CMD Total 0x200] Tail:0x0, Head:0x30.
Apr 20 21:27:38 cache-D-2023 kernel: ivhd0:     [Event0: Head:0x10 Tail:0x20]
Apr 20 21:27:38 cache-D-2023 kernel:    [IO_PAGE_FAULT EVT: devId:0xc301 DomId:0x17 Addr:0x101290000 0x0]
Apr 20 21:27:43 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xb0, head:0x30.
Apr 20 21:27:43 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:43 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:44 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x160, head:0x30.
Apr 20 21:27:44 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:44 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:45 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x210, head:0x30.
Apr 20 21:27:45 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:45 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:55 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x2c0, head:0x30.
Apr 20 21:27:55 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:55 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:56 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x370, head:0x30.
Apr 20 21:27:56 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:56 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:27:57 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x420, head:0x30.
Apr 20 21:27:57 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:27:57 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:05 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x4d0, head:0x30.
Apr 20 21:28:05 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:05 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:06 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x580, head:0x30.
Apr 20 21:28:06 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:06 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:07 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x630, head:0x30.
Apr 20 21:28:07 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:07 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:16 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x6e0, head:0x30.
Apr 20 21:28:16 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:16 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:17 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x790, head:0x30.
Apr 20 21:28:17 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:17 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:18 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x840, head:0x30.
Apr 20 21:28:18 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:18 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:26 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x8f0, head:0x30.
Apr 20 21:28:26 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:26 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:27 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x9a0, head:0x30.
Apr 20 21:28:27 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:27 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:28 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xa50, head:0x30.
Apr 20 21:28:28 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:28 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:37 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xb00, head:0x30.
Apr 20 21:28:37 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:37 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:38 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xbb0, head:0x30.
Apr 20 21:28:38 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:38 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:39 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xc60, head:0x30.
Apr 20 21:28:39 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:39 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:42 cache-D-2023 kernel: ivhd0: EVT INTR 2 Status:0xa EVT Head:0x20 Tail:0x30]
Apr 20 21:28:42 cache-D-2023 kernel:   [CMD Total 0x2c6] Tail:0xc60, Head:0x30.
Apr 20 21:28:42 cache-D-2023 kernel: ivhd0:     [Event0: Head:0x20 Tail:0x30]
Apr 20 21:28:42 cache-D-2023 kernel:    [IO_PAGE_FAULT EVT: devId:0xc301 DomId:0x1a Addr:0x1074a0000 0x0]
Apr 20 21:28:47 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xd10, head:0x30.
Apr 20 21:28:47 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:47 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:49 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xdc0, head:0x30.
Apr 20 21:28:49 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:49 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:49 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xe70, head:0x30.
Apr 20 21:28:49 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:49 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:28:59 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xf20, head:0x30.
Apr 20 21:28:59 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:28:59 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:29:00 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xfd0, head:0x30.
Apr 20 21:29:00 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:29:00 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
Apr 20 21:29:01 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x80, head:0x30.
Apr 20 21:29:01 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 20 21:29:01 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x50aebb1 0x0 0xa5a5
[....]

% uname -a
FreeBSD cache-D-2023 13.2-RELEASE FreeBSD 13.2-RELEASE releng/13.2-n254617-525ecfdad597 GENERIC amd64

% dmesg | grep CPU:
CPU: AMD EPYC 9124 16-Core Processor                 (3000.16-MHz K8-class CPU)

% dmesg | grep ivhd
ivhd0: <AMD-Vi/IOMMU ivhd with EFR> on acpi0
ivhd0: Flag:b0<IotlbSup,Coherent>
ivhd0: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0
ivhd0: Extended features[31:0]:a2295afe<PPRSup,<b2>,NXSup,GTSup,<b5>,IASup,GASup,PCSup> HATS = 0x2 GATS = 0x1 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1 DualPortLogSup = 0x2 DualEventLogSup = 0x2
ivhd0: Extended features[62:32]:25bf732f<USSup,PprOvrflwEarlySup,PPRAutoRspSup,BlKStopMrkSup,PerfOptSup,MsiCapMmioSup,GIOSup,HASup,EPHSup,AttrFWSup,HDSup> Max PASID: 0x2f DevTblSegSup = 0x2 MarcSup = 0x0
ivhd0: supported paging level:7, will use only: 4
ivhd0: device [0xc003 - 0xfffe] config:0
ivhd0: PCI cap 0x190b640f@0x40 feature:19<IOTLB,EFR,CapExt>
ivhd1: <AMD-Vi/IOMMU ivhd with EFR> on acpi0
ivhd1: Flag:b0<IotlbSup,Coherent>
ivhd1: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0
ivhd1: Extended features[31:0]:a2295afe<PPRSup,<b2>,NXSup,GTSup,<b5>,IASup,GASup,PCSup> HATS = 0x2 GATS = 0x1 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1 DualPortLogSup = 0x2 DualEventLogSup = 0x2
ivhd1: Extended features[62:32]:25bf732f<USSup,PprOvrflwEarlySup,PPRAutoRspSup,BlKStopMrkSup,PerfOptSup,MsiCapMmioSup,GIOSup,HASup,EPHSup,AttrFWSup,HDSup> Max PASID: 0x2f DevTblSegSup = 0x2 MarcSup = 0x0
ivhd1: supported paging level:7, will use only: 4
ivhd1: device [0x8003 - 0xbffe] config:0
ivhd1: PCI cap 0x190b640f@0x40 feature:19<IOTLB,EFR,CapExt>
ivhd2: <AMD-Vi/IOMMU ivhd with EFR> on acpi0
ivhd2: Flag:b0<IotlbSup,Coherent>
ivhd2: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0
ivhd2: Extended features[31:0]:a2295afe<PPRSup,<b2>,NXSup,GTSup,<b5>,IASup,GASup,PCSup> HATS = 0x2 GATS = 0x1 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1 DualPortLogSup = 0x2 DualEventLogSup = 0x2
ivhd2: Extended features[62:32]:25bf732f<USSup,PprOvrflwEarlySup,PPRAutoRspSup,BlKStopMrkSup,PerfOptSup,MsiCapMmioSup,GIOSup,HASup,EPHSup,AttrFWSup,HDSup> Max PASID: 0x2f DevTblSegSup = 0x2 MarcSup = 0x0
ivhd2: supported paging level:7, will use only: 4
ivhd2: device [0x3 - 0x3ffe] config:0
ivhd2: device [0xff00 - 0xffff] config:0
ivhd2: PCI cap 0x190b640f@0x40 feature:19<IOTLB,EFR,CapExt>
ivhd3: <AMD-Vi/IOMMU ivhd with EFR> on acpi0
ivhd3: Flag:b0<IotlbSup,Coherent>
ivhd3: Features(type:0x11) MsiNumPPR = 0 PNBanks= 2 PNCounters= 0
ivhd3: Extended features[31:0]:a2295afe<PPRSup,<b2>,NXSup,GTSup,<b5>,IASup,GASup,PCSup> HATS = 0x2 GATS = 0x1 GLXSup = 0x1 SmiFSup = 0x1 SmiFRC = 0x2 GAMSup = 0x1 DualPortLogSup = 0x2 DualEventLogSup = 0x2
ivhd3: Extended features[62:32]:25bf732f<USSup,PprOvrflwEarlySup,PPRAutoRspSup,BlKStopMrkSup,PerfOptSup,MsiCapMmioSup,GIOSup,HASup,EPHSup,AttrFWSup,HDSup> Max PASID: 0x2f DevTblSegSup = 0x2 MarcSup = 0x0
ivhd3: supported paging level:7, will use only: 4
ivhd3: device [0x4003 - 0x7ffe] config:0
ivhd3: PCI cap 0x190b640f@0x40 feature:19<IOTLB,EFR,CapExt>

I'll try on 14.
Thanks in advance.
Comment 1 Raúl 2023-04-20 21:40:14 UTC
Same on 14.

FreeBSD cache-D-2023 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n262373-8e813d07c680: Thu Apr 20 23:41:49 CEST 2023

First try:
Reboots: 28
Sending ACPI shutdown to debian1
Starting debian1
  * found guest in /usr/local/vm-bhyve/debian1
  * booting...
Reboots: 29
/usr/local/sbin/vm: WARNING: debian1 doesn't appear to be a running virtual machine

Second try:
Reboots: 28
Sending ACPI shutdown to debian1
Starting debian1
  * found guest in /usr/local/vm-bhyve/debian1
  * booting...
Reboots: 29
/usr/local/sbin/vm: WARNING: debian1 doesn't appear to be a running virtual machine

on messages:
[....]
Apr 21 00:27:13 cache-D-2023 kernel: ivhd0: EVT INTR 0 Status:0xa EVT Head:0x0 Tail:0x10]
Apr 21 00:27:13 cache-D-2023 kernel:   [CMD Total 0x105] Tail:0x50, Head:0x30.
Apr 21 00:27:13 cache-D-2023 kernel: ivhd0:     [Event0: Head:0x0 Tail:0x10]
Apr 21 00:27:13 cache-D-2023 kernel:    [ILLEGAL CMD EVT]
Apr 21 00:27:13 cache-D-2023 kernel:    CMD opcode= 0x3 0xc009 0x14 0x7ffffffffffff003
Apr 21 00:27:14 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0xe0, head:0x30.
Apr 21 00:27:14 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 21 00:27:14 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x37c8b31 0x0 0xa5a5
Apr 21 00:27:15 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x190, head:0x30.
Apr 21 00:27:15 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 21 00:27:15 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x37c8b31 0x0 0xa5a5
Apr 21 00:27:26 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x240, head:0x30.
Apr 21 00:27:26 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 21 00:27:26 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x37c8b31 0x0 0xa5a5
Apr 21 00:27:27 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x2f0, head:0x30.
Apr 21 00:27:27 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 21 00:27:27 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x37c8b31 0x0 0xa5a5
Apr 21 00:27:28 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x3a0, head:0x30.
Apr 21 00:27:28 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 21 00:27:28 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x37c8b31 0x0 0xa5a5
Apr 21 00:27:36 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x450, head:0x30.
Apr 21 00:27:36 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 21 00:27:36 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x37c8b31 0x0 0xa5a5
Apr 21 00:27:37 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x500, head:0x30.
Apr 21 00:27:37 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 21 00:27:37 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x37c8b31 0x0 0xa5a5
Apr 21 00:27:38 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x5b0, head:0x30.
Apr 21 00:27:38 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 21 00:27:38 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x37c8b31 0x0 0xa5a5
Apr 21 00:27:47 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x660, head:0x30.
Apr 21 00:27:47 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 21 00:27:47 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x37c8b31 0x0 0xa5a5
Apr 21 00:27:48 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x710, head:0x30.
Apr 21 00:27:48 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 21 00:27:48 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x37c8b31 0x0 0xa5a5
Apr 21 00:27:49 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x7c0, head:0x30.
Apr 21 00:27:49 cache-D-2023 kernel: ivhd0: Dump last 1 command(s):
Apr 21 00:27:49 cache-D-2023 kernel:   [CMD0, off:0x20] opcode= 0x1 0x37c8b31 0x0 0xa5a5
Apr 21 00:27:51 cache-D-2023 kernel: ivhd0: EVT INTR 1 Status:0xa EVT Head:0x10 Tail:0x20]
Apr 21 00:27:51 cache-D-2023 kernel:   [CMD Total 0x17c] Tail:0x7c0, Head:0x30.
Apr 21 00:27:51 cache-D-2023 kernel: ivhd0:     [Event0: Head:0x10 Tail:0x20]
Apr 21 00:27:51 cache-D-2023 kernel:    [IO_PAGE_FAULT EVT: devId:0xc301 DomId:0x15 Addr:0x109ea0000 0x0]
Apr 21 00:27:57 cache-D-2023 kernel: ivhd0: Error: completion failed tail:0x870, head:0x30.
[....]
Comment 2 Raúl 2023-07-03 12:59:03 UTC
Tried two times with today's current (main-n263930-2176c9ab71c8) and a FreeBSD 13.2 guest. Latest firmware, microcode ... ... same behavior, less reboots (20 reboots).

dmesg on host shows:

ivhd3: EVT INTR 0 Status:0xa EVT Head:0x0 Tail:0x10]
  [CMD Total 0x107] Tail:0x70, Head:0x50.
ivhd3:  [Event0: Head:0x0 Tail:0x10]
        [ILLEGAL CMD EVT]
        CMD opcode= 0x3 0x400b 0x14 0x7ffffffffffff003
ivhd3: Error: completion failed tail:0x100, head:0x50.
ivhd3: Dump last 1 command(s):
  [CMD0, off:0x40] opcode= 0x1 0x5041bb1 0x0 0xa5a5

and guest dmesg:

mlx5_core0: <mlx5_core> mem 0xc0000000-0xc00fffff at device 6.0 on pci0                                                                                                                                                                                      
mlx5: Mellanox Core driver 3.7.1 (November 2021)mlx5_core0: WARN: wait_func:967:(pid 0): ENABLE_HCA(0x104) timeout. Will cause a leak of a command resource                                                                                                  
mlx5_core0: ERR: mlx5_load_one:1083:(pid 0): enable hca failed                                                                                                                                                                                               
mlx5_core0: ERR: init_one:1646:(pid 0): mlx5_load_one failed -60                                                                                                                                                                                             
device_attach: mlx5_core0 attach returned 60
Comment 3 Raúl 2023-07-24 12:02:50 UTC
No changes on 'FreeBSD 14.0-CURRENT amd64 1400093 #0 main-n264293-e64fe029e9d3'
Comment 4 attila.kover 2023-08-05 17:30:17 UTC
I've experienced this "vm reboot instability" in 13.0-release.
Since 13.1-release, I can't even start a vm with passthru. Same in 13.2-release and 14.0-current

Opened a bugreport almost a year ago, nothing happened: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=266325
Comment 5 Corvin Köhne freebsd_committer freebsd_triage 2023-08-07 07:53:31 UTC
(In reply to attila.kover from comment #4)

Are you able to perform a bisect?
Comment 6 Raúl 2023-08-07 16:46:25 UTC
(In reply to attila.kover from comment #4)

I thought my problem was with SRIOV, maybe motherboard related (ASUS B550-E), somehow small niche, only me?. Seeing that happens also with any pci passthru (ivhd) using server class (PowerEdge R7625) and that is trivial to reproduce, present on current, give me hope. Although 20~30 reboots for single guest look like a lot, with multiple guests means constant host reboots required, barely usable. Your problem report change to open, maybe we can help before releng_14 ;).
Comment 7 Raúl 2023-08-07 16:56:51 UTC
(In reply to Corvin Köhne from comment #5)

Thanks for looking at this!. I'm assembling a new AMD based box for testing. It'll be available by the end of month. I can stress test it bisecting from releng 13.0
Comment 8 attila.kover 2023-08-07 20:41:43 UTC
(In reply to Corvin Köhne from comment #5)

Unfortunately I don't know how to do that.
Comment 9 attila.kover 2023-08-07 20:48:48 UTC
(In reply to Raúl from comment #7)


This same problem is already present in both 12.2 and 13.0 (fresh install, no OS update, no patch, only vm-bhyve installed with pkg).

This time I've used a desktop machine, Asus M5A99X EVO board with a 4-port Intel 82580 NIC.


Part of /var/log/messages in 13.0:

Aug  7 20:08:21 dlc kernel: ivhd0: Error: completion failed tail:0xbd0, head:0x30.
Aug  7 20:08:21 dlc kernel: ivhd0: Dump last 1 command(s):
Aug  7 20:08:21 dlc kernel:   [CMD0, off:0x20] opcode= 0x2 0x800 0x0 0xa5a5
Aug  7 20:08:25 dlc kernel: ivhd0: Error: completion failed tail:0xc80, head:0x30.
Aug  7 20:08:25 dlc kernel: ivhd0: Dump last 1 command(s):
Aug  7 20:08:25 dlc kernel:   [CMD0, off:0x20] opcode= 0x2 0x800 0x0 0xa5a5
Aug  7 20:08:25 dlc kernel: ivhd0: Error: completion failed tail:0xd30, head:0x30.
Aug  7 20:08:25 dlc kernel: ivhd0: Dump last 1 command(s):
Aug  7 20:08:25 dlc kernel:   [CMD0, off:0x20] opcode= 0x2 0x800 0x0 0xa5a5
Aug  7 20:08:25 dlc kernel: ivhd0: Error: completion failed tail:0xde0, head:0x30.
Aug  7 20:08:25 dlc kernel: ivhd0: Dump last 1 command(s):
Aug  7 20:08:25 dlc kernel:   [CMD0, off:0x20] opcode= 0x2 0x800 0x0 0xa5a5



Part of /var/log/messages in 12.2:

Aug  7 20:41:03 dlc kernel: ivhd0: Error: completion failed tail:0x9d0, head:0x30.
Aug  7 20:41:03 dlc kernel: ivhd0: Dump last 1 command(s):
Aug  7 20:41:03 dlc kernel:   [CMD0, off:0x20] opcode= 0x1 0x512b029 0x0 0xa5a5
Aug  7 20:41:06 dlc kernel: ivhd0: Error: completion failed tail:0xa80, head:0x30.
Aug  7 20:41:06 dlc kernel: ivhd0: Dump last 1 command(s):
Aug  7 20:41:06 dlc kernel:   [CMD0, off:0x20] opcode= 0x1 0x512b029 0x0 0xa5a5
Aug  7 20:41:06 dlc kernel: ivhd0: Error: completion failed tail:0xb30, head:0x30.
Aug  7 20:41:06 dlc kernel: ivhd0: Dump last 1 command(s):
Aug  7 20:41:06 dlc kernel:   [CMD0, off:0x20] opcode= 0x1 0x512b029 0x0 0xa5a5
Aug  7 20:41:06 dlc kernel: ivhd0: Error: completion failed tail:0xbe0, head:0x30.
Aug  7 20:41:06 dlc kernel: ivhd0: Dump last 1 command(s):
Aug  7 20:41:06 dlc kernel:   [CMD0, off:0x20] opcode= 0x1 0x512b029 0x0 0xa5a5
Comment 10 attila.kover 2023-08-07 21:04:28 UTC
(In reply to Raúl from comment #6)

I'd be eternally grateful if this would be sorted before 14.0 get released. :)

First I thought that this (270966) and "mine" (266325) are closely related. However after looking into the error messages, I've realized we might have two separate issues. At first glance the two problems looks somewhat similar, but the difference in the error messages suggests otherwise. Still possible that the (yet unknown) root cause is the same.
Comment 11 Raúl 2023-08-24 09:33:08 UTC
Tested today's 'FreeBSD 14.0-ALPHA2 amd64 1400096 #1 main-n264998-d9fee1d02178: Thu Aug 24 10:02:38 CEST 2023' no changes.

Tested also on a intel based desktop running 13.2p2 release, it doesn't exhibit the problem after more than hundred reboots.

I'll try on the epyc server running older releases.
Comment 12 Raúl 2023-08-24 12:11:03 UTC
'13.1-RELEASE FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 GENERIC amd64' fails at boot 31. I'll trying older versions looking for a 'good' version to bisect.
Comment 13 attila.kover 2023-08-24 16:47:52 UTC
(In reply to Raúl from comment #12)

You might want to try something pre-12.2. As I've mentioned earlier, this same problem is already present in both 12.2 and 13.0 (see comment #9).
Comment 14 Raúl 2023-08-25 08:26:27 UTC
(In reply to attila.kover from comment #13)

No easy way to install, given the circumstances, 12.1, 12.2 and 12.3 on this box, installation fails after boot. It look's like it misses installation media from (a DELL IDRAC v7 on a PowerEdge R7615) virtual storage. No physical access nor remote hands to locally plug storage so no quick way to test that versions.

On 12.4 release, passthru fails at reboot 32.

Intel desktop continue rebooting the vm with passthru fine since yesterday, 2377 successful reboots right now so no problem there.
Comment 15 attila.kover 2023-08-26 15:24:32 UTC
(In reply to Raúl from comment #14)

I've pxe booted a diskless box with a fresh 12.0 and installed vm-bhyve via pkg.
The problem is already present in 12.0


A short snippet from the serial console log of the host after about a dozen vm reboots:

  [CMD82, off:0x540] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD83, off:0x550] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD84, off:0x560] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD85, off:0x570] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD86, off:0x580] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD87, off:0x590] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD88, off:0x5a0] opcode= 0x1 0x3b72829 0x0 0xa5a5
ivhd0: Error: completion failed tail:0x660, head:0x30.
ivhd0: Dump all the commands:
  [CMD0, off:0x20] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD1, off:0x30] opcode= 0x3 0x10 0xe 0x7ffffffffffff003
  [CMD2, off:0x40] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD3, off:0x50] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD4, off:0x60] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD5, off:0x70] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD6, off:0x80] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD7, off:0x90] opcode= 0x1 0x3b72829 0x0 0xa5a5
  [CMD8, off:0xa0] opcode= 0x1 0x3b72829 0x0 0xa5a5
Comment 16 Santiago Martinez 2023-08-27 15:48:50 UTC
Hi Raul, 

I'm seeing the same issue on AMD EPYC proc. Checking on kernel.org (Linux) seems that they also had issues with AMD-VI. In the Linux world, many people are using iommu=pt to overcome this. This is also a known bug on Redhat KB.

I'm running a script similar to yours and the server behaves quite erratic.

My script  does the following:

- Start and stop 200 times a VM with a PCI pass (in this case is a SRIOV VF, but it does the same without SRIOV, or with any other device, non-network related).  - After that 200 times, it reboots the server. 
- When the server starts it runs the script again.

Sometimes, the script can start and stop the VM 200 times, even if I see IVH errors (command not completed or cmd error), and sometimes can only start and stop the VM once, and the server reboots after a few IO_PAGE_FAULT (something gets corrupted and the NVME stops responding and machines reboots after command retry-timeout).

The server showing the issue is a SuperMicro H12SSW-NT.
- AMD EPYC 7552 48-Core Processor                

I have updated the BIOS to the latest release as on the Linux forum they mentioned issues with the SP3.

Michael Dexter and I  also tried to replicate it on other AMD processors without any success.
- AMD EPYC 7702P 64-Core Processor
- AMD Ryzen 7 3700X 8-Core Processor 
- Ryzen 6800H
Comment 17 Raúl 2023-08-28 11:53:18 UTC
(In reply to attila.kover from comment #15)

So no good known version to bisect.

Thinking about my initial statement, 'since 13.1', it's evident that was wrong. What I think happened at that time it's that we started moving from xeon e3/e5/e7 to ryzen/epyc. Maybe passthru has never worked fine on AMD :/
Comment 18 Raúl 2023-08-28 12:40:16 UTC
(In reply to Santiago Martinez from comment #16)

Hi Santiago,

never thought it can be processor related, I've seen this problem on a 3700x.
Two hundred successful reboots? looks like too much without problems.
I'll try on a 'AMD Ryzen 9 5950X 16-Core Processor'.

About strange behavior after reboot, on that 3700x, changing lots of times sleeping time between pooling on a Mellanox mlx5 VF from a linux guest (ubuntu 22.04 with nvidia ofed driver, dpdk 23.03), the adapter stop working after a bhyve host reboot. That reboot was not related to that interface, was working fine. After reboot, no errors at all, neither on PF or VF, but no ping anywhere. Rebooting again didn't solve it, reset button did. Something happened that needed 'hardware' reset. Nics nowadays are too smart and alive XD. Maybe something more to consider on this problem?.
Comment 19 Santiago Martinez 2023-08-29 12:20:48 UTC
Hi Raul, 

I got another AMD EPYC server failing with the same error. 

We definitely have something broken with AMD and IOMMU.


Santi
Comment 20 Michael Dexter 2023-10-07 20:25:37 UTC
Raúl,

Thank you for reporting and tracking this.

Perhaps adjust the title to include "AMD IOMMU" with something like:

AMD IOMMU/bhyve/PCI passthru fails with "ivhd, ILLEGAL CMD, IO_PAGE_FAULT"

All the best,

Michael
Comment 21 Raúl 2023-10-07 21:23:22 UTC
(In reply to Michael Dexter from comment #20)

Michael,

thanks a lot for your help.
This is the only reason preventing us moving to ryzen/epyc based servers and they are great.
Comment 22 depeo 2024-04-25 16:23:37 UTC
Same issue here with 14.0-RELEASE-p6 + x710 sr-iov.

ivhd0: Error: completion failed tail:0xdc0, head:0x0.
ivhd0: Dump last 1 command(s):
  [CMD0, off:0xff0] opcode= 0x1 0x2696f31 0x0 0xa5a5
ivhd0: Error: completion failed tail:0xe70, head:0x0.
ivhd0: Dump last 1 command(s):
  [CMD0, off:0xff0] opcode= 0x1 0x2696f31 0x0 0xa5a5
ivhd0: Error: completion failed tail:0xf20, head:0x0.
ivhd0: Dump last 1 command(s):
  [CMD0, off:0xff0] opcode= 0x1 0x2696f31 0x0 0xa5a5
ivhd0: Error: completion failed tail:0xfd0, head:0x0.
ivhd0: Dump last 1 command(s):
  [CMD0, off:0xff0] opcode= 0x1 0x2696f31 0x0 0xa5a5O
....