Testing on some experimental hardware I observed a panic with the subject above. With a patch to turn the panic into a printf I can see: nvme0: <Generic NVMe Device> mem 0xb6300000-0xb6303fff at device 0.0 numa-domain 1 on pci19 nvme0: RECOVERY_START 12364819237 vs 8907038712 nvme0: Completions present in output without an interrupt NVME polled command failed to complete within 10s. <- downgraded panic nvme0: nvme_ctrlr_set_num_qpairs failed! nvme0: failing queued i/0 nvme0: SET FEATURES (09) sqid:0 cid:0 nsid:0 cdw10:00000007 cdw11:007f007f nvme0: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:0 cid:0 cdw0:0
Was caused by being unable to deliver interrupts to APIC IDs > 255 (due to lack of an AMD IOMMU). After a few fixes/workarounds and a hack to avoid assigning interrupts to CPUs with an APIC ID > 255 this issue is no longer reproducible. (see "See Also" PRs for related fixes/workarounds).