Bug 271888 - NVME polled command failed to complete within 10s.
Summary: NVME polled command failed to complete within 10s.
Status: Closed Unable to Reproduce
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on: 273022
Blocks:
  Show dependency treegraph
 
Reported: 2023-06-07 18:22 UTC by Ed Maste
Modified: 2023-08-22 21:45 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ed Maste freebsd_committer freebsd_triage 2023-06-07 18:22:37 UTC
Testing on some experimental hardware I observed a panic with the subject above. With a patch to turn the panic into a printf I can see:

nvme0: <Generic NVMe Device> mem 0xb6300000-0xb6303fff at device 0.0 numa-domain 1 on pci19
nvme0: RECOVERY_START 12364819237 vs 8907038712
nvme0: Completions present in output without an interrupt
NVME polled command failed to complete within 10s. <- downgraded panic
nvme0: nvme_ctrlr_set_num_qpairs failed!
nvme0: failing queued i/0
nvme0: SET FEATURES (09) sqid:0 cid:0 nsid:0 cdw10:00000007 cdw11:007f007f
nvme0: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:0 cid:0 cdw0:0
Comment 1 Ed Maste freebsd_committer freebsd_triage 2023-08-22 21:45:47 UTC
Was caused by being unable to deliver interrupts to APIC IDs > 255 (due to lack of an AMD IOMMU). After a few fixes/workarounds and a hack to avoid assigning interrupts to CPUs with an APIC ID > 255 this issue is no longer reproducible. (see "See Also" PRs for related fixes/workarounds).