Bug 271888

Summary: NVME polled command failed to complete within 10s.
Product: Base System Reporter: Ed Maste <emaste>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed Unable to Reproduce    
Severity: Affects Only Me CC: grahamperrin, imp, markj, thierry
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=272135
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=269572
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273022
Bug Depends on: 273022    
Bug Blocks:    

Description Ed Maste freebsd_committer freebsd_triage 2023-06-07 18:22:37 UTC
Testing on some experimental hardware I observed a panic with the subject above. With a patch to turn the panic into a printf I can see:

nvme0: <Generic NVMe Device> mem 0xb6300000-0xb6303fff at device 0.0 numa-domain 1 on pci19
nvme0: RECOVERY_START 12364819237 vs 8907038712
nvme0: Completions present in output without an interrupt
NVME polled command failed to complete within 10s. <- downgraded panic
nvme0: nvme_ctrlr_set_num_qpairs failed!
nvme0: failing queued i/0
nvme0: SET FEATURES (09) sqid:0 cid:0 nsid:0 cdw10:00000007 cdw11:007f007f
nvme0: ABORTED - BY REQUEST (00/07) crd:0 m:0 dnr:0 sqid:0 cid:0 cdw0:0
Comment 1 Ed Maste freebsd_committer freebsd_triage 2023-08-22 21:45:47 UTC
Was caused by being unable to deliver interrupts to APIC IDs > 255 (due to lack of an AMD IOMMU). After a few fixes/workarounds and a hack to avoid assigning interrupts to CPUs with an APIC ID > 255 this issue is no longer reproducible. (see "See Also" PRs for related fixes/workarounds).