Created attachment 207200 [details]
nvme resume patch
CURRENT r351747 fixes the 30s delay after resume for some users.
12.0-STABLE currently does not resume properly for me due to some other problem that I have not investigated, but unrelated to nvme (12.0-RELEASE works, but with the nvme timeout delay).
I applied the changes to 12.0-STABLE and then copied the entire sys/dev/nvme to my 12.0-RELEASE src tree. After rebuild and install, the nvme timeout problem is fixed. I assume this would work just as well on full 12.0-STABLE, but I can't test this just yet.
Note: base r351747 has "MFC After: 1 week" noted in the commit log message, so MFC is anticipated.
^Triage: Assign to base r351747 committer
A commit references this bug:
Date: Thu Sep 5 23:40:39 UTC 2019
New revision: 351914
Implement nvme suspend / resume for pci attachment
Note: this is merged ~9 hours early due to a desire to have it in before feature
freeze in 20 minutes. Several reports of it working in current (and one that it
worked in -stable after copying all of -current's nvme driver) gives me
confidence that bending the rules a little here is the right trade-off.
Merge to 12. Will not merge to 11. It's too much work: the bigendian changes made the driver too different and we can't MFC that change because it totally breaks compatibility. This change depends on other changes and I simply do not have the time to sort it all out.
(No relevance to 11, but)
Minor clarification: I am using -stable's nvme driver on 12-release, not -current's driver on -stable.
Great to see it has made it in in time for 12.1 release.
Theron: if you could test out the now-stable driver on 12.0, that would be great. The suspend / resume code follows the standard-recommended sequence now to shutdown the card and to restart it and rebuild all the host-initialized data structures in the card before continuing. 95% of the issues likely would heave been resolved by tossing a nvme_ctlr_hw_reset() in the resume path, but those other 5% of cases can lead to data corruption...
(In reply to Warner Losh from comment #5)
"test out the now-stable driver on 12.0"
I'm not exactly sure whether you meant to test on 12.0-RELEASE or the latest -stable (which is now 12.1).
Now I am running -stable (12.1-PRERELEASE #0 r352026M) with working resume (my previous problem was from foolishly leaving -release tree in /usr/src and rebuilding drm-kmod against it).
The nvme timeout problem appears fully fixed. No timeout after resume, and no nvme messages from kernel, even if suspending in the middle of something like "dd if=/dev/urandom of=temp bs=1".
Unfortunately the problem in 12-STABLE / 12.1-RELEASE is not actually fixed: I just tried again with (FreeBSD 12.1-PRERELEASE #6 r352026M) (old, but the one I said I tested before) and just had the problem. It is an intermittent issue, so I guess I just happened to not encounter it after several tests before.