Created attachment 207200 [details] nvme resume patch CURRENT r351747 fixes the 30s delay after resume for some users. 12.0-STABLE currently does not resume properly for me due to some other problem that I have not investigated, but unrelated to nvme (12.0-RELEASE works, but with the nvme timeout delay). I applied the changes to 12.0-STABLE and then copied the entire sys/dev/nvme to my 12.0-RELEASE src tree. After rebuild and install, the nvme timeout problem is fixed. I assume this would work just as well on full 12.0-STABLE, but I can't test this just yet.
Note: base r351747 has "MFC After: 1 week" noted in the commit log message, so MFC is anticipated. ^Triage: Assign to base r351747 committer
A commit references this bug: Author: imp Date: Thu Sep 5 23:40:39 UTC 2019 New revision: 351914 URL: https://svnweb.freebsd.org/changeset/base/351914 Log: MFC r351747: Implement nvme suspend / resume for pci attachment Note: this is merged ~9 hours early due to a desire to have it in before feature freeze in 20 minutes. Several reports of it working in current (and one that it worked in -stable after copying all of -current's nvme driver) gives me confidence that bending the rules a little here is the right trade-off. PR: 240340 Relnotes: Yes Changes: _U stable/12/ stable/12/sys/dev/nvme/nvme.c stable/12/sys/dev/nvme/nvme_ctrlr.c stable/12/sys/dev/nvme/nvme_pci.c stable/12/sys/dev/nvme/nvme_private.h
Merge to 12. Will not merge to 11. It's too much work: the bigendian changes made the driver too different and we can't MFC that change because it totally breaks compatibility. This change depends on other changes and I simply do not have the time to sort it all out.
(No relevance to 11, but) Minor clarification: I am using -stable's nvme driver on 12-release, not -current's driver on -stable. Great to see it has made it in in time for 12.1 release.
Theron: if you could test out the now-stable driver on 12.0, that would be great. The suspend / resume code follows the standard-recommended sequence now to shutdown the card and to restart it and rebuild all the host-initialized data structures in the card before continuing. 95% of the issues likely would heave been resolved by tossing a nvme_ctlr_hw_reset() in the resume path, but those other 5% of cases can lead to data corruption...
(In reply to Warner Losh from comment #5) "test out the now-stable driver on 12.0" I'm not exactly sure whether you meant to test on 12.0-RELEASE or the latest -stable (which is now 12.1). Now I am running -stable (12.1-PRERELEASE #0 r352026M) with working resume (my previous problem was from foolishly leaving -release tree in /usr/src and rebuilding drm-kmod against it). The nvme timeout problem appears fully fixed. No timeout after resume, and no nvme messages from kernel, even if suspending in the middle of something like "dd if=/dev/urandom of=temp bs=1".
Unfortunately the problem in 12-STABLE / 12.1-RELEASE is not actually fixed: I just tried again with (FreeBSD 12.1-PRERELEASE #6 r352026M) (old, but the one I said I tested before) and just had the problem. It is an intermittent issue, so I guess I just happened to not encounter it after several tests before.