Bug 240340 - MFC r351747 Implement nvme suspend / resume for pci attachment
Summary: MFC r351747 Implement nvme suspend / resume for pci attachment
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Warner Losh
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-05 01:40 UTC by Theron Tarigo
Modified: 2019-11-27 06:11 UTC (History)
1 user (show)

See Also:
imp: mfc-stable12+
koobs: mfc-stable11-


Attachments
nvme resume patch (8.40 KB, patch)
2019-09-05 01:40 UTC, Theron Tarigo
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Theron Tarigo 2019-09-05 01:40:48 UTC
Created attachment 207200 [details]
nvme resume patch

CURRENT r351747 fixes the 30s delay after resume for some users.

12.0-STABLE currently does not resume properly for me due to some other problem that I have not investigated, but unrelated to nvme (12.0-RELEASE works, but with the nvme timeout delay).

I applied the changes to 12.0-STABLE and then copied the entire sys/dev/nvme to my 12.0-RELEASE src tree.  After rebuild and install, the nvme timeout problem is fixed.  I assume this would work just as well on full 12.0-STABLE, but I can't test this just yet.
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2019-09-05 12:01:28 UTC
Note: base r351747 has "MFC After: 1 week" noted in the commit log message, so MFC is anticipated.

^Triage: Assign to base r351747 committer
Comment 2 commit-hook freebsd_committer freebsd_triage 2019-09-05 23:40:57 UTC
A commit references this bug:

Author: imp
Date: Thu Sep  5 23:40:39 UTC 2019
New revision: 351914
URL: https://svnweb.freebsd.org/changeset/base/351914

Log:
  MFC r351747:

    Implement nvme suspend / resume for pci attachment

  Note: this is merged ~9 hours early due to a desire to have it in before feature
  freeze in 20 minutes. Several reports of it working in current (and one that it
  worked in -stable after copying all of -current's nvme driver) gives me
  confidence that bending the rules a little here is the right trade-off.

  PR: 240340
  Relnotes: Yes

Changes:
_U  stable/12/
  stable/12/sys/dev/nvme/nvme.c
  stable/12/sys/dev/nvme/nvme_ctrlr.c
  stable/12/sys/dev/nvme/nvme_pci.c
  stable/12/sys/dev/nvme/nvme_private.h
Comment 3 Warner Losh freebsd_committer freebsd_triage 2019-09-06 14:50:49 UTC
Merge to 12. Will not merge to 11. It's too much work: the bigendian changes made the driver too different and we can't MFC that change because it totally breaks compatibility. This change depends on other changes and I simply do not have the time to sort it all out.
Comment 4 Theron Tarigo 2019-09-06 14:59:41 UTC
(No relevance to 11, but)
Minor clarification: I am using -stable's nvme driver on 12-release, not -current's driver on -stable.

Great to see it has made it in in time for 12.1 release.
Comment 5 Warner Losh freebsd_committer freebsd_triage 2019-09-06 15:07:08 UTC
Theron: if you could test out the now-stable driver on 12.0, that would be great. The suspend / resume code follows the standard-recommended sequence now to shutdown the card and to restart it and rebuild all the host-initialized data structures in the card before continuing. 95% of the issues likely would heave been resolved by tossing a nvme_ctlr_hw_reset() in the resume path, but those other 5% of cases can lead to data corruption...
Comment 6 Theron Tarigo 2019-09-08 01:44:44 UTC
(In reply to Warner Losh from comment #5)
"test out the now-stable driver on 12.0"
I'm not exactly sure whether you meant to test on 12.0-RELEASE or the latest -stable (which is now 12.1).

Now I am running -stable (12.1-PRERELEASE #0 r352026M) with working resume (my previous problem was from foolishly leaving -release tree in /usr/src and rebuilding drm-kmod against it).

The nvme timeout problem appears fully fixed.  No timeout after resume, and no nvme messages from kernel, even if suspending in the middle of something like "dd if=/dev/urandom of=temp bs=1".
Comment 7 Theron Tarigo 2019-11-27 06:11:13 UTC
Unfortunately the problem in 12-STABLE / 12.1-RELEASE is not actually fixed: I just tried again with (FreeBSD 12.1-PRERELEASE #6 r352026M) (old, but the one I said I tested before) and just had the problem.  It is an intermittent issue, so I guess I just happened to not encounter it after several tests before.