Bug 232466 - nvme is not shut properly for suspend/resume
Summary: nvme is not shut properly for suspend/resume
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-scsi (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-10-20 08:49 UTC by Poul-Henning Kamp
Modified: 2021-07-09 18:22 UTC (History)
10 users (show)

See Also:


Attachments
Test patch for freezing devq (3.49 KB, patch)
2018-10-23 18:39 UTC, Ben Widawsky
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Poul-Henning Kamp freebsd_committer freebsd_triage 2018-10-20 08:49:17 UTC
On a Thinkpad T480, trying to suspend/resume results in these messages on resume:

   nvme0: Resetting controller due to a timeout.
   nvme0: resetting controller
   nvme0: aborting outstanding i/o
   nvme0: READ sqid:1 cid:32 nsid:1 lba:75554599 len:8
   nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:32 cdw0:0
   nvme0: aborting outstanding i/o
   nvme0: WRITE sqid:1 cid:39 nsid:1 lba:43225663 len:64
   nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:39 cdw0:0
   nvme0: aborting outstanding i/o

Sometimes there is also filesystem corruption.

I als see increments to "Unsafe Shutdowns" in smartctl -a

Various google searches indicates that this may be a general problem for modern laptops.

Most recently see on -current r339250M
Comment 1 Ben Widawsky freebsd_committer freebsd_triage 2018-10-23 18:39:38 UTC
Created attachment 198507 [details]
Test patch for freezing devq

I don't think this is the right fix (according to Warner), but it looks like it should be needed in addition to the correct fix.
Comment 2 Warner Losh freebsd_committer freebsd_triage 2018-10-23 19:11:45 UTC
The devq freezing is racy.
Comment 3 Ben Widawsky freebsd_committer freebsd_triage 2018-10-23 20:15:41 UTC
Could you explain how for my edification (not arguing, I just don't see it).
Comment 4 Scott Long freebsd_committer freebsd_triage 2018-10-23 21:14:23 UTC
Calling cam_freeze_devq() doesn't guarantee that nothing is in-flight when it returns.  A lesser problem is that it (and xpt_freeze_simq()) also doesn't prevent sending CCBs to the sim that aren't flagged with XPT_FC_QUEUED.
Comment 5 Ivan 2018-12-29 12:30:41 UTC
Is this (racy) fix better than no fix at all? 
I have this messages on resume, but after 10 secs of freeze, the system resumes normal operation. It's OK for me as log as it doesn't damage ZFS.
Comment 6 Warner Losh freebsd_committer freebsd_triage 2021-07-09 18:22:05 UTC
This has been fixed in two ways. First, we properly shutdown the controller for suspend / resume. This ensures that all I/O is drained. Second, before we get to the shutdown, the system now syncs all mounted filesystems ensuring a stable point.