Bug 232466

Summary: nvme is not shut properly for suspend/resume
Product: Base System Reporter: Poul-Henning Kamp <phk>
Component: kernAssignee: freebsd-scsi (Nobody) <scsi>
Status: Closed FIXED    
Severity: Affects Many People CC: bsd, bsdimp, bwidawsk, cem, dch, freebsdbugs, imp, ports, rudolphfroger, scottl
Priority: ---    
Version: CURRENT   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
Test patch for freezing devq none

Description Poul-Henning Kamp freebsd_committer freebsd_triage 2018-10-20 08:49:17 UTC
On a Thinkpad T480, trying to suspend/resume results in these messages on resume:

   nvme0: Resetting controller due to a timeout.
   nvme0: resetting controller
   nvme0: aborting outstanding i/o
   nvme0: READ sqid:1 cid:32 nsid:1 lba:75554599 len:8
   nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:32 cdw0:0
   nvme0: aborting outstanding i/o
   nvme0: WRITE sqid:1 cid:39 nsid:1 lba:43225663 len:64
   nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:39 cdw0:0
   nvme0: aborting outstanding i/o

Sometimes there is also filesystem corruption.

I als see increments to "Unsafe Shutdowns" in smartctl -a

Various google searches indicates that this may be a general problem for modern laptops.

Most recently see on -current r339250M
Comment 1 Ben Widawsky freebsd_committer freebsd_triage 2018-10-23 18:39:38 UTC
Created attachment 198507 [details]
Test patch for freezing devq

I don't think this is the right fix (according to Warner), but it looks like it should be needed in addition to the correct fix.
Comment 2 Warner Losh freebsd_committer freebsd_triage 2018-10-23 19:11:45 UTC
The devq freezing is racy.
Comment 3 Ben Widawsky freebsd_committer freebsd_triage 2018-10-23 20:15:41 UTC
Could you explain how for my edification (not arguing, I just don't see it).
Comment 4 Scott Long freebsd_committer freebsd_triage 2018-10-23 21:14:23 UTC
Calling cam_freeze_devq() doesn't guarantee that nothing is in-flight when it returns.  A lesser problem is that it (and xpt_freeze_simq()) also doesn't prevent sending CCBs to the sim that aren't flagged with XPT_FC_QUEUED.
Comment 5 Ivan 2018-12-29 12:30:41 UTC
Is this (racy) fix better than no fix at all? 
I have this messages on resume, but after 10 secs of freeze, the system resumes normal operation. It's OK for me as log as it doesn't damage ZFS.
Comment 6 Warner Losh freebsd_committer freebsd_triage 2021-07-09 18:22:05 UTC
This has been fixed in two ways. First, we properly shutdown the controller for suspend / resume. This ensures that all I/O is drained. Second, before we get to the shutdown, the system now syncs all mounted filesystems ensuring a stable point.