232466 – nvme is not shut properly for suspend/resume

Bug 232466 - nvme is not shut properly for suspend/resume

Summary: nvme is not shut properly for suspend/resume

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	amd64 Any

Importance:	--- Affects Many People
Assignee:	freebsd-scsi (Nobody)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2018-10-20 08:49 UTC by Poul-Henning Kamp
Modified:	2021-07-09 18:22 UTC (History)
CC List:	10 users (show)

See Also:

Attachments
Test patch for freezing devq (3.49 KB, patch) 2018-10-23 18:39 UTC, Ben Widawsky	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Poul-Henning Kamp freebsd_committer

2018-10-20 08:49:17 UTC

On a Thinkpad T480, trying to suspend/resume results in these messages on resume:

   nvme0: Resetting controller due to a timeout.
   nvme0: resetting controller
   nvme0: aborting outstanding i/o
   nvme0: READ sqid:1 cid:32 nsid:1 lba:75554599 len:8
   nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:32 cdw0:0
   nvme0: aborting outstanding i/o
   nvme0: WRITE sqid:1 cid:39 nsid:1 lba:43225663 len:64
   nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:39 cdw0:0
   nvme0: aborting outstanding i/o

Sometimes there is also filesystem corruption.

I als see increments to "Unsafe Shutdowns" in smartctl -a

Various google searches indicates that this may be a general problem for modern laptops.

Most recently see on -current r339250M

Comment 1 Ben Widawsky freebsd_committer

2018-10-23 18:39:38 UTC

Created attachment 198507 [details]
Test patch for freezing devq

I don't think this is the right fix (according to Warner), but it looks like it should be needed in addition to the correct fix.

Comment 2 Warner Losh freebsd_committer

2018-10-23 19:11:45 UTC

The devq freezing is racy.

Comment 3 Ben Widawsky freebsd_committer

2018-10-23 20:15:41 UTC

Could you explain how for my edification (not arguing, I just don't see it).

Comment 4 Scott Long freebsd_committer

2018-10-23 21:14:23 UTC

Calling cam_freeze_devq() doesn't guarantee that nothing is in-flight when it returns.  A lesser problem is that it (and xpt_freeze_simq()) also doesn't prevent sending CCBs to the sim that aren't flagged with XPT_FC_QUEUED.

Comment 5 Ivan 2018-12-29 12:30:41 UTC

Is this (racy) fix better than no fix at all? 
I have this messages on resume, but after 10 secs of freeze, the system resumes normal operation. It's OK for me as log as it doesn't damage ZFS.

Comment 6 Warner Losh freebsd_committer

2021-07-09 18:22:05 UTC

This has been fixed in two ways. First, we properly shutdown the controller for suspend / resume. This ensures that all I/O is drained. Second, before we get to the shutdown, the system now syncs all mounted filesystems ensuring a stable point.