On a Thinkpad T480, trying to suspend/resume results in these messages on resume: nvme0: Resetting controller due to a timeout. nvme0: resetting controller nvme0: aborting outstanding i/o nvme0: READ sqid:1 cid:32 nsid:1 lba:75554599 len:8 nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:32 cdw0:0 nvme0: aborting outstanding i/o nvme0: WRITE sqid:1 cid:39 nsid:1 lba:43225663 len:64 nvme0: ABORTED - BY REQUEST (00/07) sqid:1 cid:39 cdw0:0 nvme0: aborting outstanding i/o Sometimes there is also filesystem corruption. I als see increments to "Unsafe Shutdowns" in smartctl -a Various google searches indicates that this may be a general problem for modern laptops. Most recently see on -current r339250M
Created attachment 198507 [details] Test patch for freezing devq I don't think this is the right fix (according to Warner), but it looks like it should be needed in addition to the correct fix.
The devq freezing is racy.
Could you explain how for my edification (not arguing, I just don't see it).
Calling cam_freeze_devq() doesn't guarantee that nothing is in-flight when it returns. A lesser problem is that it (and xpt_freeze_simq()) also doesn't prevent sending CCBs to the sim that aren't flagged with XPT_FC_QUEUED.
Is this (racy) fix better than no fix at all? I have this messages on resume, but after 10 secs of freeze, the system resumes normal operation. It's OK for me as log as it doesn't damage ZFS.
This has been fixed in two ways. First, we properly shutdown the controller for suspend / resume. This ensures that all I/O is drained. Second, before we get to the shutdown, the system now syncs all mounted filesystems ensuring a stable point.