Bug 211852 - Unsafe shutdowns on Intel 750 SSD
Summary: Unsafe shutdowns on Intel 750 SSD
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.3-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: patch
Depends on:
Blocks:
 
Reported: 2016-08-15 00:33 UTC by Rajil Saraswat
Modified: 2018-02-01 16:47 UTC (History)
5 users (show)

See Also:


Attachments
Patch to resolve issue (1.27 KB, patch)
2017-08-12 05:25 UTC, Nathan Whitehorn
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Rajil Saraswat 2016-08-15 00:33:03 UTC
Hello,

I am using a Intel 750 SSD AIC as an SLOG. Each time I poweroff the server the 'Unsafe Shutdowns' counter increments for the SSD. It seems that the Shutdown command is not being sent to the SSD.



The smart data is as follows:

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:  0x00
Temperature:  37 Celsius
Available Spare:  100%
Available Spare Threshold:  10%
Percentage Used:  0%
Data Units Read:  494 [252 MB]
Data Units Written:  468,568 [239 GB]
Host Read Commands:  5,865
Host Write Commands:  6,246,687
Controller Busy Time:  0
Power Cycles:  11
Power On Hours:  352
Unsafe Shutdowns:  8
Media and Data Integrity Errors:  0
Error Information Log Entries:  0

Error Information (NVMe Log 0x01, max 64 entries)
Comment 1 Rajil Saraswat 2016-08-15 01:34:12 UTC
The smartdata was generated using smartctl -a /dev/nvme0
Comment 2 Ravi Pokala 2017-01-21 03:15:08 UTC
Neat, I didn't know `smartctl' had been extended to understand NVMe! :-)

In any case, it the code for handling power-down looks grossly correct:

sys/dev/nvme/nvme_ctrlr.c (r308431)
1184 void
1185 nvme_ctrlr_shutdown(struct nvme_controller *ctrlr)
1186 {
1187         union cc_register       cc;
1188         union csts_register     csts;
1189         int                     ticks = 0;
1190 
1191         cc.raw = nvme_mmio_read_4(ctrlr, cc);
1192         cc.bits.shn = NVME_SHN_NORMAL;
1193         nvme_mmio_write_4(ctrlr, cc, cc.raw);
1194         csts.raw = nvme_mmio_read_4(ctrlr, csts);
1195         while ((csts.bits.shst != NVME_SHST_COMPLETE) && (ticks++ < 5*hz)) {
1196                 pause("nvme shn", 1);
1197                 csts.raw = nvme_mmio_read_4(ctrlr, csts);
1198         }
1199         if (csts.bits.shst != NVME_SHST_COMPLETE)
1200                 nvme_printf(ctrlr, "did not complete shutdown within 5 seconds "
1201                     "of notification\n");
1202 }

In English, that's roughly: notify the controller about a normal shutdown (as opposed to an "abrupt" shutdown), then wait until the controller status indicates that shutdown is complete; if the controller doesn't indicate complete shutdown within 5 seconds, print a log message and continue anyway.

It has been in that state since r254302 (2013-08-13). (That's in -HEAD, but the same code is in 10.3-RELEASE.)

Hmmm... In NVMe-1.2.1, section 7.6.2:

"It is recommended that the host wait a minimum of the RTD3 Entry Latency reported in the Identify Controller data structure for the shutdown operations to complete; if the value reported in RTD3 Entry Latency is 0h, then the host should wait for a minimum of one second."

The "RTD3 Entry Latency" is described in section 5.11, Figure 90:

"Bytes 91:88: RTD3 Entry Latency (RTD3E): This field indicates the typical latency in microseconds to enter Runtime D3 (RTD3). Refer to section 8.4.4 for test conditions. A value of 0h indicates RTD3 Entry Latency is not reported."

So, that hard-coded 5 seconds might not be correct. It looks like (struct nvme_controller_data) treats the part of the "Identify Controller" data structure which contains RTD3E as reserved. It looks like it was in fact reserved in NVMe-1.1, but was defined later.
Comment 3 Ravi Pokala 2017-01-21 03:22:33 UTC
As it happens, I also have an Intel 750; dumping it's identify info:

----------------------------------------------------------------
[threepio:dev/base/head] rpokala% sudo nvmecontrol identify -x nvme0
000: 80868086 51435643 32323435 304a3030 41303034 20204e47 45544e49 5353204c 
020: 44455044 3034574d 20344730 20202020 20202020 20202020 20202020 20202020 
040: 31564538 31373130 5cd2e400 00000500 00000000 00000000 00000000 00000000 
060: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
080: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
0a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
0c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
0e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
100: 03030006 003f0202 00000000 00000000 00000000 00000000 00000000 00000000 
120: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
140: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
160: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
180: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
1a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
1c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
1e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
200: 00004466 00000001 00000006 00000007 00000000 00000000 00000000 00000000 
220: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
240: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
260: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
280: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
2a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
----------------------------------------------------------------

RTD3E is in bytes 91:88, which are offsets 0x58-0x5B in hex, which are zeroed out. So, even if the code was updated to look at and honor RTD3E, it wouldn't help in this case. :-(

Perhaps we should make the SHN wait time configurable? Default to 5 seconds, overridden by RTD3E if present, overridden by per-device sysctl if set?
Comment 4 Nathan Whitehorn freebsd_committer freebsd_triage 2017-08-12 04:10:33 UTC
I have the identical problem on a Samsung 960 EVO SSD.
Comment 5 Nathan Whitehorn freebsd_committer freebsd_triage 2017-08-12 05:25:31 UTC
Created attachment 185303 [details]
Patch to resolve issue

This patch changes the ordering of the shutdown command to run through newbus instead of module unloading and fixes the problem on my Samsung 960. Could you please check it on the Intel 750?
Comment 6 commit-hook freebsd_committer freebsd_triage 2017-08-12 22:14:11 UTC
A commit references this bug:

Author: nwhitehorn
Date: Sat Aug 12 22:13:06 UTC 2017
New revision: 322443
URL: https://svnweb.freebsd.org/changeset/base/322443

Log:
  Move NVME controller shutdown from being called as part of module unloading
  to being called through the newbus DEVICE_SHUTDOWN() path. This ensures that
  the NVME controller gets shut down before the device and bus disappear
  and prevents data corruption on shutdown on at least Samsung EVO 960 SSDs.

  PR:		kern/211852
  Reviewed by:	imp
  MFC after:	2 weeks

Changes:
  head/sys/dev/nvme/nvme.c
Comment 7 Rajil Saraswat 2017-08-13 03:55:43 UTC
(In reply to Nathan Whitehorn from comment #5)

Can i patch this against 11.1-release?
Comment 8 Nathan Whitehorn freebsd_committer freebsd_triage 2017-08-13 05:11:54 UTC
(In reply to Rajil Saraswat from comment #7)

Assuming the patch applies, yes.
Comment 9 Ravi Pokala 2017-08-13 20:05:38 UTC
Repeating my earlier question:

> Perhaps we should make the SHN wait time configurable? Default to 5 seconds, overridden by RTD3E if present, overridden by per-device sysctl if set?
Comment 10 commit-hook freebsd_committer freebsd_triage 2018-02-01 16:45:41 UTC
A commit references this bug:

Author: mav
Date: Thu Feb  1 16:45:08 UTC 2018
New revision: 328684
URL: https://svnweb.freebsd.org/changeset/base/328684

Log:
  MFC r322443 (by nwhitehorn):
  Move NVME controller shutdown from being called as part of module unloading
  to being called through the newbus DEVICE_SHUTDOWN() path. This ensures that
  the NVME controller gets shut down before the device and bus disappear
  and prevents data corruption on shutdown on at least Samsung EVO 960 SSDs.

  PR:		kern/211852

Changes:
_U  stable/11/
  stable/11/sys/dev/nvme/nvme.c