Hello, I am using a Intel 750 SSD AIC as an SLOG. Each time I poweroff the server the 'Unsafe Shutdowns' counter increments for the SSD. It seems that the Shutdown command is not being sent to the SSD. The smart data is as follows: === START OF SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff) Critical Warning: 0x00 Temperature: 37 Celsius Available Spare: 100% Available Spare Threshold: 10% Percentage Used: 0% Data Units Read: 494 [252 MB] Data Units Written: 468,568 [239 GB] Host Read Commands: 5,865 Host Write Commands: 6,246,687 Controller Busy Time: 0 Power Cycles: 11 Power On Hours: 352 Unsafe Shutdowns: 8 Media and Data Integrity Errors: 0 Error Information Log Entries: 0 Error Information (NVMe Log 0x01, max 64 entries)
The smartdata was generated using smartctl -a /dev/nvme0
Neat, I didn't know `smartctl' had been extended to understand NVMe! :-) In any case, it the code for handling power-down looks grossly correct: sys/dev/nvme/nvme_ctrlr.c (r308431) 1184 void 1185 nvme_ctrlr_shutdown(struct nvme_controller *ctrlr) 1186 { 1187 union cc_register cc; 1188 union csts_register csts; 1189 int ticks = 0; 1190 1191 cc.raw = nvme_mmio_read_4(ctrlr, cc); 1192 cc.bits.shn = NVME_SHN_NORMAL; 1193 nvme_mmio_write_4(ctrlr, cc, cc.raw); 1194 csts.raw = nvme_mmio_read_4(ctrlr, csts); 1195 while ((csts.bits.shst != NVME_SHST_COMPLETE) && (ticks++ < 5*hz)) { 1196 pause("nvme shn", 1); 1197 csts.raw = nvme_mmio_read_4(ctrlr, csts); 1198 } 1199 if (csts.bits.shst != NVME_SHST_COMPLETE) 1200 nvme_printf(ctrlr, "did not complete shutdown within 5 seconds " 1201 "of notification\n"); 1202 } In English, that's roughly: notify the controller about a normal shutdown (as opposed to an "abrupt" shutdown), then wait until the controller status indicates that shutdown is complete; if the controller doesn't indicate complete shutdown within 5 seconds, print a log message and continue anyway. It has been in that state since r254302 (2013-08-13). (That's in -HEAD, but the same code is in 10.3-RELEASE.) Hmmm... In NVMe-1.2.1, section 7.6.2: "It is recommended that the host wait a minimum of the RTD3 Entry Latency reported in the Identify Controller data structure for the shutdown operations to complete; if the value reported in RTD3 Entry Latency is 0h, then the host should wait for a minimum of one second." The "RTD3 Entry Latency" is described in section 5.11, Figure 90: "Bytes 91:88: RTD3 Entry Latency (RTD3E): This field indicates the typical latency in microseconds to enter Runtime D3 (RTD3). Refer to section 8.4.4 for test conditions. A value of 0h indicates RTD3 Entry Latency is not reported." So, that hard-coded 5 seconds might not be correct. It looks like (struct nvme_controller_data) treats the part of the "Identify Controller" data structure which contains RTD3E as reserved. It looks like it was in fact reserved in NVMe-1.1, but was defined later.
As it happens, I also have an Intel 750; dumping it's identify info: ---------------------------------------------------------------- [threepio:dev/base/head] rpokala% sudo nvmecontrol identify -x nvme0 000: 80868086 51435643 32323435 304a3030 41303034 20204e47 45544e49 5353204c 020: 44455044 3034574d 20344730 20202020 20202020 20202020 20202020 20202020 040: 31564538 31373130 5cd2e400 00000500 00000000 00000000 00000000 00000000 060: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 080: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 100: 03030006 003f0202 00000000 00000000 00000000 00000000 00000000 00000000 120: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 140: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 160: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 180: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 200: 00004466 00000001 00000006 00000007 00000000 00000000 00000000 00000000 220: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 240: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 260: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 280: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 2a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ---------------------------------------------------------------- RTD3E is in bytes 91:88, which are offsets 0x58-0x5B in hex, which are zeroed out. So, even if the code was updated to look at and honor RTD3E, it wouldn't help in this case. :-( Perhaps we should make the SHN wait time configurable? Default to 5 seconds, overridden by RTD3E if present, overridden by per-device sysctl if set?
I have the identical problem on a Samsung 960 EVO SSD.
Created attachment 185303 [details] Patch to resolve issue This patch changes the ordering of the shutdown command to run through newbus instead of module unloading and fixes the problem on my Samsung 960. Could you please check it on the Intel 750?
A commit references this bug: Author: nwhitehorn Date: Sat Aug 12 22:13:06 UTC 2017 New revision: 322443 URL: https://svnweb.freebsd.org/changeset/base/322443 Log: Move NVME controller shutdown from being called as part of module unloading to being called through the newbus DEVICE_SHUTDOWN() path. This ensures that the NVME controller gets shut down before the device and bus disappear and prevents data corruption on shutdown on at least Samsung EVO 960 SSDs. PR: kern/211852 Reviewed by: imp MFC after: 2 weeks Changes: head/sys/dev/nvme/nvme.c
(In reply to Nathan Whitehorn from comment #5) Can i patch this against 11.1-release?
(In reply to Rajil Saraswat from comment #7) Assuming the patch applies, yes.
Repeating my earlier question: > Perhaps we should make the SHN wait time configurable? Default to 5 seconds, overridden by RTD3E if present, overridden by per-device sysctl if set?
A commit references this bug: Author: mav Date: Thu Feb 1 16:45:08 UTC 2018 New revision: 328684 URL: https://svnweb.freebsd.org/changeset/base/328684 Log: MFC r322443 (by nwhitehorn): Move NVME controller shutdown from being called as part of module unloading to being called through the newbus DEVICE_SHUTDOWN() path. This ensures that the NVME controller gets shut down before the device and bus disappear and prevents data corruption on shutdown on at least Samsung EVO 960 SSDs. PR: kern/211852 Changes: _U stable/11/ stable/11/sys/dev/nvme/nvme.c