Created attachment 188501 [details] nvme info on resuming from suspend the system stalls on any disk i/o, after console shows: freebsd nvme_ctrlr_wait_for_ready called with desired_val = 0 but cc.en = 1 - issue occurs frequently but not 100%, some resumes work fine - easily reproducible - X is running, as are any open terminals, until disk is required system is a dell xps13 laptop with a TOSHIBA NVMe. https://wiki.freebsd.org/Laptops/Dell_XPS13_9360 for full specs & dmesg/devinfo/diskinfo etc. https://s3.amazonaws.com/uploads.hipchat.com/8784/2508819/N0PyBJEhSHX3jKu/IMG_2693.JPG https://s3.amazonaws.com/uploads.hipchat.com/8784/2508819/eLtZ9caTMR7B6eb/IMG_2677.JPG # dmesg nvme0: <Generic NVMe Device> mem 0xdc000000-0xdc003fff at device 0.0 on pci4 nvd0: <THNSN5512GPUK NVMe TOSHIBA 512GB> NVMe namespace nvd0: 488386MB (1000215216 512 byte sectors) # nvmcontrol info nvme0: THNSN5512GPUK NVMe TOSHIBA 512GB nvme0ns1 (488386MB) Controller Capabilities/Features ================================ Vendor ID: 1179 Subsystem Vendor ID: 1179 Serial Number: 376B508IKSJU Model Number: THNSN5512GPUK NVMe TOSHIBA 512GB Firmware Version: 5KDA4103 Recommended Arb Burst: 1 IEEE OUI Identifier: 0d 08 00 Multi-Interface Cap: 00 Max Data Transfer Size: Unlimited Controller ID: 0x00 Admin Command Set Attributes ============================ Security Send/Receive: Supported Format NVM: Supported Firmware Activate/Download: Supported Namespace Managment: Not Supported Abort Command Limit: 4 Async Event Request Limit: 4 Number of Firmware Slots: 1 Firmware Slot 1 Read-Only: No Per-Namespace SMART Log: No Error Log Page Entries: 128 Number of Power States: 5 NVM Command Set Attributes ========================== Submission Queue Entry Size Max: 64 Min: 64 Completion Queue Entry Size Max: 16 Min: 16 Number of Namespaces: 1 Compare Command: Not Supported Write Uncorrectable Command: Supported Dataset Management Command: Supported Volatile Write Cache: Present Size (in LBAs): 1000215216 (953M) Capacity (in LBAs): 1000215216 (953M) Utilization (in LBAs): 1000215216 (953M) Thin Provisioning: Not Supported Number of LBA Formats: 2 Current LBA Format: LBA Format #00 LBA Format #00: Data Size: 512 Metadata Size: 0 LBA Format #01: Data Size: 4096 Metadata Size: 0
Created attachment 188502 [details] dmesg
# uname FreeBSD akai.skunkwerks.at 12.0-CURRENT FreeBSD 12.0-CURRENT #0 r325987+5aee85eae833(master): Sun Nov 19 04:34:13 UTC 2017 root@wintermute:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64
While most like not a fix, it would be worth testing the change proposed in D13389 [1]. The message you are seeing will definitely go away, but it isn't clear if the change would either a) let the driver proceed with the reset and life is good or b) fall down somewhere else. [1] https://reviews.freebsd.org/D13389
If you do, you'll have to add this device to the quirk list... I suspect some other issue is going on since the device is likely transitioning from power state D3 to D0. I haven't checked, but sometimes that takes a while and it's quite possible there's an other spot in the driver that needs either a fixed delay (yuck) or to poll something to become active before proceeding.
thanks Chuck & Warner for the info. I'll build with the patch later this week & report back. wrt adding a quirk, I'm not familiar with this. I see nothing in /sys/dev/nv* regarding these, so I assume they are specified in sys/cam/. I'll read up README.quirks and my FreeBSD Design & Implementation book, but if you have any pointers or similar commits that would be a big help. I've found: - https://reviews.freebsd.org/D13093 - https://forums.freebsd.org/threads/55210/ to start with.
actually it looks like https://reviews.freebsd.org/D13389#inline-80238 is all I need - sorry for the noise
It would be interesting to see what, if any, failure occurs without adding a specific quirk for your device. So if you have the time and inclination, I'd vote for experiment #1 to be the D13389 patch without changes and experiment #2 to be the D13389 patch plus an entry in pci_ids for your device with .quirks set to QUIRK_DELAY_B4_CHK_RDY.
experiment#1 underway..
early days but I've not had a repeat hang with this patch (yay). I'll close this later in the week if I don't have a recurrence. thanks!
LGTM, no issues reported at all, no quirk hacks needed. My perception is that resume time is super fast now as well.
closed via https://reviews.freebsd.org/rS326937 / https://svnweb.freebsd.org/base?view=revision&revision=326937