Bug 261771 - nvme(4): Reports errors every 5 minutes: PRP OFFSET INVALID (00/13) sqid:0 cid:10 cdw0:0
Summary: nvme(4): Reports errors every 5 minutes: PRP OFFSET INVALID (00/13) sqid:0 ci...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.3-RELEASE
Hardware: i386 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: needs-qa
Depends on:
Blocks:
 
Reported: 2022-02-07 11:09 UTC by Louis
Modified: 2022-02-08 09:51 UTC (History)
2 users (show)

See Also:
koobs: maintainer-feedback? (imp)
koobs: maintainer-feedback? (mav)
koobs: mfc-stable13?
koobs: mfc-stable12?


Attachments
dmesg.boot as requested (63.14 KB, text/plain)
2022-02-08 09:45 UTC, Louis
no flags Details
Requested PCI-config file (9.22 KB, text/plain)
2022-02-08 09:46 UTC, Louis
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Louis 2022-02-07 11:09:04 UTC
Perhaps serious NVME-driver issue ^PRP OFFSET INVALID^


Using an NVME SSD in my TrueNas system based on FreeBSD 12.3 as well as the version based on FreeBSD 13. Every 5 minutes I there are two messages

nvme2: GET LOG PAGE (02) sqid:0 cid:10 nsid:ffffffff cdw10:00ff0001 cdw11:00000000
nvme2: PRP OFFSET INVALID (00/13) sqid:0 cid:10 cdw0:0

The system I am using is based on an AMD X64 processor the SSD is a 2TB XPG Gammix S70 Blade PCIe Gen4. However from the internet I know the problem also occurs with other SSD's.

Searching the internet for this problem, I found a couple of sides describing the issue also sites where oracle and marvell engineers where discussing patches.

I do not know if those / equivalent patches are also applied to the actual FreeBSD NVME-driver, however given the fact that the messages are still there, I have doubts.

I also do not know to which extend the problem can lead to data corruption, however I do not feel comfortable!

Below a couple of links related to the subject. 

Please let the driver developer have a look at the problem and the links given below.

Louis      


https://lore.kernel.org/all/20211223215726.71096-2-alan.adamson@oracle.com/T
https://lore.kernel.org/all/20220202005050.69289-2-alan.adamson@oracle.com/
https://patchwork.ozlabs.org/project/uboot/patch/20190823033728.24591-1-awilliams@marvell.com/
https://www.mail-archive.com/u-boot@lists.denx.de/msg339265.html
https://lists.denx.de/pipermail/u-boot/2019-August/381249.html
Comment 1 Louis 2022-02-07 11:24:48 UTC
I forgot to mention that I noticed the problem on my test system, not yet in operational use. So these messages where there on a system not / nearly having any disk load / access. So no idea what the consequences are when the server would be under heavy load.
Comment 2 Kubilay Kocak freebsd_committer freebsd_triage 2022-02-07 22:20:05 UTC
@Reporter What if any symptoms / issues (other than the reported error) are observable? Is the system otherwise performing as expected?

Could you please include additional information including:

- uname -a output
- pciconf -lv output (as an attachment)
- /var/run/dmesg.boot output (as an attachment)
Comment 3 Warner Losh freebsd_committer freebsd_triage 2022-02-07 22:22:59 UTC
log page 2 is smart data.
Unsure what's going on.
Comment 4 Alexander Motin freebsd_committer freebsd_triage 2022-02-08 02:35:52 UTC
I've bought one of those SSDs just out of curiosity and was able to reproduce the errors by using smartctl on it.  I haven't found so far why requests sent to the SSD by it would be anyhow wrong.
Comment 5 Louis 2022-02-08 09:45:00 UTC
Created attachment 231634 [details]
dmesg.boot as requested

Here the requested dnesg.boot file
Comment 6 Louis 2022-02-08 09:46:07 UTC
Created attachment 231635 [details]
Requested PCI-config file
Comment 7 Louis 2022-02-08 09:51:48 UTC
I did attach the requested files and here also the result of uname -a

 uname -a
FreeBSD truenas.pc.lan 13.0-STABLE FreeBSD 13.0-STABLE #0 truenas/13-stable-c073d5cd0: Sun Feb  6 01:31:09 EST 2022     root@tnbuild02.tn.ixsystems.com:/data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/objs/data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/amd64.amd64/sys/TrueNAS.amd64  amd64

Note that I created these files on my test server at this moment running TrueNas core 13-beta. However, the same issue is there when using TrueNas core 12U7 based on FreeBSD 12.3

Note that I did not observe other storage related issues, which is quite logical since the system is just running, for some tests and is not doing ^any thing^.