Perhaps serious NVME-driver issue ^PRP OFFSET INVALID^ Using an NVME SSD in my TrueNas system based on FreeBSD 12.3 as well as the version based on FreeBSD 13. Every 5 minutes I there are two messages nvme2: GET LOG PAGE (02) sqid:0 cid:10 nsid:ffffffff cdw10:00ff0001 cdw11:00000000 nvme2: PRP OFFSET INVALID (00/13) sqid:0 cid:10 cdw0:0 The system I am using is based on an AMD X64 processor the SSD is a 2TB XPG Gammix S70 Blade PCIe Gen4. However from the internet I know the problem also occurs with other SSD's. Searching the internet for this problem, I found a couple of sides describing the issue also sites where oracle and marvell engineers where discussing patches. I do not know if those / equivalent patches are also applied to the actual FreeBSD NVME-driver, however given the fact that the messages are still there, I have doubts. I also do not know to which extend the problem can lead to data corruption, however I do not feel comfortable! Below a couple of links related to the subject. Please let the driver developer have a look at the problem and the links given below. Louis https://lore.kernel.org/all/20211223215726.71096-2-alan.adamson@oracle.com/T https://lore.kernel.org/all/20220202005050.69289-2-alan.adamson@oracle.com/ https://patchwork.ozlabs.org/project/uboot/patch/20190823033728.24591-1-awilliams@marvell.com/ https://www.mail-archive.com/u-boot@lists.denx.de/msg339265.html https://lists.denx.de/pipermail/u-boot/2019-August/381249.html
I forgot to mention that I noticed the problem on my test system, not yet in operational use. So these messages where there on a system not / nearly having any disk load / access. So no idea what the consequences are when the server would be under heavy load.
@Reporter What if any symptoms / issues (other than the reported error) are observable? Is the system otherwise performing as expected? Could you please include additional information including: - uname -a output - pciconf -lv output (as an attachment) - /var/run/dmesg.boot output (as an attachment)
log page 2 is smart data. Unsure what's going on.
I've bought one of those SSDs just out of curiosity and was able to reproduce the errors by using smartctl on it. I haven't found so far why requests sent to the SSD by it would be anyhow wrong.
Created attachment 231634 [details] dmesg.boot as requested Here the requested dnesg.boot file
Created attachment 231635 [details] Requested PCI-config file
I did attach the requested files and here also the result of uname -a uname -a FreeBSD truenas.pc.lan 13.0-STABLE FreeBSD 13.0-STABLE #0 truenas/13-stable-c073d5cd0: Sun Feb 6 01:31:09 EST 2022 root@tnbuild02.tn.ixsystems.com:/data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/objs/data/workspace/TrueNAS_13.0_Nightlies/freenas/_BE/os/amd64.amd64/sys/TrueNAS.amd64 amd64 Note that I created these files on my test server at this moment running TrueNas core 13-beta. However, the same issue is there when using TrueNas core 12U7 based on FreeBSD 12.3 Note that I did not observe other storage related issues, which is quite logical since the system is just running, for some tests and is not doing ^any thing^.
^Triage: clear unneeded flags. Nothing has yet been committed to be merged.