Bug 221038 - dev/amd_ecc_inject does not work with AMD ryzen
Summary: dev/amd_ecc_inject does not work with AMD ryzen
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL: https://bugs.freebsd.org/bugzilla/sho...
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-26 22:21 UTC by Ivan Rozhuk
Modified: 2017-07-27 07:28 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan Rozhuk 2017-07-26 22:21:20 UTC
On kldload: DRAM ECC is not supported or disabled
Comment 1 Ivan Rozhuk 2017-07-26 22:23:29 UTC
Testers are here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219399#c169
Comment 2 Don Lewis freebsd_committer freebsd_triage 2017-07-27 00:25:41 UTC
Ryzen documentation is here:
http://support.amd.com/TechDocs/54945_PPR_Family_17h_Models_00h-0Fh.pdf
Comment 3 Don Lewis freebsd_committer freebsd_triage 2017-07-27 00:53:00 UTC
I also get the "DRAM ECC is not supported or disabled" message.  I have booted Linux on this machine and it claimed that EDAC was enabled, though admittedly I haven't rechecked since my last BIOS upgrade.

My motherboard is a Gigabyte AX370-GAMING 5, which claims to support ECC.  I have this ECC RAM installed:
  http://www.crucial.com/usa/en/ct4k16g4wfd824a
though dmidecode reports a strange value for total width:

# dmidecode -t memory
# dmidecode 3.0
Scanning /dev/mem for entry point.
SMBIOS 3.0 present.

Handle 0x0027, DMI type 16, 23 bytes
Physical Memory Array
	Location: System Board Or Motherboard
	Use: System Memory
	Error Correction Type: Multi-bit ECC
	Maximum Capacity: 64 GB
	Error Information Handle: 0x0026
	Number Of Devices: 4

Handle 0x002E, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x0027
	Error Information Handle: 0x002D
	Total Width: 128 bits
	Data Width: 64 bits
	Size: 16384 MB
	Form Factor: DIMM
	Set: None
	Locator: DIMM 0
	Bank Locator: CHANNEL A
	Type: DDR4
	Type Detail: Synchronous Unbuffered (Unregistered)
	Speed: 2400 MHz
	Manufacturer: Micron Technology
	Serial Number: 14C07593
	Asset Tag: Not Specified
	Part Number: 18ASF2G72AZ-2G3B1   
	Rank: 2
	Configured Clock Speed: 2400 MHz
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V

Handle 0x0031, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x0027
	Error Information Handle: 0x0030
	Total Width: 128 bits
	Data Width: 64 bits
	Size: 16384 MB
	Form Factor: DIMM
	Set: None
	Locator: DIMM 1
	Bank Locator: CHANNEL A
	Type: DDR4
	Type Detail: Synchronous Unbuffered (Unregistered)
	Speed: 2400 MHz
	Manufacturer: Micron Technology
	Serial Number: 14C0753E
	Asset Tag: Not Specified
	Part Number: 18ASF2G72AZ-2G3B1   
	Rank: 2
	Configured Clock Speed: 2400 MHz
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V

Handle 0x0034, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x0027
	Error Information Handle: 0x0033
	Total Width: 128 bits
	Data Width: 64 bits
	Size: 16384 MB
	Form Factor: DIMM
	Set: None
	Locator: DIMM 0
	Bank Locator: CHANNEL B
	Type: DDR4
	Type Detail: Synchronous Unbuffered (Unregistered)
	Speed: 2400 MHz
	Manufacturer: Micron Technology
	Serial Number: 14C07579
	Asset Tag: Not Specified
	Part Number: 18ASF2G72AZ-2G3B1   
	Rank: 2
	Configured Clock Speed: 2400 MHz
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V

Handle 0x0037, DMI type 17, 40 bytes
Memory Device
	Array Handle: 0x0027
	Error Information Handle: 0x0036
	Total Width: 128 bits
	Data Width: 64 bits
	Size: 16384 MB
	Form Factor: DIMM
	Set: None
	Locator: DIMM 1
	Bank Locator: CHANNEL B
	Type: DDR4
	Type Detail: Synchronous Unbuffered (Unregistered)
	Speed: 2400 MHz
	Manufacturer: Micron Technology
	Serial Number: 14C07472
	Asset Tag: Not Specified
	Part Number: 18ASF2G72AZ-2G3B1   
	Rank: 2
	Configured Clock Speed: 2400 MHz
	Minimum Voltage: 1.2 V
	Maximum Voltage: 1.2 V
	Configured Voltage: 1.2 V
Comment 4 Don Lewis freebsd_committer freebsd_triage 2017-07-27 00:54:57 UTC
When I booted Linux on my previous Gigabyte motherboard that did not support ECC, Linux did not say that EDAC was enabled.
Comment 5 Don Lewis freebsd_committer freebsd_triage 2017-07-27 01:27:03 UTC
The latest version of memtest86, V7.4, which was released today says
  ECC Enabled: N/A (Unknown)
That's a pretty underwhelming upgrade ...
Comment 6 Don Lewis freebsd_committer freebsd_triage 2017-07-27 01:46:32 UTC
The MCA configuration registers for Ryzen look quite a bit different than earlier processors.  I did't see any obvious way of telling whether RAM ECC is enabled or not.

There are some legacy registers for reporting errors, though.
Comment 7 Don Lewis freebsd_committer freebsd_triage 2017-07-27 01:54:34 UTC
The Linux folks seem to have figured it out:
https://forum.level1techs.com/t/ryzen-linux-ecc-might-have-issues-plz-respond-if-you-are-trying-with-ecc/114654
Comment 8 Conrad Meyer freebsd_committer freebsd_triage 2017-07-27 02:58:12 UTC
(In reply to Don Lewis from comment #7)
Quite possibly https://lkml.org/lkml/2016/12/13/88
Comment 9 Andriy Gapon freebsd_committer freebsd_triage 2017-07-27 07:28:38 UTC
(In reply to Don Lewis from comment #2)
Thank you, but that document is very incomplete.