Bug 228340

Summary: I/O errors on Intel Denverton eMMC mmcsd0 (0x19db8086)
Product: Base System Reporter: Jérôme-Charles LALLEMAND <jeromecharles.lallemand>
Component: kernAssignee: Luiz Otavio O Souza,+55 (14) 99772-1255 <loos>
Status: In Progress ---    
Severity: Affects Only Me CC: bsdimp, fw, jeromecharles.lallemand, kibab, marius, paul.le.gauret
Priority: ---    
Version: 11.1-STABLE   
Hardware: amd64   
OS: Any   
Description Flags
pciconf -lv + dmesg -a
Netgate pfSense Factory Build dmesg -a
Netgate pfSense Factory Build sysctl kern.conftxt
FreeBSD-11.2-BETA3 Bootlog, then install until mmcsd0 error
Boot log with quirk=33554432
Boot log with quirk=268435456 none

Description Jérôme-Charles LALLEMAND 2018-05-18 15:58:24 UTC
Created attachment 193511 [details]
pciconf -lv + dmesg -a

Hello there,

I'm trying to install FreeBSD on a new device netgate XG-7100. The installer fail to install due to manies I/O errors.

I've tried with 11.1-RELEASE and 11.2-PRERELEASE.

My dmesg is flooded by these kind of messages:

sdhci_pci0-slot0: mmcsd0: Got AutoCMD12 error 0x0000, but there is no active command.
sdhci_pci0-slot0: ============== REGISTER DUMP ==============
sdhci_pci0-slot0: Sys addr: 0x00000000 | Version:  0x00001002
sdhci_pci0-slot0: Blk size: 0x00000200 | Blk cnt:  0x0000003e
sdhci_pci0-slot0: Argument: 0x00010628 | Trn mode: 0x00000027
sdhci_pci0-slot0: Present:  0x1fff0000 | Host ctl: 0x00000025
sdhci_pci0-slot0: Power:    0x0000000b | Blk gap:  0x00000080
sdhci_pci0-slot0: Wake-up:  0x00000000 | Clock:    0x00000007
sdhci_pci0-slot0: Timeout:  0x0000000d | Int stat: 0x00000000
sdhci_pci0-slot0: Int enab: 0x05ff003b | Sig enab: 0x05ff003b
sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000008d
sdhci_pci0-slot0: Caps:     0x546ec8b2 | Caps2:    0x80000007
sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000
sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000
sdhci_pci0-slot0: ===========================================
Error indicated: 2 Bad CRC
mmc0: CMD6 failed, RESULT: 1
mmc0: CMD6 failed, RESULT: 1
mmc0: CMD6 failed, RESULT: 1
mmc0: CMD6 failed, RESULT: 1
mmc0: CMD6 failed, RESULT: 1

The system runs smoothly with the custom FreeBSD pfSense.

Is there any way to make the eMMC working ?
Comment 1 Marius Strobl freebsd_committer 2018-05-22 19:34:12 UTC
Luiz, I tried to locate the source of pfSense 2.4.3 in order to check what it might be doing differently than upstream but failed to find the code for any recent version of pfSense. Does your Netgate platform require some quirk handling?
Comment 2 Jérôme-Charles LALLEMAND 2018-05-28 08:35:40 UTC
Any updates ?

I'm still trying to make it work but without success.
Comment 3 Ilya Bakulin freebsd_committer 2018-05-28 09:19:17 UTC
Hi Jérôme-Charles,

As already requested by Marius, could you please provide more information about the "custom pfSense" that runs on this hardware? At least `uname -a`, but preferable the exact version of pfSense.

Meanwhile, please rerun the installer, escaping to the loader prompt when it boots, and run the following commands:
set boot_verbose=1
set hw.mmc.debug=2
set hw.shdci.debug=2

the installer kernel will then boot with increased verbosity levels and additional debug messages from sdhci and mmc drivers. Please attach the new logs to this bug then.
Comment 4 Jérôme-Charles LALLEMAND 2018-05-28 12:01:53 UTC
Created attachment 193778 [details]
Netgate pfSense Factory Build dmesg -a
Comment 5 Jérôme-Charles LALLEMAND 2018-05-28 12:02:36 UTC
Created attachment 193779 [details]
Netgate pfSense Factory Build sysctl kern.conftxt
Comment 6 Jérôme-Charles LALLEMAND 2018-05-28 12:05:11 UTC
I've added the dmesg -a and sysctl kern.conftxt for the custom pfSense Factory build.

Here is the uname -a :

FreeBSD  11.1-RELEASE-p7 FreeBSD 11.1-RELEASE-p7 #19 r313908+dd963504c4f(factory-RELENG_2_4): Wed Mar 28 16:39:50 CDT 2018     root@buildbot2.netgate.com:/xbuilder/crossbuild-243/pfSense/tmp/obj/xbuilder/crossbuild-243/pfSense/tmp/FreeBSD-src/sys/pfSense  amd64

As suggested, I'll try the neww 11.2 BETA 3  installer in verbose mode.
Comment 7 Jérôme-Charles LALLEMAND 2018-05-28 13:14:13 UTC
Created attachment 193782 [details]
FreeBSD-11.2-BETA3 Bootlog, then install until mmcsd0 error

Here is the full console log from boot to mmcsd0 error during install.
Comment 8 Ilya Bakulin freebsd_committer 2018-05-28 14:28:19 UTC
OK, seems that the controller processes a whole bunch of write requests for a while before it eventually gets an AutoCMD12 error interrupt that doesn't have any error information...
Proper fix might require kernel modification, in the meantime try the following:

at loader prompt:
set hw.shdci.quirk_set=268435456

and all other vars that I've mentioned before.
This will activate quirk SDHCI_QUIRK_BROKEN_AUTO_STOP so that the driver will not use AutoCMD12, instead sending it by itself.

If that doesn't help, set the following:

set hw.shdci.quirk_set=33554432

this will disable HS200 mode completely. The controller will run with slower speeds, maybe that will help.

Please post dmesg from both boot sequences.
Comment 9 Jérôme-Charles LALLEMAND 2018-05-28 14:57:10 UTC
Created attachment 193788 [details]
Boot log with quirk=33554432
Comment 10 Jérôme-Charles LALLEMAND 2018-05-28 14:57:38 UTC
Created attachment 193789 [details]
Boot log with quirk=268435456
Comment 11 Jérôme-Charles LALLEMAND 2018-05-28 14:58:42 UTC
I've tried the 2 quirks. It doesn't seems to work.

I posted the results.
Comment 12 Marius Strobl freebsd_committer 2018-05-28 17:46:56 UTC
Ah, RELENG_2_4; given that there are pfSense branches such as RELENG_2_3_2, I'd have expected RELENG_2_4_3 for pfSense 2.4.3.

FreeBSD 11.1 and pfSense 2.4.3 support a maximum transfer mode of DDR52, while FreeBSD 11.2 has code for up to HS400ES. So problems with HS200 and HS400 as also supported by the Denverton eMMC controller in that Netgate platform might have gone unnoticed so far.

There's something fishy in the verbose FreeBSD dmesg (attachment ID 193782):
sdhci_pci0-slot0: Re-tuning count 0 secs, mode 1
A re-tuning count of 0 seconds in mode 1 is documented in the SDHCI specification as the controller indicating to the host that periodic re-tuning is to be disabled (so we simply don't do it), which isn't what I'd expect for a Denverton controller. In fact, according to that dmesg, things work for some time and then we detect a fatal error indicating that re-tuning including circuit reset is necessary (but recovery from that state doesn't succeed, which isn't too surprising). In turn, this suggests that periodic re-tuning actually would have been required right from the beginning.

Luiz, I don't know much about the requirements of Intel for the firmware to set up the SDHCI controller. But as far as I can tell from what another downstream vendor said, the firmware may at least adapt the capabilities registers to fit the needs of the configuration of a board. Could it be that the firmware of that Netgate platform simply incorrectly omits setting the re-tuning count bits there?

Jérôme-Charles, in the meantime, setting hw.sdhci.quirk_clear=0x4000000 via the loader should get you started unless there are further problems. If that still doesn't work, _additionally_ set hw.sdhci.quirk_set=0x2000000.
Comment 13 Jérôme-Charles LALLEMAND 2018-05-29 09:07:36 UTC
Marius, thanks a lot !

Setting hw.sdhci.quirk_clear=0x4000000 seems to work so far.

I'll let it run a few hours to see if it stable.
Comment 14 Marius Strobl freebsd_committer 2018-05-29 21:26:10 UTC
Thinking some more about this, the idea that setting  hw.sdhci.quirk_clear=0x4000000 should get you working is based on same assumptions about the board and device firmwares which are not necessarily true. So even it seems to be sufficient, I'd suggest to additionally set hw.sdhci.quirk_set=0x2000000. This will limit you to DDR52, which Netgate presumably has extensively tested.

As for FreeBSD, I currently don't see it doing anything wrong. We could add quirk handling for the XG-7100 based on its SMBIOS system strings but that will be outdated and hard to get right for firmware versions that set up the SDHCI capabilities as appropriate for the particular board. Thus, I'd prefer Netgate to get the firmware fixed in the first place.
Comment 15 Jérôme-Charles LALLEMAND 2018-05-30 09:49:46 UTC
With the quirk hw.sdhci.quirk_clear=0x4000000, the system is working fine for over 24h now.

I'll add the hw.sdhci.quirk_set=0x2000000 as you suggest and run the system with the quirks enabled until Netgate update their firmware.

I'm going to open a ticket to Netgate. From what I understand it's seems to be a firmware bug rather than a FreeBSD bug.

Thanks a lot for your help !
Comment 16 Luiz Otavio O Souza,+55 (14) 99772-1255 freebsd_committer 2018-05-30 13:20:45 UTC
(In reply to Marius Strobl from comment #14)

I'm sure that Netgate also wants the clean solution.

We are going to run some debugging with internal team to identify the root cause of this issue.

Thank you for the analysis Marius.