Created attachment 193511 [details] pciconf -lv + dmesg -a Hello there, I'm trying to install FreeBSD on a new device netgate XG-7100. The installer fail to install due to manies I/O errors. I've tried with 11.1-RELEASE and 11.2-PRERELEASE. My dmesg is flooded by these kind of messages: sdhci_pci0-slot0: mmcsd0: Got AutoCMD12 error 0x0000, but there is no active command. sdhci_pci0-slot0: ============== REGISTER DUMP ============== sdhci_pci0-slot0: Sys addr: 0x00000000 | Version: 0x00001002 sdhci_pci0-slot0: Blk size: 0x00000200 | Blk cnt: 0x0000003e sdhci_pci0-slot0: Argument: 0x00010628 | Trn mode: 0x00000027 sdhci_pci0-slot0: Present: 0x1fff0000 | Host ctl: 0x00000025 sdhci_pci0-slot0: Power: 0x0000000b | Blk gap: 0x00000080 sdhci_pci0-slot0: Wake-up: 0x00000000 | Clock: 0x00000007 sdhci_pci0-slot0: Timeout: 0x0000000d | Int stat: 0x00000000 sdhci_pci0-slot0: Int enab: 0x05ff003b | Sig enab: 0x05ff003b sdhci_pci0-slot0: AC12 err: 0x00000000 | Host ctl2:0x0000008d sdhci_pci0-slot0: Caps: 0x546ec8b2 | Caps2: 0x80000007 sdhci_pci0-slot0: Max curr: 0x00000000 | ADMA err: 0x00000000 sdhci_pci0-slot0: ADMA addr:0x00000000 | Slot int: 0x00000000 sdhci_pci0-slot0: =========================================== Error indicated: 2 Bad CRC mmc0: CMD6 failed, RESULT: 1 mmc0: CMD6 failed, RESULT: 1 mmc0: CMD6 failed, RESULT: 1 mmc0: CMD6 failed, RESULT: 1 mmc0: CMD6 failed, RESULT: 1 The system runs smoothly with the custom FreeBSD pfSense. Is there any way to make the eMMC working ?
Luiz, I tried to locate the source of pfSense 2.4.3 in order to check what it might be doing differently than upstream but failed to find the code for any recent version of pfSense. Does your Netgate platform require some quirk handling?
Any updates ? I'm still trying to make it work but without success.
Hi Jérôme-Charles, As already requested by Marius, could you please provide more information about the "custom pfSense" that runs on this hardware? At least `uname -a`, but preferable the exact version of pfSense. Meanwhile, please rerun the installer, escaping to the loader prompt when it boots, and run the following commands: set boot_verbose=1 set hw.mmc.debug=2 set hw.shdci.debug=2 boot the installer kernel will then boot with increased verbosity levels and additional debug messages from sdhci and mmc drivers. Please attach the new logs to this bug then.
Created attachment 193778 [details] Netgate pfSense Factory Build dmesg -a
Created attachment 193779 [details] Netgate pfSense Factory Build sysctl kern.conftxt
I've added the dmesg -a and sysctl kern.conftxt for the custom pfSense Factory build. Here is the uname -a : FreeBSD 11.1-RELEASE-p7 FreeBSD 11.1-RELEASE-p7 #19 r313908+dd963504c4f(factory-RELENG_2_4): Wed Mar 28 16:39:50 CDT 2018 root@buildbot2.netgate.com:/xbuilder/crossbuild-243/pfSense/tmp/obj/xbuilder/crossbuild-243/pfSense/tmp/FreeBSD-src/sys/pfSense amd64 As suggested, I'll try the neww 11.2 BETA 3 installer in verbose mode.
Created attachment 193782 [details] FreeBSD-11.2-BETA3 Bootlog, then install until mmcsd0 error Here is the full console log from boot to mmcsd0 error during install.
OK, seems that the controller processes a whole bunch of write requests for a while before it eventually gets an AutoCMD12 error interrupt that doesn't have any error information... Proper fix might require kernel modification, in the meantime try the following: at loader prompt: set hw.shdci.quirk_set=268435456 and all other vars that I've mentioned before. This will activate quirk SDHCI_QUIRK_BROKEN_AUTO_STOP so that the driver will not use AutoCMD12, instead sending it by itself. If that doesn't help, set the following: set hw.shdci.quirk_set=33554432 this will disable HS200 mode completely. The controller will run with slower speeds, maybe that will help. Please post dmesg from both boot sequences.
Created attachment 193788 [details] Boot log with quirk=33554432
Created attachment 193789 [details] Boot log with quirk=268435456
I've tried the 2 quirks. It doesn't seems to work. I posted the results.
Ah, RELENG_2_4; given that there are pfSense branches such as RELENG_2_3_2, I'd have expected RELENG_2_4_3 for pfSense 2.4.3. FreeBSD 11.1 and pfSense 2.4.3 support a maximum transfer mode of DDR52, while FreeBSD 11.2 has code for up to HS400ES. So problems with HS200 and HS400 as also supported by the Denverton eMMC controller in that Netgate platform might have gone unnoticed so far. There's something fishy in the verbose FreeBSD dmesg (attachment ID 193782): sdhci_pci0-slot0: Re-tuning count 0 secs, mode 1 A re-tuning count of 0 seconds in mode 1 is documented in the SDHCI specification as the controller indicating to the host that periodic re-tuning is to be disabled (so we simply don't do it), which isn't what I'd expect for a Denverton controller. In fact, according to that dmesg, things work for some time and then we detect a fatal error indicating that re-tuning including circuit reset is necessary (but recovery from that state doesn't succeed, which isn't too surprising). In turn, this suggests that periodic re-tuning actually would have been required right from the beginning. Luiz, I don't know much about the requirements of Intel for the firmware to set up the SDHCI controller. But as far as I can tell from what another downstream vendor said, the firmware may at least adapt the capabilities registers to fit the needs of the configuration of a board. Could it be that the firmware of that Netgate platform simply incorrectly omits setting the re-tuning count bits there? Jérôme-Charles, in the meantime, setting hw.sdhci.quirk_clear=0x4000000 via the loader should get you started unless there are further problems. If that still doesn't work, _additionally_ set hw.sdhci.quirk_set=0x2000000.
Marius, thanks a lot ! Setting hw.sdhci.quirk_clear=0x4000000 seems to work so far. I'll let it run a few hours to see if it stable.
Thinking some more about this, the idea that setting hw.sdhci.quirk_clear=0x4000000 should get you working is based on same assumptions about the board and device firmwares which are not necessarily true. So even it seems to be sufficient, I'd suggest to additionally set hw.sdhci.quirk_set=0x2000000. This will limit you to DDR52, which Netgate presumably has extensively tested. As for FreeBSD, I currently don't see it doing anything wrong. We could add quirk handling for the XG-7100 based on its SMBIOS system strings but that will be outdated and hard to get right for firmware versions that set up the SDHCI capabilities as appropriate for the particular board. Thus, I'd prefer Netgate to get the firmware fixed in the first place.
With the quirk hw.sdhci.quirk_clear=0x4000000, the system is working fine for over 24h now. I'll add the hw.sdhci.quirk_set=0x2000000 as you suggest and run the system with the quirks enabled until Netgate update their firmware. I'm going to open a ticket to Netgate. From what I understand it's seems to be a firmware bug rather than a FreeBSD bug. Thanks a lot for your help !
(In reply to Marius Strobl from comment #14) I'm sure that Netgate also wants the clean solution. We are going to run some debugging with internal team to identify the root cause of this issue. Thank you for the analysis Marius.
^Triage: committer's bit was taken into safekeeping some time ago. To submitter/commenters: is this aging PR still relevant?