Bug 231760

Summary: FreeBSD 12.0-ALPHA7 Installations Halts at ACPI on 4 different AMD Ryzen Laptops (HP, DELL, Huawei)
Product: Base System Reporter: Acu Ilie Dorin <acutech07>
Component: kernAssignee: John Baldwin <jhb>
Status: Closed FIXED    
Severity: Affects Some People CC: acpi, contact, freebsdbugs, imp, jhb, jmd, jordy, linimon, markj, pi, rajfbsd, rozhuk.im, sauraahu, sigsys, stanislas.leduc
Priority: --- Keywords: needs-qa
Version: CURRENTFlags: jhb: mfc-stable11+
jhb: mfc-stable12+
Hardware: amd64   
OS: Any   
URL: https://reviews.freebsd.org/D20327
Bug Depends on:    
Bug Blocks: 236899    

Description Acu Ilie Dorin 2018-09-27 13:46:36 UTC
FreeBsd 12.00 cannot be installed on 4 different AMD Ryzen Laptops (Manufactured by HP, Dell and Huawei)
The system halts immediately following the FreeBSD initial Welcome to FreeBSD screen.

I will describe in detail the HP ENVY 360 Ryzen 7 messages (following by more brief descriptions of Dell and Huawei Laptops)

HARDWARE:
HP ENVY x360 Convertible 15m-cp0xxx (15.6 inch screen) - from BestBuy
Product Number: 4AC55UA#ABA
Born on Date  08/12/2018
System Board ID 8497
BIOS Version F.19
Processor Type  AMD Ryzen 7 2700U with Radeon Vega Mobile Gfx
Processor Speed 2200 MHZ
Total Memory 8 GB
Primary Battery SN 1291 07/06/2018

The installation USB Stick is with 
https://download.freebsd.org/ftp/snapshots/amd64/amd64/ISO-IMAGES/12.0/FreeBSD-12.0-ALPHA7-amd64-20180921-r338849-memstick.img

During first boot with usb stick, following the Welcome to FreeBSD I can see two things
Warning: WITNESS option enabled, expect reduced Performance
SVM (disable in BIOS)
ACPI APIC Table: <HPQOEM 8497 >
...
Firmware Warning (ACPI): Optional FADT field Pn2ControlBlock has valid Length but zero Address: 0x00000000000000000000/0x1 (20100010/tbfadt-796)
ioapic (Version 2.1) irqs 0-23 on motherboard
......
module_register_inid:MOD_LOAD (vesa, 0xffffffff810e1210,  0 )error 19
...

acpi0 <HPQOEM SLIC-HPC) on Motherboard
ACPI: 10 ACPI AML tables succesfully acquired and loaded
PCIe: Memory Mapped configuration base @ 0xf8000000'
iouapic0: routing intpin 9 (ISA IRQ9) to lapic 0 vector 48
acpi0: PowerButton (fixed)
acpi0: wakeup code va 0xfffffe00033ff00 pa 0x9d00
^
That is the last message where it hangs.


I tried the same usb stick on three different AMD Ryzen Laptops as Following:


ON HP Envy 360 13.3 inch with Ryzen 5 the system halted also before the install:
last lines:
https://www.bestbuy.com/site/hp-envy-x360-2-in-1-13-3-touch-screen-laptop-amd-ryzen-5-8gb-memory-128gb-solid-state-drive-hp-finish-in-dark-ash-silver/6237358.p?skuId=6237358

hpt27xx: no controller detected
battery0:

https://www.bestbuy.com/site/dell-inspiron-2-in-1-13-3-touch-screen-laptop-amd-ryzen-5-8gb-memory-256gb-solid-state-drive-era-gray/6208228.p?skuId=6208228

HUAWEI MATEBOOK D AMD RYZEN 5
https://www.walmart.com/ip/Huawei-MateBook-D-Signature-Edition-14-IPS-FHD-Touch-AMD-Ryzen-5-2500U-8GB-RAM-256GB-SSD-Radeon-Vega-8-Graphics-Dolby-ATMOS-Win-10-Home-64-bit-Mystic/507473681
BIOS Versuib 1.2.0 
Dell Inspiron 7375
HGYQ8L2
AMD Ryzen 5 2500 u

the last message is also about
acpi_ec0: <embeded controler .... port ....)

I know more details are perhaps needed. Since there is no way to capture the text, except a picture, I could send some if needed.
I am not certain I will keep any of the laptops, perhaps I can do more testing on HP ENVYx360 15.6 inch laptop.
However, I wonder if FreeBSD can be installed in any AMD Ryzen Laptop.
Is this a EFI/BIOS issue, a CPU issue or Freebsd?
Thanks.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2018-10-09 15:47:00 UTC
Is this a regression from an earlier install, or is this a new install?
Comment 2 Stanislas Leduc 2019-02-04 19:32:09 UTC
Hi,

I purchase Huawei Matebook D and boot with FreeBSD 12.0 STABLE r343597 memstick.
Also i have hangs screen with acpi_ec0: embedded ....

I found this http://freebsd.1045724.x6.nabble.com/FreeBSD-CURRENT-on-AMD-Ryzen5-td6279845.html, and with hw.pci.mcfg=0 option, can boot.

Best regards,
Stan
Comment 3 Stanislas Leduc 2019-02-05 06:42:28 UTC
Hi,

Just report experience :)

I setting hw.pci.mcfg=0 to /boot/loader.conf, boot ok, i configured wifi on /etc/rc.conf, work perfectly.

if during the week I encounter a problem I will not fail to reassemble it

Best regards,
Stan
Comment 4 Warner Losh freebsd_committer 2019-02-05 17:18:36 UTC
Any notion of which of the PCI devices that claims to be multi-function but isn't (which is what mcfg=0 is designed to work around) is causing the problem?
Comment 5 Stanislas Leduc 2019-02-07 07:29:54 UTC
Infact, 
I think wrong implementation on motherboard / BIOS to PCI devices.
With hw.pci.mcfg=0 to allow FreeBSD manage them, it's work perfectly.
Comment 6 Rajesh 2019-02-08 07:15:36 UTC
I face the same issue, but only when I enable a EMMC device which is enumerated by ACPI. Debugged whether the ACPI device is conflicting with the PCI MMIO space using the ACPI DSDT tables, but seems they are not conflicting.

What I understand is, when we have hw.pci.mcfg=0, seems FreeBSD does port-mapped access to PCI config space, rather than memory mapped access. How this is related to the multi-function device theory mentioned here?

pciconf(8) man page says,

"If the most significant bit of the header type register is set for function 0 of a PCI device, it is a multi-function device, which contains several (similar or	independent) functions on one chip."

In my pciconf output, I don't see any device with the MSB bit set. But I see multi-function devices listed. I assume the pciconf output should be read as "driver@pci<unit>.<bus>.<device>.<function>".

So, Is pciconf output not proper? How can we debug which multi-function device creates the problem here?
Comment 7 John Baldwin freebsd_committer freebsd_triage 2019-02-13 21:34:12 UTC
I tried to debug this a bit with jmd@ previously where the hang seemed to occur in the acpi_pci_link.c driver.  This is the patch I had asked jmd@ to test, but it didn't help:

Can you please try this change.  It won't fix anything but will add some
logging.  Hopefully it hangs and we can see what PCI access it did last.
Worst case is it just sits in a loop spewing crap endlessly.

Index: acpi_pci_link.c
===================================================================
--- acpi_pci_link.c	(revision 339002)
+++ acpi_pci_link.c	(working copy)
@@ -54,6 +54,16 @@ ACPI_SERIAL_DECL(pci_link, "ACPI PCI link");
 #define NUM_ISA_INTERRUPTS	16
 #define NUM_ACPI_INTERRUPTS	256
 
+#define	pci_cfgregread(bus, dev, func, reg, size)		\
+	({ uint32_t _val;					\
+	printf("%s:%d: cfgregread pci%d.%d.%d reg %#x (%d)\n",	\
+	    __func__, __LINE__, (bus), (dev), (func), (reg),	\
+	    (size));						\
+	_val = pci_cfgregread((bus), (dev), (func), (reg),	\
+	    (size));						\
+	printf("\t=> %#x\n", _val);				\
+	_val; })
+
 /*
  * An ACPI PCI link device may contain multiple links.  Each link has its
  * own ACPI resource.  _PRT entries specify which link is being used via
@@ -577,6 +587,9 @@ acpi_pci_link_search_irq(int bus, int device, int
 	uint8_t func, maxfunc;
 
 	/* See if we have a valid device at function 0. */
+	value = pci_cfgregread(bus, device, 0, PCIR_VENDOR, 2);
+	if (value == 0xffff)
+		return (PCI_INVALID_IRQ);
 	value = pci_cfgregread(bus, device, 0, PCIR_HDRTYPE, 1);
 	if ((value & PCIM_HDRTYPE) > PCI_MAXHDRTYPE)
 		return (PCI_INVALID_IRQ);
@@ -587,8 +600,8 @@ acpi_pci_link_search_irq(int bus, int device, int
 
 	/* Scan all possible functions at this device. */
 	for (func = 0; func <= maxfunc; func++) {
-		value = pci_cfgregread(bus, device, func, PCIR_DEVVENDOR, 4);
-		if (value == 0xffffffff)
+		value = pci_cfgregread(bus, device, func, PCIR_VENDOR, 2);
+		if (value == 0xffff)
 			continue;
 		value = pci_cfgregread(bus, device, func, PCIR_INTPIN, 1);
Comment 8 Rajesh 2019-02-14 13:34:28 UTC
(In reply to John Baldwin from comment #7)

Tried your patch John. But, I am not seeing the added logs.

Added the same print in "pci_cfgregread" and "pcireg_cfgread" routines and I see the following logs before system reboots. (System doesn't hang (or) break to debugger even with DDB and BREAK_TO_DEBUGGER options set)

ioapic0: routing intpin 9 (ISA IRQ 9) to lapic 0 vector 48
pci_cfgregread: pci4.0.2 reg 0x24 bytes 0x1
pciereg_cfgread: pci4.0.2 reg 0x24 bytes 0x1
pci_cfgregread: pci4.0.2 reg 0x25 bytes 0x1
pciereg_cfgread: pci4.0.2 reg 0x25 bytes 0x1
pci_cfgregread: pci4.0.2 reg 0x26 bytes 0x1
pciereg_cfgread: pci4.0.2 reg 0x26 bytes 0x1
pci_cfgregread: pci4.0.2 reg 0x27 bytes 0x1
pciereg_cfgread: pci4.0.2 reg 0x27 bytes 0x1
pci_cfgregread: pci4.0.2 reg 0xa bytes 0x1
pciereg_cfgread: pci4.0.2 reg 0xa bytes 0x1
pci_cfgregread: pci4.0.2 reg 0xb bytes 0x1
pciereg_cfgread: pci4.0.2 reg 0xb bytes 0x1
cc

The Device in PCI 4.0.2 is a AHCI controller. Anything else we can try here?
Comment 9 John Baldwin freebsd_committer freebsd_triage 2019-02-14 17:26:38 UTC
That is really weird to be reading 1 byte of 0x24-0x27.  0x24 is a BAR and nothing should be reading individual bytes of that.  Can you get a stack trace for a read of register 0x24 with size 1 (kdb_backtrace() should print out a stack trace)?

Given that that is right after ACPI registers its interrupt, I wonder if it's something weird in the DSDT itself doing the register accesses?  Providing the acpidump might be useful as well if the stack trace ends up in ACPI.
Comment 10 Rajesh 2019-02-18 08:46:55 UTC
(In reply to John Baldwin from comment #9)

Hi John,

Sorry for the delayed response. Please find the backtrace below when the BAR (0x24) is read

pci_cfgregread: amd64 pci4.0.2 reg 0x24 bytes 0x1
pciereg_cfgread: pci4.0.2 reg 0x24 bytes 0x1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff826653f0
pciereg_cfgread() at pciereg_cfgread+0x5e/frame 0xffffffff82665430
AcpiOsReadPciConfiguration() at AcpiOsReadPciConfiguration+0x49/frame 0xffffffff82665460
AcpiEvAddressSpaceDispatch() at AcpiEvAddressSpaceDispatch+0x1f5/frame 0xffffffff826654d0
AcpiExAccessRegion() at AcpiExAccessRegion+0xa3/frame 0xffffffff82665520
AcpiExFieldDatumIo() at AcpiExFieldDatumIo+0xfb/frame 0xffffffff82665560
AcpiExExtractFromField() at AcpiExExtractFromField+0xe0/frame 0xffffffff826655d0
AcpiExReadDataFromField() at AcpiExReadDataFromField+0x125/frame 0xffffffff82665610
AcpiExResolveNodeToValue() at AcpiExResolveNodeToValue+0xe7/frame 0xffffffff82665660
AcpiExResolveToValue() at AcpiExResolveToValue+0x1d1/frame 0xffffffff826656a0
AcpiDsEvaluateNamePath() at AcpiDsEvaluateNamePath+0x78/frame 0xffffffff826656e0
AcpiDsExecEndOp() at AcpiDsExecEndOp+0x99/frame 0xffffffff82665720
AcpiPsParseLoop() at AcpiPsParseLoop+0x732/frame 0xffffffff826657a0
AcpiPsParseAml() at AcpiPsParseAml+0x80/frame 0xffffffff826657e0
AcpiPsExecuteMethod() at AcpiPsExecuteMethod+0x13c/frame 0xffffffff82665820
AcpiNsEvaluate() at AcpiNsEvaluate+0x1e7/frame 0xffffffff82665860
AcpiNsInitOneDevice() at AcpiNsInitOneDevice+0xe7/frame 0xffffffff82665890
AcpiNsWalkNamespace() at AcpiNsWalkNamespace+0xc3/frame 0xffffffff826658f0
AcpiNsInitializeDevices() at AcpiNsInitializeDevices+0x63/frame 0xffffffff82665950
AcpiInitializeObjects() at AcpiInitializeObjects+0x27/frame 0xffffffff82665970
acpi_attach() at acpi_attach+0x3d3/frame 0xffffffff82665a10
device_attach() at device_attach+0x3ec/frame 0xffffffff82665a60
bus_generic_attach() at bus_generic_attach+0x5c/frame 0xffffffff82665a90
device_attach() at device_attach+0x3ec/frame 0xffffffff82665ae0
bus_generic_new_pass() at bus_generic_new_pass+0x118/frame 0xffffffff82665b10
root_bus_configure() at root_bus_configure+0x77/frame 0xffffffff82665b40
configure() at configure+0x9/frame 0xffffffff82665b50
mi_startup() at mi_startup+0x118/frame 0xffffffff82665b70
btext() at btext+0x2c


I am not able to get the acpidump properly. When I run "acpidump -t -d", it prints certain details in stdout and then hangs. Is there any known behavior like this with acpidump?
Comment 11 John Baldwin freebsd_committer freebsd_triage 2019-02-19 21:58:35 UTC
I have not seen acpidump hang before.  Can you can capture the output you get from acpidump when it hangs?  In this case I don't need -t, just the -d.  It is definitely something weird in an ACPI method that your BIOS is doing.  I think it's some device's _INI routine that is doing the read.

Also, do you know if register 0xb (PCIR_CLASS) is the last register read before the reboot?
Comment 12 Rajesh 2019-02-20 10:02:59 UTC
(In reply to John Baldwin from comment #11)

Yes, Address 0xb to the specified PCI device is the last pci_cfgregread call what I see before system reboots. I see "acpi0: Power Button (fixed)" message in working case(mcfg=0) after this particular read.

Regarding acpidump, 

1) Without "-t" I don't see anything on stdout.

2) With "-t" I can see the RSD PTR, XSDT, FACP and FACS tables printed in said order and just hangs after that (I will just clarify what details I can share and paste those tables here). If I leave the hang as such for a while, system reboots.

But, one thing I see is DSDT=0 in the FACP block. Is that valid?  Also, Is there anything specific which I need to check regarding the BIOS?
Comment 13 Rajesh 2019-02-21 14:06:24 UTC
(In reply to Rajesh from comment #12)

John, seems like ACPI tables (whatever is dumped) has some proprietary details and couldn't share completely. But if you are looking for specific details, I shall try to get that.

One more observation is, I could able to dump the acpi tables properly in Linux in the same machine and same BIOS. And even the DSDT value was not zero there.  I did this test some time back (I will redo this test and confirm this behavior).  But, if that is the case, any reason why it behaves differently in FreeBSD?
Comment 14 John Baldwin freebsd_committer freebsd_triage 2019-02-21 23:04:01 UTC
In terms of the dump, what I would want to see is if there are any _INI methods trying to do PCI accesses, and if so I'd have probably want to see what is in the _INI method.  It might also be a _REG method rather than an _INI method.

The other thing to do is to do a backtrace for the register read of 0xb to see where that happens (probably from the same _REG/_INI method though) and maybe add some debug tracing in ACPICA to see how much farther it gets before rebooting.  There might be existing ACPICA debug tracing knobs you can turn on that will show that already.
Comment 15 Rajesh 2019-02-22 14:23:33 UTC
(In reply to John Baldwin from comment #14)

I re-did the acpidump with Linux on the same machine. It goes smooth. But FACP shows DSDT as 0 (opposite to what I said earlier). Other than that, I could dump all tables properly in Linux.

Had a quick check about PCIe config space access from _INI method (from the dump collected with Linux). As I understand, seems there is no PCIe config space access from _INI methods. But, I will just confirm again about that. The backtrace for address 0xb is same as what I posted here earlier(for 0x24). I will try to understand that path.

Will keep you posted.
Comment 16 John Baldwin freebsd_committer freebsd_triage 2019-05-19 05:44:43 UTC
I got access to two laptops at BSDCan tonight and figured out the bug.  I'll clean up the patch I came up with and post it soonish.  However, the short version is that the _PIC method in ACPI called a function that directly accessed PCI config registers via the MCFG mapping to clear a bit when enabling APIC mode.  However, since the BIOS didn't use PCI config space but directly accessed MCFG directly via SystemMemory, AcpiOsMapMemory was used to map MCFG.  This called pmap_mapbios() which remapped the MCFG window as Write-Back instead of UC and the laptops hung right after this.  The fix is to add yet another variant of pmap_mapdev() which leaves the existing PAT mode alone and doesn't change it and change AcpiOsMapMemory to use that instead of pmap_mapbios.
Comment 17 Mark Johnston freebsd_committer 2019-05-21 20:26:21 UTC
https://reviews.freebsd.org/D20327
Comment 18 John Baldwin freebsd_committer freebsd_triage 2019-05-25 01:24:51 UTC
*** Bug 236899 has been marked as a duplicate of this bug. ***
Comment 19 Saurabh 2019-06-29 16:38:22 UTC
Having similar error 19 for Asus AMD Rygen Laptop -- 

https://www.asus.com/in/Laptops/ASUS-VivoBook-15-X505ZA/

module_register_inid:MOD_LOAD (vesa, 0xffffffff810e1210,  0 )error 19

Not able to boot from memory stick.
Let me know if there is some change i can test out with fix ? I see bug is still open so i am assuming its not fixed yet. 

Using linux on same laptop since year but not able to boot freebsd, suggest me.
Thanks
Comment 20 Evilham 2019-07-24 10:20:35 UTC
Hello, I'm running FreeBSD on a ThinkPad A485, which is very likely affected by this (also an AMD Ryzen CPU).

I am able to boot with hw.pci.mcfg=0 as mentioned on this bug (that saved me having to migrate again to another OS).

And the system works quite alright, except that sometimes there is a "spin lock held too long" panic, I'm documenting that in this bug and there are coredumps as attachments:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239351

Could it be that that is a duplicate of this bug?


Also: I'm happy to test patches related to this bug both in CURRENT and 12-RELEASE.

Thank you!
Comment 21 commit-hook freebsd_committer 2019-08-03 01:36:24 UTC
A commit references this bug:

Author: jhb
Date: Sat Aug  3 01:36:07 UTC 2019
New revision: 350551
URL: https://svnweb.freebsd.org/changeset/base/350551

Log:
  Don't reset memory attributes when mapping physical addresses for ACPI.

  Previously, AcpiOsMemory was using pmap_mapbios which would always map
  the requested address Write-Back (WB).  For several AMD Ryzen laptops,
  the BIOS uses AcpiOsMemory to directly access the PCI MCFG region in
  order to access PCI config registers.  This has the side effect of
  remapping the MCFG region in the direct map as WB instead of UC
  hanging the laptops during boot.

  On the one laptop I examined in detail, the _PIC global method used to
  switch from 8259A PICs to I/O APICs uses a pair of PCI config space
  registers at offset 0x84 in the device at 0:0:0 to as a pair of
  address/data registers to access an indirect register in the chipset
  and clear a single bit to switch modes.

  To fix, alter the semantics of pmap_mapbios() such that it does not
  modify the attributes of any existing mappings and instead uses the
  existing attributes.  If a new mapping is created, this new mapping
  uses WB (the default memory attribute).

  Special thanks to the gentleman whose name I don't have who brought
  two affected laptops to the hacker lounge at BSDCan.  Direct access to
  the affected systems permitted finding the root cause within an hour
  or so.

  PR:		231760, 236899
  Reviewed by:	kib, alc
  MFC after:	2 weeks
  Differential Revision:	https://reviews.freebsd.org/D20327

Changes:
  head/sys/amd64/amd64/pmap.c
  head/sys/i386/i386/pmap.c
  head/sys/i386/i386/pmap_base.c
  head/sys/i386/include/pmap_base.h
Comment 22 Evilham 2019-08-03 18:02:21 UTC
I've been running HEAD + D20327 for a bit over a week now, just confirming that it's been working beautifully on the ThinkPad A485.

Now that this landed:

> uname -v
FreeBSD 13.0-CURRENT r350556+779520855eb1-c261499(master) GENERIC

Thank you!
Comment 23 commit-hook freebsd_committer 2019-08-24 00:36:20 UTC
A commit references this bug:

Author: jhb
Date: Sat Aug 24 00:36:01 UTC 2019
New revision: 351449
URL: https://svnweb.freebsd.org/changeset/base/351449

Log:
  MFC 350551:
  Don't reset memory attributes when mapping physical addresses for ACPI.

  Previously, AcpiOsMemory was using pmap_mapbios which would always map
  the requested address Write-Back (WB).  For several AMD Ryzen laptops,
  the BIOS uses AcpiOsMemory to directly access the PCI MCFG region in
  order to access PCI config registers.  This has the side effect of
  remapping the MCFG region in the direct map as WB instead of UC
  hanging the laptops during boot.

  On the one laptop I examined in detail, the _PIC global method used to
  switch from 8259A PICs to I/O APICs uses a pair of PCI config space
  registers at offset 0x84 in the device at 0:0:0 to as a pair of
  address/data registers to access an indirect register in the chipset
  and clear a single bit to switch modes.

  To fix, alter the semantics of pmap_mapbios() such that it does not
  modify the attributes of any existing mappings and instead uses the
  existing attributes.  If a new mapping is created, this new mapping
  uses WB (the default memory attribute).

  Special thanks to the gentleman whose name I don't have who brought
  two affected laptops to the hacker lounge at BSDCan.  Direct access to
  the affected systems permitted finding the root cause within an hour
  or so.

  PR:		231760, 236899

Changes:
_U  stable/11/
  stable/11/sys/amd64/amd64/pmap.c
  stable/11/sys/i386/i386/pmap.c
_U  stable/12/
  stable/12/sys/amd64/amd64/pmap.c
  stable/12/sys/i386/i386/pmap.c