Bug 237055 - Ampere eMAG compatibility
Summary: Ampere eMAG compatibility
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Only Me
Assignee: freebsd-arm mailing list
URL:
Keywords:
Depends on: 237234
Blocks:
  Show dependency treegraph
 
Reported: 2019-04-05 23:32 UTC by Greg V
Modified: 2019-08-18 09:37 UTC (History)
9 users (show)

See Also:


Attachments
emag.multiuser.dmesg (56.50 KB, text/plain)
2019-04-05 23:32 UTC, Greg V
no flags Details
emag.acpi.tar.gz (37.07 KB, application/gzip)
2019-04-05 23:33 UTC, Greg V
no flags Details
emag.hack.dsdt.patch (51.16 KB, patch)
2019-04-05 23:34 UTC, Greg V
no flags Details | Diff
eMAG_dmesg_pcie_works (56.17 KB, text/plain)
2019-04-19 17:01 UTC, Tuan Phan
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Greg V 2019-04-05 23:32:38 UTC
Created attachment 203420 [details]
emag.multiuser.dmesg

Sooo, now that Packet has Ampere eMAG instances (c2.large.arm), of course someone had to try FreeBSD and of course it's me… :D

tl;dr I managed to boot to multiuser with some hacks, but PCIe is busted, needs support for more ACPI stuff. Verbose boot log is attached, I'll attach ACPI tables and stuff too.

---

0. Installation

I used an Ubuntu 18.04 instance, rerooted to a ramdisk ( using the method I described in https://community.online.net/t/freebsd-on-arm64/6678 ), resized the Linux partition, added a new one, loop mounted a memstick image, dd'd it onto the new partition, copied loader_lua.efi to the EFI partition, added a GRUB entry to chainload that:

menuentry 'FreeBSD' {
  load_video
  insmod part_gpt
  insmod chain
  set root='hd0,gpt1'
  chainloader /EFI/BSD/loader_lua.efi
}

and used https://github.com/mkatiyar/fuse-ufs2 to modify the UFS partition from Linux. (As long as you don't copy files from the UFS partition *to itself*, it works fine lol. If you do that, it gets stuck in a 100% cpu loop)

1. Console

https://reviews.freebsd.org/D19507 is needed for any UART output now that one part from there (not using the hardcoded regshift) has landed. Now we need to hardcode it again but only for PL011.

But that's not all. For some reason, I'm not seeing userspace output (/dev/console) even though the ACPI node for the console was picked up:

uart0: <PrimeCell UART (PL011)> iomem 0x12600000-0x12600fff irq 1 on acpi0
uart0: console (115200,n,8,1)
uart0: fast interrupt
uart0: PPS capture mode: DCD

2. Weird early memory access crashes

EFI runtime support (specifically, enumerating efirtc) crashed in efi_call() at efi_get_time+0x50. I disabled `options EFIRT`.

Then ACPI crashed in AcpiExSystemMemorySpaceHandler when reading:

  exfield-0369 ExReadDataFromField   : FieldRead [TO]:   Obj 0xfffffd0010b41980, Type 11, Buf 0xfffffd0010b62b10, ByteLen 8
  exfield-0372 ExReadDataFromField   : FieldRead [FROM]: BitLen 1, BitOff 6, ByteOff 0
  exfldio-0395 ExAccessRegion        : [READ] Region [SystemMemory:0], Width 4, ByteBase 0, Offset 0 at 000000001F10C004

I patched DSDT, removing OperationRegion CLKE from Device AHBC. The only thing that used this was Method _INI for Device I2C4, so I removed the body of that method as well.
Who cares about i2c on a server :) that allowed booting to proceed.

3. PCIe is screwed up

There's this interesting message for all PCI bridges:

pcib0: bus end mismatch! expected 255 found 31.

And some more interesting messages (for the last couple pcib's also with "I/O port window" and "bar .. failed to allocate"):

pcib0: rman_reserve_resource: start=0x30000000, end=0x301fffff, count=0x200000
pcib0: pci_host_generic_core_alloc_resource FAIL: type=3, rid=32, start=0000000030000000, end=00000000301fffff, count=0000000000200000, flags=0
pcib1: failed to allocate initial memory window: 0x30000000-0x301fffff
pcib0: rman_reserve_resource: start=0x14080000000, end=0x14084ffffff, count=0x5000000

PCIe cards actually don't work when these messages are present:

mlx5_core0: <mlx5_core> mem 0x14082000000-0x14083ffffff at device 0.0 on pci1
mlx5_core0: ERR: Failed mapping initialization segment, aborting

Looking at Ampere's page https://github.com/AmpereComputing/ampere-centos-kernel/wiki/Ampere-CentOS-Kernel-wiki

it seems like Linux needed to ACPI _DMA objects and IORT named components:

https://github.com/torvalds/linux/commit/4f0450af530e62b0217522cab4803b5a65dccc46
https://github.com/torvalds/linux/commit/c04ac679c6b86e4e36fbb675c6c061b4091f5810
https://github.com/torvalds/linux/commit/7ad4263980826e8b02e121af22f4f4c9103fe86d
https://github.com/torvalds/linux/commit/10d8ab2c15b9ef2f46c35e7c36781399d6f2cc82
Comment 1 Greg V 2019-04-05 23:33:21 UTC
Created attachment 203421 [details]
emag.acpi.tar.gz
Comment 2 Greg V 2019-04-05 23:34:48 UTC
Created attachment 203422 [details]
emag.hack.dsdt.patch
Comment 3 Ed Maste freebsd_committer 2019-04-15 13:45:15 UTC
Oops, forgot the PR reference. Serial quirk committed as r346228.
https://svnweb.freebsd.org/changeset/base/346228
Comment 4 Ed Maste freebsd_committer 2019-04-15 19:02:39 UTC
(In reply to Greg V from comment #0)
> For some reason, I'm not seeing userspace output (/dev/console) even though the ACPI node for the console was picked up

Your split-out review D19896 is for a /dev/console issue on Amazon EC2 UARTs, might we have a similar issue here?
Comment 5 Greg V 2019-04-16 20:14:01 UTC
New mail from Ampere engineers (they don't seem to want to sign up for bugzilla, sadly), new very helpful info about PCIe:

The _DMA objects are for the SMMU, they would make "virtualization work properly" (I assume that means PCI passthrough). Since bhyvearm64 is not finished / not upstreamed, no rush for that I guess.

Apparently the real problem with just using PCIe is that we're not adding the address base from the "AddressTranslation - TRA" field, so e.g.

pcib1: failed to allocate initial memory window: 0x30000000-0x301fffff

we should actually be accessing: _TRA+0x3000_0000 = 0x100_3000_0000

From a quick grep, I think acpi_pcib_producer_handler is where we handle this:

min = res->Data.Address64.Address.Minimum;
max = res->Data.Address64.Address.Maximum;

So I guess it should be something like

min = res->Data.Address64.Address.Minimum + res->Data.Address64.Address.Translation;
max = res->Data.Address64.Address.Maximum + res->Data.Address64.Address.Translation;

(for all widths)


(In reply to Ed Maste from comment #4)
> Your split-out review D19896 is for a /dev/console issue on Amazon EC2 UARTs, might we have a similar issue here?

Nah, that one is about connecting the SPCR device with the PCI device (the Amazon UART has different memory addresses in SPCR and PCI).

The PL011 on the eMAG is not PCI, it's described in ACPI and it *is* picked up as the console, as I posted:

uart0: <PrimeCell UART (PL011)> iomem 0x12600000-0x12600fff irq 1 on acpi0
uart0: console (115200,n,8,1)
Comment 6 John O'Neill 2019-04-17 23:51:38 UTC
(In reply to Greg V from comment #5)
I work for Ampere and did create a Bugzilla account - trying to learn the ropes :-). Next time will post info here vs. email.  We are working on testing this.
Comment 7 Greg V 2019-04-18 14:39:57 UTC
Continuing the investigation:

Reading ACPI TranslationOffset was added in review D17791 by jchandra@. It is not applied in enough places, however.

The call that gets the non-translated address is pci_host_generic_core_alloc_resource(dev=pcib0, child=pcib1):

pcib0: rman_reserve_resource: start=0x30000000, end=0x301fffff, count=0x200000
rman_reserve_resource_bound: <PCIe Memory> request: [0x30000000, 0x301fffff], length 0x200000, flags 0, device pcib1
rman_reserve_resource_bound: trying 0x100efffffff <0x30000000,0x1fffff>
considering [0x10030000000, 0x100efffffff]
s->r_start (0x10030000000) + count - 1> end (0x301fffff)
no unshared regions found

I'm trying to figure out where that call is, seems to be pcib_probe_windows -> pcib_probe_windows -> bus_alloc_resource.


(In reply to John O'Neill from comment #6)
Nice! Welcome.
Comment 8 Greg V 2019-04-18 16:05:06 UTC
err, pcib_probe_windows -> pcib_alloc_window -> bus_alloc_resource.

After adding a hardcoded offset: it can reserve on pcib0, but can't manage on pcib1…

pcib0: rman_reserve_resource: start=0x10030000000, end=0x100301fffff, count=0x200000
rman_reserve_resource_bound: <PCIe Memory> request: [0x10030000000, 0x100301fffff], length 0x200000, flags 0, device pcib1
rman_reserve_resource_bound: trying 0x100efffffff <0x10030000000,0x1fffff>
considering [0x10030000000, 0x100efffffff]
truncated region: [0x10030000000, 0x100301fffff]; size 0x200000 (requested 0x200000)
candidate region: [0x10030000000, 0x100301fffff], size 0x200000
allocating from the beginning
pcib0: rman_reserve_resource: 0xfffffd0010197780
rman_manage_region: <pcib1 memory window> request: start 0x10030000000, end 0x100301fffff
panic: Failed to add resource to rman
Comment 9 Tuan Phan 2019-04-19 17:00:59 UTC
Hello,
I am Tuan Phan and BIOS maintainer at Ampere. I can boot FreeBSD to prompt with PCI-e supported (I am not PCI-e expect, just did a quick hack in FreeBSD, not sure it is a right way to do). Also, I have just learned FreeBSD a few day ago, so definitely may have mistakes.

1. Fix the issue with console.
  - I added these lines to /boot/loader.conf
vfs.mountroot.timeout="10"
kernels_autodetect="NO"
boot_serial="YES"
console="comconsole,efi"
boot_multicons="YES"

2. Fix the SPCR and EFI runtime crash
  - I fixed SPCR in BIOS.
  - I removed _INI node from I2C4. It is useless node. Not sure why FreeBSD didn't happy with it.

3. Fix the PCI-e.
  - Here is the patch, again, not PCI-e expect so you may improve it and change it properly.

diff --git a/sys/dev/pci/pci_host_generic.c b/sys/dev/pci/pci_host_generic.c
index 60f06a00909..ca814a03058 100644
--- a/sys/dev/pci/pci_host_generic.c
+++ b/sys/dev/pci/pci_host_generic.c
@@ -359,29 +359,29 @@ generic_pcie_activate_resource(device_t dev, device_t child, int type,
 
 	switch (type) {
 	case SYS_RES_IOPORT:
+	case SYS_RES_MEMORY:
 		found = 0;
 		for (i = 0; i < MAX_RANGES_TUPLES; i++) {
 			pci_base = sc->ranges[i].pci_base;
 			phys_base = sc->ranges[i].phys_base;
 			size = sc->ranges[i].size;
 
-			if ((rid > pci_base) && (rid < (pci_base + size))) {
+			if ((rman_get_start(r) >= pci_base) && (rman_get_start(r) < (pci_base + size))) {
 				found = 1;
 				break;
 			}
 		}
 		if (found) {
-			rman_set_start(r, rman_get_start(r) + phys_base);
-			rman_set_end(r, rman_get_end(r) + phys_base);
+			rman_set_start(r, rman_get_start(r) - pci_base + phys_base);
+			rman_set_end(r, rman_get_end(r) - pci_base + phys_base);
 			res = BUS_ACTIVATE_RESOURCE(device_get_parent(dev),
 			    child, type, rid, r);
 		} else {
 			device_printf(dev,
-			    "Failed to activate IOPORT resource\n");
+			    "Failed to activate %d resource\n", type);
 			res = 0;
 		}
 		break;
-	case SYS_RES_MEMORY:
 	case SYS_RES_IRQ:
 		res = BUS_ACTIVATE_RESOURCE(device_get_parent(dev), child,
 		    type, rid, r);
diff --git a/sys/dev/pci/pci_host_generic_acpi.c b/sys/dev/pci/pci_host_generic_acpi.c
index fa1bf4e6efc..dbc1b7fc746 100644
--- a/sys/dev/pci/pci_host_generic_acpi.c
+++ b/sys/dev/pci/pci_host_generic_acpi.c
@@ -297,7 +297,7 @@ pci_host_generic_acpi_attach(device_t dev)
 			continue; /* empty range element */
 		if (sc->base.ranges[tuple].flags & FLAG_MEM) {
 			error = rman_manage_region(&sc->base.mem_rman,
-			   phys_base, phys_base + size - 1);
+			   pci_base, pci_base + size - 1);
 		} else if (sc->base.ranges[tuple].flags & FLAG_IO) {
 			error = rman_manage_region(&sc->base.io_rman,
 			   pci_base + PCI_IO_WINDOW_OFFSET,
Comment 10 Tuan Phan 2019-04-19 17:01:48 UTC
Created attachment 203803 [details]
eMAG_dmesg_pcie_works
Comment 11 Greg V 2019-04-19 22:00:07 UTC
(In reply to Tuan Phan from comment #9)

Excellent work, thanks! I actually tried doing this — same handling for SYS_RES_MEMORY as for SYS_RES_IOPORT there — but I wasn't smart enough to figure out the subtraction of pci_base.

I see there's some initial I/O port window failures still, but it's nice that you have a NIC working!

> boot_multicons="YES"

Oh. It was using only the framebuffer graphical console as the main console, I thought multicons was default on arm64 for some reason *facepalm*

> Fix the SPCR and EFI runtime crash

hmm, I see the I2C4 thing below, but looks like you didn't get a panic on efirtc initialization either… was that also fixed in firmware?

(it was crashing for me on Packet, the firmware on Packet's servers is: HVE104D-1.02 03/08/2019)

> I removed _INI node from I2C4. It is useless node. Not sure why FreeBSD didn't happy with it.

FreeBSD was probing all ACPI devices, and ACPICA walked into a memory fault while trying to read from that address…
Comment 12 Tuan Phan 2019-04-19 22:18:30 UTC
(In reply to Greg V from comment #11)

> hmm, I see the I2C4 thing below, but looks like you didn't get a panic on efirtc initialization either… was that also fixed in firmware?

I only removed _INI, but not the whole I2C4 node. I didn't see efirtc issue, maybe different issue. The system installed in Packet is not the same system I am using. We are looking into it.

> FreeBSD was probing all ACPI devices, and ACPICA walked into a memory fault while trying to read from that address…

That makes sense.

One more thing, our ACPI has two XHCI nodes with _CID = PNP0D10. Looks like current FreeBSD doesn't have a code to parse it. I saw it only supports EHCI ACPI.
Comment 13 Tuan Phan 2019-04-19 22:20:07 UTC
(In reply to Greg V from comment #11)

> I see there's some initial I/O port window failures still, but it's nice that you have a NIC working!

Correct me if I am wrong. ARM doesn't use IO ports at all.
Comment 14 Greg V 2019-04-20 10:19:08 UTC
(In reply to Tuan Phan from comment #13)
> ARM doesn't use IO ports at all.

Yeah, ARM doesn't have actual IO ports, but looks like PCIe "IO" regions should be mapped into memory:

https://community.nxp.com/thread/387557#comment-626470

and other ARM systems do not show these errors: https://dmesgd.nycbug.org/index.cgi?do=view&id=4798

> our ACPI has two XHCI nodes with _CID = PNP0D10. Looks like current FreeBSD doesn't have a code to parse it.

Nice catch. Yeah, XHCI has typically been on PCIe on big systems (both AMD/Intel and Cavium ThunderX/2) and described by FDT on embedded systems.. That looks easy enough to add though.
Comment 15 Greg V 2019-04-20 13:02:11 UTC
wooooo I have SSH on the Packet instance! :)

Patch for enabling Mellanox NIC support on aarch64: https://reviews.freebsd.org/D19983
Comment 16 Greg V 2019-04-20 13:49:08 UTC
To avoid I/O port window fails, I had to use the `rid` still for I/O port resources

			if (type == SYS_RES_IOPORT) {
				if ((rid >= pci_base) && (rid < (pci_base + size))) {
					found = 1;
					break;
				}
			} else {
				if ((rman_get_start(r) >= pci_base) && (rman_get_start(r) < (pci_base + size))) {
					found = 1;
					break;
				}
			}

The only fails I see is on pcib12:

pcib12: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=0
pcib12: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=3000
pcib12: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=3000
pcib12: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000000000000, end=00000000ffffffff, count=0000000000001000, flags=3000
Comment 17 Greg V 2019-04-20 14:06:06 UTC
I have a patch for ACPI XHCI: https://reviews.freebsd.org/D19986

The Packet instance has USB disabled though:

            Method (_STA, 0, NotSerialized)  // _STA: Status
            {
                Return (0x00)
            }

Patching the table to 0x0F results in

xhci0: <Generic USB 3.0 controller> iomem 0x13800000-0x138fffff irq 5 on acpi0                                                                                                                                                                                
panic: vm_fault_hold: fault on nofault entry, addr: 0xffff0000e1785000

— most likely because disabling USB actually detaches the controller, not just makes ACPI tell the system that it's not present :D
Comment 18 Greg V 2019-04-20 14:20:37 UTC
(hmm even the bios setup says "USB Controllers: None". Does the Lenovo server ship w/o USB at all?)
Comment 19 commit-hook freebsd_committer 2019-04-20 15:57:40 UTC
A commit references this bug:

Author: emaste
Date: Sat Apr 20 15:57:06 UTC 2019
New revision: 346445
URL: https://svnweb.freebsd.org/changeset/base/346445

Log:
  Enable ioremap for aarch64 in the LinuxKPI

  Required for Mellanox drivers (e.g. on Ampere eMAG at Packet.com).

  PR:		237055
  Submitted by:	Greg V <greg@unrelenting.technology>
  Reviewed by:	hselasky
  Differential Revision:	https://reviews.freebsd.org/D19987

Changes:
  head/sys/compat/linuxkpi/common/include/linux/io.h
  head/sys/compat/linuxkpi/common/src/linux_compat.c
Comment 20 Ed Maste freebsd_committer 2019-04-21 15:58:46 UTC
CC jhb@; John can you review the PCI change in comment #9
Comment 21 John Baldwin freebsd_committer freebsd_triage 2019-04-22 16:13:49 UTC
Those aren't generic PCI changes but in the arm-specific drivers (despite the poorly chosen "generic" in the name).  They are ok for now.  The real fix is larger but requires proper implementation of bus_map_resource and using a real resource manager for the host bridges instead of passing requests through.
Comment 22 Tuan Phan 2019-04-22 16:56:08 UTC
(In reply to Greg V from comment #17)
> Patching the table to 0x0F results in

> xhci0: <Generic USB 3.0 controller> iomem 0x13800000-0x138fffff irq 5 on acpi0                                                                                                                                                                                
> panic: vm_fault_hold: fault on nofault entry, addr: 0xffff0000e1785000

eMAG USB controller is disabled in UEFI BIOS so force enabling it in ACPI will likely cause crashing. Some USB registers such as clock, memory access, etc. are controlled in BIOS. USB node in ACPI is just XHCI interface.
Comment 23 Tuan Phan 2019-04-22 16:58:23 UTC
(In reply to Greg V from comment #18)
> (hmm even the bios setup says "USB Controllers: None". Does the Lenovo server ship w/o USB at all?)

If you see _STA = 0 then it is disabled in BIOS. You can try go to BIOS setup tab chipset/xhci controller configuration setting and enable it.
Comment 24 Tuan Phan 2019-04-22 18:07:22 UTC
(In reply to Greg V from comment #16)
> if ((rid >= pci_base) && (rid < (pci_base + size))

I am still not clear why rid can be compared to pci_base? It is an ID resource, right?

In pci_host_generic_acpi.c, function pci_host_generic_acpi_attach
			error = rman_manage_region(&sc->base.io_rman,
			   pci_base + PCI_IO_WINDOW_OFFSET,
			   pci_base + PCI_IO_WINDOW_OFFSET + size - 1);

We shouldn't plus PCI_IO_WINDOW_OFFSET to pci_base, should we?
Comment 25 Greg V 2019-04-23 11:09:58 UTC
(In reply to Tuan Phan from comment #23)
> You can try go to BIOS setup tab chipset/xhci controller configuration setting and enable it.

That tab wasn't giving me an option to enable it, or maybe I just couldn't figure it out…

Either way, it would be better if you or Ed tested the XHCI patch (https://reviews.freebsd.org/D19986) because I can't exactly plug anything into the USB ports of a server on the other side of the planet :D
Comment 26 commit-hook freebsd_committer 2019-04-23 15:11:20 UTC
A commit references this bug:

Author: emaste
Date: Tue Apr 23 15:11:01 UTC 2019
New revision: 346598
URL: https://svnweb.freebsd.org/changeset/base/346598

Log:
  Enable Mellanox drivers (modules) on AArch64

  Tested by Greg V with mlx5en on an Ampere eMAG instance at Packet.com on
  c2.large.arm (with some additional uncommitted PCIe WIP).

  PR:		237055
  Submitted by:	Greg V <greg@unrelenting.technology>
  Reviewed by:	hselasky
  MFC after:	1 month
  Differential Revision:	https://reviews.freebsd.org/D19983

Changes:
  head/sys/modules/Makefile
Comment 27 Tuan Phan 2019-04-23 16:57:40 UTC
(In reply to Greg V from comment #25)

> Either way, it would be better if you or Ed tested the XHCI patch (https://reviews.freebsd.org/D19986) because I can't exactly plug anything into the USB ports of a server on the other side of the planet :D

I tested the patch on my board and USB works both USB keyboard/mass storage.
Thanks
Comment 28 Ed Maste freebsd_committer 2019-04-25 23:08:47 UTC
(In reply to Tuan Phan from comment #27)
Can you test the updated USB patch in https://reviews.freebsd.org/D19986? I applied it to my tree but was unsuccessful - As with GregV's report in PR237055 dsdt has for USB:
```
            Method (_STA, 0, NotSerialized)  // _STA: Status
            {
                Return (0x00)
            }
```
regardless of BIOS settings; I wasn't able to test this here.

At boot my FW reports:
SMpro FW version: 1.04
PMpro FW version: 1.04
FW date: 20190228

AMI setup utility reports Version 2.19.1268 and BIOS Version 1.02 Build Date and Time 03/08/2019 09:59:05
Comment 29 Tuan Phan 2019-04-25 23:17:53 UTC
(In reply to Ed Maste from comment #28)

> Can you test the updated USB patch in https://reviews.freebsd.org/D19986? I applied it to my tree but was unsuccessful - As with GregV's report in PR237055 dsdt has for USB:

Sure, but it may take a while. We are moving to new office so all boards in LAB teared down.
Comment 30 Emmanuel Vadot freebsd_committer 2019-04-30 09:12:07 UTC
(In reply to Tuan Phan from comment #24)

Hi,

I also don't understand what the current code is trying to achieve by comparing rid to pci_base, it don't make sense for me too.
I'm working on a patch based on yours and make sure it will not break the other platform using PCI (softiron overdrive, qemu and thunderx and the only ones I think). I'll put up some reviews tonight or maybe tomorrow morning.
In the meantime I've seen that the bus end number in the MCFG table is correctly set to 31 while the one in the _CRS method of each PCI device is set to 255,  Tuan could you fix that in later bios releases ?
Thanks.
Comment 31 Tuan Phan 2019-04-30 16:14:06 UTC
(In reply to Emmanuel Vadot from comment #30)

> In the meantime I've seen that the bus end number in the MCFG table is correctly set to 31 while the one in the _CRS method of each PCI device is set to 255,  Tuan could you fix that in later bios releases ?

Sure, we will fix it.
Comment 32 Ed Maste freebsd_committer 2019-04-30 16:56:31 UTC
(In reply to Tuan Phan from comment #31)
Also please let us know when the update makes it through to new Lenovo firmware.
Comment 33 Emmanuel Vadot freebsd_committer 2019-04-30 17:26:09 UTC
(In reply to Greg V from comment #16)

This just hide the problem and in fact doesn't work.
The IO mapping work with PCI0 to PCI6 (acpi names) but the PCIR_IOBASEH in the PCI-PCI bridge under PCI7 contain 0x10000000. I'm not sure why or how it should map the the addresses in _CRS.
Comment 34 commit-hook freebsd_committer 2019-05-01 17:13:32 UTC
A commit references this bug:

Author: andrew
Date: Wed May  1 17:12:50 UTC 2019
New revision: 346996
URL: https://svnweb.freebsd.org/changeset/base/346996

Log:
  Restore x18 in efi_arch_leave.

  Some UEFI implementations trash this register and, as we use it as a
  platform register, the kernel doesn't save it before calling into the UEFI
  runtime services. As we have a copy in tpidr_el1 restore from there when
  exiting the EFI environment.

  PR:		237234, 237055
  Reviewed by:	manu
  Tested On:	Ampere eMAG
  MFC after:	2 weeks
  Sponsored by:	DARPA, AFRL
  Sponsored by:	Ampere Computing (hardware)
  Differential Revision:	https://reviews.freebsd.org/D20127

Changes:
  head/sys/arm64/arm64/efirt_machdep.c
Comment 35 Emmanuel Vadot freebsd_committer 2019-05-02 16:30:23 UTC
Just opened https://reviews.freebsd.org/D20144
This improve the performance of ahci.
Comment 36 Emmanuel Vadot freebsd_committer 2019-05-09 10:47:48 UTC
Follow up on the ACPI bug.
As Greg noted the problem in on the OperationRegion in the AHBC device.
When the acpica code is trying to read on the address (in the function AcpiExSystemMemorySpaceHandler in file sys/contrib/dev/acpica/components/executer/exregion.c) we get a fault.
The ESR value for this fault is 0x96000410 which mean that is this a "Synchronous External abort, not on translation table walk" according to the armv8 arm. The FnV bit is set so the far register is not valid and SET is equal to 0 so it is a recoverable error.
Andrew Turner (andrew@) thinks it might be a RAS exception which FreeBSD doesn't support for now.
For now I have a crappy patch that just return in the AcpiExSystemMemorySpaceHandler function if the address is 0x1f10c004 or 0x1f10c000 so I can boot the system with the latest BIOS and the full acpi table and not a modified one.
Comment 37 Tuan Phan 2019-05-09 22:01:46 UTC
(In reply to Emmanuel Vadot from comment #36)

About this issue, I am wondering why access 0x1f10c004 or 0x1f10c000 causing exception? Other OS work fine in this case.

Does the access happen before enabling virtual address? need memory mapping?

Somehow, need to fix this issue, otherwise any ACPI nodes that access memory in _INI will have problem.
Comment 38 Emmanuel Vadot freebsd_committer 2019-05-10 16:15:16 UTC
(In reply to Tuan Phan from comment #37)

The ACPICA code will call AcpiOsMapMemory before accessing the region which in turns calls pmap_mapbios.
If there would be something wrong in the mapping I don't think that I will get a data abort exception with a non valid address.
Comment 39 Tuan Phan 2019-05-13 21:03:47 UTC
(In reply to Tuan Phan from comment #29)

> Can you test the updated USB patch in https://reviews.freebsd.org/D19986? I applied it to my tree but was unsuccessful - As with GregV's report in PR237055 dsdt has for USB:


Tested the patch. Can detect USB mass storage and keyboard. The patch is good.
Comment 40 Tuan Phan 2019-05-14 20:06:21 UTC
(In reply to Emmanuel Vadot from comment #38)

Did some debug, it was data abort exception. The address 0x1f10c004 was mapped but with normal memory cacheable attribute. It should be mapped with device memory attribute.

UEFI always export it as device memory.
Comment 41 Emmanuel Vadot freebsd_committer 2019-05-22 05:09:48 UTC
Just opened three new reviews that address the ACPI bugs :
https://reviews.freebsd.org/D20347
https://reviews.freebsd.org/D20348
https://reviews.freebsd.org/D20349
Comment 42 Michael Tuexen freebsd_committer 2019-05-24 11:42:31 UTC
(In reply to Emmanuel Vadot from comment #41)
I recently got a Lenovo HR 350A system for my lab and want to run FreeBSD on it.

Do I only need D2034[789] on top of FreeBSD head or do I need additional patches and or specific version of the firmware?
Comment 43 Ed Maste freebsd_committer 2019-05-24 13:25:01 UTC
(In reply to Michael Tuexen from comment #42)
My WIP tree is functional on eMAG with those three commits included; they should be sufficient. (I have a lot of other changes but they are largely userland, and some unrelated kernel changes.)

Firmware info from early boot (the same eMAG that manu@ is using for development):

SMpro FW version: 1.04
PMpro FW version: 1.04
FW date: 20190228

    EFI version: 2.60
    EFI Firmware: American Megatrends (rev 5.13)
Comment 44 commit-hook freebsd_committer 2019-05-24 13:40:59 UTC
A commit references this bug:

Author: emaste
Date: Fri May 24 13:39:57 UTC 2019
New revision: 348237
URL: https://svnweb.freebsd.org/changeset/base/348237

Log:
  MFC r346598: Enable Mellanox drivers (modules) on AArch64

  PR:		237055
  Submitted by:	Greg V <greg@unrelenting.technology>

Changes:
_U  stable/12/
  stable/12/sys/modules/Makefile
Comment 45 Michael Tuexen freebsd_committer 2019-05-24 16:40:35 UTC
(In reply to Ed Maste from comment #43)
Thanks for the information. Will try to test this on my machine next week...
Comment 46 Michael Tuexen freebsd_committer 2019-05-27 12:44:17 UTC
(In reply to Ed Maste from comment #43)
Hi Ed,

I built a FreeBSD install image based on FreeBSD head with applying D2034[789].

I can confirm that the system boots fine with such a kernel.

When running the installer to install the OS on a new SSD, the installer
finishes the archive extraction step and writes on the screen:

Formatting /dev/ada0p1 as FAT32
Mounting ESP /dev/ada0p1
Installing loader.efi onto ESP
Creating UEFI boot entry

Then the system stalls...

Any idea what is going wrong or what am I doing wrong?
Comment 47 Emmanuel Vadot freebsd_committer 2019-05-27 13:14:40 UTC
Yes, there is a problem with the runtime efi SetVar in the firmware, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237808
I haven't tested the new firmware yet.
If you don't want to try it you could to something like https://github.com/evadot/freebsd/commit/cbf0449d2d6193e209c611dc87eed8f2bfdedd7a
Comment 48 Michael Tuexen freebsd_committer 2019-05-27 15:11:37 UTC
(In reply to Emmanuel Vadot from comment #47)
Thanks, that helps in letting the installer finish. I used your patch, not the updated firmware.

Unfortunately, the kernel from disk panics on load. Likely a problem due to my way of building the image. Restarted from scratch to build the image. I'll report...
Comment 49 Emmanuel Vadot freebsd_committer 2019-05-27 15:28:06 UTC
(In reply to Michael Tuexen from comment #48)

You could try :
https://people.freebsd.org/~manu/FreeBSD-13.0-CURRENT-arm64-aarch64-GENERIC-NODEBUG-r347932.img.xz

It's a week old or something like that and it's using NODEBUG but ...
Otherwise building the image is just : export TARGET_ARCH=aarch64; export TARGET=arm64 ; make buildworld/buildkernel; cd release sudo -E make memstick

You need both target and target_arch for image building (I don't remember why right now ...)
Comment 50 Michael Tuexen freebsd_committer 2019-05-27 16:10:20 UTC
(In reply to Emmanuel Vadot from comment #49)
I gave it a try. It runs the installer without problems, the installed system boots and computes the ssh server keys and locks up...
Comment 51 Emmanuel Vadot freebsd_committer 2019-05-27 16:18:56 UTC
(In reply to Michael Tuexen from comment #50)

Where exactly ?
I have some problem with sendmail being stuck in nanoslp (same problem on Thunderx2 it seems) but I can ctrl+c (that is until I look at what is the problem exactly).
Comment 52 Michael Tuexen freebsd_committer 2019-05-27 17:20:37 UTC
(In reply to Emmanuel Vadot from comment #51)
After reporting that it generated the third key. I could not CTRL-C...
When the build with a debug kernel has finished, I'll try that. Possibly it
provides information or even a panic.
Comment 53 Michael Tuexen freebsd_committer 2019-05-28 08:25:11 UTC
(In reply to Michael Tuexen from comment #52)
OK, I did a build with FreeBSD head of yesterday, applied
* https://reviews.freebsd.org/D20347
* https://reviews.freebsd.org/D20348
* https://reviews.freebsd.org/D20349
* https://github.com/evadot/freebsd/commit/cbf0449d2d6193e209c611dc87eed8f2bfdedd7a

This resulted in a working system. I checked out the sources and rebuild a GENERIC-NODEBUG kernel and it also runs.

However, I had one (temporary) problem during booting.
The messages on the screen where:
...
Loading configured modules...
/boot/entropy size=0x1000
No valid device tree blob found!
WARNING! Trying to fire up the kernel, but no device blob tree found!
EFI framebuffer information:
addr, size     0x430000000, 0x30000
dimensions     1024 x 768
stride         1024
masks          0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000
_

Then the system was hanging. A reboot resolved the issue.
Comment 54 Michael Tuexen freebsd_committer 2019-05-28 11:33:34 UTC
(In reply to Michael Tuexen from comment #53)
Some more testing. The system is capable in doing buildworld, but it locks up a lot when booting. You can't CTRL-C it.

Is there any information I could provide which would help to nail the problem down?
Comment 55 Ed Maste freebsd_committer 2019-05-28 13:12:20 UTC
(In reply to Michael Tuexen from comment #54)
To be clear, you mean that it frequently locks up during boot, but once booted it runs correctly?
Comment 56 Michael Tuexen freebsd_committer 2019-05-28 13:52:20 UTC
(In reply to Ed Maste from comment #55)
More testing, better description:

I meant: several times it booted to the login prompt but it didn't accept input on the keyboard or over the network (ssh access)

Now I have observed that sometimes it accepts input on the console, but the
network (an igb card) wasn't brought up. When looking at the boot messages I do see (trans-scribed):
...
pci14 <PCI bus> on pcib14
pcib15 <PCI-PCI bridge> at device 0.0 on pci14
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=000000010000fff, count=0000000000001000, flags=0
pcib15: failed to allocate initial I/O port window:0x10000000-0x10000fff
pci15: <PCI bus> on pcib15
pcib16: <PCI-PCI bridge> at device 0.0 on pci15
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=start=0000000010000000, end=000000010000fff, count=0000000000001000, flags=3000
pcib16: failed to allocate initial I/O port window:0x10000000-0x10000fff
pci16: <PCI bus> on pcib16
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=start=0000000010000000, end=000000010000fff, count=0000000000001000, flags=3000
...
acpi0: Could not update all GPEs: AE_NOT_CONFIGURED

I have observed similar instabilities on an overdrive 3000 system when these kind of PCU error occurred. On the Overdrive 3000 I'm working around this by using an ethernet card which doesn't show these PCI errors (a bge card instead of igb or ix).

The ampere system has an igb card (in use) and an Mellanox card (not in use). Should I try to replace them?
Comment 57 Michael Tuexen freebsd_committer 2019-05-28 14:23:02 UTC
OK, I identified one problem: 
When setting the time/date via
sudo date 1432
on the command line, the system locks up after a couple of seconds.

This might be related to the lock up after booting problems I have seen, since I added
ntpdate="YES"
to my /etc/rc.conf

Without this entry, the system boots fine.

Can you reproduce this?
Comment 58 Emmanuel Vadot freebsd_committer 2019-05-28 14:28:57 UTC
(In reply to Michael Tuexen from comment #57)
I can yes, I'll add this to my stuff to resolv list :)
Comment 59 Michael Tuexen freebsd_committer 2019-05-28 14:31:19 UTC
(In reply to Emmanuel Vadot from comment #58)
Great. Thanks a lot!
Comment 60 Tuan Phan 2019-05-28 17:26:09 UTC
(In reply to Michael Tuexen from comment #57)

It hang because the same issue with SetVariable. I think you should use the latest FW which mentioned on the SetVariable issue.

When you set RTC, it also use SetVariable to save timezone info.
Comment 61 Tuan Phan 2019-05-28 17:27:09 UTC
(In reply to Michael Tuexen from comment #53)

I suggest you use the latest FW and try again.
Comment 62 Tuan Phan 2019-05-28 17:27:23 UTC
(In reply to Michael Tuexen from comment #53)

I suggest you use the latest FW and try again.
Comment 63 Michael Tuexen freebsd_committer 2019-05-28 17:34:26 UTC
(In reply to Tuan Phan from comment #62)
OK. Will try tomorrow and report.
Comment 64 Greg V 2019-05-28 18:53:42 UTC
(In reply to Michael Tuexen from comment #56)

The errors on pci14-16 are not from your igb card and should not affect your card, which is probably on a far lower-numbered bus/bridge/thingy.

The Mellanox CX4 cards on the Packet instances are on pci1: https://dmesgd.nycbug.org/index.cgi?do=view&id=4864 and they work perfectly fine (in a LACP aggregation, even). The same errors are showing up on pci12-14 there. (2 less buses there — HR350A vs HR330A?)
Comment 65 Greg V 2019-05-28 22:28:31 UTC
By the way, a few questions for Tuan and/or John:

- is there no hardware random number generator on eMAG? I see there was on X-Gene: https://github.com/torvalds/linux/blob/master/drivers/char/hw_random/xgene-rng.c but APMC0D18 is nowhere to be found in the DSDT I got from the Packet instance..
- does the CPU boost to the 3.3GHz speed without the OS doing anything?
- is there public documentation for the monitoring (temperature, frequency)/PMU etc. devices, other than the GPL'ed Linux driver code?
- why is the primary part number in MIDR zero?

---

also, I just realized that we're not building ipmi_acpi on aarch64, and it does build..
Comment 66 Tuan Phan 2019-05-28 22:38:29 UTC
(In reply to Greg V from comment #65)

Greg,
I can answer some questions:

1. why is the primary part number in MIDR zero?
=> We fixed a bug that the MIDR was put to the second DWORD if you are parsing from smbios type 4?

2. I don't think we have RNG in eMag. Not sure, let John confirm with designer.
3. I believe the CPU can boost to the 3.3Ghz without media needed from OS. Not sure, let John confirm with the power management maintainer.
4. John can help you with documents if it is available or provide support from designer.
Comment 67 Michael Tuexen freebsd_committer 2019-05-29 10:14:33 UTC
(In reply to Michael Tuexen from comment #63)
I can confirm that updating the Firmware to the version provided in bug #237808 resolves the issue with setting the time (via /etc/rc.conf or manually).
Comment 68 Tuan Phan 2019-05-30 21:49:44 UTC
(In reply to Michael Tuexen from comment #53)

> /boot/entropy size=0x1000
> No valid device tree blob found!
> WARNING! Trying to fire up the kernel, but no device blob tree found!
> EFI framebuffer information:
> addr, size     0x430000000, 0x30000
> dimensions     1024 x 768
> stride         1024
> masks          0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000
> _
> Then the system was hanging. A reboot resolved the issue.

Did you see this issue with the new test FW?
Comment 69 Michael Tuexen freebsd_committer 2019-05-30 22:30:22 UTC
(In reply to Tuan Phan from comment #68)
No, I haven't. Using the new Firmware, the system runs fine (using the igb und mce interfaces).

It only reports:

pci13: <PCI bus> on pcib13
pcib14: <Generic PCI host controller> on acpi0
pci14: <PCI bus> on pcib14
pcib15: <PCI-PCI bridge> at device 0.0 on pci14
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=0
pcib15: failed to allocate initial I/O port window: 0x10000000-0x10000fff
pci15: <PCI bus> on pcib15
pcib16: <PCI-PCI bridge> at device 0.0 on pci15
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=3000
pcib16: failed to allocate initial I/O port window: 0x10000000-0x10000fff
pci16: <PCI bus> on pcib16
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=3000
vgapci0: <VGA-compatible display> port 0x1000-0x107f mem 0x30000000-0x30ffffff,0x31040000-0x3105ffff at device 0.0 on pci16
cpu0: <ACPI CPU> on acpi0
uart0: <PrimeCell UART (PL011)> iomem 0x12600000-0x12600fff irq 1 on acpi0
uart0: console (115200,n,8,1)
uart1: <PrimeCell UART (PL011)> iomem 0x12610000-0x12610fff irq 2 on acpi0
acpi0: Could not update all GPEs: AE_NOT_CONFIGURED

during boot. But it doesn't seem to affect the system.
Comment 70 Michael Tuexen freebsd_committer 2019-06-04 06:58:17 UTC
I tried to enable console access via a serial line by putting

boot_multicons="YES"
boot_serial="YES"
console="comconsole,efi"
comconsole_speed="115200"

into /boot/loader.conf.

Is this supposed to work with FreeBSD head (r348543)? It never works on my system and sometimes the system locks up during boot. Without these entries in /boot/loader.conf I have not observed such lockups anymore.

I'm running the firmware from bug #237808.

From dmesg:

...
cpu0: <ACPI CPU> on acpi0
uart0: <PrimeCell UART (PL011)> iomem 0x12600000-0x12600fff irq 1 on acpi0
uart0: console (115200,n,8,1)
uart1: <PrimeCell UART (PL011)> iomem 0x12610000-0x12610fff irq 2 on acpi0
acpi0: Could not update all GPEs: AE_NOT_CONFIGURED
...
Comment 71 Greg V 2019-07-21 17:48:20 UTC
hm, looks like it is possible to identify the eMAG CPU: https://github.com/NetBSD/src/commit/a1feb17c3b45b52319a61e4f9c172e373b055bc2 + https://github.com/NetBSD/src/commit/74b0f2158a5c1fee10344fc3d995780a353570a2

btw, if anyone is interested in trying more stuff on eMAG (and other aarch64 HW):

- AMD Radeon GPU driver https://github.com/FreeBSDDesktop/kms-drm/pull/154
- SBSA watchdog driver https://reviews.freebsd.org/D20974
Comment 72 Greg V 2019-07-24 20:10:52 UTC
Well here's a funny story…

I've been experimenting with two aarch64 things: attaching the PMU (where P = Performance) via ACPI (to make pmcstat work) and building IPMI support.

On my Marvell MACCHIATObin, PMU attaches on boot, the interrupt fires but very rarely, so most of the time there's nothing in pmcstat, but occasionally a couple lines did appear. That platform actually has some weirdness (the PMU interrupts a custom Marvell interrupt controller, and in ACPI mode the firmware catches that and rethrows onto the GICv2, or something like that) so it might be a firmware bug.

So I've rented an Ampere eMAG instance from Packet again to try a different ACPI platform and uhh.

On boot, the PMU does not attach:

pmu0: rid 0 irq 23
pmu0: <Performance Monitoring Unit> irq 13 on acpi0
pmu0: could not allocate resources

But when I do `kldload ipmi` (!!!):

pmu0: <Performance Monitoring Unit> irq 13 on acpi0

and pmcstat does actually start working! Wait, what?! Oh. I guess it's just reprobing all the drivers on unattached devices, but it looked so bizzare at first :D

Evidently, I just put the PMU too early in the attachment order (BUS_PASS_INTERRUPT + BUS_PASS_ORDER_MIDDLE).

(and, ipmi does not attach because the i2c controller wasn't even attaching (https://reviews.freebsd.org/D21059), the i2c controller doesn't attach its children, and IPMI-over-i2c-described-by-ACPI is not supported anyway)
Comment 73 Greg V 2019-07-24 20:35:04 UTC
(In reply to Greg V from comment #72)

What is actually weird is that attaching pmu correctly in the boot process results in a ridiculous interrupt rate slowing the system down :(

# vmstat -i
interrupt                                             total       rate
gic0,p7: pmu0                                    2397246676    2342967
gic0,p11:-ic_timer0                                26390337      25793
gic0,s66: uart0                                         502          0
gic0,s79: ahci0                                        2071          2
Comment 74 Michael Tuexen freebsd_committer 2019-08-18 09:37:06 UTC
(In reply to Greg V from comment #71)
A patch is in review D21314.