237055 – Ampere eMAG compatibility

Bug 237055 - Ampere eMAG compatibility

Summary: Ampere eMAG compatibility

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	arm (show other bugs)
Version:	CURRENT
Hardware:	arm64 Any

Importance:	--- Affects Only Me
Assignee:	freebsd-arm (Nobody)

URL:
Keywords:

Depends on:	237234
Blocks:
	Show dependency tree / graph

Reported:	2019-04-05 23:32 UTC by Val Packett
Modified:	2021-07-02 08:03 UTC (History)
CC List:	12 users (show)

See Also:

Attachments
emag.multiuser.dmesg (56.50 KB, text/plain) 2019-04-05 23:32 UTC, Val Packett	no flags	Details
emag.acpi.tar.gz (37.07 KB, application/gzip) 2019-04-05 23:33 UTC, Val Packett	no flags	Details
emag.hack.dsdt.patch (51.16 KB, patch) 2019-04-05 23:34 UTC, Val Packett	no flags	Details \| Diff
eMAG_dmesg_pcie_works (56.17 KB, text/plain) 2019-04-19 17:01 UTC, Tuan Phan	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Val Packett 2019-04-05 23:32:38 UTC

Created attachment 203420 [details]
emag.multiuser.dmesg

Sooo, now that Packet has Ampere eMAG instances (c2.large.arm), of course someone had to try FreeBSD and of course it's me… :D

tl;dr I managed to boot to multiuser with some hacks, but PCIe is busted, needs support for more ACPI stuff. Verbose boot log is attached, I'll attach ACPI tables and stuff too.

---

0. Installation

I used an Ubuntu 18.04 instance, rerooted to a ramdisk ( using the method I described in https://community.online.net/t/freebsd-on-arm64/6678 ), resized the Linux partition, added a new one, loop mounted a memstick image, dd'd it onto the new partition, copied loader_lua.efi to the EFI partition, added a GRUB entry to chainload that:

menuentry 'FreeBSD' {
  load_video
  insmod part_gpt
  insmod chain
  set root='hd0,gpt1'
  chainloader /EFI/BSD/loader_lua.efi
}

and used https://github.com/mkatiyar/fuse-ufs2 to modify the UFS partition from Linux. (As long as you don't copy files from the UFS partition *to itself*, it works fine lol. If you do that, it gets stuck in a 100% cpu loop)

1. Console

https://reviews.freebsd.org/D19507 is needed for any UART output now that one part from there (not using the hardcoded regshift) has landed. Now we need to hardcode it again but only for PL011.

But that's not all. For some reason, I'm not seeing userspace output (/dev/console) even though the ACPI node for the console was picked up:

uart0: <PrimeCell UART (PL011)> iomem 0x12600000-0x12600fff irq 1 on acpi0
uart0: console (115200,n,8,1)
uart0: fast interrupt
uart0: PPS capture mode: DCD

2. Weird early memory access crashes

EFI runtime support (specifically, enumerating efirtc) crashed in efi_call() at efi_get_time+0x50. I disabled `options EFIRT`.

Then ACPI crashed in AcpiExSystemMemorySpaceHandler when reading:

  exfield-0369 ExReadDataFromField   : FieldRead [TO]:   Obj 0xfffffd0010b41980, Type 11, Buf 0xfffffd0010b62b10, ByteLen 8
  exfield-0372 ExReadDataFromField   : FieldRead [FROM]: BitLen 1, BitOff 6, ByteOff 0
  exfldio-0395 ExAccessRegion        : [READ] Region [SystemMemory:0], Width 4, ByteBase 0, Offset 0 at 000000001F10C004

I patched DSDT, removing OperationRegion CLKE from Device AHBC. The only thing that used this was Method _INI for Device I2C4, so I removed the body of that method as well.
Who cares about i2c on a server :) that allowed booting to proceed.

3. PCIe is screwed up

There's this interesting message for all PCI bridges:

pcib0: bus end mismatch! expected 255 found 31.

And some more interesting messages (for the last couple pcib's also with "I/O port window" and "bar .. failed to allocate"):

pcib0: rman_reserve_resource: start=0x30000000, end=0x301fffff, count=0x200000
pcib0: pci_host_generic_core_alloc_resource FAIL: type=3, rid=32, start=0000000030000000, end=00000000301fffff, count=0000000000200000, flags=0
pcib1: failed to allocate initial memory window: 0x30000000-0x301fffff
pcib0: rman_reserve_resource: start=0x14080000000, end=0x14084ffffff, count=0x5000000

PCIe cards actually don't work when these messages are present:

mlx5_core0: <mlx5_core> mem 0x14082000000-0x14083ffffff at device 0.0 on pci1
mlx5_core0: ERR: Failed mapping initialization segment, aborting

Looking at Ampere's page https://github.com/AmpereComputing/ampere-centos-kernel/wiki/Ampere-CentOS-Kernel-wiki

it seems like Linux needed to ACPI _DMA objects and IORT named components:

https://github.com/torvalds/linux/commit/4f0450af530e62b0217522cab4803b5a65dccc46
https://github.com/torvalds/linux/commit/c04ac679c6b86e4e36fbb675c6c061b4091f5810
https://github.com/torvalds/linux/commit/7ad4263980826e8b02e121af22f4f4c9103fe86d
https://github.com/torvalds/linux/commit/10d8ab2c15b9ef2f46c35e7c36781399d6f2cc82

Comment 1 Val Packett 2019-04-05 23:33:21 UTC

Created attachment 203421 [details]
emag.acpi.tar.gz

Comment 2 Val Packett 2019-04-05 23:34:48 UTC

Created attachment 203422 [details]
emag.hack.dsdt.patch

Comment 3 Ed Maste freebsd_committer

2019-04-15 13:45:15 UTC

Oops, forgot the PR reference. Serial quirk committed as r346228.
https://svnweb.freebsd.org/changeset/base/346228

Comment 4 Ed Maste freebsd_committer

2019-04-15 19:02:39 UTC

(In reply to Greg V from comment #0)
> For some reason, I'm not seeing userspace output (/dev/console) even though the ACPI node for the console was picked up

Your split-out review D19896 is for a /dev/console issue on Amazon EC2 UARTs, might we have a similar issue here?

Comment 5 Val Packett 2019-04-16 20:14:01 UTC

New mail from Ampere engineers (they don't seem to want to sign up for bugzilla, sadly), new very helpful info about PCIe:

The _DMA objects are for the SMMU, they would make "virtualization work properly" (I assume that means PCI passthrough). Since bhyvearm64 is not finished / not upstreamed, no rush for that I guess.

Apparently the real problem with just using PCIe is that we're not adding the address base from the "AddressTranslation - TRA" field, so e.g.

pcib1: failed to allocate initial memory window: 0x30000000-0x301fffff

we should actually be accessing: _TRA+0x3000_0000 = 0x100_3000_0000

From a quick grep, I think acpi_pcib_producer_handler is where we handle this:

min = res->Data.Address64.Address.Minimum;
max = res->Data.Address64.Address.Maximum;

So I guess it should be something like

min = res->Data.Address64.Address.Minimum + res->Data.Address64.Address.Translation;
max = res->Data.Address64.Address.Maximum + res->Data.Address64.Address.Translation;

(for all widths)


(In reply to Ed Maste from comment #4)
> Your split-out review D19896 is for a /dev/console issue on Amazon EC2 UARTs, might we have a similar issue here?

Nah, that one is about connecting the SPCR device with the PCI device (the Amazon UART has different memory addresses in SPCR and PCI).

The PL011 on the eMAG is not PCI, it's described in ACPI and it *is* picked up as the console, as I posted:

uart0: <PrimeCell UART (PL011)> iomem 0x12600000-0x12600fff irq 1 on acpi0
uart0: console (115200,n,8,1)

Comment 6 John O'Neill 2019-04-17 23:51:38 UTC

(In reply to Greg V from comment #5)
I work for Ampere and did create a Bugzilla account - trying to learn the ropes :-). Next time will post info here vs. email.  We are working on testing this.

Comment 7 Val Packett 2019-04-18 14:39:57 UTC

Continuing the investigation:

Reading ACPI TranslationOffset was added in review D17791 by jchandra@. It is not applied in enough places, however.

The call that gets the non-translated address is pci_host_generic_core_alloc_resource(dev=pcib0, child=pcib1):

pcib0: rman_reserve_resource: start=0x30000000, end=0x301fffff, count=0x200000
rman_reserve_resource_bound: <PCIe Memory> request: [0x30000000, 0x301fffff], length 0x200000, flags 0, device pcib1
rman_reserve_resource_bound: trying 0x100efffffff <0x30000000,0x1fffff>
considering [0x10030000000, 0x100efffffff]
s->r_start (0x10030000000) + count - 1> end (0x301fffff)
no unshared regions found

I'm trying to figure out where that call is, seems to be pcib_probe_windows -> pcib_probe_windows -> bus_alloc_resource.


(In reply to John O'Neill from comment #6)
Nice! Welcome.

Comment 8 Val Packett 2019-04-18 16:05:06 UTC

err, pcib_probe_windows -> pcib_alloc_window -> bus_alloc_resource.

After adding a hardcoded offset: it can reserve on pcib0, but can't manage on pcib1…

pcib0: rman_reserve_resource: start=0x10030000000, end=0x100301fffff, count=0x200000
rman_reserve_resource_bound: <PCIe Memory> request: [0x10030000000, 0x100301fffff], length 0x200000, flags 0, device pcib1
rman_reserve_resource_bound: trying 0x100efffffff <0x10030000000,0x1fffff>
considering [0x10030000000, 0x100efffffff]
truncated region: [0x10030000000, 0x100301fffff]; size 0x200000 (requested 0x200000)
candidate region: [0x10030000000, 0x100301fffff], size 0x200000
allocating from the beginning
pcib0: rman_reserve_resource: 0xfffffd0010197780
rman_manage_region: <pcib1 memory window> request: start 0x10030000000, end 0x100301fffff
panic: Failed to add resource to rman

Comment 9 Tuan Phan 2019-04-19 17:00:59 UTC

Hello,
I am Tuan Phan and BIOS maintainer at Ampere. I can boot FreeBSD to prompt with PCI-e supported (I am not PCI-e expect, just did a quick hack in FreeBSD, not sure it is a right way to do). Also, I have just learned FreeBSD a few day ago, so definitely may have mistakes.

1. Fix the issue with console.
  - I added these lines to /boot/loader.conf
vfs.mountroot.timeout="10"
kernels_autodetect="NO"
boot_serial="YES"
console="comconsole,efi"
boot_multicons="YES"

2. Fix the SPCR and EFI runtime crash
  - I fixed SPCR in BIOS.
  - I removed _INI node from I2C4. It is useless node. Not sure why FreeBSD didn't happy with it.

3. Fix the PCI-e.
  - Here is the patch, again, not PCI-e expect so you may improve it and change it properly.

diff --git a/sys/dev/pci/pci_host_generic.c b/sys/dev/pci/pci_host_generic.c
index 60f06a00909..ca814a03058 100644
--- a/sys/dev/pci/pci_host_generic.c
+++ b/sys/dev/pci/pci_host_generic.c
@@ -359,29 +359,29 @@ generic_pcie_activate_resource(device_t dev, device_t child, int type,
 
 	switch (type) {
 	case SYS_RES_IOPORT:
+	case SYS_RES_MEMORY:
 		found = 0;
 		for (i = 0; i < MAX_RANGES_TUPLES; i++) {
 			pci_base = sc->ranges[i].pci_base;
 			phys_base = sc->ranges[i].phys_base;
 			size = sc->ranges[i].size;
 
-			if ((rid > pci_base) && (rid < (pci_base + size))) {
+			if ((rman_get_start(r) >= pci_base) && (rman_get_start(r) < (pci_base + size))) {
 				found = 1;
 				break;
 			}
 		}
 		if (found) {
-			rman_set_start(r, rman_get_start(r) + phys_base);
-			rman_set_end(r, rman_get_end(r) + phys_base);
+			rman_set_start(r, rman_get_start(r) - pci_base + phys_base);
+			rman_set_end(r, rman_get_end(r) - pci_base + phys_base);
 			res = BUS_ACTIVATE_RESOURCE(device_get_parent(dev),
 			    child, type, rid, r);
 		} else {
 			device_printf(dev,
-			    "Failed to activate IOPORT resource\n");
+			    "Failed to activate %d resource\n", type);
 			res = 0;
 		}
 		break;
-	case SYS_RES_MEMORY:
 	case SYS_RES_IRQ:
 		res = BUS_ACTIVATE_RESOURCE(device_get_parent(dev), child,
 		    type, rid, r);
diff --git a/sys/dev/pci/pci_host_generic_acpi.c b/sys/dev/pci/pci_host_generic_acpi.c
index fa1bf4e6efc..dbc1b7fc746 100644
--- a/sys/dev/pci/pci_host_generic_acpi.c
+++ b/sys/dev/pci/pci_host_generic_acpi.c
@@ -297,7 +297,7 @@ pci_host_generic_acpi_attach(device_t dev)
 			continue; /* empty range element */
 		if (sc->base.ranges[tuple].flags & FLAG_MEM) {
 			error = rman_manage_region(&sc->base.mem_rman,
-			   phys_base, phys_base + size - 1);
+			   pci_base, pci_base + size - 1);
 		} else if (sc->base.ranges[tuple].flags & FLAG_IO) {
 			error = rman_manage_region(&sc->base.io_rman,
 			   pci_base + PCI_IO_WINDOW_OFFSET,

Comment 10 Tuan Phan 2019-04-19 17:01:48 UTC

Created attachment 203803 [details]
eMAG_dmesg_pcie_works

Comment 11 Val Packett 2019-04-19 22:00:07 UTC

(In reply to Tuan Phan from comment #9)

Excellent work, thanks! I actually tried doing this — same handling for SYS_RES_MEMORY as for SYS_RES_IOPORT there — but I wasn't smart enough to figure out the subtraction of pci_base.

I see there's some initial I/O port window failures still, but it's nice that you have a NIC working!

> boot_multicons="YES"

Oh. It was using only the framebuffer graphical console as the main console, I thought multicons was default on arm64 for some reason *facepalm*

> Fix the SPCR and EFI runtime crash

hmm, I see the I2C4 thing below, but looks like you didn't get a panic on efirtc initialization either… was that also fixed in firmware?

(it was crashing for me on Packet, the firmware on Packet's servers is: HVE104D-1.02 03/08/2019)

> I removed _INI node from I2C4. It is useless node. Not sure why FreeBSD didn't happy with it.

FreeBSD was probing all ACPI devices, and ACPICA walked into a memory fault while trying to read from that address…

Comment 12 Tuan Phan 2019-04-19 22:18:30 UTC

(In reply to Greg V from comment #11)

> hmm, I see the I2C4 thing below, but looks like you didn't get a panic on efirtc initialization either… was that also fixed in firmware?

I only removed _INI, but not the whole I2C4 node. I didn't see efirtc issue, maybe different issue. The system installed in Packet is not the same system I am using. We are looking into it.

> FreeBSD was probing all ACPI devices, and ACPICA walked into a memory fault while trying to read from that address…

That makes sense.

One more thing, our ACPI has two XHCI nodes with _CID = PNP0D10. Looks like current FreeBSD doesn't have a code to parse it. I saw it only supports EHCI ACPI.

Comment 13 Tuan Phan 2019-04-19 22:20:07 UTC

(In reply to Greg V from comment #11)

> I see there's some initial I/O port window failures still, but it's nice that you have a NIC working!

Correct me if I am wrong. ARM doesn't use IO ports at all.

Comment 14 Val Packett 2019-04-20 10:19:08 UTC

(In reply to Tuan Phan from comment #13)
> ARM doesn't use IO ports at all.

Yeah, ARM doesn't have actual IO ports, but looks like PCIe "IO" regions should be mapped into memory:

https://community.nxp.com/thread/387557#comment-626470

and other ARM systems do not show these errors: https://dmesgd.nycbug.org/index.cgi?do=view&id=4798

> our ACPI has two XHCI nodes with _CID = PNP0D10. Looks like current FreeBSD doesn't have a code to parse it.

Nice catch. Yeah, XHCI has typically been on PCIe on big systems (both AMD/Intel and Cavium ThunderX/2) and described by FDT on embedded systems.. That looks easy enough to add though.

Comment 15 Val Packett 2019-04-20 13:02:11 UTC

wooooo I have SSH on the Packet instance! :)

Patch for enabling Mellanox NIC support on aarch64: https://reviews.freebsd.org/D19983

Comment 16 Val Packett 2019-04-20 13:49:08 UTC

To avoid I/O port window fails, I had to use the `rid` still for I/O port resources

			if (type == SYS_RES_IOPORT) {
				if ((rid >= pci_base) && (rid < (pci_base + size))) {
					found = 1;
					break;
				}
			} else {
				if ((rman_get_start(r) >= pci_base) && (rman_get_start(r) < (pci_base + size))) {
					found = 1;
					break;
				}
			}

The only fails I see is on pcib12:

pcib12: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=0
pcib12: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=3000
pcib12: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=3000
pcib12: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000000000000, end=00000000ffffffff, count=0000000000001000, flags=3000

Comment 17 Val Packett 2019-04-20 14:06:06 UTC

I have a patch for ACPI XHCI: https://reviews.freebsd.org/D19986

The Packet instance has USB disabled though:

            Method (_STA, 0, NotSerialized)  // _STA: Status
            {
                Return (0x00)
            }

Patching the table to 0x0F results in

xhci0: <Generic USB 3.0 controller> iomem 0x13800000-0x138fffff irq 5 on acpi0                                                                                                                                                                                
panic: vm_fault_hold: fault on nofault entry, addr: 0xffff0000e1785000

— most likely because disabling USB actually detaches the controller, not just makes ACPI tell the system that it's not present :D

Comment 18 Val Packett 2019-04-20 14:20:37 UTC

(hmm even the bios setup says "USB Controllers: None". Does the Lenovo server ship w/o USB at all?)

Comment 19 commit-hook freebsd_committer

2019-04-20 15:57:40 UTC

A commit references this bug:

Author: emaste
Date: Sat Apr 20 15:57:06 UTC 2019
New revision: 346445
URL: https://svnweb.freebsd.org/changeset/base/346445

Log:
  Enable ioremap for aarch64 in the LinuxKPI

  Required for Mellanox drivers (e.g. on Ampere eMAG at Packet.com).

  PR:		237055
  Submitted by:	Greg V <greg@unrelenting.technology>
  Reviewed by:	hselasky
  Differential Revision:	https://reviews.freebsd.org/D19987

Changes:
  head/sys/compat/linuxkpi/common/include/linux/io.h
  head/sys/compat/linuxkpi/common/src/linux_compat.c

Comment 20 Ed Maste freebsd_committer

2019-04-21 15:58:46 UTC

CC jhb@; John can you review the PCI change in comment #9

Comment 21 John Baldwin freebsd_committer

2019-04-22 16:13:49 UTC

Those aren't generic PCI changes but in the arm-specific drivers (despite the poorly chosen "generic" in the name).  They are ok for now.  The real fix is larger but requires proper implementation of bus_map_resource and using a real resource manager for the host bridges instead of passing requests through.

Comment 22 Tuan Phan 2019-04-22 16:56:08 UTC

(In reply to Greg V from comment #17)
> Patching the table to 0x0F results in

> xhci0: <Generic USB 3.0 controller> iomem 0x13800000-0x138fffff irq 5 on acpi0                                                                                                                                                                                
> panic: vm_fault_hold: fault on nofault entry, addr: 0xffff0000e1785000

eMAG USB controller is disabled in UEFI BIOS so force enabling it in ACPI will likely cause crashing. Some USB registers such as clock, memory access, etc. are controlled in BIOS. USB node in ACPI is just XHCI interface.

Comment 23 Tuan Phan 2019-04-22 16:58:23 UTC

(In reply to Greg V from comment #18)
> (hmm even the bios setup says "USB Controllers: None". Does the Lenovo server ship w/o USB at all?)

If you see _STA = 0 then it is disabled in BIOS. You can try go to BIOS setup tab chipset/xhci controller configuration setting and enable it.

Comment 24 Tuan Phan 2019-04-22 18:07:22 UTC

(In reply to Greg V from comment #16)
> if ((rid >= pci_base) && (rid < (pci_base + size))

I am still not clear why rid can be compared to pci_base? It is an ID resource, right?

In pci_host_generic_acpi.c, function pci_host_generic_acpi_attach
			error = rman_manage_region(&sc->base.io_rman,
			   pci_base + PCI_IO_WINDOW_OFFSET,
			   pci_base + PCI_IO_WINDOW_OFFSET + size - 1);

We shouldn't plus PCI_IO_WINDOW_OFFSET to pci_base, should we?

Comment 25 Val Packett 2019-04-23 11:09:58 UTC

(In reply to Tuan Phan from comment #23)
> You can try go to BIOS setup tab chipset/xhci controller configuration setting and enable it.

That tab wasn't giving me an option to enable it, or maybe I just couldn't figure it out…

Either way, it would be better if you or Ed tested the XHCI patch (https://reviews.freebsd.org/D19986) because I can't exactly plug anything into the USB ports of a server on the other side of the planet :D

Comment 26 commit-hook freebsd_committer

2019-04-23 15:11:20 UTC

A commit references this bug:

Author: emaste
Date: Tue Apr 23 15:11:01 UTC 2019
New revision: 346598
URL: https://svnweb.freebsd.org/changeset/base/346598

Log:
  Enable Mellanox drivers (modules) on AArch64

  Tested by Greg V with mlx5en on an Ampere eMAG instance at Packet.com on
  c2.large.arm (with some additional uncommitted PCIe WIP).

  PR:		237055
  Submitted by:	Greg V <greg@unrelenting.technology>
  Reviewed by:	hselasky
  MFC after:	1 month
  Differential Revision:	https://reviews.freebsd.org/D19983

Changes:
  head/sys/modules/Makefile

Comment 27 Tuan Phan 2019-04-23 16:57:40 UTC

(In reply to Greg V from comment #25)

> Either way, it would be better if you or Ed tested the XHCI patch (https://reviews.freebsd.org/D19986) because I can't exactly plug anything into the USB ports of a server on the other side of the planet :D

I tested the patch on my board and USB works both USB keyboard/mass storage.
Thanks

Comment 28 Ed Maste freebsd_committer

2019-04-25 23:08:47 UTC

(In reply to Tuan Phan from comment #27)
Can you test the updated USB patch in https://reviews.freebsd.org/D19986? I applied it to my tree but was unsuccessful - As with GregV's report in PR237055 dsdt has for USB:
```
            Method (_STA, 0, NotSerialized)  // _STA: Status
            {
                Return (0x00)
            }
```
regardless of BIOS settings; I wasn't able to test this here.

At boot my FW reports:
SMpro FW version: 1.04
PMpro FW version: 1.04
FW date: 20190228

AMI setup utility reports Version 2.19.1268 and BIOS Version 1.02 Build Date and Time 03/08/2019 09:59:05

Comment 29 Tuan Phan 2019-04-25 23:17:53 UTC

(In reply to Ed Maste from comment #28)

> Can you test the updated USB patch in https://reviews.freebsd.org/D19986? I applied it to my tree but was unsuccessful - As with GregV's report in PR237055 dsdt has for USB:

Sure, but it may take a while. We are moving to new office so all boards in LAB teared down.

Comment 30 Emmanuel Vadot freebsd_committer

2019-04-30 09:12:07 UTC

(In reply to Tuan Phan from comment #24)

Hi,

I also don't understand what the current code is trying to achieve by comparing rid to pci_base, it don't make sense for me too.
I'm working on a patch based on yours and make sure it will not break the other platform using PCI (softiron overdrive, qemu and thunderx and the only ones I think). I'll put up some reviews tonight or maybe tomorrow morning.
In the meantime I've seen that the bus end number in the MCFG table is correctly set to 31 while the one in the _CRS method of each PCI device is set to 255,  Tuan could you fix that in later bios releases ?
Thanks.

Comment 31 Tuan Phan 2019-04-30 16:14:06 UTC

(In reply to Emmanuel Vadot from comment #30)

> In the meantime I've seen that the bus end number in the MCFG table is correctly set to 31 while the one in the _CRS method of each PCI device is set to 255,  Tuan could you fix that in later bios releases ?

Sure, we will fix it.

Comment 32 Ed Maste freebsd_committer

2019-04-30 16:56:31 UTC

(In reply to Tuan Phan from comment #31)
Also please let us know when the update makes it through to new Lenovo firmware.

Comment 33 Emmanuel Vadot freebsd_committer

2019-04-30 17:26:09 UTC

(In reply to Greg V from comment #16)

This just hide the problem and in fact doesn't work.
The IO mapping work with PCI0 to PCI6 (acpi names) but the PCIR_IOBASEH in the PCI-PCI bridge under PCI7 contain 0x10000000. I'm not sure why or how it should map the the addresses in _CRS.

Comment 34 commit-hook freebsd_committer

2019-05-01 17:13:32 UTC

A commit references this bug:

Author: andrew
Date: Wed May  1 17:12:50 UTC 2019
New revision: 346996
URL: https://svnweb.freebsd.org/changeset/base/346996

Log:
  Restore x18 in efi_arch_leave.

  Some UEFI implementations trash this register and, as we use it as a
  platform register, the kernel doesn't save it before calling into the UEFI
  runtime services. As we have a copy in tpidr_el1 restore from there when
  exiting the EFI environment.

  PR:		237234, 237055
  Reviewed by:	manu
  Tested On:	Ampere eMAG
  MFC after:	2 weeks
  Sponsored by:	DARPA, AFRL
  Sponsored by:	Ampere Computing (hardware)
  Differential Revision:	https://reviews.freebsd.org/D20127

Changes:
  head/sys/arm64/arm64/efirt_machdep.c

Comment 35 Emmanuel Vadot freebsd_committer

2019-05-02 16:30:23 UTC

Just opened https://reviews.freebsd.org/D20144
This improve the performance of ahci.

Comment 36 Emmanuel Vadot freebsd_committer

2019-05-09 10:47:48 UTC

Follow up on the ACPI bug.
As Greg noted the problem in on the OperationRegion in the AHBC device.
When the acpica code is trying to read on the address (in the function AcpiExSystemMemorySpaceHandler in file sys/contrib/dev/acpica/components/executer/exregion.c) we get a fault.
The ESR value for this fault is 0x96000410 which mean that is this a "Synchronous External abort, not on translation table walk" according to the armv8 arm. The FnV bit is set so the far register is not valid and SET is equal to 0 so it is a recoverable error.
Andrew Turner (andrew@) thinks it might be a RAS exception which FreeBSD doesn't support for now.
For now I have a crappy patch that just return in the AcpiExSystemMemorySpaceHandler function if the address is 0x1f10c004 or 0x1f10c000 so I can boot the system with the latest BIOS and the full acpi table and not a modified one.

Comment 37 Tuan Phan 2019-05-09 22:01:46 UTC

(In reply to Emmanuel Vadot from comment #36)

About this issue, I am wondering why access 0x1f10c004 or 0x1f10c000 causing exception? Other OS work fine in this case.

Does the access happen before enabling virtual address? need memory mapping?

Somehow, need to fix this issue, otherwise any ACPI nodes that access memory in _INI will have problem.

Comment 38 Emmanuel Vadot freebsd_committer

2019-05-10 16:15:16 UTC

(In reply to Tuan Phan from comment #37)

The ACPICA code will call AcpiOsMapMemory before accessing the region which in turns calls pmap_mapbios.
If there would be something wrong in the mapping I don't think that I will get a data abort exception with a non valid address.

Comment 39 Tuan Phan 2019-05-13 21:03:47 UTC

(In reply to Tuan Phan from comment #29)

> Can you test the updated USB patch in https://reviews.freebsd.org/D19986? I applied it to my tree but was unsuccessful - As with GregV's report in PR237055 dsdt has for USB:


Tested the patch. Can detect USB mass storage and keyboard. The patch is good.

Comment 40 Tuan Phan 2019-05-14 20:06:21 UTC

(In reply to Emmanuel Vadot from comment #38)

Did some debug, it was data abort exception. The address 0x1f10c004 was mapped but with normal memory cacheable attribute. It should be mapped with device memory attribute.

UEFI always export it as device memory.

Comment 41 Emmanuel Vadot freebsd_committer

2019-05-22 05:09:48 UTC

Just opened three new reviews that address the ACPI bugs :
https://reviews.freebsd.org/D20347
https://reviews.freebsd.org/D20348
https://reviews.freebsd.org/D20349

Comment 42 Michael Tuexen freebsd_committer

2019-05-24 11:42:31 UTC

(In reply to Emmanuel Vadot from comment #41)
I recently got a Lenovo HR 350A system for my lab and want to run FreeBSD on it.

Do I only need D2034[789] on top of FreeBSD head or do I need additional patches and or specific version of the firmware?

Comment 43 Ed Maste freebsd_committer

2019-05-24 13:25:01 UTC

(In reply to Michael Tuexen from comment #42)
My WIP tree is functional on eMAG with those three commits included; they should be sufficient. (I have a lot of other changes but they are largely userland, and some unrelated kernel changes.)

Firmware info from early boot (the same eMAG that manu@ is using for development):

SMpro FW version: 1.04
PMpro FW version: 1.04
FW date: 20190228

    EFI version: 2.60
    EFI Firmware: American Megatrends (rev 5.13)

Comment 44 commit-hook freebsd_committer

2019-05-24 13:40:59 UTC

A commit references this bug:

Author: emaste
Date: Fri May 24 13:39:57 UTC 2019
New revision: 348237
URL: https://svnweb.freebsd.org/changeset/base/348237

Log:
  MFC r346598: Enable Mellanox drivers (modules) on AArch64

  PR:		237055
  Submitted by:	Greg V <greg@unrelenting.technology>

Changes:
_U  stable/12/
  stable/12/sys/modules/Makefile

Comment 45 Michael Tuexen freebsd_committer

2019-05-24 16:40:35 UTC

(In reply to Ed Maste from comment #43)
Thanks for the information. Will try to test this on my machine next week...

Comment 46 Michael Tuexen freebsd_committer

2019-05-27 12:44:17 UTC

(In reply to Ed Maste from comment #43)
Hi Ed,

I built a FreeBSD install image based on FreeBSD head with applying D2034[789].

I can confirm that the system boots fine with such a kernel.

When running the installer to install the OS on a new SSD, the installer
finishes the archive extraction step and writes on the screen:

Formatting /dev/ada0p1 as FAT32
Mounting ESP /dev/ada0p1
Installing loader.efi onto ESP
Creating UEFI boot entry

Then the system stalls...

Any idea what is going wrong or what am I doing wrong?

Comment 47 Emmanuel Vadot freebsd_committer

2019-05-27 13:14:40 UTC

Yes, there is a problem with the runtime efi SetVar in the firmware, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=237808
I haven't tested the new firmware yet.
If you don't want to try it you could to something like https://github.com/evadot/freebsd/commit/cbf0449d2d6193e209c611dc87eed8f2bfdedd7a

Comment 48 Michael Tuexen freebsd_committer

2019-05-27 15:11:37 UTC

(In reply to Emmanuel Vadot from comment #47)
Thanks, that helps in letting the installer finish. I used your patch, not the updated firmware.

Unfortunately, the kernel from disk panics on load. Likely a problem due to my way of building the image. Restarted from scratch to build the image. I'll report...

Comment 49 Emmanuel Vadot freebsd_committer

2019-05-27 15:28:06 UTC

(In reply to Michael Tuexen from comment #48)

You could try :
https://people.freebsd.org/~manu/FreeBSD-13.0-CURRENT-arm64-aarch64-GENERIC-NODEBUG-r347932.img.xz

It's a week old or something like that and it's using NODEBUG but ...
Otherwise building the image is just : export TARGET_ARCH=aarch64; export TARGET=arm64 ; make buildworld/buildkernel; cd release sudo -E make memstick

You need both target and target_arch for image building (I don't remember why right now ...)

Comment 50 Michael Tuexen freebsd_committer

2019-05-27 16:10:20 UTC

(In reply to Emmanuel Vadot from comment #49)
I gave it a try. It runs the installer without problems, the installed system boots and computes the ssh server keys and locks up...

Comment 51 Emmanuel Vadot freebsd_committer

2019-05-27 16:18:56 UTC

(In reply to Michael Tuexen from comment #50)

Where exactly ?
I have some problem with sendmail being stuck in nanoslp (same problem on Thunderx2 it seems) but I can ctrl+c (that is until I look at what is the problem exactly).

Comment 52 Michael Tuexen freebsd_committer

2019-05-27 17:20:37 UTC

(In reply to Emmanuel Vadot from comment #51)
After reporting that it generated the third key. I could not CTRL-C...
When the build with a debug kernel has finished, I'll try that. Possibly it
provides information or even a panic.

Comment 53 Michael Tuexen freebsd_committer

2019-05-28 08:25:11 UTC

(In reply to Michael Tuexen from comment #52)
OK, I did a build with FreeBSD head of yesterday, applied
* https://reviews.freebsd.org/D20347
* https://reviews.freebsd.org/D20348
* https://reviews.freebsd.org/D20349
* https://github.com/evadot/freebsd/commit/cbf0449d2d6193e209c611dc87eed8f2bfdedd7a

This resulted in a working system. I checked out the sources and rebuild a GENERIC-NODEBUG kernel and it also runs.

However, I had one (temporary) problem during booting.
The messages on the screen where:
...
Loading configured modules...
/boot/entropy size=0x1000
No valid device tree blob found!
WARNING! Trying to fire up the kernel, but no device blob tree found!
EFI framebuffer information:
addr, size     0x430000000, 0x30000
dimensions     1024 x 768
stride         1024
masks          0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000
_

Then the system was hanging. A reboot resolved the issue.

Comment 54 Michael Tuexen freebsd_committer

2019-05-28 11:33:34 UTC

(In reply to Michael Tuexen from comment #53)
Some more testing. The system is capable in doing buildworld, but it locks up a lot when booting. You can't CTRL-C it.

Is there any information I could provide which would help to nail the problem down?

Comment 55 Ed Maste freebsd_committer

2019-05-28 13:12:20 UTC

(In reply to Michael Tuexen from comment #54)
To be clear, you mean that it frequently locks up during boot, but once booted it runs correctly?

Comment 56 Michael Tuexen freebsd_committer

2019-05-28 13:52:20 UTC

(In reply to Ed Maste from comment #55)
More testing, better description:

I meant: several times it booted to the login prompt but it didn't accept input on the keyboard or over the network (ssh access)

Now I have observed that sometimes it accepts input on the console, but the
network (an igb card) wasn't brought up. When looking at the boot messages I do see (trans-scribed):
...
pci14 <PCI bus> on pcib14
pcib15 <PCI-PCI bridge> at device 0.0 on pci14
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=000000010000fff, count=0000000000001000, flags=0
pcib15: failed to allocate initial I/O port window:0x10000000-0x10000fff
pci15: <PCI bus> on pcib15
pcib16: <PCI-PCI bridge> at device 0.0 on pci15
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=start=0000000010000000, end=000000010000fff, count=0000000000001000, flags=3000
pcib16: failed to allocate initial I/O port window:0x10000000-0x10000fff
pci16: <PCI bus> on pcib16
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=start=0000000010000000, end=000000010000fff, count=0000000000001000, flags=3000
...
acpi0: Could not update all GPEs: AE_NOT_CONFIGURED

I have observed similar instabilities on an overdrive 3000 system when these kind of PCU error occurred. On the Overdrive 3000 I'm working around this by using an ethernet card which doesn't show these PCI errors (a bge card instead of igb or ix).

The ampere system has an igb card (in use) and an Mellanox card (not in use). Should I try to replace them?

Comment 57 Michael Tuexen freebsd_committer

2019-05-28 14:23:02 UTC

OK, I identified one problem: 
When setting the time/date via
sudo date 1432
on the command line, the system locks up after a couple of seconds.

This might be related to the lock up after booting problems I have seen, since I added
ntpdate="YES"
to my /etc/rc.conf

Without this entry, the system boots fine.

Can you reproduce this?

Comment 58 Emmanuel Vadot freebsd_committer

2019-05-28 14:28:57 UTC

(In reply to Michael Tuexen from comment #57)
I can yes, I'll add this to my stuff to resolv list :)

Comment 59 Michael Tuexen freebsd_committer

2019-05-28 14:31:19 UTC

(In reply to Emmanuel Vadot from comment #58)
Great. Thanks a lot!

Comment 60 Tuan Phan 2019-05-28 17:26:09 UTC

(In reply to Michael Tuexen from comment #57)

It hang because the same issue with SetVariable. I think you should use the latest FW which mentioned on the SetVariable issue.

When you set RTC, it also use SetVariable to save timezone info.

Comment 61 Tuan Phan 2019-05-28 17:27:09 UTC

(In reply to Michael Tuexen from comment #53)

I suggest you use the latest FW and try again.

Comment 62 Tuan Phan 2019-05-28 17:27:23 UTC

(In reply to Michael Tuexen from comment #53)

I suggest you use the latest FW and try again.

Comment 63 Michael Tuexen freebsd_committer

2019-05-28 17:34:26 UTC

(In reply to Tuan Phan from comment #62)
OK. Will try tomorrow and report.

Comment 64 Val Packett 2019-05-28 18:53:42 UTC

(In reply to Michael Tuexen from comment #56)

The errors on pci14-16 are not from your igb card and should not affect your card, which is probably on a far lower-numbered bus/bridge/thingy.

The Mellanox CX4 cards on the Packet instances are on pci1: https://dmesgd.nycbug.org/index.cgi?do=view&id=4864 and they work perfectly fine (in a LACP aggregation, even). The same errors are showing up on pci12-14 there. (2 less buses there — HR350A vs HR330A?)

Comment 65 Val Packett 2019-05-28 22:28:31 UTC

By the way, a few questions for Tuan and/or John:

- is there no hardware random number generator on eMAG? I see there was on X-Gene: https://github.com/torvalds/linux/blob/master/drivers/char/hw_random/xgene-rng.c but APMC0D18 is nowhere to be found in the DSDT I got from the Packet instance..
- does the CPU boost to the 3.3GHz speed without the OS doing anything?
- is there public documentation for the monitoring (temperature, frequency)/PMU etc. devices, other than the GPL'ed Linux driver code?
- why is the primary part number in MIDR zero?

---

also, I just realized that we're not building ipmi_acpi on aarch64, and it does build..

Comment 66 Tuan Phan 2019-05-28 22:38:29 UTC

(In reply to Greg V from comment #65)

Greg,
I can answer some questions:

1. why is the primary part number in MIDR zero?
=> We fixed a bug that the MIDR was put to the second DWORD if you are parsing from smbios type 4?

2. I don't think we have RNG in eMag. Not sure, let John confirm with designer.
3. I believe the CPU can boost to the 3.3Ghz without media needed from OS. Not sure, let John confirm with the power management maintainer.
4. John can help you with documents if it is available or provide support from designer.

Comment 67 Michael Tuexen freebsd_committer

2019-05-29 10:14:33 UTC

(In reply to Michael Tuexen from comment #63)
I can confirm that updating the Firmware to the version provided in bug #237808 resolves the issue with setting the time (via /etc/rc.conf or manually).

Comment 68 Tuan Phan 2019-05-30 21:49:44 UTC

(In reply to Michael Tuexen from comment #53)

> /boot/entropy size=0x1000
> No valid device tree blob found!
> WARNING! Trying to fire up the kernel, but no device blob tree found!
> EFI framebuffer information:
> addr, size     0x430000000, 0x30000
> dimensions     1024 x 768
> stride         1024
> masks          0x00ff0000, 0x0000ff00, 0x000000ff, 0xff000000
> _
> Then the system was hanging. A reboot resolved the issue.

Did you see this issue with the new test FW?

Comment 69 Michael Tuexen freebsd_committer

2019-05-30 22:30:22 UTC

(In reply to Tuan Phan from comment #68)
No, I haven't. Using the new Firmware, the system runs fine (using the igb und mce interfaces).

It only reports:

pci13: <PCI bus> on pcib13
pcib14: <Generic PCI host controller> on acpi0
pci14: <PCI bus> on pcib14
pcib15: <PCI-PCI bridge> at device 0.0 on pci14
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=0
pcib15: failed to allocate initial I/O port window: 0x10000000-0x10000fff
pci15: <PCI bus> on pcib15
pcib16: <PCI-PCI bridge> at device 0.0 on pci15
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=3000
pcib16: failed to allocate initial I/O port window: 0x10000000-0x10000fff
pci16: <PCI bus> on pcib16
pcib14: pci_host_generic_core_alloc_resource FAIL: type=4, rid=28, start=0000000010000000, end=0000000010000fff, count=0000000000001000, flags=3000
vgapci0: <VGA-compatible display> port 0x1000-0x107f mem 0x30000000-0x30ffffff,0x31040000-0x3105ffff at device 0.0 on pci16
cpu0: <ACPI CPU> on acpi0
uart0: <PrimeCell UART (PL011)> iomem 0x12600000-0x12600fff irq 1 on acpi0
uart0: console (115200,n,8,1)
uart1: <PrimeCell UART (PL011)> iomem 0x12610000-0x12610fff irq 2 on acpi0
acpi0: Could not update all GPEs: AE_NOT_CONFIGURED

during boot. But it doesn't seem to affect the system.

Comment 70 Michael Tuexen freebsd_committer

2019-06-04 06:58:17 UTC

I tried to enable console access via a serial line by putting

boot_multicons="YES"
boot_serial="YES"
console="comconsole,efi"
comconsole_speed="115200"

into /boot/loader.conf.

Is this supposed to work with FreeBSD head (r348543)? It never works on my system and sometimes the system locks up during boot. Without these entries in /boot/loader.conf I have not observed such lockups anymore.

I'm running the firmware from bug #237808.

From dmesg:

...
cpu0: <ACPI CPU> on acpi0
uart0: <PrimeCell UART (PL011)> iomem 0x12600000-0x12600fff irq 1 on acpi0
uart0: console (115200,n,8,1)
uart1: <PrimeCell UART (PL011)> iomem 0x12610000-0x12610fff irq 2 on acpi0
acpi0: Could not update all GPEs: AE_NOT_CONFIGURED
...

Comment 71 Val Packett 2019-07-21 17:48:20 UTC

hm, looks like it is possible to identify the eMAG CPU: https://github.com/NetBSD/src/commit/a1feb17c3b45b52319a61e4f9c172e373b055bc2 + https://github.com/NetBSD/src/commit/74b0f2158a5c1fee10344fc3d995780a353570a2

btw, if anyone is interested in trying more stuff on eMAG (and other aarch64 HW):

- AMD Radeon GPU driver https://github.com/FreeBSDDesktop/kms-drm/pull/154
- SBSA watchdog driver https://reviews.freebsd.org/D20974

Comment 72 Val Packett 2019-07-24 20:10:52 UTC

Well here's a funny story…

I've been experimenting with two aarch64 things: attaching the PMU (where P = Performance) via ACPI (to make pmcstat work) and building IPMI support.

On my Marvell MACCHIATObin, PMU attaches on boot, the interrupt fires but very rarely, so most of the time there's nothing in pmcstat, but occasionally a couple lines did appear. That platform actually has some weirdness (the PMU interrupts a custom Marvell interrupt controller, and in ACPI mode the firmware catches that and rethrows onto the GICv2, or something like that) so it might be a firmware bug.

So I've rented an Ampere eMAG instance from Packet again to try a different ACPI platform and uhh.

On boot, the PMU does not attach:

pmu0: rid 0 irq 23
pmu0: <Performance Monitoring Unit> irq 13 on acpi0
pmu0: could not allocate resources

But when I do `kldload ipmi` (!!!):

pmu0: <Performance Monitoring Unit> irq 13 on acpi0

and pmcstat does actually start working! Wait, what?! Oh. I guess it's just reprobing all the drivers on unattached devices, but it looked so bizzare at first :D

Evidently, I just put the PMU too early in the attachment order (BUS_PASS_INTERRUPT + BUS_PASS_ORDER_MIDDLE).

(and, ipmi does not attach because the i2c controller wasn't even attaching (https://reviews.freebsd.org/D21059), the i2c controller doesn't attach its children, and IPMI-over-i2c-described-by-ACPI is not supported anyway)

Comment 73 Val Packett 2019-07-24 20:35:04 UTC

(In reply to Greg V from comment #72)

What is actually weird is that attaching pmu correctly in the boot process results in a ridiculous interrupt rate slowing the system down :(

# vmstat -i
interrupt                                             total       rate
gic0,p7: pmu0                                    2397246676    2342967
gic0,p11:-ic_timer0                                26390337      25793
gic0,s66: uart0                                         502          0
gic0,s79: ahci0                                        2071          2

Comment 74 Michael Tuexen freebsd_committer

2019-08-18 09:37:06 UTC

(In reply to Greg V from comment #71)
A patch is in review D21314.

Comment 75 Michael Tuexen freebsd_committer

2019-08-26 16:14:49 UTC

(In reply to Michael Tuexen from comment #74)
Now committed in base r351511.

Comment 76 Val Packett 2019-09-03 23:08:31 UTC

OpenBSD now has the IPMI over i2c thing: https://github.com/openbsd/src/commit/19146c2bc8b614f59695c154d0d659dca1394404 we could port that eventually

Comment 77 richliu 2019-09-09 02:50:52 UTC

I have one eMag server (not in packet.net)

Freebsd 13 current can be installed on the machine. 

but I have one question about shutdown, it will cause kernel crash 

following is my shutdown command . 

$ uname -a
FreeBSD fbsd 13.0-CURRENT FreeBSD 13.0-CURRENT r351591 GENERIC  arm64
[richliu@fbsd ~]$ sudo shutdown -h now

here is crash screen shot 
https://imgur.com/a/nj29u5A

anyone have idea to avoid it ?

Comment 78 Michael Tuexen freebsd_committer

2019-09-09 10:19:40 UTC

(In reply to richliu from comment #77)
I also have a physical machine in my lab. I'm using FreeSBD head on it and can run shutdown -p now without any problems. I recently (two weeks ago or so) updated the Firmware. Which version are you running?

Comment 79 richliu 2019-09-09 11:19:04 UTC

(In reply to Michael Tuexen from comment #78)

My eMag machine model name called Raptor, latest software version is 1.00. 
All BMC/UEFI/Firmware updated to this version. 

May I know your machine model name and version?

Comment 80 Michael Tuexen freebsd_committer

2019-09-09 12:54:57 UTC

(In reply to richliu from comment #79)
I don't know Raptor (at least in the eMag context). My machine is a Lenovo HR250A (https://amperecomputing.com/wp-content/uploads/2019/04/Lenovo_ThinkSystem_HR350A_20190409.pdf) which runs the Firmware verion 1.10. You can find the dmsg at https://dmesgd.nycbug.org/index.cgi?do=view&id=5068

Comment 81 richliu 2019-09-10 04:21:27 UTC

(In reply to Michael Tuexen from comment #80)
It should be HR350A not HR250A

I think the problem is caused by I used wrong shutdown command . 
used usb disk to boot system, shutdown -p work on both HR350A and Raptor, appreciate your help.

Comment 82 Ed Maste freebsd_committer

2019-09-15 18:48:24 UTC

For reference I completed a full Poudriere bulk build on a Lenovo HR350A.

Kernel:
FreeBSD  13.0-CURRENT FreeBSD 13.0-CURRENT 1d40d15b053-c262556(master) GENERIC-NODEBUG  arm64
(This corresponds to r352103.)

Queued 	Built 	Failed 	Skipped Ignored Remaining
32947 	29075 	131 	2513 	1228 	0

Elapsed: 62:33:45

(There was a fairly long period of < 10 jobs finishing up at the end, with some tweaks I believe it can finish in under 60 hours.)

Three packages failed after building for more than 10 hours:

131	electron4-4.2.9	devel/electron4	build/timeout	0	runaway_process	24:22:24
119	qt5-webengine-5.12.2_3	www/qt5-webengine	build/timeout	62	runaway_process	24:06:09
61	llvm-devel-10.0.d20190821	devel/llvm-devel	package	2	???	21:34:09

Comment 83 commit-hook freebsd_committer

2019-09-16 12:51:54 UTC

A commit references this bug:

Author: emaste
Date: Mon Sep 16 12:51:29 UTC 2019
New revision: 352388
URL: https://svnweb.freebsd.org/changeset/base/352388

Log:
  MFC r346445: Enable ioremap for aarch64 in the LinuxKPI

  Required for Mellanox drivers (e.g. on Ampere eMAG at Packet.com).

  PR:		237055
  Submitted by:	Greg V <greg@unrelenting.technology>

Changes:
_U  stable/12/
  stable/12/sys/compat/linuxkpi/common/include/linux/io.h
  stable/12/sys/compat/linuxkpi/common/src/linux_compat.c

Comment 84 Ed Maste freebsd_committer

2019-09-16 13:15:48 UTC

Changes to MFC to stable/12:

r346996 (andrew)
r347343 (manu)

Also, commits to MFC for ThunderX2:

r340595 (jchandra)
r343876 (andrew)

Comment 85 commit-hook freebsd_committer

2019-09-16 13:46:05 UTC

A commit references this bug:

Author: andrew
Date: Mon Sep 16 13:45:32 UTC 2019
New revision: 352395
URL: https://svnweb.freebsd.org/changeset/base/352395

Log:
  MFC r346996:

  Restore x18 in efi_arch_leave.

  Some UEFI implementations trash this register and, as we use it as a
  platform register, the kernel doesn't save it before calling into the UEFI
  runtime services. As we have a copy in tpidr_el1 restore from there when
  exiting the EFI environment.

  PR:		237234, 237055
  Reviewed by:	manu
  Tested On:	Ampere eMAG
  Sponsored by:	DARPA, AFRL
  Sponsored by:	Ampere Computing (hardware)
  Differential Revision:	https://reviews.freebsd.org/D20127

Changes:
_U  stable/12/
  stable/12/sys/arm64/arm64/efirt_machdep.c

Comment 86 Ed Maste freebsd_committer

2019-10-02 20:14:18 UTC

I believe merging the following revisions to 12.1 is necessary (but not sufficient) to boot on eMAG:

r339754 Distinguish _CID match and _HID match and make lower priority probe
r343860 pci_host_generic_acpi: use IORT data for MSI/MSI-X
r347343 Add support for USB 3.0 XHCI via ACPI
r347929 pci: ecam: Do not warn on mismatch of bus_end
r347930 pci: ecam: Correctly parse memory and IO region

For me releng/12.1 + these commits hangs after:

NFS ROOT: 10.0.0.1/tank/export-root/arm64
igb0: link state changed to UP

Comment 87 Ed Maste freebsd_committer

2019-10-02 20:30:02 UTC

(In reply to Ed Maste from comment #86)

Presumably also:

r343853 arm64 acpi: Add support for IORT table
r343860 pci_host_generic_acpi: use IORT data for MSI/MSI-X

Comment 88 Michael Tuexen freebsd_committer

2020-10-07 21:07:15 UTC

Is this PR still active or should it be closed. I'm running an Ampere eMAG system using head and it is pretty stable....

Comment 89 Ed Maste freebsd_committer

2020-10-07 21:15:07 UTC

(In reply to Michael Tuexen from comment #88)
I had hoped to MFC everything necessary for eMAG to work on 12.2, but wasn't able to get it done in time. We could keep this PR open for tracking, if we want to merge before 12.3. Otherwise IMO it can be closed.

Comment 90 Michael Tuexen freebsd_committer

2020-10-07 21:35:09 UTC

(In reply to Ed Maste from comment #89)
I guess 13.0 will be released before 12.3. I can live with Ampere systems being supported by 13.0...

Comment 91 Philip Paeps freebsd_committer

2020-11-29 07:56:49 UTC

I've just installed two of these machines in the FreeBSD cluster.

They complain about this repeatedly:

```
uhub_reattach_port: device problem (USB_ERR_TIMEOUT), disabling port 1
uhub_reattach_port: port 1 reset failed, error=USB_ERR_TIMEOUT
```

Haven't fiddled with the configuration yet.

Comment 92 Michael Tuexen freebsd_committer

2020-11-29 12:13:19 UTC

(In reply to Philip Paeps from comment #91)
I don't see this on the machine in my lab:

---<<BOOT>>---
KDB: debugger backends: ddb
KDB: current backend: ddb
Copyright (c) 1992-2020 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 13.0-CURRENT #37 r368141: Sun Nov 29 12:45:12 CET 2020
    root@bsd6.fh-muenster.de:/usr/obj/usr/home/tuexen/head/arm64.aarch64/sys/TCP-NODEBUG arm64
FreeBSD clang version 11.0.0 (git@github.com:llvm/llvm-project.git llvmorg-11.0.0-0-g176249bd673)
VT(efifb): resolution 800x600
module firmware already present!
real memory  = 137168117760 (130813 MB)
avail memory = 133693915136 (127500 MB)
Starting CPU 1 (1)
Starting CPU 2 (100)
Starting CPU 3 (101)
Starting CPU 4 (200)
Starting CPU 5 (201)
Starting CPU 6 (300)
Starting CPU 7 (301)
Starting CPU 8 (400)
Starting CPU 9 (401)
Starting CPU 10 (500)
Starting CPU 11 (501)
Starting CPU 12 (600)
Starting CPU 13 (601)
Starting CPU 14 (700)
Starting CPU 15 (701)
Starting CPU 16 (800)
Starting CPU 17 (801)
Starting CPU 18 (900)
Starting CPU 19 (901)
Starting CPU 20 (a00)
Starting CPU 21 (a01)
Starting CPU 22 (b00)
Starting CPU 23 (b01)
Starting CPU 24 (c00)
Starting CPU 25 (c01)
Starting CPU 26 (d00)
Starting CPU 27 (d01)
Starting CPU 28 (e00)
Starting CPU 29 (e01)
Starting CPU 30 (f00)
Starting CPU 31 (f01)
FreeBSD/SMP: Multiprocessor System Detected: 32 CPUs
random: unblocking device.
random: entropy device external interface
MAP 92000000 mode 2 pages 2304
MAP fffc0000 mode 2 pages 64
MAP 9ff32b0000 mode 2 pages 48
MAP 9ff80f0000 mode 2 pages 16
MAP 9ff8830000 mode 2 pages 1232
MAP 9ffa540000 mode 2 pages 16
MAP 9ffcac0000 mode 2 pages 80
MAP 9ffcb10000 mode 2 pages 128
MAP 9ffcb90000 mode 2 pages 16
MAP 9ffcba0000 mode 2 pages 32
MAP 9ffcbc0000 mode 2 pages 16
MAP 9ffcbd0000 mode 2 pages 32
MAP 9ffcbf0000 mode 2 pages 16
MAP 9ffcc00000 mode 2 pages 32
MAP 9ffcc20000 mode 2 pages 16
MAP 9ffcc30000 mode 2 pages 48
MAP 9ffcc60000 mode 2 pages 16
MAP 9ffcc70000 mode 2 pages 16
MAP 9ffcc80000 mode 2 pages 32
MAP 9ffcca0000 mode 2 pages 16
MAP 9ffccb0000 mode 2 pages 16
MAP 9ffccc0000 mode 2 pages 16
MAP 9ffccd0000 mode 2 pages 1232
MAP 9ffd1a0000 mode 2 pages 48
MAP 9ffd1d0000 mode 2 pages 4112
MAP 9fffd80000 mode 2 pages 32
MAP 9fffda0000 mode 2 pages 48
MAP 10540000 mode 0 pages 16
WARNING: Device "kbd" is Giant locked and may be deleted before FreeBSD 13.0.
kbd0 at kbdmux0
WARNING: Device "openfirm" is Giant locked and may be deleted before FreeBSD 13.0.
acpi0: <ALASKA A M I >
acpi0: Power Button (fixed)
acpi0: Sleep Button (fixed)
acpi0: Could not update all GPEs: AE_NOT_CONFIGURED
psci0: <ARM Power State Co-ordination Interface Driver> on acpi0
gic0: <ARM Generic Interrupt Controller v3.0> iomem 0x78000000-0x7801ffff,0x78400000-0x787fffff on acpi0
its0: <ARM GIC Interrupt Translation Service> on gic0
generic_timer0: <ARM Generic Timer> irq 11,12,13 on acpi0
Timecounter "ARM MPCore Timecounter" frequency 40000000 Hz quality 1000
Event timer "ARM MPCore Eventtimer" frequency 40000000 Hz quality 1000
efirtc0: <EFI Realtime Clock>
efirtc0: registered as a time-of-day clock, resolution 1.000000s
ahci0: <AHCI SATA controller> iomem 0x1c000000-0x1c000fff irq 3 on acpi0
ahci0: AHCI v1.31 with 2 6Gbps ports, Port Multiplier not supported with FBS
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahci1: <AHCI SATA controller> iomem 0x1c100000-0x1c100fff irq 4 on acpi0
ahci1: AHCI v1.31 with 2 6Gbps ports, Port Multiplier not supported with FBS
ahcich2: <AHCI channel> at channel 0 on ahci1
ahcich3: <AHCI channel> at channel 1 on ahci1
xhci0: <Generic USB 3.0 controller> iomem 0x13800000-0x138fffff irq 5 on acpi0
xhci0: 64 bytes context size, 32-bit DMA
usbus0 on xhci0
xhci1: <Generic USB 3.0 controller> iomem 0x13900000-0x139fffff irq 6 on acpi0
xhci1: 64 bytes context size, 32-bit DMA
usbus1 on xhci1
acpi_button0: <Power Button> on acpi0
apei0: <ACPI Platform Error Interface> on acpi0
pcib0: <Generic PCI host controller> on acpi0
pci0: <PCI bus> on pcib0
pcib1: <PCI-PCI bridge> at device 0.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <network, ethernet> at device 0.0 (no driver attached)
pci1: <network, ethernet> at device 0.1 (no driver attached)
pcib2: <Generic PCI host controller> on acpi0
pci2: <PCI bus> on pcib2
pcib3: <PCI-PCI bridge> at device 0.0 on pci2
pci3: <PCI bus> on pcib3
pcib4: <Generic PCI host controller> on acpi0
pci4: <PCI bus> on pcib4
pcib5: <PCI-PCI bridge> at device 0.0 on pci4
pci5: <PCI bus> on pcib5
igb0: <Intel(R) PRO/1000 PCI-Express Network Driver> mem 0x30100000-0x301fffff,0x30200000-0x30203fff at device 0.0 on pci5
igb0: Using 1024 TX descriptors and 1024 RX descriptors
igb0: Using 4 RX queues 4 TX queues
igb0: Using MSI-X interrupts with 5 vectors
igb0: Ethernet address: 68:05:ca:92:c5:41
pcib6: <Generic PCI host controller> on acpi0
pci6: <PCI bus> on pcib6
pcib7: <PCI-PCI bridge> at device 0.0 on pci6
pci7: <PCI bus> on pcib7
pcib8: <Generic PCI host controller> on acpi0
pci8: <PCI bus> on pcib8
pcib9: <PCI-PCI bridge> at device 0.0 on pci8
pci9: <PCI bus> on pcib9
pcib10: <Generic PCI host controller> on acpi0
pci10: <PCI bus> on pcib10
pcib11: <PCI-PCI bridge> at device 0.0 on pci10
pci11: <PCI bus> on pcib11
pcib12: <Generic PCI host controller> on acpi0
pci12: <PCI bus> on pcib12
pcib13: <PCI-PCI bridge> at device 0.0 on pci12
pci13: <PCI bus> on pcib13
pcib14: <Generic PCI host controller> on acpi0
pci14: <PCI bus> on pcib14
pcib15: <PCI-PCI bridge> at device 0.0 on pci14
pcib14: Failed to translate resource 10000000-10000fff type 4 for pcib15
pcib15: failed to allocate initial I/O port window: 0x10000000-0x10000fff
pci15: <PCI bus> on pcib15
pcib16: <PCI-PCI bridge> at device 0.0 on pci15
pcib14: Failed to translate resource 10000000-10000fff type 4 for pcib15
pcib16: failed to allocate initial I/O port window: 0x10000000-0x10000fff
pci16: <PCI bus> on pcib16
pcib14: Failed to translate resource 10000000-10000fff type 4 for pcib15
vgapci0: <VGA-compatible display> port 0-0x7f mem 0x30000000-0x30ffffff,0x31040000-0x3105ffff at device 0.0 on pci16
cpu0: <ACPI CPU> on acpi0
uart0: <PrimeCell UART (PL011)> iomem 0x12600000-0x12600fff irq 1 on acpi0
uart0: console (115200,n,8,1)
uart1: <PrimeCell UART (PL011)> iomem 0x12610000-0x12610fff irq 2 on acpi0
cryptosoft0: <software crypto>
Timecounters tick every 1.000 msec
Attempting to load tcp_bbr
usbus0: 5.0Gbps Super Speed USB v3.0
usbus1: 5.0Gbps Super Speed USB v3.0
tcp_bbr is now available
ipfw2 (+ipv6) initialized, divert loadable, nat loadable, default to accept, logging disabled
TCP Hpts created 32 swi interrupt threads and bound 0 to cpus
Release APs...done
CPU  0: APM eMAG 8180 r3p2 affinity:  0  0
                   Cache Type = <64 byte D-cacheline,64 byte I-cacheline,PIPT ICache,64 byte ERG,64 byte CWG>
 Instruction Set Attributes 0 = <CRC32,SHA2,SHA1,AES+PMULL>
 Instruction Set Attributes 1 = <>
         Processor Features 0 = <GIC,AdvSIMD,FP,EL3,EL2,EL1 32,EL0 32>
         Processor Features 1 = <>
      Memory Model Features 0 = <TGran4,TGran64,TGran16,SNSMem,BigEnd,16bit ASID,4TB PA>
      Memory Model Features 1 = <8bit VMID>
      Memory Model Features 2 = <32bit CCIDX,48bit VA>
             Debug Features 0 = <2 CTX BKPTs,4 Watchpoints,6 Breakpoints,PMUv3,Debugv8>
             Debug Features 1 = <>
         Auxiliary Features 0 = <>
         Auxiliary Features 1 = <>
CPU  1: APM eMAG 8180 r3p2 affinity:  0  1
CPU  2: APM eMAG 8180 r3p2 affinity:  1  0
CPU  3: APM eMAG 8180 r3p2 affinity:  1  1
CPU  4: APM eMAG 8180 r3p2 affinity:  2  0
CPU  5: APM eMAG 8180 r3p2 affinity:  2  1
CPU  6: APM eMAG 8180 r3p2 affinity:  3  0
CPU  7: APM eMAG 8180 r3p2 affinity:  3  1
CPU  8: APM eMAG 8180 r3p2 affinity:  4  0
CPU  9: APM eMAG 8180 r3p2 affinity:  4  1
CPU 10: APM eMAG 8180 r3p2 affinity:  5  0
CPU 11: APM eMAG 8180 r3p2 affinity:  5  1
CPU 12: APM eMAG 8180 r3p2 affinity:  6  0
CPU 13: APM eMAG 8180 r3p2 affinity:  6  1
CPU 14: APM eMAG 8180 r3p2 affinity:  7  0
CPU 15: APM eMAG 8180 r3p2 affinity:  7  1
CPU 16: APM eMAG 8180 r3p2 affinity:  8  0
CPU 17: APM eMAG 8180 r3p2 affinity:  8  1
CPU 18: APM eMAG 8180 r3p2 affinity:  9  0
CPU 19: APM eMAG 8180 r3p2 affinity:  9  1
CPU 20: APM eMAG 8180 r3p2 affinity: 10  0
CPU 21: APM eMAG 8180 r3p2 affinity: 10  1
CPU 22: APM eMAG 8180 r3p2 affinity: 11  0
CPU 23: APM eMAG 8180 r3p2 affinity: 11  1
CPU 24: APM eMAG 8180 r3p2 affinity: 12  0
CPU 25: APM eMAG 8180 r3p2 affinity: 12  1
CPU 26: APM eMAG 8180 r3p2 affinity: 13  0
CPU 27: APM eMAG 8180 r3p2 affinity: 13  1
CPU 28: APM eMAG 8180 r3p2 affinity: 14  0
CPU 29: APM eMAG 8180 r3p2 affinity: 14  1
CPU 30: APM eMAG 8180 r3p2 affinity: 15  0
CPU 31: APM eMAG 8180 r3p2 affinity: 15  1
TCP_ratelimit: Is now initialized
Trying to mount root from ufs:/dev/ada0p3 [rw]...
ugen0.1: <Generic XHCI root HUB> at usbus0
ugen1.1: <Generic XHCI root HUB> at usbus1
Root mount waiting for:uhub0 CAM usbus0 usbus1 on usbus0
uhub0: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0

uhub1 on usbus1
uhub1: <Generic XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <Samsung SSD 860 EVO 500GB RVT02B6Q> ACS-4 ATA SATA 3.x device
ada0: Serial Number S3Z2NB0M352023L
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 512bytes)
ada0: Command Queueing enabled
ada0: 476940MB (976773168 512 byte sectors)
uhub0: 1 port with 1 removable, self powered
uhub1: 1 port with 1 removable, self powered
ugen1.2: <vendor 0x04b4 product 0x6560> at usbus1
uhub2 on uhub1
uhub2: <vendor 0x04b4 product 0x6560, class 9/0, rev 2.00/90.15, addr 1> on usbus1
ugen0.2: <American Megatrends Inc. Virtual Hub> at usbus0
uhub3 on uhub0
uhub3: <7-port Hub> on usbus0
Root mount waiting for: usbus0 usbus1
uhub2: 4 ports with 4 removable, self powered
uhub3: 5 ports with 5 removable, self powered
Root mount waiting for: usbus0
ugen0.3: <American Megatrends Inc. Virtual Cdrom Device> at usbus0
umass0 on uhub3
umass0: <Virtual Cdrom> on usbus0
cd0 at umass-sim0 bus 0 scbus4 target 0 lun 0
cd0: <AMI Virtual CDROM0 1.00> Removable CD-ROM SCSI device
cd0: Serial Number AAAABBBBCCCC1
cd0: 40.000MB/s transfers
cd0: Attempt to query device size failed: NOT READY, Medium not present
cd0: quirks=0x10<10_BYTE_ONLY>
Root mount waiting for: usbus0
ugen0.4: <American Megatrends Inc. Virtual HardDisk Device> at usbus0
umass1 on uhub3
umass1: <Virtual HardDisk> on usbus0
da0 at umass-sim1 bus 1 scbus5 target 0 lun 0
da0: <AMI Virtual HDisk0 1.00> Removable Direct Access SCSI device
da0: Serial Number AAAABBBBCCCC3
da0: 40.000MB/s transfers
da0: Attempt to query device size failed: NOT READY, Medium not present
da0: quirks=0x2<NO_6_BYTE>
ugen0.5: <American Megatrends Inc. Virtual Keyboard and Mouse> at usbus0
ukbd0 on uhub3
ukbd0: <Keyboard Interface> on usbus0
kbd1 at ukbd0
mountroot: waiting for device /dev/ada0p3...
Dual Console: Video Primary, Serial Secondary
lo0: link state changed to UP
ums0 on uhub3
ums0: <Mouse Interface> on usbus0
ums0: 3 buttons and [Z] coordinates ID=0
igb0: link state changed to UP

Comment 93 Philip Paeps freebsd_committer

2020-11-29 14:55:59 UTC

It looks like they've stopped doing this now.  Sorry for the noise.  Our CLUSTER13 configuration was lagging quite a bit behind GENERIC.  I'll keep an eye on it.  If it happens again ... I'll get some more useful debugging data out.

Comment 94 Dave Cottlehuber freebsd_committer

2021-07-02 08:03:00 UTC

13.0-RELEASE runs fine on these boxes without issue, certainly on recent firmware.

I think there are a few remaining drivers/patches lurking out there but we could track that on the wiki, I added a page https://wiki.freebsd.org/arm/Ampere just now.