Reproduction steps: ------------------- 1. Create an AWS EC2 instance from one of the following AMIs in us-east-1 1.1: ami-0b55af91f40cd29ee - FreeBSD 14.0-CURRENT-arm64-20230525 UEFI 1.2: ami-0fdc715f878897386 - FreeBSD 13.2-STABLE-arm64-20230601 UEFI 1.3: ami-0e1fd0c2493efe1d1 - FreeBSD 12.4-STABLE-arm64-2023-06-01 2. run the following reset loop script: #!/bin/sh while true do devctl disable ena0 devctl enable ena0 done Result: ------- Crashes every time. 100% reproducible. ***The same test does not fail on intel based instances.*** Stack trace: ------------ 2023-06-14T08:05:02.374Z panic: IRQ mapping table is full. 2023-06-14T08:05:02.374Z cpuid = 18 2023-06-14T08:05:02.374Z time = 1686729902 2023-06-14T08:05:02.374Z KDB: stack backtrace: 2023-06-14T08:05:02.374Z db_trace_self() at db_trace_self 2023-06-14T08:05:02.374Z db_trace_self_wrapper() at db_trace_self_wrapper+0x30 2023-06-14T08:05:02.374Z vpanic() at vpanic+0x13c 2023-06-14T08:05:02.374Z panic() at panic+0x44 2023-06-14T08:05:02.374Z intr_map_irq() at intr_map_irq+0xb0 2023-06-14T08:05:02.374Z intr_alloc_msix() at intr_alloc_msix+0x1d8 2023-06-14T08:05:02.374Z generic_pcie_acpi_alloc_msix() at generic_pcie_acpi_alloc_msix+0x78 2023-06-14T08:05:02.374Z pci_alloc_msix_method() at pci_alloc_msix_method+0x168 2023-06-14T08:05:02.374Z ena_enable_msix_and_set_admin_interrupts() at ena_enable_msix_and_set_admin_interrupts+0x10c 2023-06-14T08:05:02.374Z ena_attach() at ena_attach+0x65c 2023-06-14T08:05:02.375Z device_attach() at device_attach+0x3f8 2023-06-14T08:05:02.375Z device_probe_and_attach() at device_probe_and_attach+0x7c 2023-06-14T08:05:02.375Z devctl2_ioctl() at devctl2_ioctl+0x44c 2023-06-14T08:05:02.375Z devfs_ioctl() at devfs_ioctl+0xd4 2023-06-14T08:05:02.375Z vn_ioctl() at vn_ioctl+0xc0 2023-06-14T08:05:02.375Z devfs_ioctl_f() at devfs_ioctl_f+0x20 2023-06-14T08:05:02.375Z kern_ioctl() at kern_ioctl+0x2dc 2023-06-14T08:05:02.375Z sys_ioctl() at sys_ioctl+0x118 2023-06-14T08:05:02.375Z do_el0_sync() at do_el0_sync+0x520 2023-06-14T08:05:02.375Z handle_el0_sync() at handle_el0_sync+0x44 2023-06-14T08:05:02.375Z --- exception, esr 0x56000000 2023-06-14T08:05:02.375Z Uptime: 4m1s 2023-06-14T08:05:02.375Z Dumping 2053 out of 64453 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% 2023-06-14T08:10:36.676Z Dump complete 2023-06-14T08:10:37.976Z UEFI firmware (version built at 09:00:00 on Nov 1 2018) 2023-06-14T08:10:38.076Z [2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[0m[35m[40m[2J[01;01H[2J[01;01H[0m[37m[40m[01;01HConsoles: EFI console 2023-06-14T08:10:38.076Z Reading loader env vars from /efi/freebsd/loader.env 2023-06-14T08:10:38.076Z Setting currdev to disk0p1: 2023-06-14T08:10:38.076Z FreeBSD/arm64 EFI loader, Revision 1.1 2023-06-14T08:10:38.076Z (Thu May 25 06:36:21 UTC 2023 root@releng1.nyi.freebsd.org) 2023-06-14T08:10:38.076Z 2023-06-14T08:10:38.076Z Command line arguments: loader.efi 2023-06-14T08:10:38.176Z Image base: 0x7856f000 2023-06-14T08:10:38.176Z EFI version: 2.70 2023-06-14T08:10:38.176Z EFI Firmware: EDK II (rev 1.00) 2023-06-14T08:10:38.176Z Console: efi (0x1000) 2023-06-14T08:10:38.176Z Load Path: \EFI\BOOT\BOOTAA64.EFI 2023-06-14T08:10:38.176Z Load Device: PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(1,GPT,B61C1E65-FAFA-11ED-84CB-002590EC5BF2,0x3,0x10418) 2023-06-14T08:10:38.176Z BootCurrent: 0001 Initial investigation results: ------------------------------ Tried to reproduce the issue on Intel based instances, no reproduction even after 50k up/down iteration. Looked into the fbsd ena driver [1] up/down flows, saw that the driver does the pci_msix_allocate/release and bus_allocation/release in the correct order. [1] https://github.com/amzn/amzn-drivers/tree/master/kernel/fbsd/ena Since the pci/bus APIs should be platform agnostic (?) I assume it to be an issue with ARM side of the kernel
This looks pretty much like a resource leakage. Did you check the driver source code, if all IRQ/MSIX resources are freed up by the driver itself? Should be the first thing to check! --HPS
(In reply to Hans Petter Selasky from comment #1) Thanks for your response, I also suspected it to be a leakage issue however how can we explain the lack of reproduction on Intel based instances? > Did you check the driver source code, if all IRQ/MSIX resources are freed up by the driver itself? Yes I've checked the driver code from [1], in particular in the detach flow [2] the driver frees the resources, first the IO irqs as a part of ena_down (ena_detach->ena_down->ena_free_io_irq->bus_teardown_intr/bus_release_resource) and then the management irq as part of ena_detach as well (ena_detach->ena_destroy_device->ena_free_mgmnt_irq->bus_teardown_intr/bus_release_resource) and finally release the pci resource right after freeing the management irq (ena_detach->ena_destroy_device->ena_disable_msix->pci_release_msi) [1] https://github.com/amzn/amzn-drivers/blob/master/kernel/fbsd/ena/ena.c [2] https://github.com/amzn/amzn-drivers/blob/master/kernel/fbsd/ena/ena.c#L3777
OK, #include <sys/kdb.h> kdb_backtrace(); Find that panic print in the kernel. Add a printf() followed by a call to kdb_backtrace() for all cases where this resource is allocated and freed. Then add if (!cold) . Then attach/detach once. Do you see anything obvious in dmesg? Then do it multiple times. Also, sysctl kern.consmute=1, may speed up prints. --HPS
(In reply to Hans Petter Selasky from comment #3) Thank you for the advice, here's the investigation results: I've look into the kernel to see where we add the panic print, it's printed in src/sys/kern/subr_intr.c file in intr_map_irq() function. when an irq is allocated in ARM instance it leads to this function, I've added a print for each irq we allocate within this function and for each irq we free within intr_unmap_irq() and got something like this: intr_map_irq is allocating entry: 32754 intr_map_irq is allocating entry: 32755 intr_map_irq is allocating entry: 32755 intr_map_irq is allocating entry: 32756 intr_map_irq is allocating entry: 32757 intr_map_irq is allocating entry: 32758 intr_map_irq is allocating entry: 32759 intr_map_irq is allocating entry: 32760 intr_map_irq is allocating entry: 32761 intr_map_irq is allocating entry: 32762 ena0: Link is down ena0: Link is down ena0: Link is down intr_unmap_irq is freeing entry: 32754 intr_unmap_irq is freeing entry: 32755 intr_unmap_irq is freeing entry: 32756 intr_unmap_irq is freeing entry: 32757 intr_unmap_irq is freeing entry: 32758 intr_unmap_irq is freeing entry: 32759 intr_unmap_irq is freeing entry: 32760 intr_unmap_irq is freeing entry: 32761 intr_unmap_irq is freeing entry: 32762 ena0: detached intr_map_irq is allocating entry: 32762 intr_map_irq is allocating entry: 32763 intr_map_irq is allocating entry: 32764 intr_map_irq is allocating entry: 32765 intr_map_irq is allocating entry: 32766 intr_map_irq is allocating entry: 32767 Interestingly, the last allocated irq number is 32767, the size of the irq map on ARM64 instances is decided in intr_map_init() and is 2 * intr_nirq where intr_nirq is defined as 16384 (16384*2=32768). This same define is much smaller on other archs - MIPS 128, RISCV 1024 Now, the global parameter irq_map_first_free_idx doesn't seem to be maintained as it should be, it always holds the last free entry but it leaves behind free entries that will not accessible in the future, it's visible within the prints as well as the code, every time we enable the driver all 9 irq are allocated in sequence so let's assume they get map_idx 10-18, Now once we disable the driver, intr_unmap_irq is invoked and all 9 irqs are freed but in the end irq_map_first_free_idx holds the value of the last freed irq which is 18 and that's where we begin the next allocation, this means that indexes 10-17 are free but they will not be actually used again since irq_map_first_free_idx is only changed upon intr_map_irq and intr_unmap_irq and is not maintained in a cyclic manner. Also, subr_intr.c file indicates that the map should be a dynamic one, which I also did not see a code that expands it in case it's full, at least within subr_intr.c (/* XXX Convert irq_map[] to dynamicaly expandable one. */). Be mindful to the fact even if the map was a dynamic one we'd still use up all of the system resources since the algorithm itself is not correct I've tried to add the exact same kernel prints on an intel based instance but none were printed into the dmesg, indicating that intel based instances go through a different flow when allocating the irqs. I've tried to look into what flow does irq allocation on such instances goes through by calling kdb_backtrack() upon every allocation/free of an irq in the driver but I always got the same prints which were not very helpful: calling bus_alloc_resource_any(SYS_RES_IRQ) from ena_request_io_irq() KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01ebe879f0 ena_up() at ena_up+0x24c/frame 0xfffffe01ebe87a90 ena_ioctl() at ena_ioctl+0x246/frame 0xfffffe01ebe87ad0 ifhwioctl() at ifhwioctl+0xd32/frame 0xfffffe01ebe87bd0 ifioctl() at ifioctl+0x7b0/frame 0xfffffe01ebe87cc0 kern_ioctl() at kern_ioctl+0x202/frame 0xfffffe01ebe87d30 sys_ioctl() at sys_ioctl+0x12a/frame 0xfffffe01ebe87e00 amd64_syscall() at amd64_syscall+0x12b/frame 0xfffffe01ebe87f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01ebe87f30 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x344c989b11fa, rsp = 0x344c94531c58, rbp = 0x344c94531cb0 --- Thanks, Osama
I think https://reviews.freebsd.org/D40768 might fix the issue.
(In reply to Andrew Turner from comment #5) Yup, it does solve the issue. Thanks, Osama
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9beb195fd9fdc4beb7d5bca865cf5b19ccd7449b commit 9beb195fd9fdc4beb7d5bca865cf5b19ccd7449b Author: Andrew Turner <andrew@FreeBSD.org> AuthorDate: 2023-06-27 08:32:12 +0000 Commit: Andrew Turner <andrew@FreeBSD.org> CommitDate: 2023-06-28 17:03:08 +0000 Continue searching for an irq map from the start When searching for a free irq map location continue the search from the beginning of the list. There may be holes in the map before irq_map_first_free_idx, e.g. when removing an entries in order will increase the index past the current free entry. PR: 271990 Reviewed by: mhorne Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D40768 sys/kern/subr_intr.c | 8 ++++++++ 1 file changed, 8 insertions(+)
given that this is a bug is 13.x and 12.x RELEASE, are there plans to MFC it?
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ffe9a1987bb7d57b21bb19e20fd9f66dd98e7a78 commit ffe9a1987bb7d57b21bb19e20fd9f66dd98e7a78 Author: Andrew Turner <andrew@FreeBSD.org> AuthorDate: 2023-06-27 08:32:12 +0000 Commit: Andrew Turner <andrew@FreeBSD.org> CommitDate: 2023-09-25 08:41:32 +0000 Continue searching for an irq map from the start When searching for a free irq map location continue the search from the beginning of the list. There may be holes in the map before irq_map_first_free_idx, e.g. when removing an entries in order will increase the index past the current free entry. PR: 271990 Reviewed by: mhorne Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D40768 (cherry picked from commit 9beb195fd9fdc4beb7d5bca865cf5b19ccd7449b) sys/kern/subr_intr.c | 8 ++++++++ 1 file changed, 8 insertions(+)
^Triage: the 12 branch is now out of support. Assign to committer that resolved.