Bug 271990 - Panic: IRQ mapping table is full after stress devctl disable/enable
Summary: Panic: IRQ mapping table is full after stress devctl disable/enable
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: CURRENT
Hardware: arm64 Any
: --- Affects Many People
Assignee: Andrew Turner
URL: https://reviews.freebsd.org/D40768
Keywords: crash
Depends on:
Blocks:
 
Reported: 2023-06-14 11:08 UTC by Osama Abboud
Modified: 2024-01-02 03:27 UTC (History)
7 users (show)

See Also:
freebsd: mfc-stable13?
freebsd: mfc-stable12?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Osama Abboud 2023-06-14 11:08:54 UTC
Reproduction steps:
-------------------
1. Create an AWS EC2 instance from one of the following AMIs in us-east-1
1.1: ami-0b55af91f40cd29ee - FreeBSD 14.0-CURRENT-arm64-20230525 UEFI
1.2: ami-0fdc715f878897386 - FreeBSD 13.2-STABLE-arm64-20230601 UEFI
1.3: ami-0e1fd0c2493efe1d1 - FreeBSD 12.4-STABLE-arm64-2023-06-01

2. run the following reset loop script:
#!/bin/sh
while true
do
devctl disable ena0
devctl enable ena0
done

Result:
-------
Crashes every time. 100% reproducible.

***The same test does not fail on intel based instances.***

Stack trace:
------------
2023-06-14T08:05:02.374Z	panic: IRQ mapping table is full.
	2023-06-14T08:05:02.374Z	cpuid = 18
	2023-06-14T08:05:02.374Z	time = 1686729902
	2023-06-14T08:05:02.374Z	KDB: stack backtrace:
	2023-06-14T08:05:02.374Z	db_trace_self() at db_trace_self
	2023-06-14T08:05:02.374Z	db_trace_self_wrapper() at db_trace_self_wrapper+0x30
	2023-06-14T08:05:02.374Z	vpanic() at vpanic+0x13c
	2023-06-14T08:05:02.374Z	panic() at panic+0x44
	2023-06-14T08:05:02.374Z	intr_map_irq() at intr_map_irq+0xb0
	2023-06-14T08:05:02.374Z	intr_alloc_msix() at intr_alloc_msix+0x1d8
	2023-06-14T08:05:02.374Z	generic_pcie_acpi_alloc_msix() at generic_pcie_acpi_alloc_msix+0x78
	2023-06-14T08:05:02.374Z	pci_alloc_msix_method() at pci_alloc_msix_method+0x168
	2023-06-14T08:05:02.374Z	ena_enable_msix_and_set_admin_interrupts() at ena_enable_msix_and_set_admin_interrupts+0x10c
	2023-06-14T08:05:02.374Z	ena_attach() at ena_attach+0x65c
	2023-06-14T08:05:02.375Z	device_attach() at device_attach+0x3f8
	2023-06-14T08:05:02.375Z	device_probe_and_attach() at device_probe_and_attach+0x7c
	2023-06-14T08:05:02.375Z	devctl2_ioctl() at devctl2_ioctl+0x44c
	2023-06-14T08:05:02.375Z	devfs_ioctl() at devfs_ioctl+0xd4
	2023-06-14T08:05:02.375Z	vn_ioctl() at vn_ioctl+0xc0
	2023-06-14T08:05:02.375Z	devfs_ioctl_f() at devfs_ioctl_f+0x20
	2023-06-14T08:05:02.375Z	kern_ioctl() at kern_ioctl+0x2dc
	2023-06-14T08:05:02.375Z	sys_ioctl() at sys_ioctl+0x118
	2023-06-14T08:05:02.375Z	do_el0_sync() at do_el0_sync+0x520
	2023-06-14T08:05:02.375Z	handle_el0_sync() at handle_el0_sync+0x44
	2023-06-14T08:05:02.375Z	--- exception, esr 0x56000000
	2023-06-14T08:05:02.375Z	Uptime: 4m1s
	2023-06-14T08:05:02.375Z	Dumping 2053 out of 64453 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%
	2023-06-14T08:10:36.676Z	Dump complete
	2023-06-14T08:10:37.976Z	UEFI firmware (version built at 09:00:00 on Nov 1 2018)
	2023-06-14T08:10:38.076Z	[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[2J[01;01H[=3h[2J[01;01H[0m[35m[40m[2J[01;01H[2J[01;01H[0m[37m[40m[01;01HConsoles: EFI console
	2023-06-14T08:10:38.076Z	Reading loader env vars from /efi/freebsd/loader.env
	2023-06-14T08:10:38.076Z	Setting currdev to disk0p1:
	2023-06-14T08:10:38.076Z	FreeBSD/arm64 EFI loader, Revision 1.1
	2023-06-14T08:10:38.076Z	(Thu May 25 06:36:21 UTC 2023 root@releng1.nyi.freebsd.org)
	2023-06-14T08:10:38.076Z	
	2023-06-14T08:10:38.076Z	Command line arguments: loader.efi
	2023-06-14T08:10:38.176Z	Image base: 0x7856f000
	2023-06-14T08:10:38.176Z	EFI version: 2.70
	2023-06-14T08:10:38.176Z	EFI Firmware: EDK II (rev 1.00)
	2023-06-14T08:10:38.176Z	Console: efi (0x1000)
	2023-06-14T08:10:38.176Z	Load Path: \EFI\BOOT\BOOTAA64.EFI
	2023-06-14T08:10:38.176Z	Load Device: PciRoot(0x0)/Pci(0x4,0x0)/NVMe(0x1,00-00-00-00-00-00-00-00)/HD(1,GPT,B61C1E65-FAFA-11ED-84CB-002590EC5BF2,0x3,0x10418)
	2023-06-14T08:10:38.176Z	BootCurrent: 0001 


Initial investigation results:
------------------------------
Tried to reproduce the issue on Intel based instances, no reproduction even after 50k up/down iteration.
Looked into the fbsd ena driver [1] up/down flows, saw that the driver does the pci_msix_allocate/release and bus_allocation/release in the correct order.

[1] https://github.com/amzn/amzn-drivers/tree/master/kernel/fbsd/ena

Since the pci/bus APIs should be platform agnostic (?) I assume it to be an issue with ARM side of the kernel
Comment 1 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-14 12:04:29 UTC
This looks pretty much like a resource leakage.

Did you check the driver source code, if all IRQ/MSIX resources are freed up by the driver itself?

Should be the first thing to check!

--HPS
Comment 2 Osama Abboud 2023-06-14 12:23:21 UTC
(In reply to Hans Petter Selasky from comment #1)
Thanks for your response,
I also suspected it to be a leakage issue however how can we explain the lack of reproduction on Intel based instances?

> Did you check the driver source code, if all IRQ/MSIX resources are freed up by the driver itself?

Yes I've checked the driver code from [1], in particular in the detach flow [2] the driver frees the resources, first the IO irqs as a part of ena_down (ena_detach->ena_down->ena_free_io_irq->bus_teardown_intr/bus_release_resource)
and then the management irq as part of ena_detach as well (ena_detach->ena_destroy_device->ena_free_mgmnt_irq->bus_teardown_intr/bus_release_resource) and finally release the pci resource right after freeing the management irq (ena_detach->ena_destroy_device->ena_disable_msix->pci_release_msi)


[1] https://github.com/amzn/amzn-drivers/blob/master/kernel/fbsd/ena/ena.c
[2] https://github.com/amzn/amzn-drivers/blob/master/kernel/fbsd/ena/ena.c#L3777
Comment 3 Hans Petter Selasky freebsd_committer freebsd_triage 2023-06-14 13:08:09 UTC
OK,

#include <sys/kdb.h>

kdb_backtrace();

Find that panic print in the kernel. Add a printf() followed by a call to kdb_backtrace() for all cases where this resource is allocated and freed.

Then add if (!cold) .

Then attach/detach once. Do you see anything obvious in dmesg?

Then do it multiple times.

Also, sysctl kern.consmute=1, may speed up prints.

--HPS
Comment 4 Osama Abboud 2023-06-18 07:29:09 UTC
(In reply to Hans Petter Selasky from comment #3)
Thank you for the advice, here's the investigation results:
I've look into the kernel to see where we add the panic print, it's printed in src/sys/kern/subr_intr.c file in intr_map_irq() function. when an irq is allocated in ARM instance it leads to this function, I've added a print for each irq we allocate within this function and for each irq we free within intr_unmap_irq() and got something like this:

intr_map_irq is allocating entry: 32754
intr_map_irq is allocating entry: 32755
intr_map_irq is allocating entry: 32755
intr_map_irq is allocating entry: 32756
intr_map_irq is allocating entry: 32757
intr_map_irq is allocating entry: 32758
intr_map_irq is allocating entry: 32759
intr_map_irq is allocating entry: 32760
intr_map_irq is allocating entry: 32761
intr_map_irq is allocating entry: 32762
ena0: Link is down
ena0: Link is down
ena0: Link is down
intr_unmap_irq is freeing entry: 32754
intr_unmap_irq is freeing entry: 32755
intr_unmap_irq is freeing entry: 32756
intr_unmap_irq is freeing entry: 32757
intr_unmap_irq is freeing entry: 32758
intr_unmap_irq is freeing entry: 32759
intr_unmap_irq is freeing entry: 32760
intr_unmap_irq is freeing entry: 32761
intr_unmap_irq is freeing entry: 32762
ena0: detached
intr_map_irq is allocating entry: 32762
intr_map_irq is allocating entry: 32763
intr_map_irq is allocating entry: 32764
intr_map_irq is allocating entry: 32765
intr_map_irq is allocating entry: 32766
intr_map_irq is allocating entry: 32767 

Interestingly, the last allocated irq number is 32767, the size of the irq map on ARM64 instances is decided in intr_map_init() and is 2 * intr_nirq where intr_nirq is defined as 16384 (16384*2=32768). 
This same define is much smaller on other archs - MIPS 128, RISCV 1024

Now, the global parameter irq_map_first_free_idx doesn't seem to be maintained as it should be, it always holds the last free entry but it leaves behind free entries that will not accessible in the future, it's visible within the prints as well as the code, every time we enable the driver all 9 irq are allocated in sequence so let's assume they get map_idx 10-18, Now once we disable the driver, intr_unmap_irq is invoked and all 9 irqs are freed but in the end irq_map_first_free_idx holds the value of the last freed irq which is 18 and that's where we begin the next allocation, this means that indexes 10-17 are free but they will not be actually used again since irq_map_first_free_idx is only changed upon intr_map_irq  and intr_unmap_irq and is not maintained in a cyclic manner.
Also, subr_intr.c file indicates that the map should be a dynamic one, which I also did not see a code that expands it in case it's full, at least within subr_intr.c (/* XXX Convert irq_map[] to dynamicaly expandable one. */).
Be mindful to the fact even if the map was a dynamic one we'd still use up all of the system resources since the algorithm itself is not correct

I've tried to add the exact same kernel prints on an intel based instance but none were printed into the dmesg, indicating that intel based instances go through a different flow when allocating the irqs. I've tried to look into what flow does irq allocation on such instances goes through by calling kdb_backtrack() upon every allocation/free of an irq in the driver but I always got the same prints which were not very helpful:

calling bus_alloc_resource_any(SYS_RES_IRQ) from ena_request_io_irq() 
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01ebe879f0
ena_up() at ena_up+0x24c/frame 0xfffffe01ebe87a90
ena_ioctl() at ena_ioctl+0x246/frame 0xfffffe01ebe87ad0
ifhwioctl() at ifhwioctl+0xd32/frame 0xfffffe01ebe87bd0
ifioctl() at ifioctl+0x7b0/frame 0xfffffe01ebe87cc0
kern_ioctl() at kern_ioctl+0x202/frame 0xfffffe01ebe87d30
sys_ioctl() at sys_ioctl+0x12a/frame 0xfffffe01ebe87e00
amd64_syscall() at amd64_syscall+0x12b/frame 0xfffffe01ebe87f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01ebe87f30
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x344c989b11fa, rsp = 0x344c94531c58, rbp = 0x344c94531cb0 ---


Thanks,
Osama
Comment 5 Andrew Turner freebsd_committer freebsd_triage 2023-06-27 08:38:23 UTC
I think https://reviews.freebsd.org/D40768 might fix the issue.
Comment 6 Osama Abboud 2023-06-28 13:41:47 UTC
(In reply to Andrew Turner from comment #5)
Yup, it does solve the issue.
Thanks,
Osama
Comment 7 commit-hook freebsd_committer freebsd_triage 2023-06-28 17:03:43 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9beb195fd9fdc4beb7d5bca865cf5b19ccd7449b

commit 9beb195fd9fdc4beb7d5bca865cf5b19ccd7449b
Author:     Andrew Turner <andrew@FreeBSD.org>
AuthorDate: 2023-06-27 08:32:12 +0000
Commit:     Andrew Turner <andrew@FreeBSD.org>
CommitDate: 2023-06-28 17:03:08 +0000

    Continue searching for an irq map from the start

    When searching for a free irq map location continue the search from the
    beginning of the list. There may be holes in the map before
    irq_map_first_free_idx, e.g. when removing an entries in order will
    increase the index past the current free entry.

    PR:             271990
    Reviewed by:    mhorne
    Sponsored by:   Arm Ltd

    Differential Revision:  https://reviews.freebsd.org/D40768

 sys/kern/subr_intr.c | 8 ++++++++
 1 file changed, 8 insertions(+)
Comment 8 Mina Galić freebsd_triage 2023-06-28 19:53:47 UTC
given that this is a bug is 13.x and 12.x RELEASE, are there plans to MFC it?
Comment 9 commit-hook freebsd_committer freebsd_triage 2023-09-25 11:02:15 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ffe9a1987bb7d57b21bb19e20fd9f66dd98e7a78

commit ffe9a1987bb7d57b21bb19e20fd9f66dd98e7a78
Author:     Andrew Turner <andrew@FreeBSD.org>
AuthorDate: 2023-06-27 08:32:12 +0000
Commit:     Andrew Turner <andrew@FreeBSD.org>
CommitDate: 2023-09-25 08:41:32 +0000

    Continue searching for an irq map from the start

    When searching for a free irq map location continue the search from the
    beginning of the list. There may be holes in the map before
    irq_map_first_free_idx, e.g. when removing an entries in order will
    increase the index past the current free entry.

    PR:             271990
    Reviewed by:    mhorne
    Sponsored by:   Arm Ltd

    Differential Revision:  https://reviews.freebsd.org/D40768

    (cherry picked from commit 9beb195fd9fdc4beb7d5bca865cf5b19ccd7449b)

 sys/kern/subr_intr.c | 8 ++++++++
 1 file changed, 8 insertions(+)
Comment 10 Mark Linimon freebsd_committer freebsd_triage 2024-01-02 03:27:50 UTC
^Triage: the 12 branch is now out of support.

Assign to committer that resolved.