Summary: | Panic: IRQ mapping table is full after stress devctl disable/enable | ||
---|---|---|---|
Product: | Base System | Reporter: | Osama Abboud <osamaabb> |
Component: | arm | Assignee: | Andrew Turner <Andrew> |
Status: | Closed FIXED | ||
Severity: | Affects Many People | CC: | Andrew, akiyano, darinzon, emaste, freebsd, grahamperrin, hselasky |
Priority: | --- | Keywords: | crash |
Version: | CURRENT | Flags: | freebsd:
mfc-stable13?
freebsd: mfc-stable12? |
Hardware: | arm64 | ||
OS: | Any | ||
URL: | https://reviews.freebsd.org/D40768 |
Description
Osama Abboud
2023-06-14 11:08:54 UTC
This looks pretty much like a resource leakage. Did you check the driver source code, if all IRQ/MSIX resources are freed up by the driver itself? Should be the first thing to check! --HPS (In reply to Hans Petter Selasky from comment #1) Thanks for your response, I also suspected it to be a leakage issue however how can we explain the lack of reproduction on Intel based instances? > Did you check the driver source code, if all IRQ/MSIX resources are freed up by the driver itself? Yes I've checked the driver code from [1], in particular in the detach flow [2] the driver frees the resources, first the IO irqs as a part of ena_down (ena_detach->ena_down->ena_free_io_irq->bus_teardown_intr/bus_release_resource) and then the management irq as part of ena_detach as well (ena_detach->ena_destroy_device->ena_free_mgmnt_irq->bus_teardown_intr/bus_release_resource) and finally release the pci resource right after freeing the management irq (ena_detach->ena_destroy_device->ena_disable_msix->pci_release_msi) [1] https://github.com/amzn/amzn-drivers/blob/master/kernel/fbsd/ena/ena.c [2] https://github.com/amzn/amzn-drivers/blob/master/kernel/fbsd/ena/ena.c#L3777 OK, #include <sys/kdb.h> kdb_backtrace(); Find that panic print in the kernel. Add a printf() followed by a call to kdb_backtrace() for all cases where this resource is allocated and freed. Then add if (!cold) . Then attach/detach once. Do you see anything obvious in dmesg? Then do it multiple times. Also, sysctl kern.consmute=1, may speed up prints. --HPS (In reply to Hans Petter Selasky from comment #3) Thank you for the advice, here's the investigation results: I've look into the kernel to see where we add the panic print, it's printed in src/sys/kern/subr_intr.c file in intr_map_irq() function. when an irq is allocated in ARM instance it leads to this function, I've added a print for each irq we allocate within this function and for each irq we free within intr_unmap_irq() and got something like this: intr_map_irq is allocating entry: 32754 intr_map_irq is allocating entry: 32755 intr_map_irq is allocating entry: 32755 intr_map_irq is allocating entry: 32756 intr_map_irq is allocating entry: 32757 intr_map_irq is allocating entry: 32758 intr_map_irq is allocating entry: 32759 intr_map_irq is allocating entry: 32760 intr_map_irq is allocating entry: 32761 intr_map_irq is allocating entry: 32762 ena0: Link is down ena0: Link is down ena0: Link is down intr_unmap_irq is freeing entry: 32754 intr_unmap_irq is freeing entry: 32755 intr_unmap_irq is freeing entry: 32756 intr_unmap_irq is freeing entry: 32757 intr_unmap_irq is freeing entry: 32758 intr_unmap_irq is freeing entry: 32759 intr_unmap_irq is freeing entry: 32760 intr_unmap_irq is freeing entry: 32761 intr_unmap_irq is freeing entry: 32762 ena0: detached intr_map_irq is allocating entry: 32762 intr_map_irq is allocating entry: 32763 intr_map_irq is allocating entry: 32764 intr_map_irq is allocating entry: 32765 intr_map_irq is allocating entry: 32766 intr_map_irq is allocating entry: 32767 Interestingly, the last allocated irq number is 32767, the size of the irq map on ARM64 instances is decided in intr_map_init() and is 2 * intr_nirq where intr_nirq is defined as 16384 (16384*2=32768). This same define is much smaller on other archs - MIPS 128, RISCV 1024 Now, the global parameter irq_map_first_free_idx doesn't seem to be maintained as it should be, it always holds the last free entry but it leaves behind free entries that will not accessible in the future, it's visible within the prints as well as the code, every time we enable the driver all 9 irq are allocated in sequence so let's assume they get map_idx 10-18, Now once we disable the driver, intr_unmap_irq is invoked and all 9 irqs are freed but in the end irq_map_first_free_idx holds the value of the last freed irq which is 18 and that's where we begin the next allocation, this means that indexes 10-17 are free but they will not be actually used again since irq_map_first_free_idx is only changed upon intr_map_irq and intr_unmap_irq and is not maintained in a cyclic manner. Also, subr_intr.c file indicates that the map should be a dynamic one, which I also did not see a code that expands it in case it's full, at least within subr_intr.c (/* XXX Convert irq_map[] to dynamicaly expandable one. */). Be mindful to the fact even if the map was a dynamic one we'd still use up all of the system resources since the algorithm itself is not correct I've tried to add the exact same kernel prints on an intel based instance but none were printed into the dmesg, indicating that intel based instances go through a different flow when allocating the irqs. I've tried to look into what flow does irq allocation on such instances goes through by calling kdb_backtrack() upon every allocation/free of an irq in the driver but I always got the same prints which were not very helpful: calling bus_alloc_resource_any(SYS_RES_IRQ) from ena_request_io_irq() KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01ebe879f0 ena_up() at ena_up+0x24c/frame 0xfffffe01ebe87a90 ena_ioctl() at ena_ioctl+0x246/frame 0xfffffe01ebe87ad0 ifhwioctl() at ifhwioctl+0xd32/frame 0xfffffe01ebe87bd0 ifioctl() at ifioctl+0x7b0/frame 0xfffffe01ebe87cc0 kern_ioctl() at kern_ioctl+0x202/frame 0xfffffe01ebe87d30 sys_ioctl() at sys_ioctl+0x12a/frame 0xfffffe01ebe87e00 amd64_syscall() at amd64_syscall+0x12b/frame 0xfffffe01ebe87f30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe01ebe87f30 --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x344c989b11fa, rsp = 0x344c94531c58, rbp = 0x344c94531cb0 --- Thanks, Osama I think https://reviews.freebsd.org/D40768 might fix the issue. (In reply to Andrew Turner from comment #5) Yup, it does solve the issue. Thanks, Osama A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9beb195fd9fdc4beb7d5bca865cf5b19ccd7449b commit 9beb195fd9fdc4beb7d5bca865cf5b19ccd7449b Author: Andrew Turner <andrew@FreeBSD.org> AuthorDate: 2023-06-27 08:32:12 +0000 Commit: Andrew Turner <andrew@FreeBSD.org> CommitDate: 2023-06-28 17:03:08 +0000 Continue searching for an irq map from the start When searching for a free irq map location continue the search from the beginning of the list. There may be holes in the map before irq_map_first_free_idx, e.g. when removing an entries in order will increase the index past the current free entry. PR: 271990 Reviewed by: mhorne Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D40768 sys/kern/subr_intr.c | 8 ++++++++ 1 file changed, 8 insertions(+) given that this is a bug is 13.x and 12.x RELEASE, are there plans to MFC it? A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ffe9a1987bb7d57b21bb19e20fd9f66dd98e7a78 commit ffe9a1987bb7d57b21bb19e20fd9f66dd98e7a78 Author: Andrew Turner <andrew@FreeBSD.org> AuthorDate: 2023-06-27 08:32:12 +0000 Commit: Andrew Turner <andrew@FreeBSD.org> CommitDate: 2023-09-25 08:41:32 +0000 Continue searching for an irq map from the start When searching for a free irq map location continue the search from the beginning of the list. There may be holes in the map before irq_map_first_free_idx, e.g. when removing an entries in order will increase the index past the current free entry. PR: 271990 Reviewed by: mhorne Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D40768 (cherry picked from commit 9beb195fd9fdc4beb7d5bca865cf5b19ccd7449b) sys/kern/subr_intr.c | 8 ++++++++ 1 file changed, 8 insertions(+) ^Triage: the 12 branch is now out of support. Assign to committer that resolved. |