Bug 272469 - Broadcom mpi3mr driver: MSIX allocation fail on DELL PowerEdge R7625 system
Summary: Broadcom mpi3mr driver: MSIX allocation fail on DELL PowerEdge R7625 system
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.2-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-07-12 11:48 UTC by Chandrakanth Patil
Modified: 2023-07-12 11:48 UTC (History)
2 users (show)

See Also:


Attachments
msix_table_dump (19.00 KB, text/plain)
2023-07-12 11:48 UTC, Chandrakanth Patil
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Chandrakanth Patil 2023-07-12 11:48:01 UTC
Created attachment 243352 [details]
msix_table_dump

mpi3mr avenger driver:

system details: 
1. Dell PowerEdge R7625 with 196 physical cores and 256 logical cores

mpi3mr driver will allocate the single msix for handshaking with the driver during
the initial load phase using pci_alloc_msix() API. After allocating the single msix, the driver is sending the get IOC_FACTS commands to firmware through which the driver will fetch all the controller properties. The issue is the driver is not getting the interrupt for IOC_FACTS completion leads to timeout which in turn leads to driver load failure. but the driver can see that the command is completed by the firmware if it polls the reply queue.
After creating the single msix in the driver, the vmstat -i in the OS should show the interrupt but it is not showing so the interrupt binding is failing. ideally in this case the pci_alloc_msix() API should throw some error during allocation but it is not throwing any error.

Note: 
     1. This issue is happening only on this specific server where the number of 
        CPUs are > 128 (total CPUs are 256).
     2. But when we reduce the number of cores to 24 in the BIOS then the driver 
        is working without any issues. 

We have dumped the MSIX table before and after the allocation of a single msix and after the command times out. Please find it in the attachment.

I wanted to understand if is there any OS limitation w.r.t MSIX allocation on larger cores system.

Please find attached driver logs and MSIX table dump.