| Summary: | oce(4) driver causes fatal trap 12 on boot with emulex 10gbe nic | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Sajeev Ramasamy <thorion3006> | ||||||||
| Component: | kern | Assignee: | freebsd-net (Nobody) <net> | ||||||||
| Status: | Closed FIXED | ||||||||||
| Severity: | Affects Many People | CC: | freqlabs, hselasky, jpaetzel, ryan, venkatduvvuru.ml | ||||||||
| Priority: | --- | Keywords: | crash, needs-qa, patch, regression | ||||||||
| Version: | 11.2-STABLE | Flags: | freqlabs:
mfc-stable12+
freqlabs: mfc-stable11+ |
||||||||
| Hardware: | amd64 | ||||||||||
| OS: | Any | ||||||||||
| Attachments: |
|
||||||||||
Created attachment 202047 [details]
Fix panic in OCE
I also experienced this issue, and came up with a patch that solved the panic. That machine is currently in pieces waiting for a replacement motherboard, but I think this was the change I applied.
(In reply to Ryan Moeller from comment #1) Hey, sorry for the late reply. Thanks for your patch, but that line is no longer present in the new file. Created attachment 202296 [details]
Bounds check array accesses in oce driver
My machine is operational again. Here is the actual patch that applies to HEAD. The same idea applies to 12 and 11 as well, though maybe on different line numbers.
(In reply to Ryan Moeller from comment #3) Hey, sorry to bother you again, but i'm not able to find those lines in 11.2-RELEASE version. https://github.com/freebsd/freebsd/blob/release/11.2.0/sys/dev/oce/oce_if.c (In reply to Sajeev Ramasamy from comment #4) Ah I see the confusion. Those lines are in stable/11, but not in 11.2-release. https://github.com/freebsd/freebsd/blob/stable/11/sys/dev/oce/oce_if.c#L2392-L2395 Fixed in main, stable/13: commit 3582828053556ca0e05ed9aab3e78008a0595e09 Author: Alexander Motin <mav@FreeBSD.org> Date: Tue May 28 18:32:04 2019 +0000 Fix array out of bound panic introduced in r306219. As I see, different NICs in different configurations may have different numbers of TX and RX queues. The code was assuming 1:1 mapping between event queues (interrupts) and TX/RX queues. Since number of interrupts is set to maximum of TX and RX queues, when those two are different, the system is doomed. I have no documentation or deep knowledge about this hardware, so this change is based on general observations and code reading. If some of my guesses are wrong, please do better. I just confirmed HP NC550SFP NICs are working now. MFC after: 2 weeks Sponsored by: iXsystems, Inc. Notes: svn path=/head/; revision=348332 Fixed in stable/12: 24a556b1dd7481cfac036d5138bbbfa1bde832b4 (r348888) Fixed in stable/11: a42a0b77f0de636a91f79fa2fde8a507d88b79b7 I'm calling this done. It's too late for any 11.x releases and it's already been shipped in 12.1 and 12.2 release and will be in 13.0 release. |
Created attachment 201962 [details] Fatal 12 error log The new oce(4) drivers include from version 11.2 and above causes boot panic with emulex 10gbe nic. The driver included in 11.1-STABLE release works without any issues. So please kindly rollback the driver to the version included in 11.1-STABLE release.