Bug 235700

Summary: oce(4) driver causes fatal trap 12 on boot with emulex 10gbe nic
Product: Base System Reporter: Sajeev Ramasamy <thorion3006>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Closed FIXED    
Severity: Affects Many People CC: freqlabs, hselasky, jpaetzel, ryan, venkatduvvuru.ml
Priority: --- Keywords: crash, needs-qa, patch, regression
Version: 11.2-STABLEFlags: freqlabs: mfc-stable12+
freqlabs: mfc-stable11+
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
Fatal 12 error log
none
Fix panic in OCE
none
Bounds check array accesses in oce driver none

Description Sajeev Ramasamy 2019-02-12 16:19:38 UTC
Created attachment 201962 [details]
Fatal 12 error log

The new oce(4) drivers include from version 11.2 and above causes boot panic with emulex 10gbe nic. The driver included in 11.1-STABLE release works without any issues. So please kindly rollback the driver to the version included in 11.1-STABLE release.
Comment 1 Ryan Moeller 2019-02-15 18:43:29 UTC
Created attachment 202047 [details]
Fix panic in OCE

I also experienced this issue, and came up with a patch that solved the panic. That machine is currently in pieces waiting for a replacement motherboard, but I think this was the change I applied.
Comment 2 Sajeev Ramasamy 2019-02-23 11:58:20 UTC
(In reply to Ryan Moeller from comment #1)
Hey, sorry for the late reply. Thanks for your patch, but that line is no longer present in the new file.
Comment 3 Ryan Moeller 2019-02-23 19:45:04 UTC
Created attachment 202296 [details]
Bounds check array accesses in oce driver

My machine is operational again. Here is the actual patch that applies to HEAD. The same idea applies to 12 and 11 as well, though maybe on different line numbers.
Comment 4 Sajeev Ramasamy 2019-02-24 15:58:11 UTC
(In reply to Ryan Moeller from comment #3)
Hey, sorry to bother you again, but i'm not able to find those lines in 11.2-RELEASE version. https://github.com/freebsd/freebsd/blob/release/11.2.0/sys/dev/oce/oce_if.c
Comment 5 Ryan Moeller 2019-02-24 16:54:48 UTC
(In reply to Sajeev Ramasamy from comment #4)
Ah I see the confusion. Those lines are in stable/11, but not in 11.2-release.
https://github.com/freebsd/freebsd/blob/stable/11/sys/dev/oce/oce_if.c#L2392-L2395
Comment 6 Ryan Moeller freebsd_committer freebsd_triage 2021-03-30 15:06:55 UTC
Fixed in main, stable/13:

commit 3582828053556ca0e05ed9aab3e78008a0595e09
Author: Alexander Motin <mav@FreeBSD.org>
Date:   Tue May 28 18:32:04 2019 +0000

    Fix array out of bound panic introduced in r306219.
    
    As I see, different NICs in different configurations may have different
    numbers of TX and RX queues.  The code was assuming 1:1 mapping between
    event queues (interrupts) and TX/RX queues.  Since number of interrupts
    is set to maximum of TX and RX queues, when those two are different, the
    system is doomed.
    
    I have no documentation or deep knowledge about this hardware, so this
    change is based on general observations and code reading.  If some of my
    guesses are wrong, please do better.  I just confirmed HP NC550SFP NICs
    are working now.
    
    MFC after:      2 weeks
    Sponsored by:   iXsystems, Inc.

Notes:
    svn path=/head/; revision=348332

Fixed in stable/12: 24a556b1dd7481cfac036d5138bbbfa1bde832b4 (r348888)
Fixed in stable/11: a42a0b77f0de636a91f79fa2fde8a507d88b79b7
Comment 7 Ryan Moeller freebsd_committer freebsd_triage 2021-03-30 15:14:37 UTC
I'm calling this done. It's too late for any 11.x releases and it's already been shipped in 12.1 and 12.2 release and will be in 13.0 release.