Bug 162110 - [igb] [panic] RELENG_9 panics on boot in IGB driver - [regression] from 8.2
Summary: [igb] [panic] RELENG_9 panics on boot in IGB driver - [regression] from 8.2
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-net mailing list
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2011-10-28 20:50 UTC by Frank Terhaar-Yonkers
Modified: 2018-05-28 20:01 UTC (History)
2 users (show)

See Also:


Attachments
if_igb.c.diff (2.59 KB, patch)
2011-10-31 19:37 UTC, Gleb Smirnoff
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Terhaar-Yonkers 2011-10-28 20:50:08 UTC
if_igb driver panics during bootup.

The IGB driver probes the device at line 591 of if_igb.c and punts:
                if (e1000_validate_nvm_checksum(&adapter->hw) < 0) {
                        device_printf(dev,
                            "The EEPROM Checksum Is Not Valid\n");
                        error = EIO;
                        goto err_late;
                }

The kernel immediately panics with a page fault.  The trace-back show it's in the if_igb driver as the console messages suggest.

Releng_8 did not panic, so this is a regression.  The IGB NIC most likely has some sort of problem which is properly diagnosed.

Email me if you want the screen shot of the panic, or have a fix to try out.

Fix: 

Disabled compile of if_igb.c driver, system boots fine.
How-To-Repeat: Crashes every time on boot.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2011-10-29 13:04:50 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

reclassify.
Comment 2 Gleb Smirnoff freebsd_committer 2011-10-31 19:37:28 UTC
On Fri, Oct 28, 2011 at 07:43:28PM +0000, Frank Terhaar-Yonkers wrote:
F> 
F> >Number:         162110
F> >Category:       kern
F> >Synopsis:       Releng_9 panics on boot in IGB driver - regression from 8.2
F> >Confidential:   no
F> >Severity:       critical
F> >Priority:       high
F> >Responsible:    freebsd-bugs
F> >State:          open
F> >Quarter:        
F> >Keywords:       
F> >Date-Required:
F> >Class:          sw-bug
F> >Submitter-Id:   current-users
F> >Arrival-Date:   Fri Oct 28 19:50:08 UTC 2011
F> >Closed-Date:
F> >Last-Modified:
F> >Originator:     Frank Terhaar-Yonkers
F> >Release:        Releng_9 CVSUP 2011-October-28
F> >Organization:
F> Cisco
F> >Environment:
F> FreeBSD fty-zfs-01 9.0-RC1 FreeBSD 9.0-RC1 #1: Fri Oct 28 06:50:23 EDT 2011     toot@fty-zfs-01:/usr/obj/usr/src/sys/GENERIC  amd64
F> >Description:
F> if_igb driver panics during bootup.
F> 
F> The IGB driver probes the device at line 591 of if_igb.c and punts:
F>                 if (e1000_validate_nvm_checksum(&adapter->hw) < 0) {
F>                         device_printf(dev,
F>                             "The EEPROM Checksum Is Not Valid\n");
F>                         error = EIO;
F>                         goto err_late;
F>                 }
F> 
F> The kernel immediately panics with a page fault.  The trace-back show it's in the if_igb driver as the console messages suggest.
F> 
F> Releng_8 did not panic, so this is a regression.  The IGB NIC most likely has some sort of problem which is properly diagnosed.
F> 
F> Email me if you want the screen shot of the panic, or have a fix to try out.

To reproduce your problem, I've put '|| 1)' conditional into code quoted
above. It appeared that calling igb_detach() in case of igb_attach() failure
is full of landmines. Attached patch fixes lot of them, and at least kernel
doesn't panic in case of e1000_validate_nvm_checksum() failure, not sure
about other cases.

Unfortunately patch will not fix your NIC, it only cures panic.

I've put into Cc Jack Vogel, who is maintainer of the Intel NIC drivers
in FreeBSD. May be he can help you.

Jack, please consider including my patch into next version of driver.
The issues fixed:

- igb_detach() may be called with not initialized ifp
- igb_stop() may be called with not initialized ifp
- igb_detach() already does free transmit/receive structures
- igb_detach() already does free adapter->mta
- igb_detach() already does destroy core lock

There are probably other edge cases, when kernel panics due to some failure
in igb_attach(), not all possible error exits were tested.

-- 
Totus tuus, Glebius.
Comment 3 Sean Bruno freebsd_committer 2015-08-04 15:47:14 UTC
gleb made this patch quite a long time ago.  The error/shutdown code is still broken.
Comment 4 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:49:55 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.