Bug 162110

Summary: [igb] [panic] RELENG_9 panics on boot in IGB driver - [regression] from 8.2
Product: Base System Reporter: Frank Terhaar-Yonkers <fty>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: Closed FIXED    
Severity: Affects Only Me CC: sbruno, shurd
Priority: Normal Keywords: IntelNetworking
Version: Unspecified   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
if_igb.c.diff none

Description Frank Terhaar-Yonkers 2011-10-28 20:50:08 UTC
if_igb driver panics during bootup.

The IGB driver probes the device at line 591 of if_igb.c and punts:
                if (e1000_validate_nvm_checksum(&adapter->hw) < 0) {
                        device_printf(dev,
                            "The EEPROM Checksum Is Not Valid\n");
                        error = EIO;
                        goto err_late;
                }

The kernel immediately panics with a page fault.  The trace-back show it's in the if_igb driver as the console messages suggest.

Releng_8 did not panic, so this is a regression.  The IGB NIC most likely has some sort of problem which is properly diagnosed.

Email me if you want the screen shot of the panic, or have a fix to try out.

Fix: 

Disabled compile of if_igb.c driver, system boots fine.
How-To-Repeat: Crashes every time on boot.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2011-10-29 13:04:50 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

reclassify.
Comment 2 Gleb Smirnoff freebsd_committer freebsd_triage 2011-10-31 19:37:28 UTC
On Fri, Oct 28, 2011 at 07:43:28PM +0000, Frank Terhaar-Yonkers wrote:
F> 
F> >Number:         162110
F> >Category:       kern
F> >Synopsis:       Releng_9 panics on boot in IGB driver - regression from 8.2
F> >Confidential:   no
F> >Severity:       critical
F> >Priority:       high
F> >Responsible:    freebsd-bugs
F> >State:          open
F> >Quarter:        
F> >Keywords:       
F> >Date-Required:
F> >Class:          sw-bug
F> >Submitter-Id:   current-users
F> >Arrival-Date:   Fri Oct 28 19:50:08 UTC 2011
F> >Closed-Date:
F> >Last-Modified:
F> >Originator:     Frank Terhaar-Yonkers
F> >Release:        Releng_9 CVSUP 2011-October-28
F> >Organization:
F> Cisco
F> >Environment:
F> FreeBSD fty-zfs-01 9.0-RC1 FreeBSD 9.0-RC1 #1: Fri Oct 28 06:50:23 EDT 2011     toot@fty-zfs-01:/usr/obj/usr/src/sys/GENERIC  amd64
F> >Description:
F> if_igb driver panics during bootup.
F> 
F> The IGB driver probes the device at line 591 of if_igb.c and punts:
F>                 if (e1000_validate_nvm_checksum(&adapter->hw) < 0) {
F>                         device_printf(dev,
F>                             "The EEPROM Checksum Is Not Valid\n");
F>                         error = EIO;
F>                         goto err_late;
F>                 }
F> 
F> The kernel immediately panics with a page fault.  The trace-back show it's in the if_igb driver as the console messages suggest.
F> 
F> Releng_8 did not panic, so this is a regression.  The IGB NIC most likely has some sort of problem which is properly diagnosed.
F> 
F> Email me if you want the screen shot of the panic, or have a fix to try out.

To reproduce your problem, I've put '|| 1)' conditional into code quoted
above. It appeared that calling igb_detach() in case of igb_attach() failure
is full of landmines. Attached patch fixes lot of them, and at least kernel
doesn't panic in case of e1000_validate_nvm_checksum() failure, not sure
about other cases.

Unfortunately patch will not fix your NIC, it only cures panic.

I've put into Cc Jack Vogel, who is maintainer of the Intel NIC drivers
in FreeBSD. May be he can help you.

Jack, please consider including my patch into next version of driver.
The issues fixed:

- igb_detach() may be called with not initialized ifp
- igb_stop() may be called with not initialized ifp
- igb_detach() already does free transmit/receive structures
- igb_detach() already does free adapter->mta
- igb_detach() already does destroy core lock

There are probably other edge cases, when kernel panics due to some failure
in igb_attach(), not all possible error exits were tested.

-- 
Totus tuus, Glebius.
Comment 3 Sean Bruno freebsd_committer freebsd_triage 2015-08-04 15:47:14 UTC
gleb made this patch quite a long time ago.  The error/shutdown code is still broken.
Comment 4 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:49:55 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.