Bug 236724

Summary: igb(4): Interfaces fail to switch active to inactive state
Product: Base System Reporter: ncrogers
Component: kernAssignee: Marius Strobl <marius>
Status: Closed FIXED    
Severity: Affects Some People CC: aleksandr.fedorov, bugzilla.freebsd, egypcio, emaste, erj, freebsd, jboman, julien, karl, krzysztof.galazka, ltning-freebsd, marius.halden, ncrogers, net, shurd, sigsys, smh, snow, webmaster
Priority: --- Keywords: IntelNetworking, regression
Version: 12.1-RELEASEFlags: koobs: mfc-stable12+
Hardware: amd64   
OS: Any   
URL: https://reviews.freebsd.org/D21769
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=228556
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239240
Bug Depends on:    
Bug Blocks: 240700    

Description ncrogers 2019-03-22 21:50:11 UTC
Since upgrading to 12.0-RELEASE / iflib I've noticed that my igbX interfaces
no longer go from status "active" to "inactive" (as reported by ifconfig)
when the Ethernet port is unplugged and loses link. The status lights on the
physical interface go out but ifconfig still reports active +
autoselect/1000baseT. An "ifconfig down" followed by "up", while its
unplugged, forces the link to recognize the "inactive" state. After the
down/up dance plugging the cable in again returns it to "active".

I have replicated this problem with a clean install of the latest 12-STABLE
snapshot as of today. The only configuration I've made was the usual
timezone, root password, etc. during install. The interfaces in this test
case have no IP address configurations.

This is possibly a duplicate of bug #228556, however I am unsure because the
description is confusing and I am not using a virtualization layer.

I've noticed this on multiple systems with the following Intel devices (as reported by pciconf "device"):

I210 Gigabit Network Connection
I211 Gigabit Network Connection
Ethernet Connection I354

uname -a:
FreeBSD test.local 12.0-STABLE FreeBSD 12.0-STABLE r345358 GENERIC  amd64

ifconfig when igb2 is actually connected and active:
igb2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:90:0b:78:13:5a
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

ifconfig when igb2 is disconnected (identical to above):
igb2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 00:90:0b:78:13:5a
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>

Relevant pciconf -lcv:
igb2@pci0:0:20:0:       class=0x020000 card=0x1f418086 chip=0x1f418086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection I354'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x20[0x0], PBA in map 0x20[0x2000]
    cap 10[a0] = PCI-Express 2 root endpoint max data 512(512) FLR NS
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 00900bffff78135a
    ecap 0017[1a0] = TPH Requester 1
    ecap 000d[1d0] = ACS 1
igb3@pci0:0:20:1:       class=0x020000 card=0x1f418086 chip=0x1f418086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection I354'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x20[0x0], PBA in map 0x20[0x2000]
    cap 10[a0] = PCI-Express 2 root endpoint max data 512(512) FLR NS
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 00900bffff78135a
    ecap 0017[1a0] = TPH Requester 1
    ecap 000d[1d0] = ACS 1
igb4@pci0:0:20:2:       class=0x020000 card=0x1f418086 chip=0x1f418086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection I354'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x20[0x0], PBA in map 0x20[0x2000]
    cap 10[a0] = PCI-Express 2 root endpoint max data 512(512) FLR NS
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 00900bffff78135a
    ecap 0017[1a0] = TPH Requester 1
    ecap 000d[1d0] = ACS 1
igb5@pci0:0:20:3:       class=0x020000 card=0x1f418086 chip=0x1f418086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Ethernet Connection I354'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x20[0x0], PBA in map 0x20[0x2000]
    cap 10[a0] = PCI-Express 2 root endpoint max data 512(512) FLR NS
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 00900bffff78135a
    ecap 0017[1a0] = TPH Requester 1
    ecap 000d[1d0] = ACS 1
ehci0@pci0:0:22:0:      class=0x0c0320 card=0x72708086 chip=0x1f2c8086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Atom processor C2000 USB Enhanced Host Controller'
    class      = serial bus
    subclass   = USB
    cap 01[50] = powerspec 3  supports D0 D3  current D0
    cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14
    cap 13[98] = PCI Advanced Features: FLR TP
ahci0@pci0:0:24:0:      class=0x010601 card=0x72708086 chip=0x1f328086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Atom processor C2000 AHCI SATA3 Controller'
    class      = mass storage
    subclass   = SATA
    cap 05[80] = MSI supports 1 message enabled with 1 message
    cap 01[70] = powerspec 3  supports D0 D3  current D0
    cap 12[a8] = SATA Index-Data Pair
isab0@pci0:0:31:0:      class=0x060100 card=0x72708086 chip=0x1f388086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Atom processor C2000 PCU'
    class      = bridge
    subclass   = PCI-ISA
    cap 09[e0] = vendor (length 12) Intel cap 1 version 0
                 features: 4 PCI-e x1 slots
none3@pci0:0:31:3:      class=0x0c0500 card=0x72708086 chip=0x1f3c8086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'Atom processor C2000 PCU SMBus'
    class      = serial bus
    subclass   = SMBus
igb0@pci0:1:0:0:        class=0x020000 card=0x0000ffff chip=0x15338086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I210 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 00900bffff78135e
    ecap 0017[1a0] = TPH Requester 1
igb1@pci0:2:0:0:        class=0x020000 card=0x0000ffff chip=0x15338086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I210 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 00900bffff78135f
    ecap 0017[1a0] = TPH Requester 1

-------------------------------------------------------------------------------

It may be worth noting that I have observed this problem on two other systems
with igbX interfaces. pciconf for these is below.


I210 Gigabit devices
-------------------------------------------------------------------------------

igb0@pci0:7:0:0:	class=0x020000 card=0x153315d9 chip=0x15338086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I210 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 2 corrected
    ecap 0003[140] = Serial 1 ac1f6bffff64a676
    ecap 0017[1a0] = TPH Requester 1
igb1@pci0:8:0:0:	class=0x020000 card=0x153315d9 chip=0x15338086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I210 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 ac1f6bffff64a677
    ecap 0017[1a0] = TPH Requester 1
pcib11@pci0:9:0:0:	class=0x060400 card=0x092115d9 chip=0x11501a03 rev=0x03 hdr=0x01
    vendor     = 'ASPEED Technology, Inc.'
    device     = 'AST1150 PCI-to-PCI Bridge'
    class      = bridge
    subclass   = PCI-PCI
    cap 05[50] = MSI supports 1 message, 64 bit 
    cap 01[78] = powerspec 3  supports D0 D1 D2 D3  current D0
    cap 10[80] = PCI-Express 1 PCI bridge max data 128(128) NS
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    cap 0d[a4] = PCI Bridge card=0x092115d9
    ecap 0002[100] = VC 1 max VC0
vgapci0@pci0:10:0:0:	class=0x030000 card=0x092115d9 chip=0x20001a03 rev=0x30 hdr=0x00
    vendor     = 'ASPEED Technology, Inc.'
    device     = 'ASPEED Graphics Family'
    class      = display
    subclass   = VGA
    cap 01[40] = powerspec 3  supports D0 D1 D2 D3  current D0
    cap 05[50] = MSI supports 4 messages, 64 bit 
igb2@pci0:11:0:0:	class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I350 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 ac1f6bffff64a678
    ecap 000e[150] = ARI 1
    ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                     0 VFs configured out of 8 supported
                     First VF RID Offset 0x0180, VF RID Stride 0x0004
                     VF Device ID 0x1520
                     Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304
    ecap 0017[1a0] = TPH Requester 1
    ecap 0018[1c0] = LTR 1
    ecap 000d[1d0] = ACS 1
igb3@pci0:11:0:1:	class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I350 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 ac1f6bffff64a678
    ecap 000e[150] = ARI 1
    ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                     0 VFs configured out of 8 supported
                     First VF RID Offset 0x0180, VF RID Stride 0x0004
                     VF Device ID 0x1520
                     Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304
    ecap 0017[1a0] = TPH Requester 1
    ecap 000d[1d0] = ACS 1
igb4@pci0:11:0:2:	class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I350 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 ac1f6bffff64a678
    ecap 000e[150] = ARI 1
    ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                     0 VFs configured out of 8 supported
                     First VF RID Offset 0x0180, VF RID Stride 0x0004
                     VF Device ID 0x1520
                     Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304
    ecap 0017[1a0] = TPH Requester 1
    ecap 000d[1d0] = ACS 1
igb5@pci0:11:0:3:	class=0x020000 card=0x152115d9 chip=0x15218086 rev=0x01 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I350 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 10 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x4(x4) speed 5.0(5.0) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 ac1f6bffff64a678
    ecap 000e[150] = ARI 1
    ecap 0010[160] = SR-IOV 1 IOV disabled, Memory Space disabled, ARI disabled
                     0 VFs configured out of 8 supported
                     First VF RID Offset 0x0180, VF RID Stride 0x0004
                     VF Device ID 0x1520
                     Page Sizes: 4096 (enabled), 8192, 65536, 262144, 1048576, 4194304
    ecap 0017[1a0] = TPH Requester 1
    ecap 000d[1d0] = ACS 1


I211 Gigabit devices
-------------------------------------------------------------------------------

igb0@pci0:1:0:0:	class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
    ecap 0003[140] = Serial 1 406231ffff06a5c2
    ecap 0017[1a0] = TPH Requester 1
igb1@pci0:2:0:0:	class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 2 corrected
    ecap 0003[140] = Serial 1 406231ffff06a5c3
    ecap 0017[1a0] = TPH Requester 1
igb2@pci0:3:0:0:	class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 2 corrected
    ecap 0003[140] = Serial 1 406231ffff06a5c4
    ecap 0017[1a0] = TPH Requester 1
igb3@pci0:4:0:0:	class=0x020000 card=0x00008086 chip=0x15398086 rev=0x03 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = 'I211 Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[40] = powerspec 3  supports D0 D3  current D0
    cap 05[50] = MSI supports 1 message, 64 bit, vector masks 
    cap 11[70] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    cap 10[a0] = PCI-Express 2 endpoint max data 128(512) FLR NS
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    ecap 0001[100] = AER 2 0 fatal 0 non-fatal 2 corrected
    ecap 0003[140] = Serial 1 406231ffff06a5c5
    ecap 0017[1a0] = TPH Requester 1
Comment 1 ncrogers 2019-03-25 19:04:37 UTC
Is there anything else I can do to make squashing this bug easier? I seem to recall a way to enable debug mode on an interface but I can't seem to figure this out post-iflib.
Comment 2 ncrogers 2019-03-26 16:52:43 UTC
FWIW this is also broken in latest 13-CURRENT snapshot.

FreeBSD 13.0-CURRENT r345355 GENERIC
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2019-03-30 03:21:45 UTC
Set version to earliest version issue was observed in.
Comment 4 ncrogers 2019-06-20 20:45:24 UTC
Curious if anyone else on the CC list is having the same problem or not?
Comment 5 karl 2019-06-20 21:02:56 UTC
(In reply to ncrogers from comment #4)

Maybe.

I had a very odd thing happen the other day; my PCEngines gateway/firewall machine, which has two of these interfaces in it, "disappeared" off the net.

I wasn't where the box was, so I couldn't physically check it from the console.  I was forced to have an untrained person reset it, and it came right back up.

BUT -- if the interface flapped and the upper levels got it wrong, well..... guess what -- no packets for you, which is exactly what it looked like.
Comment 6 Jeff 2019-06-20 21:03:23 UTC
(In reply to ncrogers from comment #4)
Yes, seing problem in FreeBSD 12.0-RELEASE-p5 (amd64) w/ I210 interface
Comment 7 karl 2019-06-20 21:04:36 UTC
BTW I've just updated the box (source build) to a pretty-current rev so we'll see if the problem is gone.....
Comment 8 James Snow 2019-08-08 22:09:19 UTC
Stumbled my way here because I've also just encountered this in 12.0-Rp7, also on PCEngines hardware. Happy to test fixes or provide additional information.
Comment 9 John Delano 2019-08-27 19:26:47 UTC
This looks similar to what I reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239240. 

Unplugging the LAN cable results in not getting notification in ifconfig of the link dropping. Furthermore, Leaving the cable unplugged for any extended time makes the interface unusable throwing "igb0: TX(0) desc avail = 1024, pidx = 0" or "ix0: TX(0) desc avail = 1024, pidx = 0". 

The interface will not resume working until a reboot.
Comment 10 Krzysztof Galazka 2019-09-23 18:12:24 UTC
Should be fixed with this patch: https://reviews.freebsd.org/D21769
Comment 11 James Snow 2019-09-25 09:28:30 UTC
D21769 appears to fix this for me. Thanks!
Comment 12 ncrogers 2019-09-25 18:51:48 UTC
(In reply to Krzysztof Galazka from comment #10)

D21769 fixes this for me as well. Thank you!
Comment 13 Kubilay Kocak freebsd_committer freebsd_triage 2019-09-26 09:15:36 UTC
@Krzysztof Could you please add re@f.o to the review as a blocking reviewer to get approval to merge this to releng/12.1 after stable/12 merge, so that this makes it to 12.1-RELEASE

Thanks
Comment 14 Harald Schmalzbauer 2019-09-26 17:39:07 UTC
*** Bug 240658 has been marked as a duplicate of this bug. ***
Comment 15 Eric Joyner freebsd_committer 2019-10-02 17:39:45 UTC
(In reply to Kubilay Kocak from comment #13)

Do you mean "releng"? I don't see an re@f.o option in Phabricator when I look at editing the reviewers list.
Comment 16 Kubilay Kocak freebsd_committer freebsd_triage 2019-10-03 12:05:38 UTC
(In reply to Eric Joyner from comment #15)

Apologies Eric, ignore the request to add re@f.o to reviews, as they shouldn't be involved until a change at least makes it to head. The re@ approval process is currently entirely *after* commit/merge.

The reason I had suggested that originally, was there was no attachment (patch) here to add re@f.o to for approval (we can do that with attachment flags).

Given this issue blocks bug 240700, it is now on re's radar, so I'm less worried about an important issue missing 12.1-R, so..

Once this issue is committed (to head) and merged (to stable/*), you can ask re@f.o for explicit merge approval to releng/12.1 for 12.1-R inclusion.
Comment 17 Harald Schmalzbauer 2019-10-08 18:24:14 UTC
I'd like to add that marius@'s approach in https://reviews.freebsd.org/D21924 has the same effect – from the operator's view – like the original tested D21769.
Once the interface was "up", link state change is correctly detected (again tested with 82574L (em) and igb(4)s 82576, i210, i350).

If the interface wasn't configured/brought up, link state changes to "active" but never back, which seems to be by design, according to that report:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240818


Thanks,

-harry
Comment 18 commit-hook freebsd_committer 2019-10-20 17:41:29 UTC
A commit references this bug:

Author: marius
Date: Sun Oct 20 17:40:50 UTC 2019
New revision: 353778
URL: https://svnweb.freebsd.org/changeset/base/353778

Log:
  - In em_intr(), just call em_handle_link() instead of duplicating it.
  - In em_msix_link(), properly handle IGB-class devices after the iflib(4)
    conversion again by only setting EM_MSIX_LINK for the EM-class 82574
    and by re-arming link interrupts unconditionally, i. e. not only in
    case of spurious interrupts. This fixes the interface link state change
    detection for the IGB-class. [1]
  - In em_if_update_admin_status(), only re-arm the link state change
    interrupt for 82574 and also only if such a device uses MSI-X, i. e.
    takes advantage of autoclearing. In case of INTx and MSI as well as
    for LEM- and IGB-class devices, re-arming isn't appropriate here and
    setting EM_MSIX_LINK isn't either.
    While at it, consistently take advantage of the hw variable.

  PR:	236724 [1]
  Differential Revision:	https://reviews.freebsd.org/D21924

Changes:
  head/sys/dev/e1000/if_em.c
Comment 19 commit-hook freebsd_committer 2019-10-24 14:19:02 UTC
A commit references this bug:

Author: marius
Date: Thu Oct 24 14:18:06 UTC 2019
New revision: 354021
URL: https://svnweb.freebsd.org/changeset/base/354021

Log:
  MFC: r353778

  - In em_intr(), just call em_handle_link() instead of duplicating it.
  - In em_msix_link(), properly handle IGB-class devices after the iflib(4)
    conversion again by only setting EM_MSIX_LINK for the EM-class 82574
    and by re-arming link interrupts unconditionally, i. e. not only in
    case of spurious interrupts. This fixes the interface link state change
    detection for the IGB-class. [1]
  - In em_if_update_admin_status(), only re-arm the link state change
    interrupt for 82574 and also only if such a device uses MSI-X, i. e.
    takes advantage of autoclearing. In case of INTx and MSI as well as
    for LEM- and IGB-class devices, re-arming isn't appropriate here and
    setting EM_MSIX_LINK isn't either.
    While at it, consistently take advantage of the hw variable.

  PR:	236724 [1]
  Differential Revision:	https://reviews.freebsd.org/D21924

Changes:
_U  stable/12/
  stable/12/sys/dev/e1000/if_em.c
Comment 20 Vinícius Zavam freebsd_committer 2019-12-11 13:40:29 UTC
I did see it happening in HEAD (r355121/amd64) during a silly test I was conducting trying to reproduce bug #239240. should we reopen this issue?

used components to reproduce it;

 - physical hardware (https://www.dell.com/en-us/work/shop/povw/poweredge-r440);
 - intel i350 gigabit network card [igb] (class=0x020000 rev=0x01 hdr=0x00 vendor=0x8086 device=0x1521 subvendor=0x8086 subdevice=0x0002);
 - live image (https://download.freebsd.org/ftp/snapshots/amd64/amd64/ISO-IMAGES/13.0/FreeBSD-13.0-CURRENT-amd64-20191127-r355121-mini-memstick.img).

how to reproduce;

 - leave NO ethernet cables plugged to the net. card's ports;
 - boot the live image, and choose 'live cd' option (log in as root, of course);
 - perform an `ifconfig` to get actual status/options of all igb interfaces;
 - plug a cable to the igb interface of your choice (the other end must be connected to a switch or anything that can trigger layer1 activity);
 - perform an `ifconfig` to get actual status/options of all igb interfaces;
 - unplug the cable;
 - perform an `ifconfig` to get actual status/options of all igb interfaces.

NOTE: after performing an `ifconfig igb0 up` and testing it all again, the feedback to its physical status are all fine.

* igb0 is the only one showing 'WOL_MAGIC' but that's maybe some setting from the BIOS I should check in a few.

=====
igb0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
        ether b4:96:91:62:ef:92
        media: Ethernet autoselect
        status: no carrier
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
igb1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4e507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
        ether b4:96:91:62:ef:93
        media: Ethernet autoselect
        status: no carrier
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
=====



MORE ABOUT IT: the odd situation is that we can see "status:" on all intel interfaces even before performing the very first `ifconfig igb0 up` --- for the onboard broadcom interfaces, there's no such a thing (all broadcom stay DOWN and display no "status:" line before performing `ifconfig bge0 up`)

=====
bge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=c019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
        ether 4c:d9:8f:8f:11:9a
        media: Ethernet autoselect
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
bge1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=c019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
        ether 4c:d9:8f:8f:11:9b
        media: Ethernet autoselect
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
=====
Comment 21 Vinícius Zavam freebsd_committer 2019-12-11 13:47:49 UTC
btw, I also tested stable/12 and releng/12.1 live images! same behavior
Comment 22 Vinícius Zavam freebsd_committer 2019-12-11 14:08:37 UTC
(In reply to Vinícius Zavam from comment #21)

should this be tagged to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240700 as well?

PS: my bad on the 'BIOS settings' commend - the Intel card is not an onboard one, so there was/is no options on the BIOS to play with the wake-on-lan option.
Comment 23 Vinícius Zavam freebsd_committer 2019-12-11 14:21:47 UTC
I am reopenig this one and adding to the META PR, please check the "See Also" reports as well.
Comment 24 Vinícius Zavam freebsd_committer 2019-12-16 18:18:22 UTC
(In reply to Vinícius Zavam from comment #21)

same thing also when using stable/11 or releng/11.3. was this thing *ALWAYS* behaving like this?

as mentioned before: setting up the interface with a regular 'ifconfig_igb0="up"' on the rc.conf, its states changes behaves just fine. still looks odd.
Comment 25 ncrogers 2019-12-16 21:27:27 UTC
(In reply to Vinícius Zavam from comment #24)
FWIW, myy systems were on 11.1 for a while where it did not happen, and then I noticed it when switching many systems over to RELEASE-12.0. I don't think it always happened, but perhaps it started somewhere in 11/stable.

I am still running 12.0 with the D21769 patch, which fixed the problem for me, but it looks like some different fixes went into 12.1?
Comment 26 Marius Halden 2019-12-17 09:22:19 UTC
We are seeing what seems to be the same problem as Vinícius described on FreeBSD 12.1-RELEASE-p1.
Comment 27 Vinícius Zavam freebsd_committer 2019-12-17 11:06:04 UTC
truly hope I am not testing it wrong, because after trying the same steps with an 10.4-RELEASE I got the same results. used very same hardware as described on 'Comment 20'

image that I used? http://ftp-archive.freebsd.org/pub/FreeBSD-Archive/old-releases/amd64/amd64/ISO-IMAGES/10.4/FreeBSD-10.4-RELEASE-amd64-uefi-mini-memstick.img.xz [decompressed and 'dd' to an USB stick, of course]
Comment 28 Marius Halden 2019-12-17 16:05:18 UTC
From what I can see the patch from marius@ was never merged into 12.1, is that correct?
Comment 29 Marius Halden 2019-12-17 17:23:56 UTC
(In reply to Marius Halden from comment #28)

I tried rebuilding the releng/12.1 kernel with the patch from D21924 applied. With the patch applied I've so far been unable to reproduce the issues we've been having.

Maybe there should be an errata for this?
Comment 30 Eirik Oeverby 2019-12-17 17:42:32 UTC
Strongly support getting this fix out there as quickly as possible. There is very significant fallout from this, with interface failover and all sorts of other things depending on link detection being rendered useless.
Comment 31 Marius Halden 2019-12-17 17:43:38 UTC
(In reply to Marius Halden from comment #29)

It actually looks like the disabling msix for the interface with a loader tunable mitigates the (most obvious) issues we have been having without patching the 12.1 kernel.

dev.igb.0.iflib.disable_msix=1
Comment 32 Vinícius Zavam freebsd_committer 2020-01-06 09:34:09 UTC
(In reply to Marius Halden from comment #31)

it did not work for me, following the steps I reported on Comment 20. D21712 also did not fix it (for me, again, following same testing steps).

used HEAD@r356310 and created an USB bootable image with the following release.conf:

#!/bin/sh
######################################################################
# WITH_DVD=
CHROOTDIR="/builder2/freebsd/scratch/head"
DOC_UPDATE_SKIP=1
KERNEL="GENERIC"
MAKE_FLAGS="-s -j4"
NODOC=
NOPORTS=
PORTS_UPDATE_SKIP=1
SRCBRANCH="base/head@rHEAD"
SVNROOT="https://svn.freebsd.org"
TARGET="amd64"
TARGET_ARCH="amd64"
Comment 33 Marius Halden 2020-01-06 13:53:58 UTC
(In reply to Vinícius Zavam from comment #32)

The comment from me earlier is only a reference to 12.1-RELEASE. I haven't tested on anything else. From what I can see the patch I referenced is in HEAD and 12-STABLE, but has not yet been merged to releng/12.1.
Comment 34 Marius Strobl freebsd_committer 2020-01-08 00:10:26 UTC
(In reply to Vinícius Zavam from comment #32)

The fix for this PR, i. e. link state change detection for interfaces in the up state, didn't make it into 12.1 as RC3 was cancelled, unfortunately. Disabling the use of MSI-X as described in comment 32 is a viable workaround, though.

Comment 20 describes an orthogonal bug consisting in link status being reported for interfaces in the down state, while the expected behavior for an interface in this state is that no link status is reported and that - unless WOL is enabled - its PHY(s) is/are shut down.

I'm closing this PR again as the regression it's about has been fixed and I won't file an EN request for the fix.
Comment 35 Marius Strobl freebsd_committer 2020-01-08 00:13:21 UTC
Sorry, typo; the workaround actually has been described in comment 31.
Comment 36 Steven Hartland freebsd_committer 2020-01-08 08:46:43 UTC
Won't disabling MSI-X destroy the performance, if so feels like this might be worth an EN?
Comment 37 Eirik Oeverby 2020-01-08 08:51:56 UTC
(In reply to Steven Hartland from comment #36)

Strongly agree.
Comment 38 Julien Cigar 2020-04-07 14:23:14 UTC
+1 for an EN, as lagg is unable to detect when the interface is down (and thus unusable..)