Bug 231166 - em(4) network link state flapping after 11.2 upgrade
Summary: em(4) network link state flapping after 11.2 upgrade
Status: Closed DUPLICATE of bug 229432
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Eugene Grosbein
URL:
Keywords: IntelNetworking, regression
Depends on:
Blocks:
 
Reported: 2018-09-05 01:52 UTC by steven_nikkel
Modified: 2018-10-20 07:18 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description steven_nikkel 2018-09-05 01:52:12 UTC
Immediately after doing an upgrade from 11.1-release to 11.2-release the network connection using em0 constantly goes up and down. It's driving an Intel 82574 card on PCI-E. I've tried rebooting, power cycling, using different cables, switch ports and a different switch with no change. Prior everything was stable for many moons in the same configuration. Switched to using the onboard Realtek in the system and that is also stable with the same cable/switch.

Sep  4 20:37:20 itsirk kernel: em0: link state changed to UP
Sep  4 20:37:20 itsirk kernel: em0: link state changed to DOWN
Sep  4 20:37:27 itsirk kernel: em0: link state changed to UP
Sep  4 20:37:27 itsirk kernel: em0: link state changed to DOWN
Sep  4 20:37:31 itsirk kernel: em0: link state changed to UP
Sep  4 20:37:31 itsirk kernel: em0: link state changed to DOWN

em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0xdf00-0xdf1f mem 0xfdcc0000-0xfdcdffff,0xfdc00000-0xfdc7ffff,0xfdcfc000-0xfdcfffff irq 19 at device 0.0 on pci2
em0: Using MSIX interrupts with 3 vectors
em0: Ethernet address: 00:1b:21:39:dc:5f
em0: netmap queues/slots: TX 1/1024, RX 1/1024

em0@pci0:3:0:0: class=0x020000 card=0xa01f8086 chip=0x10d38086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82574L Gigabit Network Connection'
    class      = network
    subclass   = ethernet
    cap 01[c8] = powerspec 2  supports D0 D3  current D0
    cap 05[d0] = MSI supports 1 message, 64 bit 
    cap 10[e0] = PCI-Express 1 endpoint max data 128(256) RO NS
                 link x1(x1) speed 2.5(2.5) ASPM disabled(L0s/L1)
    cap 11[a0] = MSI-X supports 5 messages, enabled
                 Table in map 0x1c[0x0], PBA in map 0x1c[0x2000]
    ecap 0001[100] = AER 1 0 fatal 0 non-fatal 0 corrected
    ecap 0003[140] = Serial 1 001b21ffff39dc5f
Comment 1 Compri 2018-09-05 07:56:05 UTC
Hi, it seems to be, a problem looks at similar  on my case: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=231169
Comment 2 Compri 2018-09-05 12:14:13 UTC
from intel manual, try this:

Important system configuration changes:
  ---------------------------------------

  When there is a choice run on a 64bit OS rather than 32, it makes a 
  significant difference in improvement.
  
  The interface can generate high number of interrupts. To avoid running 
  into the limit set by the kernel, adjust hw.intr_storm_threshold 
  setting using sysctl:
 
       sysctl hw.intr_storm_threshold=9000 (the default is 1000)

  For this change to take effect on boot, edit /etc/sysctl.conf and add the 
  line:  
       hw.intr_storm_threshold=9000

  If you still see Interrupt Storm detected messages, increase the limit to a
  higher number.

  Best throughput results are seen with a large MTU; use 9000 if possible.
Comment 3 steven_nikkel 2018-09-05 18:17:42 UTC
Weird behaviour today. I left the cable unplugged over night. Plugged it in this morning and the interface worked properly. It worked fine for an hour while I watched and poked at it, passed IPv6 packets fine, picked up IPv4 DHCP, forgot to test otherwise. Then I downed the realtek interface and things seemed to stop working on the intel interface as well. Downed and upped the intel interface and it came back to life, but the link state flapping started again. Trying all the usual things isn't making it go away again either.

Per Compri, I am using 64bit OS, not seeing Interrupt Storm messages, don't think it's interrupt rate related as there is hardly any traffic flowing when this starts, but I did try upping the setting, which had no effect.
Comment 4 steven_nikkel 2018-10-19 16:38:11 UTC
Tried a different 82574 card I had and the behaviour was the same.
Comment 5 Eugene Grosbein freebsd_committer freebsd_triage 2018-10-19 17:53:15 UTC
As you use DHCP, this problem seems to be duplicate of already resolved problem https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=229432. Take a look  for a workaround and solution.
Comment 6 Kubilay Kocak freebsd_committer freebsd_triage 2018-10-20 07:18:21 UTC
Close per comment 5

@Steven Please re-open this issue with additional information if the workaround described in bug 229432 does not resolve the problem.

*** This bug has been marked as a duplicate of bug 229432 ***