Bug 193601 - em0: link state changed to DOWN / UP
Summary: em0: link state changed to DOWN / UP
Status: Closed Overcome By Events
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.0-STABLE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-13 11:57 UTC by freebsd
Modified: 2014-09-25 23:01 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description freebsd 2014-09-13 11:57:15 UTC
I have FreeBSD 10.0-STABLE running on an Intel NUC and the network connection is being dropped every minute or so.

I have seen similar issues related to earlier versions of FreeBSD, but none for 10.0 yet.
See also: 165509(fixed), 123347(fixed), 118927(in discussion)

Here are some (hopefully relevant) extracts from /var/log/messages:

Sep 13 04:11:02 domus kernel: FreeBSD 10.0-STABLE #2 a2e15fc(master)-dirty: Mon Mar  3 17:26:56 CET 2014
Sep 13 04:11:02 domus kernel: FreeBSD clang version 3.3 (tags/RELEASE_33/final 183502) 20130610
Sep 13 04:11:02 domus kernel: CPU: Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz (1895.65-MHz K8-class CPU)
Sep 13 04:11:02 domus kernel: Origin = "GenuineIntel"  Id = 0x40651  Family = 0x6  Model = 0x45  Stepping = 1
Sep 13 04:11:02 domus kernel: Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
Sep 13 04:11:02 domus kernel: Features2=0x7ffafbbf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,<b11>,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
Sep 13 04:11:02 domus kernel: AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
Sep 13 04:11:02 domus kernel: AMD Features2=0x21<LAHF,ABM>
Sep 13 04:11:02 domus kernel: Standard Extended Features=0x27ab<GSFSBASE,TSCADJ,BMI1,AVX2,SMEP,BMI2,ENHMOVSB,INVPCID>
Sep 13 04:11:02 domus kernel: TSC: P-state invariant, performance statistics
Sep 13 04:11:02 domus kernel: real memory  = 17179869184 (16384 MB)
Sep 13 04:11:02 domus kernel: avail memory = 16523751424 (15758 MB)
Sep 13 04:11:02 domus kernel: Event timer "LAPIC" quality 600
Sep 13 04:11:02 domus kernel: ACPI APIC Table: <INTEL  D54250WY>
Sep 13 04:11:02 domus kernel: FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
Sep 13 04:11:02 domus kernel: FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 SMT threads
Sep 13 04:11:02 domus kernel: cpu0 (BSP): APIC ID:  0
Sep 13 04:11:02 domus kernel: cpu1 (AP): APIC ID:  1
Sep 13 04:11:02 domus kernel: cpu2 (AP): APIC ID:  2
Sep 13 04:11:02 domus kernel: cpu3 (AP): APIC ID:  3

Sep 13 04:11:02 domus kernel: ACPI Error: Gpe0Block - 32-bit FADT register is too long (32 bytes, 256 bits) to convert to GAS struct - 255 bits max, truncating (20130823/tbfadt-223)

Sep 13 04:11:02 domus kernel: atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0
Sep 13 04:11:02 domus kernel: atrtc0: Warning: Couldn't map I/O.

Sep 13 04:11:02 domus kernel: pci0: <simple comms> at device 22.0 (no driver attached)
Sep 13 04:11:02 domus kernel: em0: <Intel(R) PRO/1000 Network Connection 7.3.8> port 0xf080-0xf09f mem 0xf7c00000-0xf7c1ffff,0xf7c34000-0xf7c34fff irq 20 at device 25.0 on pci0
Sep 13 04:11:02 domus kernel: em0: No MSI/MSIX using a Legacy IRQ

Sep 13 04:11:02 domus kernel: Setting hostname: domus.
Sep 13 04:11:02 domus kernel: Starting Network: lo0 em0.
Sep 13 04:11:02 domus kernel: lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
Sep 13 04:11:02 domus kernel: options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
Sep 13 04:11:02 domus kernel: inet6 ::1 prefixlen 128
Sep 13 04:11:02 domus kernel: inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
Sep 13 04:11:02 domus kernel: inet 127.0.0.1 netmask 0xff000000
Sep 13 04:11:02 domus kernel: nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Sep 13 04:11:02 domus kernel: em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
Sep 13 04:11:02 domus kernel: options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
Sep 13 04:11:02 domus kernel: ether xx:xx:xx:xx:xx:xx
Sep 13 04:11:02 domus kernel: inet 192.168.0.5 netmask 0xffffff00 broadcast 192.168.0.255
Sep 13 04:11:02 domus kernel: inet6 fe80::xxxx:xxxx:xxxx:xxxx%em0 prefixlen 64 scopeid 0x1
Sep 13 04:11:02 domus kernel: inet6 fd00::5 prefixlen 64
Sep 13 04:11:02 domus kernel: nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
Sep 13 04:11:02 domus kernel: media: Ethernet autoselect
Sep 13 04:11:02 domus kernel: status: no carrier
Sep 13 04:11:02 domus kernel: Starting devd.
Sep 13 04:11:02 domus kernel: add net default: gateway 192.168.0.1
Sep 13 04:11:02 domus kernel: add net fe80::: gateway ::1
Sep 13 04:11:02 domus kernel: add net ff02::: gateway ::1
Sep 13 04:11:02 domus kernel: add net ::ffff:0.0.0.0: gateway ::1
Sep 13 04:11:02 domus kernel: add net ::0.0.0.0: gateway ::1
Sep 13 04:11:02 domus kernel: add net default: gateway fe80::xxxx:xxxx:xxxx:xxxx%em0
Sep 13 04:11:02 domus kernel: Additional inet6 routing options: gateway=YES.
... time goes by ...
Sep 13 04:26:47 domus kernel: em0: link state changed to DOWN
Sep 13 04:26:51 domus kernel: em0: link state changed to UP
Sep 13 04:26:52 domus kernel: em0: link state changed to DOWN
Sep 13 04:26:56 domus kernel: em0: link state changed to UP
Sep 13 04:27:08 domus kernel: em0: link state changed to DOWN
Sep 13 04:27:11 domus kernel: em0: link state changed to UP
Sep 13 04:28:07 domus kernel: em0: link state changed to DOWN
Sep 13 04:28:10 domus kernel: em0: link state changed to UP
Sep 13 04:29:13 domus kernel: em0: link state changed to DOWN
Sep 13 04:29:17 domus kernel: em0: link state changed to UP
Sep 13 04:30:24 domus kernel: em0: link state changed to DOWN
Sep 13 04:30:27 domus kernel: em0: link state changed to UP

% pciconf -lcv
em0@pci0:0:25:0:        class=0x020000 card=0x20548086 chip=0x15598086 rev=0x04 hdr=0x00
    vendor     = 'Intel Corporation'
    class      = network
    subclass   = ethernet
    cap 01[c8] = powerspec 2  supports D0 D3  current D0
    cap 05[d0] = MSI supports 1 message, 64 bit 
    cap 13[e0] = PCI Advanced Features: FLR TP

% devinfo -rv | grep em0
        em0 pnpinfo vendor=0x8086 device=0x1559 subvendor=0x8086 subdevice=0x2054 class=0x020000 at slot=25 function=0 handle=\_SB_.PCI0.GLAN

% ifconfig em0
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
        ether xx:xx:xx:xx:xx:xx
        inet 192.168.0.5 netmask 0xffffff00 broadcast 192.168.0.255 
        inet6 fe80::eea8:6bff:feff:db77%em0 prefixlen 64 scopeid 0x1 
        inet6 fd00::5 prefixlen 64 
        inet6 fd00::xxxx:xxxx:xxxx:xxxx prefixlen 64 autoconf 
        inet6 2002:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx prefixlen 64 autoconf 
        inet6 2002:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx prefixlen 64 autoconf temporary 
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active

The NUC is about 6 months old. It has shown wonderful performance but it has always had
networking issues, so I am not sure if something has been poorly configured or if the
hardware is going bad on me. How could I trace the cause of the DOWN/UP state changes?
Or, is there any way to entice causal messages with link state change messages?
e.g.: "link state changed to DOWN because ..."

Thanks in advance,

Steve
Comment 1 Eric Joyner 2014-09-17 18:55:20 UTC
Hi,

Do the link UP/DOWN messages generally occur after ~15 minutes, or does the time between initializing the adapter and those messages vary?

Do you send IPv4 or IPv6 traffic, specific UDP/TCP traffic like NFS, or does the adapter idle until the link starts flapping?

By default, the adapter should use MSI instead of legacy interrupts. Did you change that setting?

Jack (the maintainer) is on a vacation, so I get to look at these bugs while he's gone. We tried passing TCP/IPv4 traffic using a NUC we have in-house, and we couldn't get the link up/down messages after ~30 minutes, so I'm wondering if we need to do something specific to reproduce this issue.

- Eric
Comment 2 freebsd 2014-09-18 11:02:12 UTC
(In reply to Eric Joyner from comment #1)
> Hi,

Thanks for taking the time!

> Do the link UP/DOWN messages generally occur after ~15 minutes, or does the
> time between initializing the adapter and those messages vary?

weird! Now they have stopped just as quick as they started!
Grepping through my logs I can see that I had this problem between Sept. 8 - 13
(my oldest logfile goes back to the 6th).

Here are the first few entries from /var/log/messages:
(IP's & hostnames redacted)

Sep  8 04:04:49 domus afpd[93184]: afp_alarm: reconnect timer expired, goodbye
Sep  8 04:04:49 domus afpd[93184]: Disconnected session terminating
Sep  8 08:18:06 domus kernel: em0: link state changed to DOWN
Sep  8 08:18:09 domus kernel: em0: link state changed to UP
Sep  8 08:18:09 domus devd: Executing '/etc/rc.d/dhclient quietstart em0'
Sep  8 14:07:24 domus ntpd[886]: bind() fd 31, family AF_INET6, port 123, scope 0, addr xxxx::xxxx:xxxx:xxxx:xxxx, mc
Sep  8 14:07:24 domus ntpd[886]: unable to create socket on em0 (176) for xxxx::xxxx:xxxx:xxxx:xxxx#123
Sep  8 15:03:15 domus kernel: em0: link state changed to DOWN
Sep  8 15:03:19 domus kernel: em0: link state changed to UP
Sep  8 15:03:19 domus devd: Executing '/etc/rc.d/dhclient quietstart em0'
Sep  8 18:56:33 domus kernel: em0: link state changed to DOWN
Sep  8 18:56:37 domus kernel: em0: link state changed to UP
Sep  8 18:56:37 domus devd: Executing '/etc/rc.d/dhclient quietstart em0'
Sep  8 21:10:26 domus ntpd[886]: bind() fd 32, family AF_INET6, port 123, scope 0, addr xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx
Sep  8 21:10:26 domus ntpd[886]: unable to create socket on em0 (182) for xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx#123
Sep  8 21:10:26 domus ntpd[886]: bind() fd 32, family AF_INET6, port 123, scope 0, addr xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx
Sep  8 21:10:26 domus ntpd[886]: unable to create socket on em0 (183) for xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx#123
Sep  8 21:11:54 domus kernel: em0: link state changed to DOWN
Sep  8 21:11:57 domus kernel: em0: link state changed to UP
Sep  8 21:11:57 domus devd: Executing '/etc/rc.d/dhclient quietstart em0'

and here the last few entries:

Sep 13 19:32:52 domus kernel: em0: link state changed to DOWN
Sep 13 19:32:56 domus kernel: em0: link state changed to UP
Sep 13 19:32:56 domus devd: Executing '/etc/rc.d/dhclient quietstart em0'
Sep 13 19:34:10 domus kernel: em0: link state changed to DOWN
Sep 13 19:34:14 domus kernel: em0: link state changed to UP
Sep 13 19:34:14 domus devd: Executing '/etc/rc.d/dhclient quietstart em0'
Sep 13 19:37:54 domus kernel: em0: link state changed to DOWN
Sep 13 19:37:57 domus kernel: em0: link state changed to UP
Sep 13 19:37:57 domus devd: Executing '/etc/rc.d/dhclient quietstart em0'
Sep 13 19:39:39 domus kernel: em0: link state changed to DOWN
Sep 13 19:39:43 domus kernel: em0: link state changed to UP
Sep 13 19:39:43 domus devd: Executing '/etc/rc.d/dhclient quietstart em0'
Sep 13 19:40:04 domus rpc.statd: Failed to contact host backups.home: RPC: Port mapper failure - RPC: Timed out
Sep 13 19:40:18 domus kernel: em0: link state changed to DOWN
Sep 13 19:40:49 domus kernel: em0: link state changed to UP
Sep 13 19:40:49 domus devd: Executing '/etc/rc.d/dhclient quietstart em0'
Sep 13 19:42:04 domus rpc.statd: Failed to contact host laptop.home: RPC: Port mapper failure - RPC: Timed out
Sep 13 20:44:04 domus rpc.statd: Failed to contact host backups.home: RPC: Port mapper failure - RPC: Timed out
Sep 13 20:46:05 domus rpc.statd: Failed to contact host laptop.home: RPC: Port mapper failure - RPC: Timed out
Sep 13 21:48:05 domus rpc.statd: Failed to contact host backups.home: RPC: Port mapper failure - RPC: Timed out
Sep 13 21:50:05 domus rpc.statd: Failed to contact host laptop.home: RPC: Port mapper failure - RPC: Timed out
Sep 13 22:52:05 domus rpc.statd: Failed to contact host backups.home: RPC: Port mapper failure - RPC: Timed out
Sep 13 22:54:05 domus rpc.statd: Failed to contact host laptop.home: RPC: Port mapper failure - RPC: Timed out
Sep 13 23:56:05 domus rpc.statd: Failed to contact host backups.home: RPC: Port mapper failure - RPC: Timed out
Sep 13 23:58:05 domus rpc.statd: Failed to contact host laptop.home: RPC: Port mapper failure - RPC: Timed out
Sep 14 00:07:24 domus afpd[80087]: Login by steve (AFP3.3)

Here's a quick summary of how often it occurred:
        drops   max
        / day   / hour
Sep  8:   5      1
Sep  9:  26      6
Sep 10:  58      7
Sep 11: 273     18
Sep 12: 620     38
Sep 13: 453     33

No configuration changes took place at either end - although I did reboot a
few times in the hopes of shaking out some cobwebs. And as the log shows, there
wasn't even a reboot around the time it stopped.

> Do you send IPv4 or IPv6 traffic, specific UDP/TCP traffic like NFS, or does
> the adapter idle until the link starts flapping?

Yes, I use both. I actually tried to set up an IPv6-only network earlier
this year, but failed - too many utils still fall back to IPv4.
I've got NFS and AFP deamons running but their log messages don't seem to
coincide with the link drops. (at least not obviously)

> By default, the adapter should use MSI instead of legacy interrupts. Did you
> change that setting?

Not knowingly. Where / how would I go about changing it? How can I find out
what setting(s) I currently have?
As a stab in the dark, is this what you're talking about?

# sysctl -a | grep -i msi
hw.aac.enable_msi: 1
hw.bce.msi_enable: 1
hw.em.enable_msix: 1
hw.igb.enable_msix: 1
hw.ix.enable_msix: 1
hw.malo.pci.msi_disable: 0
hw.mfi.msi: 1
hw.pci.enable_msi: 1
hw.pci.enable_msix: 1
hw.pci.honor_msi_blacklist: 1

This actually more my problem: I see that something is wrong, but have no idea
how to get more information from the system about the cause.
I've got 20 years of unix experience but only a few months with FreeBSD,
it's all so familiar and yet still seems like black magic at times :-(

Again, thanks for your time!

Steve
Comment 3 freebsd 2014-09-22 07:03:48 UTC
...and their back again, this time with up to 80 drops per hour,
every hour, regardless of traffic, load or memory use.
(The machine is mostly idle)

I'd really love to be able to increase the logging level for the
link controller / driver. Is there a way to do this?

Any tips would be greatly appreciated!

Thanks in advance,

Steve
Comment 4 Eric Joyner 2014-09-22 16:25:55 UTC
Unfortunately, I don't think there's a way to increase the debug logging level of the driver. You can try setting dev.em.0.debug to 1 and reading the output in /var/log/messages, but that information probably isn't very useful in this case. Have you tried connecting something else to the NUC's link partner instead of the NUC, or connecting something else to the NUC, and seeing if the link flaps occur?
Comment 5 freebsd 2014-09-25 23:01:10 UTC
update:

although I was unable to determine any conclusive faults, I replaced the network switch
today and it appears that this solved the problem.

Thanks for helping.

Steve