Bug 168217 - [bce] Watchdog timeouts with bce(4) on BCM5716
Summary: [bce] Watchdog timeouts with bce(4) on BCM5716
Status: Closed Feedback Timeout
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 9.0-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Pyun YongHyeon
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-22 06:30 UTC by Xin LI
Modified: 2018-05-30 00:18 UTC (History)
2 users (show)

See Also:


Attachments
screenshot 2012-05-23 11.03.48.png (155.18 KB, image/png)
2012-05-23 04:11 UTC, Nemo Liu
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Xin LI freebsd_committer freebsd_triage 2012-05-22 06:30:10 UTC
	The system sometimes stop to respond network traffic with the following
in system log:


May 22 07:40:01 sanji kernel: Limiting closed port RST response from 235 to 200 packets/sec
May 22 09:41:41 sanji kernel: bce1: /usr/src/sys/dev/bce/if_bce.c(7628): Watchdog timeout occurred, resetting!
May 22 09:41:41 sanji kernel: bce1: link state changed to DOWN
May 22 09:41:43 sanji kernel: bce1: discard frame w/o leading ethernet header (len 0 pkt len 0)

	This also sometimes lead to panic:

May 22 12:42:21 sanji kernel: bce0: discard frame w/o leading ethernet header (len 0 pkt len 0)
May 22 12:42:21 sanji kernel: bce0: discard frame w/o leading ethernet header (len 0 pkt len 0)
May 22 12:42:21 sanji kernel:
May 22 12:42:21 sanji kernel:
May 22 12:42:21 sanji kernel: Fatal trap 12: page fault while in kernel mode
May 22 12:42:21 sanji kernel: cpuid = 13; apic id = 13
May 22 12:42:21 sanji kernel: fault virtual address     = 0x18
May 22 12:42:21 sanji kernel: fault code                = supervisor read data, page not present
May 22 12:42:21 sanji kernel: instruction pointer       = 0x20:0xffffffff80403a46
			(this is RELENG_9_0 sys/dev/bce/bce.c:6449)
May 22 12:42:21 sanji kernel: stack pointer             = 0x28:0xffffff84601e5a80
May 22 12:42:21 sanji kernel: frame pointer             = 0x28:0xffffff84601e5b40
May 22 12:42:21 sanji kernel: code segment              = base rx0, limit 0xfffff, type 0x1b
May 22 12:42:21 sanji kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
May 22 12:42:21 sanji kernel: processor eflags  = interrupt enabled, resume, IOPL = 0
May 22 12:42:21 sanji kernel: current process           = 12 (irq256: bce0)

	Note that the system does not have jumbo enabled, however, split_hdr is enabled.
Looking at the code, it was a hardwired value.

	The system used to run stock FreeBSD 8.2-RELEASE and now run a patched 9.0-RELEASE.

	The host system is a Dell PowerEdge R410.


bce0: <Broadcom NetXtreme II BCM5716 1000Base-T (C0)> mem 0xd6000000-0xd7ffffff irq 36 at device 0.0 on pci1
miibus0: <MII bus> on bce0
brgphy0: <BCM5709 10/100/1000baseT PHY> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bce0: Ethernet address: 78:2b:cb:74:82:a6
bce0: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3); Bufs (RX:2;TX:2;PG:0); Flags (MSI|MFW); MFW (NCSI 2.0.11)
bce1: <Broadcom NetXtreme II BCM5716 1000Base-T (C0)> mem 0xd8000000-0xd9ffffff irq 48 at device 0.1 on pci1
miibus1: <MII bus> on bce1
brgphy1: <BCM5709 10/100/1000baseT PHY> PHY 1 on miibus1
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bce1: Ethernet address: 78:2b:cb:74:82:a7
bce1: ASIC (0x57092008); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3); Bufs (RX:2;TX:2;PG:0); Flags (MSI|MFW); MFW (NCSI 2.0.11)
bce2: <Broadcom NetXtreme II BCM5709 1000Base-T (C0)> mem 0xda000000-0xdbffffff irq 38 at device 0.0 on pci3
miibus2: <MII bus> on bce2
brgphy2: <BCM5709 10/100/1000baseT PHY> PHY 1 on miibus2
brgphy2:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bce2: Ethernet address: 00:10:18:bc:f6:30
bce2: ASIC (0x57092003); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3); Bufs (RX:2;TX:2;PG:0); Flags (MSI)
bce3: <Broadcom NetXtreme II BCM5709 1000Base-T (C0)> mem 0xdc000000-0xddffffff irq 45 at device 0.1 on pci3
miibus3: <MII bus> on bce3
brgphy3: <BCM5709 10/100/1000baseT PHY> PHY 1 on miibus3
brgphy3:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bce3: Ethernet address: 00:10:18:bc:f6:32
bce3: ASIC (0x57092003); Rev (C0); Bus (PCIe x4, 2.5Gbps); B/C (5.2.3); Bufs (RX:2;TX:2;PG:0); Flags (MSI)
bce1: Gigabit link up!
bce1: Gigabit link up!
bce0: Gigabit link up!
bce0: Gigabit link up!

bce0@pci0:1:0:0:        class=0x020000 card=0x028c1028 chip=0x163b14e4 rev=0x20 hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = 'NetXtreme II BCM5716 Gigabit Ethernet'
    class      = network
    subclass   = ethernet
bce1@pci0:1:0:1:        class=0x020000 card=0x028c1028 chip=0x163b14e4 rev=0x20 hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = 'NetXtreme II BCM5716 Gigabit Ethernet'
    class      = network
    subclass   = ethernet
mpt0@pci0:2:0:0:        class=0x010000 card=0x1f0f1028 chip=0x00581000 rev=0x08 hdr=0x00
    vendor     = 'LSI Logic / Symbios Logic'
    device     = 'SAS1068E PCI-Express Fusion-MPT SAS'
    class      = mass storage
    subclass   = SCSI
bce2@pci0:3:0:0:        class=0x020000 card=0x090714e4 chip=0x163914e4 rev=0x20 hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = 'NetXtreme II BCM5709 Gigabit Ethernet'
    class      = network
    subclass   = ethernet
bce3@pci0:3:0:1:        class=0x020000 card=0x090714e4 chip=0x163914e4 rev=0x20 hdr=0x00
    vendor     = 'Broadcom Corporation'
    device     = 'NetXtreme II BCM5709 Gigabit Ethernet'
    class      = network
    subclass   = ethernet

	More information will be available upon request.

Fix: 

Not known at this time.  We are testing with split_hdr disabled.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2012-05-22 07:44:13 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 2 Pyun YongHyeon freebsd_committer freebsd_triage 2012-05-22 11:36:45 UTC
State Changed
From-To: open->feedback

Your release information(i386) and environment(amd64) does not match. 
Are you using i386 or PAE? 
Having backtrace would be nice. 

It seems you have two differnt controllers(5716 and 5709). 
Does the issue happen only on 5716? 

Comment 3 Pyun YongHyeon freebsd_committer freebsd_triage 2012-05-22 11:36:45 UTC
Responsible Changed
From-To: freebsd-net->yongari

Grab. 

http://www.freebsd.org/cgi/query-pr.cgi?pr=168217 

Date: Tue, 22 May 2012 21:34:27 +0000
Comment 4 Nemo Liu 2012-06-01 08:30:51 UTC
I have tested on BCM5709 and BCM5709C(my other machine has 4 5709c), after
about one week, the same problem comes again.

by accident I also find the network is down when I do steps bellow :
1¡¢bce0 is pluged but no ip assigned , bce1¡¢bce3 is no carier, bce2 is
pluged and assigned ip.
2¡¢edit /etc/rc.conf , assign ip to bce3

ifconfig_bce3="inet 111.1.46.6 netmask 255.255.255.224"

then /etc/netstart restart

3¡¢unplug the bce0 , everything is fine
4¡¢plug the bce3, then both bce2 and bce3 are pingable, but after 3 seconds,
both  ips are unreachable...

the log from /var/log/messges

Jun  1 14:59:52 sanji kernel: bce3: link state changed to UP
Jun  1 14:59:52 sanji kernel: bce3: Gigabit link up!
Jun  1 15:00:45 sanji kernel: bce2: /usr/src/sys/dev/bce/if_bce.c(7907):
Watchdog timeout occurred, resetting!
Jun  1 15:00:45 sanji kernel: bce2: link state changed to DOWN
Jun  1 15:00:48 sanji kernel: bce2: link state changed to UP
Jun  1 15:00:48 sanji kernel: bce2: Gigabit link up!
Jun  1 15:00:48 sanji kernel: bce2: discard frame w/o leading ethernet
header (len 0 pkt len 0)
Jun  1 15:00:48 sanji last message repeated 9 times
Comment 5 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:43:35 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.