Bug 213751 - bfe(4) link state goes down and up when big transfers start over socket to geom volume or ZFS pool
Summary: bfe(4) link state goes down and up when big transfers start over socket to ge...
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 11.0-STABLE
Hardware: i386 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-24 19:53 UTC by Michael Osipov
Modified: 2023-06-20 20:45 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Osipov 2016-10-24 19:53:17 UTC
My system:
===================================
Copyright (c) 1992-2016 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.0-STABLE #1 r307823: Mon Oct 24 09:09:51 CEST 2016
    mosipov@bsd1home:/usr/obj/usr/src/sys/BSD1HOME i386
FreeBSD clang version 3.8.0 (tags/RELEASE_380/final 262564) (based on LLVM 3.8.0)
VT(vga): resolution 640x480
CPU: Intel(R) Pentium(R) 4 CPU 2.40GHz (2405.51-MHz 686-class CPU)
  Origin="GenuineIntel"  Id=0xf27  Family=0xf  Model=0x2  Stepping=7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x4400<CNXT-ID,xTPR>
real memory  = 2147483648 (2048 MB)
avail memory = 2076766208 (1980 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <ASUS   P4PE    >
random: unblocking device.
ioapic0 <Version 2.0> irqs 0-23 on motherboard
random: entropy device external interface
kbd1 at kbdmux0
module_register_init: MOD_LOAD (vesa, 0x8129c9c0, 0) error 19
nexus0
vtvga0: <VT VGA driver> on motherboard
cryptosoft0: <software crypto> on motherboard
acpi0: <ASUS P4PE> on motherboard
acpi0: Overriding SCI from IRQ 9 to IRQ 22
acpi0: Power Button (fixed)
cpu0: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x73 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <Intel 82845G host to AGP bridge> on hostb0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> mem 0xde000000-0xdeffffff,0xe0000000-0xefffffff,0xdd000000-0xddffffff irq 16 at device 0.0 on pci1
vgapci0: Boot video device
uhci0: <Intel 82801DB (ICH4) USB controller USB-A> port 0xd800-0xd81f irq 16 at device 29.0 on pci0
uhci0: LegSup = 0x2f00
usbus0 on uhci0
uhci1: <Intel 82801DB (ICH4) USB controller USB-B> port 0xd400-0xd41f irq 19 at device 29.1 on pci0
uhci1: LegSup = 0x2f00
usbus1 on uhci1
uhci2: <Intel 82801DB (ICH4) USB controller USB-C> port 0xd000-0xd01f irq 18 at device 29.2 on pci0
uhci2: LegSup = 0x2f00
usbus2 on uhci2
ehci0: <Intel 82801DB/L/M (ICH4) USB 2.0 controller> mem 0xdc800000-0xdc8003ff at device 29.7 on pci0
usbus3: EHCI version 1.0
usbus3 on ehci0
pcib2: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci2: <ACPI PCI bus> on pcib2
bfe0: <Broadcom BCM4401 Fast Ethernet> mem 0xdc000000-0xdc001fff at device 5.0 on pci2
miibus0: <MII bus> on bfe0
bmtphy0: <BCM4401 10/100 media interface> PHY 1 on miibus0
bmtphy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
bfe0: Ethernet address: 00:0c:6e:17:d3:5c
rl0: <RealTek 8139 10/100BaseTX> port 0xb800-0xb8ff mem 0xdb800000-0xdb8000ff at device 11.0 on pci2
miibus1: <MII bus> on rl0
rlphy0: <RealTek internal media interface> PHY 0 on miibus1
rlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
rl0: Ethernet address: 00:1e:2a:c1:b7:90
atapci0: <HighPoint HPT370 UDMA100 controller> port 0xb400-0xb407,0xb000-0xb003,0xa800-0xa807,0xa400-0xa403,0xa000-0xa0ff irq 21 at device 13.0 on pci2
ata2: <ATA channel> at channel 0 on atapci0
ata3: <ATA channel> at channel 1 on atapci0
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <Intel ICH4 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f irq 18 at device 31.1 on pci0
ata0: <ATA channel> at channel 0 on atapci1
ata1: <ATA channel> at channel 1 on atapci1
pmtimer0 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
fdc0: No FDOUT register!
ppc0: parallel port not found.
fuse-freebsd: version 0.4.4, FUSE ABI 7.8
Timecounters tick every 1.000 msec
nvme cam probe device init
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 12Mbps Full Speed USB v1.0
usbus2: 12Mbps Full Speed USB v1.0
usbus3: 480Mbps High Speed USB v2.0
ugen0.1: <Intel> at usbus0
uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <Intel> at usbus1
uhub1: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus1
ugen2.1: <Intel> at usbus2
uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen3.1: <Intel> at usbus3
uhub3: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
cd0 at ata1 bus 0 scbus3 target 0 lun 0
cd0: <_NEC DVD_RW ND-3500AG 2.1B> Removable CD-ROM SCSI device
cd0: 33.300MB/s transfers (UDMA2, ATAPI 12bytes, PIO 65534bytes)
cd0: Attempt to query device size failed: NOT READY, Medium not present
ada0 at ata2 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD2000JB-00GVA0 08.02D08> ATA-6 device
ada0: Serial Number WD-WCAL82220375
ada0: 100.000MB/s transfers (UDMA5, PIO 8192bytes)
ada0: 190782MB (390721968 512 byte sectors)
ada1 at ata2 bus 0 scbus0 target 1 lun 0
ada1: <WDC WD2000JB-00FUA0 15.05R15> ATA-6 device
ada1: Serial Number WD-WMAEP3043891
ada1: 100.000MB/s transfers (UDMA5, PIO 8192bytes)
ada1: 190782MB (390721968 512 byte sectors)
ada2 at ata3 bus 0 scbus1 target 0 lun 0
ada2: <WDC WD2000JB-00EVA0 15.05R15> ATA-6 device
ada2: Serial Number WD-WMAEH2583910
ada2: 100.000MB/s transfers (UDMA5, PIO 8192bytes)
ada2: 190782MB (390721968 512 byte sectors)
ada3 at ata0 bus 0 scbus2 target 0 lun 0
ada3: <ST380020A 3.39> ATA-6 device
ada3: Serial Number 5GCMFD8T
ada3: 100.000MB/s transfers (UDMA5, PIO 8192bytes)
ada3: 76319MB (156301488 512 byte sectors)
taskqgroup_adjust failed cnt: 1 stride: 1 mp_ncpus: 1 smp_started: 0
Timecounter "TSC-low" frequency 1202752882 Hz quality 800
taskqgroup_adjust failed cnt: 1 stride: 1 mp_ncpus: 1 smp_started: 0
uhub0: 2 ports with 2 removable, self powered
uhub1: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
Trying to mount root from ufs:/dev/gpt/system [rw,acls]...
uhub3: 6 ports with 6 removable, self powered
==============================================

ada0, ada1, ada2 are linked to:
# graid3 status
      Name    Status  Components
raid3/data  COMPLETE  diskid/DISK-WD-WMAEH2583910 (ACTIVE)
                      diskid/DISK-WD-WMAEP3043891 (ACTIVE)
                      diskid/DISK-WD-WCAL82220375 (ACTIVE)

which is mounted:
# mount
/dev/gpt/system on / (ufs, local, journaled soft-updates, acls)
devfs on /dev (devfs, local, multilabel)
fdescfs on /dev/fd (fdescfs)
procfs on /proc (procfs, local)
/dev/raid3/data on /mnt (ufs, local)


ifconfig:
bfe0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80008<VLAN_MTU,LINKSTATE>
        ether 00:0c:6e:17:d3:5c
        inet 192.168.1.7 netmask 0xffffff00 broadcast 192.168.1.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
rl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=2008<VLAN_MTU,WOL_MAGIC>
        ether 00:1e:2a:c1:b7:90
        inet 192.168.1.2 netmask 0xffffff00 broadcast 192.168.1.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (100baseTX <full-duplex>)
        status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo

Writing large files over a socket with SSH, SMB, nc to that mountpoint results in tremendous breakdowns of transfer speed down to kilobytes per second. SSH becomes unresponsive. dmesg says over and over again:
Oct 24 21:24:37 bsd1home kernel: bfe0: link state changed to UP
Oct 24 21:24:38 bsd1home kernel: bfe0: link state changed to DOWN
Oct 24 21:24:40 bsd1home kernel: bfe0: link state changed to UP
Oct 24 21:24:41 bsd1home kernel: bfe0: link state changed to DOWN
Oct 24 21:24:43 bsd1home kernel: bfe0: link state changed to UP
Oct 24 21:24:44 bsd1home kernel: bfe0: link state changed to DOWN
Oct 24 21:24:46 bsd1home kernel: bfe0: link state changed to UP
Oct 24 21:24:48 bsd1home kernel: bfe0: link state changed to DOWN
Oct 24 21:24:50 bsd1home kernel: bfe0: link state changed to UP
Oct 24 21:24:50 bsd1home kernel: bfe0: link state changed to DOWN
Oct 24 21:24:52 bsd1home kernel: bfe0: link state changed to UP
Oct 24 21:24:52 bsd1home kernel: bfe0: link state changed to DOWN
Oct 24 21:24:54 bsd1home kernel: bfe0: link state changed to UP
Oct 24 21:24:54 bsd1home kernel: bfe0: link state changed to DOWN
Oct 24 21:24:56 bsd1home kernel: bfe0: link state changed to UP
Oct 24 21:24:56 bsd1home kernel: bfe0: link state changed to DOWN
Oct 24 21:24:58 bsd1home kernel: bfe0: link state changed to UP
Oct 24 21:24:58 bsd1home kernel: bfe0: link state changed to DOWN
Oct 24 21:25:00 bsd1home kernel: bfe0: link state changed to UP
Oct 24 21:25:02 bsd1home dhclient: New IP Address (bfe0): 192.168.1.7
Oct 24 21:25:02 bsd1home dhclient: New Subnet Mask (bfe0): 255.255.255.0
Oct 24 21:25:02 bsd1home dhclient: New Broadcast Address (bfe0): 192.168.1.255
Oct 24 21:25:02 bsd1home dhclient: New Routers (bfe0): 192.168.1.

The same test was performed with another NIC in that system: rl0 (Realtek 8139). No performance degregation was found.

Additionally, 10baseT and half-duplex where tested with some positive results. Though, dropouts came a bit later, but still unacceptable to use.

Similar issues were already reported some years ago:
http://www.mail-archive.com/freebsd-net@freebsd.org/msg11038.html
http://forums.nas4free.org/viewtopic.php?t=3606

It seems to be driver-related.
Comment 1 Michael Osipov 2016-10-30 09:19:49 UTC
I have created a RAIDZ pool from these three drives. Same result with the driver, link state goes down and up...
Comment 2 Graham Perrin freebsd_committer freebsd_triage 2023-04-01 17:16:12 UTC
Please, is this still an issue? 

It was mentioned in bug 166724 comment 110.
Comment 3 Michael Osipov 2023-04-01 19:07:15 UTC
(In reply to Graham Perrin from comment #2)

I don't have the NIC (machine) anymore, but if the driver hasn't changed the problem is still present.
Comment 4 Ed Maste freebsd_committer freebsd_triage 2023-06-20 12:53:31 UTC
I've suggested adding a deprecation notice to bfe.4 in https://reviews.freebsd.org/D40625, as I suspect these issues will not be fixed and this NIC is essentially obsolete.
Comment 5 commit-hook freebsd_committer freebsd_triage 2023-06-20 20:45:05 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4bc148c30effe0fc1c21b6bbaee366f239353ac1

commit 4bc148c30effe0fc1c21b6bbaee366f239353ac1
Author:     Ed Maste <emaste@FreeBSD.org>
AuthorDate: 2023-06-20 12:44:22 +0000
Commit:     Ed Maste <emaste@FreeBSD.org>
CommitDate: 2023-06-20 20:42:34 +0000

    bfe: add unmaintained / deprecation notice

    The bfe (Broadcom BCM4401 10/100 Ethernet) driver has known bugs and no
    active maintenance.  There have been no changes other than sweeping tree
    changes, typo corrections etc. since 2008 a far as I can tell.  Add a
    note in the man page so that users expectations are correctly set, and
    indicate that it may be removed in the future.

    I did not add a gone_in() call in the driver itself as there is no
    specific target version for removal, and this driver has evidence of
    recent use (dmesg, PRs).

    PR:             201947, 213751
    Reviewed by:    brooks
    Sponsored by:   The FreeBSD Foundation
    Differential Revision: https://reviews.freebsd.org/D40625

 share/man/man4/bfe.4 | 7 +++++++
 1 file changed, 7 insertions(+)