Bug 26613

Summary: ethernet vr0 hangs
Product: Base System Reporter: dirk.meyer <dirk.meyer>
Component: kernAssignee: silby
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description dirk.meyer 2001-04-16 12:30:00 UTC
	Network seems unstable, NSF hangs, interactive Login lags sometimes.

Fix: 

Downgrade to FreeBSD 4.2-STABLE #4: Sun Feb 18 10:48:45 CET 2001
How-To-Repeat: 
	NFS exported volumume (server on this machine)
	Client hangs when copy to much data.
	A ping from the server to the client shows stray packages!

[...]
64 bytes from 217.6.200.196: icmp_seq=17604 ttl=255 time=4.292 ms
64 bytes from 217.6.200.196: icmp_seq=17605 ttl=255 time=30.590 ms
64 bytes from 217.6.200.196: icmp_seq=17607 ttl=255 time=0.523 ms
64 bytes from 217.6.200.196: icmp_seq=17608 ttl=255 time=0.513 ms
64 bytes from 217.6.200.196: icmp_seq=17609 ttl=255 time=18.057 ms
64 bytes from 217.6.200.196: icmp_seq=17610 ttl=255 time=0.538 ms
64 bytes from 217.6.200.196: icmp_seq=17606 ttl=255 time=4518.681 ms
64 bytes from 217.6.200.196: icmp_seq=17611 ttl=255 time=0.523 ms
64 bytes from 217.6.200.196: icmp_seq=17611 ttl=255 time=0.523 ms
64 bytes from 217.6.200.196: icmp_seq=17612 ttl=255 time=22.136 ms
64 bytes from 217.6.200.196: icmp_seq=17613 ttl=255 time=27.047 ms
64 bytes from 217.6.200.196: icmp_seq=17614 ttl=255 time=0.507 ms
64 bytes from 217.6.200.196: icmp_seq=17615 ttl=255 time=0.528 ms
64 bytes from 217.6.200.196: icmp_seq=17616 ttl=255 time=11.841 ms
64 bytes from 217.6.200.196: icmp_seq=17617 ttl=255 time=17.809 ms
64 bytes from 217.6.200.196: icmp_seq=17618 ttl=255 time=7.396 ms
64 bytes from 217.6.200.196: icmp_seq=17619 ttl=255 time=23.490 ms
64 bytes from 217.6.200.196: icmp_seq=17620 ttl=255 time=0.724 ms
64 bytes from 217.6.200.196: icmp_seq=17621 ttl=255 time=14.023 ms
error: No Buffer Space

	to reactivate the Line:
	$ ifconfig vr0 down
	$ ifconfig vr0 up

	After the copy is done, teh ping keeps "dancing"

[...]
64 bytes from 217.6.200.196: icmp_seq=100 ttl=255 time=0.524 ms
64 bytes from 217.6.200.196: icmp_seq=62 ttl=255 time=38901.325 ms
64 bytes from 217.6.200.196: icmp_seq=101 ttl=255 time=0.494 ms
[...]
64 bytes from 217.6.200.196: icmp_seq=200 ttl=255 time=0.504 ms
64 bytes from 217.6.200.196: icmp_seq=163 ttl=255 time=38231.139 ms
64 bytes from 217.6.200.196: icmp_seq=201 ttl=255 time=0.445 ms
[...]
64 bytes from 217.6.200.196: icmp_seq=536 ttl=255 time=0.570 ms
64 bytes from 217.6.200.196: icmp_seq=423 ttl=255 time=114132.380 ms
64 bytes from 217.6.200.196: icmp_seq=538 ttl=255 time=0.536 ms
[...]
64 bytes from 217.6.200.196: icmp_seq=631 ttl=255 time=0.510 ms
64 bytes from 217.6.200.196: icmp_seq=537 ttl=255 time=95722.164 ms
64 bytes from 217.6.200.196: icmp_seq=632 ttl=255 time=0.518 ms
[...]
64 bytes from 217.6.200.196: icmp_seq=940 ttl=255 time=0.543 ms
64 bytes from 217.6.200.196: icmp_seq=813 ttl=255 time=128272.599 ms
64 bytes from 217.6.200.196: icmp_seq=942 ttl=255 time=0.531 ms
[...]
64 bytes from 217.6.200.196: icmp_seq=1068 ttl=255 time=0.538 ms
64 bytes from 217.6.200.196: icmp_seq=941 ttl=255 time=128272.534 ms
64 bytes from 217.6.200.196: icmp_seq=1070 ttl=255 time=0.516 ms
[...]
64 bytes from 217.6.200.196: icmp_seq=1176 ttl=255 time=0.475 ms
64 bytes from 217.6.200.196: icmp_seq=1069 ttl=255 time=108969.408 ms
64 bytes from 217.6.200.196: icmp_seq=1177 ttl=255 time=0.529 ms
[...]
64 bytes from 217.6.200.196: icmp_seq=1228 ttl=255 time=0.489 ms
64 bytes from 217.6.200.196: icmp_seq=1199 ttl=255 time=29461.110 ms
64 bytes from 217.6.200.196: icmp_seq=1229 ttl=255 time=0.480 ms
[...]
64 bytes from 217.6.200.196: icmp_seq=1275 ttl=255 time=0.556 ms
64 bytes from 217.6.200.196: icmp_seq=1257 ttl=255 time=18230.945 ms
64 bytes from 217.6.200.196: icmp_seq=1276 ttl=255 time=0.529 ms

	$ netstat -m
124/320/4096 mbufs in use (current/peak/max):
        123 mbufs allocated to data
        1 mbufs allocated to packet headers
121/164/1024 mbuf clusters in use (current/peak/max)
408 Kbytes allocated to network (13% of mb_map in use)
Comment 1 dirk.meyer 2001-04-22 18:35:41 UTC
- Collisions have counted up
- NFS packages are fragmented
	Is this a problem here?

kind regards Dirk

- Dirk Meyer, Im Grund 4, 34317 Habichtswald, Germany

cvsup and make world:
FreeBSD 4.3-STABLE, Sat Apr 21 16:07:45 CEST 2001

$ df
net3:/data                         3969982  3245507   406877    89%    /net3
$ mount
net3:/data on /net3 (nfs)
$ du /src/distfiles-local
315798  /src/distfiles-local

$ cp -pR /src/distfiles-local /net3/distfiles/ &
$ ping net3
PING net3 (XXX.XXX.XXX.XXX): 56 data bytes
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=0 ttl=255 time=10.349 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=1 ttl=255 time=21.457 ms
[...]
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=16 ttl=255 time=0.640 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=17 ttl=255 time=4.393 ms
ping: sendto: Host is down
ping: sendto: Host is down
[...]


Copy has tranfered:
net3$ du
36769	/data/distfiles/

net3$ netstat -m
389/400/4096 mbufs in use (current/peak/max):
        389 mbufs allocated to data
142/162/1024 mbuf clusters in use (current/peak/max)
424 Kbytes allocated to network (13% of mb_map in use)

net3$ netstat -i
Name  Mtu   Network       Address            Ipkts Ierrs    Opkts Oerrs  Coll
vr0   1500  <Link#1>    00:50:ba:65:c2:13    22263     0     6998     4  2572
vr0   1500  n-telekom3    net3               22226     -     7213     -     -

net3$ netstat -s
tcp:
        3805 packets sent
                2475 data packets (1036486 bytes)
                1327 ack-only packets (118 delayed)
                3 control packets
        5372 packets received
                1964 acks (for 1036492 bytes)
                3 duplicate acks
                3891 packets (79580 bytes) received in-sequence
                1 out-of-order packet (0 bytes)
                1 window update packet
        3 connection accepts
        3 connections established (including accepts)
        9 connections closed (including 0 drops)
                3 connections updated cached RTT on close
                3 connections updated cached RTT variance on close
        1842 segments updated rtt (of 1842 attempts)
        930 correct ACK header predictions
        3091 correct data packet header predictions
udp:
        3339 datagrams received
        3339 delivered
        3331 datagrams output
ip:
        22252 total packets received
        16179 fragments received
        2713 packets reassembled ok
        8786 packets for this host
        7239 packets sent from this host
icmp:
        Output histogram:
                echo reply: 75
        Input histogram:
                echo: 75
        75 message responses generated
        ICMP address mask responses are disabled


ping: sendto: Host is down
ping: sendto: Host is down
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=249 ttl=255 time=0.945 ms
64 bytes from XXX.XXX.XXX.XXX: icmp_seq=250 ttl=255 time=0.654 ms
[...]
64 bytes from 217.6.200.194: icmp_seq=282 ttl=255 time=32.686 ms
64 bytes from 217.6.200.194: icmp_seq=283 ttl=255 time=37.923 ms
ping: sendto: Host is down
ping: sendto: Host is down
Comment 2 dirk.meyer 2001-05-14 09:22:15 UTC
I did serveral check with:
FreeBSD 4.3-STABLE #0: Sun May  6 17:10:21 CEST 2001

options         NMBCLUSTERS=8192
- only 1 ethernet card in the system.

stil don't fix this problem.

Name  Mtu   Network       Address            Ipkts Ierrs    Opkts Oerrs  Coll
vr0   1500  <Link#1>    00:50:ba:65:c2:13    94983     0    17340     8 14090

It could be something with fragmented packages NFS use,
so I tweaked the blocksize:

Name  Mtu   Network       Address            Ipkts Ierrs    Opkts Oerrs  Coll
vr0   1500  <Link#1>    00:50:ba:65:c2:13   152849     0    75162     8 47268
vr0   1500  <Link#1>    00:50:ba:65:c2:13   351460     0   273611     8 160714
vr0   1500  <Link#1>    00:50:ba:65:c2:13   507605     0   429471     8 249338

=========== FIX ===============

If the client mount with options "-w=1024" it works.!
I can copy 400 Mbytes without a problem on an NFS mount.

================================

Is this an NFS or IP-Stack problem?

kind regards Dirk

- Dirk Meyer, Im Grund 4, 34317 Habichtswald, Germany
Comment 3 Crist J. Clark freebsd_committer freebsd_triage 2002-03-27 09:04:01 UTC
State Changed
From-To: open->feedback

Is this still an issue with later releases of FreeBSD?
Comment 4 dirk.meyer 2002-03-27 17:07:05 UTC
> State-Changed-From-To: open->feedback
> Is this still an issue with later releases of FreeBSD?

Yes, The problem still hit me under 4.5 RELEASE
Unfortunatly the system is down at the moment,
But I have another machine with the problem,
that I just can't reboot/test so often with.

I use as a workaround a scrin in cron:

        echo "fail:"
        netstat -m
        ifconfig vr0 down
        netstat -m
        ifconfig vr0 up
        netstat -m

to keep the system responsive.

kind regards Dirk

- Dirk Meyer, Im Grund 4, 34317 Habichtswald, Germany
- [dirk.meyer@dinoex.sub.org],[dirk.meyer@guug.de],[dinoex@FreeBSD.org]
Comment 5 Crist J. Clark freebsd_committer freebsd_triage 2002-03-31 17:11:40 UTC
State Changed
From-To: feedback->open

Submitter reports this is still a problem.
Comment 6 silby freebsd_committer freebsd_triage 2002-05-17 18:44:59 UTC
Responsible Changed
From-To: freebsd-bugs->silby

I'm taking the vr-related PRs for now.
Comment 7 silby freebsd_committer freebsd_triage 2002-05-20 02:22:18 UTC
State Changed
From-To: open->feedback

A workaround for this problem so that the driver will automatically 
reset the network card has been committed to 5.x and 4.x as of 
this week.  Please update to the latest version of if_vr.c and 
see if this effectively solves the problem for you.
Comment 8 dirk.meyer 2002-05-21 19:00:34 UTC
> A workaround for this problem so that the driver will automatically
> reset the network card has been committed to 5.x and 4.x as of

I rebuild with the module:
 * $FreeBSD: src/sys/pci/if_vr.c,v 1.26.2.9 2002/05/20 01:18:06 silby Exp $

Test shows no problem, I will run it 2 weeks to be sure.

kind regards Dirk

- Dirk Meyer, Im Grund 4, 34317 Habichtswald, Germany
- [dirk.meyer@dinoex.sub.org],[dirk.meyer@guug.de],[dinoex@FreeBSD.org]
Comment 9 dirk.meyer 2002-06-07 05:45:54 UTC
> Have you been able to determine if the patch solved the problem you were
> experiencing yet?

Thanks, The patches does help ...
I confirm to see the syslog line:

May 22 22:38:12 ceres /kernel: vr0: watchdog timeout
Jun  6 06:06:24 ceres /kernel: vr0: watchdog timeout

kind regards Dirk

- Dirk Meyer, Im Grund 4, 34317 Habichtswald, Germany
- [dirk.meyer@dinoex.sub.org],[dirk.meyer@guug.de],[dinoex@FreeBSD.org]
Comment 10 silby freebsd_committer freebsd_triage 2002-06-07 06:41:41 UTC
State Changed
From-To: feedback->closed

The recent changes to the vr driver effectively fix 
the problem of the submitter of this PR.