Bug 24811

Summary: Networking in FreeBSD 4.2-RELEASE doesn't allow full-duplex<->half-duplex communication
Product: Base System Reporter: klui <klui>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description klui 2001-02-02 23:50:01 UTC
I have a FreeBSD 4.2 installation at work over 10baseT at full duplex
communicating with a FreeBSD 4.2 installation at home over 10baseT at
half duplex. The office box is connected to a switch. The remote is
connected to a hub via SDSL. If I copy files from the 4.2 box at work
to the 4.2 box at home, I will inevitably get network stalls. Searching
dejanews reveals that this may be caused by the interaction between a
machine that is connected at full duplex versus one that is connected at
half duplex. The work machine has used the following ethernet cards with
the same results: PCNet-FAST/NCR 53c875 combination card (pcn driver)
and 3Com 3c905B XL (xl driver).

Netmask for the machine at work is 255.255.248.0, netmask for the box
at home is 255.255.252.0.

When the work box is running FreeBSD 3.2-RELEASE, using the lnc driver,
things work fine. If I use my HP 715 workstation running HPUX 10.20
within the same subnet as my work FreeBSD box, I can also transfer files
to my 4.2 box at home without any stalls. 

The box at home is using an Intel PRO/100B Management ethernet card via
the fxp driver.

How-To-Repeat: Use FreeBSD 4.2 and transfer a series of files (around 20, each over
200K in size) between two boxes where one box is connected to a switch
while the other is behind a hub. Refer to full description for more
information.
Comment 1 billf 2001-02-03 00:12:27 UTC
On Fri, Feb 02, 2001 at 03:44:51PM -0800, klui@cup.hp.com wrote:

> >Description:
> I have a FreeBSD 4.2 installation at work over 10baseT at full duplex
> communicating with a FreeBSD 4.2 installation at home over 10baseT at
> half duplex. The office box is connected to a switch. The remote is
> connected to a hub via SDSL. If I copy files from the 4.2 box at work
> to the 4.2 box at home, I will inevitably get network stalls. Searching
> dejanews reveals that this may be caused by the interaction between a
> machine that is connected at full duplex versus one that is connected at
> half duplex. The work machine has used the following ethernet cards with
> the same results: PCNet-FAST/NCR 53c875 combination card (pcn driver)
> and 3Com 3c905B XL (xl driver).

This is the fault of your hub. It is the responsibility of the hub to
translate those, its possible that your hub just simply has two busses
(one half-duplex, one full-duplex) or something like that.

-- 
Bill Fumerola - security yahoo         / Yahoo! inc.
              - fumerola@yahoo-inc.com / billf@FreeBSD.org
Comment 2 klui 2001-02-03 00:24:27 UTC
On Fri, 2 Feb 2001, Bill Fumerola wrote:
> > >Description:
> > I have a FreeBSD 4.2 installation at work over 10baseT at full duplex
> > communicating with a FreeBSD 4.2 installation at home over 10baseT at
> > half duplex. The office box is connected to a switch. The remote is
> > connected to a hub via SDSL. If I copy files from the 4.2 box at work
> > to the 4.2 box at home, I will inevitably get network stalls. Searching
> > dejanews reveals that this may be caused by the interaction between a
> > machine that is connected at full duplex versus one that is connected at
> > half duplex. The work machine has used the following ethernet cards with
> > the same results: PCNet-FAST/NCR 53c875 combination card (pcn driver)
> > and 3Com 3c905B XL (xl driver).
> 
> This is the fault of your hub. It is the responsibility of the hub to
> translate those, its possible that your hub just simply has two busses
> (one half-duplex, one full-duplex) or something like that.

Since it works with another operating system and a prior version of
FreeBSD, I find it difficult to believe my hub is the problem. Could
you tell me how I can determine my hub is at fault? My hub is old
and has a 10base2 connector, which I'm using for other boxes. But my
FreeBSD 4.2 box is using 10baseT/UTP. I have never experienced these
types of stalls until I began using FreeBSD 4.2.


Ken
Comment 3 klui 2001-02-03 00:30:35 UTC
Another thing I want to add, if I push files from my machines at work
to my 4.2 box at home, I lose around half my bandwidth than if I
push files from my 4.2 box at home to my machines at work. Is this
more indicative of a network/hardware problem outside the scope of
my machines? My SDSL connection's bandwidth should be symmetric.


Ken
Comment 4 Jason.Young 2001-02-15 18:05:22 UTC
If there's a duplex mismatch, your symptoms would seem to indicate it would have
to be on your work machine, between it and your switch. You have proven (with
the transfer from the HPUX machine to your home machine) that your home machine
has reasonable connectivity.

This could definitely be caused by the OS upgrade, since it seems that the
driver plays a very large role in speed and duplex negotiations and you've
changed drivers.

You haven't mentioned how you know that link is 10BaseT full-duplex. Is this an
assumption, or have you set it on the switch personally or otherwise know it's
supposed to be full-duplex? You could quickly check if this is the problem by
doing this:

  ifconfig pcn0 media 10BaseT mediaopt half-duplex

Then test your connectivity again. If that fixes things, the switch wasn't
really giving you full duplex connectivity. If the switch is set to
autonegotiate duplex settings, be aware this tends to fail a lot and you'll
probably need to force the setting to be whichever way you want it or stay half
duplex.

If autonegotiation worked before in 3.2-RELEASE with the same hardware with the
lnc driver, you may want to send a (very, very, very, very) detailed bug report
to Bill Paul (wpaul@freebsd.org), who maintains that driver.

Jason Young
CNS - Network Design, Anheuser-Busch
(314)577-4597
Comment 5 klui 2001-02-16 21:09:32 UTC
On Thu, 15 Feb 2001, Young, Jason wrote:
> If there's a duplex mismatch, your symptoms would seem to indicate it would
> have to be on your work machine, between it and your switch. You have proven
> (with the transfer from the HPUX machine to your home machine) that your home
> machine has reasonable connectivity.

Hi Jason,

Thanks for your reply.

> This could definitely be caused by the OS upgrade, since it seems that the
> driver plays a very large role in speed and duplex negotiations and you've
> changed drivers.
> 
> You haven't mentioned how you know that link is 10BaseT full-duplex. Is this
> an assumption, or have you set it on the switch personally or otherwise know
> it's supposed to be full-duplex? You could quickly check if this is the
> problem by doing this:
> 
>   ifconfig pcn0 media 10BaseT mediaopt half-duplex

I initially used the pcn drivers as is without media nor mediaopt
switches and connectivity under 4.2-R is very very slow with ping
times of 1 second to my mail server outside my subnet and I couldn't
ping my local LAN HPUX box. The lnc driver works without these
options because it doesn't understand media and mediaopt anyway.
Once I used media and mediaopt full-duplex everything worked a lot
better than before without the bad network delays. I cannot recall
what happened when I used half-duplex but I either got the bad
network delays and unpingable local LAN boxes or an error message
from ifconfig.

My current ping times to my mail box averages 5ms under FreeBSD 3.2-R
and lnc.

> Then test your connectivity again. If that fixes things, the switch wasn't
> really giving you full duplex connectivity. If the switch is set to
> autonegotiate duplex settings, be aware this tends to fail a lot and you'll
> probably need to force the setting to be whichever way you want it or stay
> half duplex.
> 
> If autonegotiation worked before in 3.2-RELEASE with the same hardware with
> the lnc driver, you may want to send a (very, very, very, very) detailed bug
> report to Bill Paul (wpaul@freebsd.org), who maintains that driver.
> 
> Jason Young
> CNS - Network Design, Anheuser-Busch
> (314)577-4597

I would assume that autonegotiation worked under 3.2-R because I never
had to do anything and it is currently working. I don't have network
stalls nor does my connection wedge for no reason at all after an
indeterminate amount of time. For my work machine anyway, I have
remained at 3.2-R and will probably await for 5.0.

I personally feel something in the networking code is broken under
4.2 or is incompatible with certain switch configurations somehow.
The pcn, lnc, and ep drivers don't work correctly with my fxp at
home--they all drop the network connection after a certain amount of
data has been sent through. The baffling part is that they work when
transferring data between my Kayak (pcn, lnc, ep) to two HPUX boxes I
have access to.  And those HPUX boxes transfer fine to my fxp 4.2-R
box at home via the WAN.

Bill, is there anything I can provide that can help you?


Ken
-- 
Ken Lui                          3000 Hanover Street
klui@cup.hp.com                  Palo Alto, CA  94304          USA
Hewlett-Packard Company  invent  1.650.236.5364  FAX 1.650.857.2085
   Views within may not be those of the Hewlett-Packard Company
Comment 6 Jason.Young 2001-02-16 23:05:01 UTC
> I would assume that autonegotiation worked under 3.2-R because I never
> had to do anything and it is currently working. I don't have network
> stalls nor does my connection wedge for no reason at all after an
> indeterminate amount of time. For my work machine anyway, I have
> remained at 3.2-R and will probably await for 5.0.

One thing you can try on your work 4.2-RELEASE installation is running with the
lnc driver instead of pcn. It's still present, but the enhanced pcn driver will
claim your card before lnc if both are present. You can accomplish this by
simply not building the pcn driver into your kernel, or not loading the module.
If you're running GENERIC, you'll need to build a kernel without pcn (but make
sure to keep lnc in). This may help isolate the problem to being the new OS, or
the new driver.

> I personally feel something in the networking code is broken under
> 4.2 or is incompatible with certain switch configurations somehow.
> The pcn, lnc, and ep drivers don't work correctly with my fxp at
> home--they all drop the network connection after a certain amount of
> data has been sent through. The baffling part is that they work when
> transferring data between my Kayak (pcn, lnc, ep) to two HPUX boxes I
> have access to.  And those HPUX boxes transfer fine to my fxp 4.2-R
> box at home via the WAN.
> 
> Bill, is there anything I can provide that can help you?

I'm having a lot of trouble sifting info out of these mails you're sending. I'm
not sure if you're still having trouble after the full-duplex setting was forced
or not. It sounds like you may have had two problems, a negotiation failure or
setting problem that was hosing all traffic from your work box, and then
something else after you fixed that that causes communications to break down
between your home and work machines only. Is this the case?

Bill Paul may be able to help with autonegotiation troubles, but autonegotiation
isn't the most reliable thing in the world to begin with. He will need at least
the following before he's able to make any attempt to help you:

1) The brand and model of switch you are connected to at work. If available, its
software revision would be nice.

2) Its settings for your port. Speed and duplex settings. Autonegotiate settings
for speed and duplex, hardcoded or negotiation turned on. Don't guess, don't try
to remember. Find out from the switch itself what it's set for right now.

3) Whatever else you can think of that's relevant.

Bill is a busy guy and really hates problem reports without adequate detail
(just search the mailing list archive). Please make sure to have the above info
ready before talking to him.

Jason Young
CNS - Network Design, Anheuser-Busch
(314)577-4597
Comment 7 klui 2001-02-16 23:41:40 UTC
On Fri, 16 Feb 2001, Young, Jason wrote:
> One thing you can try on your work 4.2-RELEASE installation is running with the
> lnc driver instead of pcn. It's still present, but the enhanced pcn driver will
> claim your card before lnc if both are present. You can accomplish this by
> simply not building the pcn driver into your kernel, or not loading the module.
> If you're running GENERIC, you'll need to build a kernel without pcn (but make
> sure to keep lnc in). This may help isolate the problem to being the new OS, or
> the new driver.

Tried it. Same "results." I will be more clear below.

> I'm having a lot of trouble sifting info out of these mails you're sending. I'm
> not sure if you're still having trouble after the full-duplex setting was forced
> or not. It sounds like you may have had two problems, a negotiation failure or
> setting problem that was hosing all traffic from your work box, and then
> something else after you fixed that that causes communications to break down
> between your home and work machines only. Is this the case?
> 
> Bill Paul may be able to help with autonegotiation troubles, but autonegotiation
> isn't the most reliable thing in the world to begin with. He will need at least
> the following before he's able to make any attempt to help you:

Sorry for the confusion. Here are the events that occurred from my
trying to install FreeBSD 4.2 on my Kayak XU with PCnet ethernet
card at work:

1. installed 4.2-release
2. pcn driver recognized my ethernet card
3. tried to ping my HPUX machine on my desk and get no route to host
4. tried to ping my mail HPUX server and get responses of 1sec.
   tried to ping my 4.2-release w/ fxp driver at home via WAN and get
   high ping times of over 1sec.
5. tried to ssh into my mail server but network delay made it unusable.
6. removed pcn driver and used lnc in the kernel.
7. ping times back to normal and I can ping my HPUX box on my desk.
8. problems occur while I try to transfer a batch of files via scp.
   I would get a "stalled" status on files at seemingly random times.
   Each time when it would stall, the files have transferred at least
   100K. I have around 45 files, each over 100K. These network stalls
   would also occur via ftp.
   Also at times, when I'm at home, I would ssh into my 4.2-release
   with fxp box and then ssh into my work box, with the lnc driver. If
   I leave my terminal open and come back to it after around 1/2-1 hour,
   my terminal would be unresponsive. I can open up another ssh session
   and kill the old session, but I would have to manually close my
   initial session's window.
   I've also applied the patch discussed in kern/13062 but it didn't
   fix these problems.
   When I transfer from my work box to my HPUX mail server/workstation,
   things work without any problems.
9. I then tried to use the pcn drivers but with the media and mediaopt
   switches to use 10baset and full-duplex.
10.the ping times returned to normal and I was once again able to
   ping my HPUX box, but the network stalls and network freezes remained.
11.installing a 3Com 509B ethernet card produced the same network stalls
   and freezes.

NOTE: the network stalls and freezes only occured when I was linked
between my work 4.2-release box and my home 4.2-release box.

> 1) The brand and model of switch you are connected to at work. If available, its
> software revision would be nice.
> 
> 2) Its settings for your port. Speed and duplex settings. Autonegotiate settings
> for speed and duplex, hardcoded or negotiation turned on. Don't guess, don't try
> to remember. Find out from the switch itself what it's set for right now.
> 
> 3) Whatever else you can think of that's relevant.
> 
> Bill is a busy guy and really hates problem reports without adequate detail
> (just search the mailing list archive). Please make sure to have the above info
> ready before talking to him.

I will try to get these pieces of information and forward it here.

> Jason Young
> CNS - Network Design, Anheuser-Busch
> (314)577-4597


Ken
-- 
Ken Lui                          3000 Hanover Street
klui@cup.hp.com                  Palo Alto, CA  94304          USA
Hewlett-Packard Company  invent  1.650.236.5364  FAX 1.650.857.2085
   Views within may not be those of the Hewlett-Packard Company
Comment 8 Jeroen Ruigrok van der Werven freebsd_committer freebsd_triage 2001-11-15 19:21:19 UTC
State Changed
From-To: open->analyzed

Move to analyzed state since it was analyzed a bit already. 

Any improvements in 4.4 for this problem by the way?
Comment 9 Doug Barton freebsd_committer freebsd_triage 2002-04-21 23:47:43 UTC
State Changed
From-To: analyzed->closed


Feedback timeout. If this problem still occurs in 4.5-Release or later, 
feel free to file a new PR, and refer to this one when doing so.