Bug 108861

Summary: [nve] nve(4) driver on FreeBSD 6.2 AMD64 does not work at 1Gbps with nForce4 NIC
Product: Base System Reporter: Lawrence Stewart <lstewart>
Component: amd64Assignee: freebsd-amd64 (Nobody) <amd64>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Lawrence Stewart 2007-02-07 03:20:18 UTC
The research centre I work at recently purchased a new file server, built from commodity hardware that FreeBSD claimed to have support for.

Motherboard is an Asus A8N-E Rev 2.00 (listed as supported here: http://www.freebsd.org/platforms/amd64/motherboards.html).

I clean installed FreeBSD 6.2 from ISO 6.2-RELEASE-amd64-disc1.iso.

The onboard nVidia nForce4 nve(4) based NIC was plugged into the university's Cisco based gigabit ethernet LAN via CAT5e straight-through patch lead.

An Intel 82540EM em(4) based PCI NIC was also installed in the server, but was not plugged into a network point (we previously tried installing FreeBSD 6.1 which had no support for the onboard NIC, so we put the Intel card in to use. We left it in for the new test with FreeBSD 6.2).

The install completed successfully. We didn't configure either NIC during install at all (well... that's not entriely true the first time through... but I'll save the details of that crazy story for later as it will confuse things for the moment).

After reboot, we bought the NIC up using "ifconfig nve0 up" and it synced at 1000baseTX full-duplex. Tcpdumping on the interface shows no activity at all (even though the uni network is VERY chatty, multiple packets per second should have been showing). Running "dhclient nve0" shows outbound DHCP requests, but not any return packets. 

Forcibly syncing the interface to 1000baseTX using "ifconfig nve0 media 1000baseTX mediaopt full-duplex" resulted in same behaviour i.e. doesn't "see" inbound packets. Forcibly syncing the interface to 100baseTX all of a sudden shows packets being received in tcpdump, and we can use the interface, though it only achieves about 80kb/sec on transfers i.e. is severely restricted for some reason.

Intrigued, I installed a small 5 port 100baseTX switch between the server and network port and set the interface back to autoselect using "ifconfig nve0 media autoselect". The nve(4) interface synced at 100baseTX, and could then transfer at full 100Mbps speed (scp @ 11MB/sec from another local computer).

I tried resolving the issue by setting "hw.pci.enable_msi=0" and "hw.pci.enable_msix=0" in /boot/loader.conf as suggested for fixing em(4) related problems. It had no effect from what I could see. I also tried rebuilding and installing a GENERIC kernel, GENERIC kernel without nve driver and kldloading it after startup, GENERIC+SMP kernel... none of the different kernel configs seemd to change anything.

I also might add that I tried getting the em(4) Intel NIC working when I realised the nve(4) NIC was going to be a pain, and I had no luck, even with the disabling of MSI in /boot/loader.conf. I didn't try any harder than that though because my main focus was seeing if the nve driver was fixed.

Fix: 

I could not find a fix for the problem.

I also could not figure out why explicitly setting the NIC to sync at 100baseTX full-duplex made the NIC work, but only at a fraction of 100Mbps capacity.

A work around is to cause the switch port the NIC is plugged into to only allow 100baseTX, so that the NIC will autoselect at 100baseTX and work at reasonable speed. You could also do as I did and stick small 100baseTX switch in between the machine and gigabit switch.
How-To-Repeat: Patch the onboard ethernet NIC on an Asus A8N-E Rev 2.00 motherboard into a 1GBps switch port.

Install FreeBSD 6.2 from ISO 6.2-RELEASE-amd64-disc1.iso.

Configure the NIC to autoselect it's sync speed using "ifconfig nve0 media autoselect".

Bring the NIC up using "ifconfig nve0 up"

It should sync at 1000baseTX full-duplex, but not be able to communicate at all via the interface.
Comment 1 aturetta+bsd 2007-02-07 14:43:52 UTC
Could you please test the alternative driver mentioned at

http://www.se.hiroshima-u.ac.jp/~shigeaki/software/freebsd-nfe.html

It's already been commited to -current, and I'd certainly hope it will be
MFC to 6-STABLE soon.

Angelo Turetta
Comment 2 aturetta+bsd 2007-02-07 14:43:58 UTC
Could you please test the alternative driver mentioned at

http://www.se.hiroshima-u.ac.jp/~shigeaki/software/freebsd-nfe.html

It's already been commited to -current, and I'd certainly hope it will be
MFC to 6-STABLE soon.

Angelo Turetta
Comment 3 Scot Hetzel 2007-02-07 22:58:36 UTC
On 2/6/07, Lawrence Stewart <lstewart@room52.net> wrote:
> I could not find a fix for the problem.
>
> I also could not figure out why explicitly setting the NIC to sync at 100baseTX full-duplex made the NIC work, but only at a fraction of 100Mbps capacity.
>
> A work around is to cause the switch port the NIC is plugged into to only allow 100baseTX, so that the NIC will autoselect at 100baseTX and work at reasonable speed. You could also do as I did and stick small 100baseTX switch in between the machine and gigabit switch.
>
Have you tried forcing the switch switch and the NIC to 1000baseT Full Duplex?

Sometime times auto-negotiation fails, so forcing speed should fix it.

Scot
-- 
DISCLAIMER:
No electrons were mamed while sending this message. Only slightly bruised.
Comment 4 Lawrence Stewart 2007-02-08 02:56:21 UTC
Hi Angelo,

Thanks very much for the reply.

Angelo Turetta wrote:
> Could you please test the alternative driver mentioned at
>
> http://www.se.hiroshima-u.ac.jp/~shigeaki/software/freebsd-nfe.html
>
> It's already been commited to -current, and I'd certainly hope it will be
> MFC to 6-STABLE soon.
>
> Angelo Turetta
>   


The new driver appears to be working perfectly. I'm running it through 
some basic stress tests now, but everything so far is all good. It can 
sync at both 100baseTX and 1000baseTX. When in 100baseTX mode, I can get 
11.1 Mb/sec over scp. When in 1000baseTX I can get 24 MB/sec over scp. 
This was only transferring from a low spec desktop machine, so I think 
the machine's disk drive was the bottleneck, not the NIC.

For the benefit of anyone else that comes across this thread and wants 
to know how to get the new driver working, here are the steps I followed 
(lines beginning with a hyphen "-" are comments, not actual shell commands):



cd /root

fetch http://www.se.hiroshima-u.ac.jp/~shigeaki/software/nfe-20070106.tar.gz


- You may need a different patch to the one below depending on the 
motherboard you have... this patch is the right one for my Asus A8N-E, 
but all the details are on the patch website 
http://www.se.hiroshima-u.ac.jp/~shigeaki/software/freebsd-nfe.html

fetch 
http://www.se.hiroshima-u.ac.jp/~shigeaki/software/e1000phy.20061219.fbsd62.patch

tar -xzvf nfe-20070106.tar.gz

cp -r nfe-20070106 /usr/src/sys/dev/nfe

cp e1000phy.20061219.fbsd62.patch /usr/src/sys/dev/mii/

cd /usr/src/sys/dev/mii/

patch < e1000phy.20061219.fbsd62.patch

cd /usr/src/sys/amd64/conf/

- Not sure that you need to do the next step, but I thought it would be 
safer to remove the nve(4) driver from the kernel so it didn't get 
confused... you can still kldload it later if needed

edit GENERIC

- Comment out the line "device          nve             # nVidia nForce 
MCP on-board Ethernet Networking"



rm -rf ../compile/GENERIC

config GENERIC

cd ../compile/GENERIC

make cleandepend && make depend && make && make install

cd /usr/src/sys/dev/nfe

make

make install

shutdown -r now


- You should now have if_nve.ko in /boot/kernel/ and you can kldload it 
using "kldload if_nfe" or load it at startup by sticking 
if_nfe_load="YES" in /boot/loader.conf.

- Good times from here on in!




Thanks again for the help.


Regards,
Lawrence Stewart
Comment 5 Lawrence Stewart 2007-02-08 05:05:14 UTC
Hi Scot,

Thanks for the reply.

Scot Hetzel wrote:
> On 2/6/07, Lawrence Stewart <lstewart@room52.net> wrote:
>> I could not find a fix for the problem.
>>
>> I also could not figure out why explicitly setting the NIC to sync at 
>> 100baseTX full-duplex made the NIC work, but only at a fraction of 
>> 100Mbps capacity.
>>
>> A work around is to cause the switch port the NIC is plugged into to 
>> only allow 100baseTX, so that the NIC will autoselect at 100baseTX 
>> and work at reasonable speed. You could also do as I did and stick 
>> small 100baseTX switch in between the machine and gigabit switch.
>>
> Have you tried forcing the switch switch and the NIC to 1000baseT Full 
> Duplex?
>
> Sometime times auto-negotiation fails, so forcing speed should fix it.
>
I don't have access to the actual Cisco switches, as they are maintained 
by our IT services department. Forcibly setting the switch to 1000 Mbps 
is kind of besides the point anyway. The problem is that when the NIC is 
autosensing (default state) and plugged into an autonegotiating 1 Gbps 
switch port (default for all switches I've ever worked with), the driver 
appears to be functional for all intensive purposes, except for the fact 
that no packets can be accessed. That is confusing for the user and is 
the major problem with the nve(4) driver in its current state. Having to 
put a call into the IT dept and trying to explain that you need them to 
manually set the sync speed on a port is unlikely to be well received... 
it's also simply another work around, rather than a fix.

Regards,
Lawrence Stewart
Comment 6 David E. O'Brien freebsd_committer freebsd_triage 2008-02-05 18:00:11 UTC
State Changed
From-To: open->suspended

It is hard to do much with nve due to it being mostly a binary blob. 
In FreeBSD 7.0, one should use nfe(4) instead of nve(4).
Comment 7 Andriy Gapon freebsd_committer freebsd_triage 2010-12-05 10:01:48 UTC
Lawrence,

how do you feel about this PR?
Did David's suggestion work for you?

-- 
Andriy Gapon
Comment 8 Lawrence Stewart 2010-12-13 04:31:09 UTC
On 12/05/10 21:01, Andriy Gapon wrote:
> 
> Lawrence,
> 
> how do you feel about this PR?
> Did David's suggestion work for you?
> 

We've been running with nfe on those machines since shortly after I
logged that PR and the machines have been running (and continue to run)
stably since. The PR should be closed.

Cheers,
Lawrence
Comment 9 Andriy Gapon freebsd_committer freebsd_triage 2010-12-13 10:34:03 UTC
State Changed
From-To: suspended->closed

Closing per originator's request: nfe is a better replacement 
for proprietary nve.