Bug 193152

Summary: mlxen driver problem with MT26448 interface
Product: Base System Reporter: gnoma <gnoma_86>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Open ---    
Severity: Affects Some People CC: emaste, trasz
Priority: ---    
Version: 10.0-RELEASE   
Hardware: amd64   
OS: Any   

Description gnoma 2014-08-30 16:29:23 UTC
Hello,

I had problem using the driver mlxen from the ports tree. I built it in the kernel, tried also loading it dynamically, it's the same issue. 
When I start iSCSI daemon (tried ctld and istgt, same issue) I cannot connect to the iSCSI from the initiator and I got the following message in the syslog:

Aug 29 11:15:19 sentinel kernel: WARNING: 10.0.80.2 (iqn.1991-05.com.unixhomenet:hola-pc): no ping reply (NOP-Out) after 5 seconds; dropping connection
Aug 29 11:15:51 sentinel kernel: cfiscsi_ioctl_handoff: new connection from iqn.1991-05.com.unixhomenet:hola-pc (10.0.80.2) to iqn.sentinel.deltanews.lan:test123
Aug 29 11:15:58 sentinel kernel: WARNING: 10.0.80.2 (iqn.1991-05.com.unixhomenet:hola-pc): no ping reply (NOP-Out) after 5 seconds; dropping connection
Aug 29 11:16:01 sentinel kernel: cfiscsi_ioctl_handoff: new connection from iqn.1991-05.com.unixhomenet:hola-pc (10.0.80.2) to iqn.sentinel.deltanews.lan:test123
Aug 29 11:16:21 sentinel kernel: WARNING: 10.0.80.2 (iqn.1991-05.com.unixhomenet:hola-pc): no ping reply (NOP-Out) after 5 seconds; dropping connection
Aug 29 11:17:15 sentinel kernel: cfiscsi_ioctl_handoff: new connection from iqn.1991-05.com.unixhomenet:hola-pc (10.0.80.2) to iqn.sentinel.deltanews.lan:test123
Aug 29 11:17:20 sentinel kernel: WARNING: 10.0.80.2 (iqn.1991-05.com.unixhomenet:hola-pc): no ping reply (NOP-Out) after 5 seconds; dropping connection


Also noticed some package lost:

root@sentinel:/var/log # ping -f 10.0.80.2
PING 10.0.80.2 (10.0.80.2): 56 data bytes
.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................^C
--- 10.0.80.2 ping statistics ---
168461 packets transmitted, 167952 packets received, 0.3% packet loss
round-trip min/avg/max/stddev = 0.020/0.031/0.360/0.011 ms
root@sentinel:/var/log #


switching iSCSI traffic, or testing the connection with flooding ping via standard Intel 1GB interface makes no trouble:

root@sentinel:/var/log # ping -f 192.168.2.101
PING 192.168.2.101 (192.168.2.101): 56 data bytes
.^C
--- 192.168.2.101 ping statistics ---
44140 packets transmitted, 44139 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.105/0.303/0.466/0.062 ms
root@sentinel:/var/log #

Building the last version of the driver provided by mellanox website (http://www.mellanox.com/downloads/Drivers/MLNX_EN_FreeBSD_v2.1.tgz) fixed the iSCSI issue, but flooding ping still reports some package loss.
Also trying to load dynamically the driver from mellanox during boot using loader.conf seem to fail, so I had to put the kldload command in rc.local, or in /etc/rc.d/crld in order to assure it's being loaded at system boot and interface is configured before iSCSI target daemon starts.

I can also provide tcpdumps if needed.
Comment 1 Edward Tomasz Napierala freebsd_committer freebsd_triage 2014-09-09 14:46:33 UTC
Looks like a generic ethernet driver problem causing packet loss and thus iSCSI session drops.
Comment 2 gnoma 2014-09-10 07:01:57 UTC
Hello,

What I found out is that even if I ping my own IP, I still got package lost:

root@sentinel:~ # ifconfig | grep 10.0. |grep inet
	inet 10.0.80.1 netmask 0xffffff00 broadcast 10.0.80.255 
root@sentinel:~ # 
root@sentinel:~ # ping -f 10.0.80.1
PING 10.0.80.1 (10.0.80.1): 56 data bytes
.................................................................................................................................................................................................................................................^C
--- 10.0.80.1 ping statistics ---
838 packets transmitted, 597 packets received, 28.8% packet loss
round-trip min/avg/max/stddev = 0.021/0.028/0.058/0.006 ms
root@sentinel:~ # 

this is really strange.

I saw that with PF firewall, it's recommended to use maximum 1GB interface. However:

root@sentinel:~ # /etc/rc.d/pf stop
Disabling pf.
root@sentinel:~ # ping -f 10.0.80.1
PING 10.0.80.1 (10.0.80.1): 56 data bytes
...........................................................................................................................................................................................................................................^C
--- 10.0.80.1 ping statistics ---
832 packets transmitted, 597 packets received, 28.2% packet loss
round-trip min/avg/max/stddev = 0.028/0.041/0.109/0.009 ms
root@sentinel:~ #

Even without PF I am still losing packages to my own interface.
Comment 3 gnoma 2014-09-28 14:42:44 UTC
Update: 
The package lost seems to be because of this kernel parameter:

net.inet.icmp.icmplim: 200


Which appears to be quite normal. Setting the icmplimit to few thousand makes the flooding ping package lost go away. However the issue with iscsi running on ctld remains.
Comment 4 Edward Tomasz Napierala freebsd_committer freebsd_triage 2014-09-28 18:29:23 UTC
You mean, you no longer suffer packet loss, but you still have iSCSI sessions disconnected due to timeouts?  Could you try disabling TSO on that interface?
Comment 5 gnoma 2014-10-09 11:20:50 UTC
Hello,

Disabling TSO did the trick, now it's working normaly with iSCSI. 

No issues. However with the driver provided by mellanox I didn't have this issue.

Without TSO I got only a little more CPU load? Or it will interrupt other stuff?

Thank you.
Comment 6 Edward Tomasz Napierala freebsd_committer freebsd_triage 2014-10-09 18:34:09 UTC
Just a CPU load.  Could you check if the problem persists in 10.1-RC1?  There were some significant TSO fixes.
Comment 7 gnoma 2014-10-10 06:20:31 UTC
I am sorry, this is a production system and I can't upgrade or reinstall the OS. I have a spare system, but I can't remove the 10GB LAN card from the production one :( 
There's no way that I can test it until the next maintenance window - the next 3 mounts. 

Sorry :(
Comment 8 Eitan Adler freebsd_committer freebsd_triage 2018-05-20 23:53:38 UTC
For bugs matching the following conditions:
- Status == In Progress
- Assignee == "bugs@FreeBSD.org"
- Last Modified Year <= 2017

Do
- Set Status to "Open"