Bug 248324 - qlxge: stability issues (unusable) with QLogic QLE8142 NIC
Summary: qlxge: stability issues (unusable) with QLogic QLE8142 NIC
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-28 13:45 UTC by Kira
Modified: 2023-01-01 07:08 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kira 2020-07-28 13:45:18 UTC
Short: QLogic QLE8142 Convergent network adapters do not work properly on FreeBSD (10.2 and 12.1 tested).
Low speeds and connection dropouts.
But they work perfectly with Linux (qlge driver)
I was suggested that this can be related to mismatched mtu size somewhere.
I also suspect that this somehow can be related to flowcontrol and media detection.


Long: I have two 10G Qlogic converged network cards.
QLogic QLE8142-IBMX (42C1802)
QLogic QLE8142-SR-IBM (46K8088)
They are basically the same(looks and works same), both flashed with latest firmware (from Qlogic support) (other firmware versions have no effects on unstable behavior)

They are transceiver specific and I use them with Qlogic FLTX8571D3BCL-QL optical transceivers (it seems they support only fiber)

I am testing them with iperf3.

I setup direct fiber link between two NICs on different PCs with different OS
Like this (ifconfig ql0 inet 192.168.10.1/24) and (192.168.10.2/24) for second PC.


On FreeBSD this card got identified as:
ql0: <Qlogic ISP 8000 PCI CNA Adapter-Ethernet Function v2.0.0> port 0xe300-0xe3ff mem 0xfea0c000-0xfea0ffff,0xfe800000-0xfe8fffff at device 0.0 on pci1
ql0: 
qls_pci_attach: ha 0xfffffe0000696000 pci_func 0x0  msix_count 0x1 pci_reg 0xfffff80003720700 pci_reg1 0xfffff80003720600
ql0: Ethernet address: 00:c0:dd:26:25:80


On Linux:
qlge 0000:01:00.0: QLogic 10 Gigabit PCI-E Ethernet Driver 
qlge 0000:01:00.0: Driver name: qlge, Version: 1.00.00.35.
qlge 0000:01:00.0 eth1: Link is down.
qlge 0000:01:00.0 eth1: Clearing MAC address
qlge 0000:01:00.0 eth1: Function #0, Port 0, NIC Roll 0, NIC Rev = 1, XG Roll = 0, XG R>
qlge 0000:01:00.0 eth1: MAC address 00:c0:dd:26:25:80


What works:
OS Linux (Arch and Debian tested) qlge (https://cateee.net/lkddb/web-lkddb/QLGE.html) kernel module.
QLE8142 -> fiber -> any transceiver(including Qlogic one) + any 10G card at any os(except QLE8142 card on FreeBSD)
mtu 9000 and mtu 1500
All great, I see channel saturation and no errors. Different offload settings have small effect on anything.

What does not work:
QLE8142 card on FreeBSD with any card in other end (including QLE8142 on Linux)
OS FreeBSD qlxge (https://www.freebsd.org/cgi/man.cgi?query=qlxge) kernel module, FreeBSD10.2 and FreeBSD12.1 tested.

I get unstable behavior.
Low link speed, or dropout of link when test start.

It is unstable, so, it is hard to describe whats going on.

One example:
iperf3 connecting from Linux box to FreeBSD12.1 with QLE8142 - all seems fine.
iperf3 connecting from FreeBSD12.1 with QLE8142 to Linux box - low unstable speed (2Gb/s - 5Gb/s) then dropout to zero. Link seems to stay active but no new connection can pass at all(for some time)

This somehow gets affected by -tso setting, but unstable behavior just become different.
On FreeBSD10.2 with -tso setting and mtu 1500 - card can work stable for some time, but sometimes it is not.
With mtu 9000 - unstable, drop of connections.


Another example:
Testing two direct connected QLE8142 cards, one on Linux other on FreeBSD12.1
All default. mtu 1500.
I get 8.2Gb/s from Linux to BSD, 1Gb/s from BSD to Linux.

When on FreeBSD I disable TSO, I get 5Gb/s From BSD to Linux.
And 4Gb/s in both directions simultaneously.

Then. I set mtu 9000 on both ends.
From Linux to BSD - drop of connection.
From BSD to Linux 2Gb/s

mtu 9000 and disable TSO on FreeBSD end
From Linux to BSD - drop of connection.
From BSD to Linux 9,5Gb/s


FreeBSD show messages in log such as:
ql0: qls_mbx_get_link_status 0x00004000 0x10000051 0x00000000 0x00000037 0x000000f9 0x05050504
ql0: qls_hw_send: tx_free[0] = 2
ql0: qls_mbx_get_link_status 0x00004000 0x10000051 0x00000000 0x00000037 0x000000fd 0x05050504
ql0: qls_mbx_isr: AEN [0x00008012 0x00000060 0x00000037 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000 0x00000000]
ql0: qls_rx_comp: DS bit not set 
ql0: qls_rx_comp: (rxb->paddr != cq_e->b_paddr)[0x55270000, 0x5527c800] 
ql0: qls_rx_comp: (rxb->paddr != cq_e->b_paddr)[0x55270000, 0x35b3cc000] 
ql0: qls_rx_comp: (rxb->paddr != cq_e->b_paddr)[0x55270000, 0x53fe4000]


FreeBSD ifconfig -m ql0 output:
ql0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=c013b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,TSO4,VLAN_HWTSO,LINKSTATE>
	capabilities=c013b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,TSO4,VLAN_HWTSO,LINKSTATE>
	ether 00:c0:dd:26:25:80
	inet 192.168.10.1 netmask 0xffffff00 broadcast 192.168.10.255
	media: Ethernet autoselect (10Gbase-SR <full-duplex>)
	status: active
	supported media:
		media autoselect
		media autoselect mediaopt full-duplex
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>



From Linux:
sudo ethtool enp1s0f0

netlink error: No such file or directory
Settings for enp1s0f0:
	Supported ports: [ FIBRE ]
	Supported link modes:   10000baseT/Full 
	Supported pause frame use: No
	Supports auto-negotiation: No
	Supported FEC modes: Not reported
	Advertised link modes:  10000baseT/Full 
	Advertised pause frame use: No
	Advertised auto-negotiation: No
	Advertised FEC modes: Not reported
	Speed: 10000Mb/s
	Duplex: Full
	Port: FIBRE
	PHYAD: 0
	Transceiver: internal
	Auto-negotiation: off
	Supports Wake-on: d
	Wake-on: d
	Current message level: 0x000060f7 (24823)
			       drv probe link ifdown ifup rx_err tx_err hw wol
	

I can provide ssh access to test environment (email me)
Comment 1 adamstouffer13 2023-01-01 07:08:45 UTC
I'm seeing a similar issue with a QLE8140 card on 13.1-RELEASE amd64. The difference is the card is stable but fills the logs with error messages. Running iperf shows good throughput, as much as my switch allows. The card works in Linux and Windows.


ql0: <Qlogic ISP 8000 PCI CNA Adapter-Ethernet Function v2.0.0> port 0xe100-0xe1ff mem 0xfb384000-0xfb387fff,0xfb200000-0xfb2fffff irq 32 at device 0.0 on pci3


ql0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=c013b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,TSO4,VLAN_HWTSO,LINKSTATE>
        capabilities=c013b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,TSO4,VLAN_HWTSO,LINKSTATE>
        ether 00:c0:dd:12:ae:90
        inet 192.168.1.32 netmask 0xffffff00 broadcast 192.168.1.255
        media: Ethernet autoselect (10Gbase-SR <full-duplex>)
        status: active
        supported media:
                media autoselect
                media autoselect mediaopt full-duplex
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>




Dec 31 20:55:43 nas kernel: ql0: qls_mbx_isr: AEN [0x00008110 0x08000704 0x00100041 0x00100041 0x00000003 0x00000000 0x00000000 0x00000000 0x00000000]
Dec 31 20:56:13 nas kernel: ql0: qls_mbx_isr: AEN [0x00008110 0x08000704 0x00100042 0x00100042 0x00000003 0x00000000 0x00000000 0x00000000 0x00000000]
Dec 31 20:56:43 nas kernel: ql0: qls_mbx_isr: AEN [0x00008110 0x08000704 0x00100043 0x00100043 0x00000003 0x00000000 0x00000000 0x00000000 0x00000000]
Dec 31 20:57:13 nas kernel: ql0: qls_mbx_isr: AEN [0x00008110 0x08000704 0x00100044 0x00100044 0x00000003 0x00000000 0x00000000 0x00000000 0x00000000]