Bug 264059 - mlx4en(4) Mellanox ConnectX-3 10g eth not working in 13.1: mlx4_core0: Unable to determine PCI device chain minimum BW
Summary: mlx4en(4) Mellanox ConnectX-3 10g eth not working in 13.1: mlx4_core0: Unable...
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Hans Petter Selasky
URL:
Keywords: regression
: 264058 (view as bug list)
Depends on:
Blocks: 264030
  Show dependency treegraph
 
Reported: 2022-05-18 03:34 UTC by crb
Modified: 2022-09-27 08:32 UTC (History)
3 users (show)

See Also:
koobs: mfc-stable13?
koobs: mfc-stable12-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description crb 2022-05-18 03:34:01 UTC
I had a 13.0 system fully working.  It has a Mellanox 10g ether card (ConnectX-3 I think) that worked without issue.  I replaced the SSD and installed 13.1 and now the card doesn't work.  It shows up in dmesg:

mlx4_core0: mlx4_shutdown was called
mlx4_core0: <mlx4_core> mem 0xfb700000-0xfb7fffff,0x7fff800000-0x7fffffffff irq 30 at device 0.0 on pci4
mlx4_core: Mellanox ConnectX core driver v3.7.1 (November 2021)
mlx4_core: Initializing mlx4_core
mlx4_core0: Unable to determine PCI device chain minimum BW
mlx4_core0: mlx4_shutdown was called
mlx4_core0: <mlx4_core> mem 0xfb700000-0xfb7fffff,0x7fff800000-0x7fffffffff irq 30 at device 0.0 on pci4
mlx4_core: Mellanox ConnectX core driver v3.7.1 (November 2021)
mlx4_core: Initializing mlx4_core
mlx4_core0: Unable to determine PCI device chain minimum BW
mlx4_core0: mlx4_shutdown was called
mlx4_core0: <mlx4_core> mem 0xfb700000-0xfb7fffff,0x7fff800000-0x7fffffffff irq 30 at device 0.0 on pci4
mlx4_core: Mellanox ConnectX core driver v3.7.1 (November 2021)
mlx4_core: Initializing mlx4_core
mlx4_core0: Unable to determine PCI device chain minimum BW

But doesn't show up in ifconfig:

crb@eclipse:294> ifconfig -a
em0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=481049b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LRO,VLAN_HWFILTER,NOMAP>
	ether 00:1b:21:1d:12:ad
	inet6 fe80::21b:21ff:fe1d:12ad%em0 prefixlen 64 scopeid 0x1
	inet6 2600:1700:5430:10b1:21b:21ff:fe1d:12ad prefixlen 64 autoconf
	inet 192.168.1.190 netmask 0xffffff00 broadcast 192.168.1.255
	media: Ethernet autoselect (100baseTX <full-duplex>)
	status: active
	nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
igc0: flags=8822<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
	options=4e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6,NOMAP>
	ether 3c:7c:3f:4e:0b:c6
	media: Ethernet autoselect
	status: no carrier
	nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
	options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
	inet6 ::1 prefixlen 128
	inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
	inet 127.0.0.1 netmask 0xff000000
	groups: lo
	nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

I'm happy to build kernels, debug what ever is necessary to help fix the problem.
Comment 1 crb 2022-05-18 03:55:26 UTC
ASUS TUF GAMING X570-PRO motherboard
AMD 5950 processor
Comment 2 Mark Linimon freebsd_committer freebsd_triage 2022-05-18 04:29:12 UTC
*** Bug 264058 has been marked as a duplicate of this bug. ***
Comment 3 Hans Petter Selasky freebsd_committer freebsd_triage 2022-05-18 06:59:57 UTC
Hi,

Did you try to set the port-type to ethernet, "eth" ?

There is a SYSCTL somewhere, try to grep for it:

sysctl -a | grep mlx4_port

--HPS
Comment 4 crb 2022-05-18 08:04:44 UTC
seems to be already set:

crb@eclipse:88> sysctl -a | grep mlx4_port
sys.device.mlx4_core0.mlx4_port1_mtu: 4096
sys.device.mlx4_core0.mlx4_port1: eth
Comment 5 Hans Petter Selasky freebsd_committer freebsd_triage 2022-05-18 08:42:26 UTC
Is mlx4en loaded?

--HPS
Comment 6 crb 2022-05-18 08:44:29 UTC
trimed:

crb@eclipse:89> kldstat 
Id Refs Address                Size Name
 1   92 0xffffffff80200000  1f30590 kernel
14    1 0xffffffff82bbe000    3ef50 mlx4.ko
Comment 7 Hans Petter Selasky freebsd_committer freebsd_triage 2022-05-18 08:45:34 UTC
I think you need to load mlx4en aswell, try:

kldload mlx4en

--HPS
Comment 8 Hans Petter Selasky freebsd_committer freebsd_triage 2022-05-19 08:00:38 UTC
This error message is expected and not harmful.

mlx4_core0: Unable to determine PCI device chain minimum BW

--HPS
Comment 9 Hans Petter Selasky freebsd_committer freebsd_triage 2022-05-23 08:02:38 UTC
Ping

--HPS
Comment 10 crb 2022-05-23 08:04:52 UTC
Sorry, I thought I left a message and closed this.  I was, indeed, missing the module you suggested.  When I loaded that the machine started working.  Thank you!
Comment 11 Hans Petter Selasky freebsd_committer freebsd_triage 2022-05-23 08:12:34 UTC
Thank you!

Then I'll close this issue.

--HPS
Comment 12 Kubilay Kocak freebsd_committer freebsd_triage 2022-06-02 02:21:54 UTC
^Triage: Assign to committer that resolved and correct resolution
Comment 13 Michael Meiszl 2022-09-27 08:17:03 UTC
I would not say this issue is "fixed and closed".
Today I installed a new fbsd box with a 13.1 memstick, installation was not able to configure and use the network, because the mellanox module was missing.

it does not make sense to build in a kernel driver that can detect the card, but to forget to load the dependency module too.

So I would like to reopen this bug report and have somebody really think about how to prevent it in the future.
Comment 14 Hans Petter Selasky freebsd_committer freebsd_triage 2022-09-27 08:20:02 UTC
Maybe something can be done by devd and the device driver loading framework.

Currently the mlx4en driver is old and not actively maintained.
Comment 15 Michael Meiszl 2022-09-27 08:32:24 UTC
(In reply to Hans Petter Selasky from comment #14)
if old and not maintained does not really matter as long as it works. I doubt it would do any harm to be automatically be included as the main mlx4 driver is now.
(I dont know why they are split)

The cards are good and can be bought cheap as used ones. This allows low budget entries into the 10Gbe area, so I bet, they will stay on for another 10yrs or so.