Bug 239216

Summary: Problem with ix(4) driver on systems with two Intel X520 cards
Product: Base System Reporter: Sascha <sk>
Component: kernAssignee: Intel FreeBSD <freebsd>
Status: Closed FIXED    
Severity: Affects Some People CC: pkubaj
Priority: --- Keywords: IntelNetworking
Version: CURRENT   
Hardware: amd64   
OS: Any   

Description Sascha 2019-07-15 09:41:30 UTC
May relate to: bug #221317

---------------------------
Symptoms of the error
---------------------------

Network cards become unusable
---------------------------

ifconfig up/down cycle does not bing the card back online. The card stays in a "lights on - protocol down" state.

the switch shows the following:

Dell Switch S4048-ON  # show interfaces Te1/4 link-status 

Interface Name                 : TenGigabitEthernet 1/4

PORT Link State                : DOWN

    PMA/PMD RX Status          : Up

    PCS RX Status              : Down

    PHY XS TX Status           : N/A

Dell Switch S4048-ON # 

SFP+ plug/unplug does not have any effect.

Soft powercylce may but mostly does not cure the problem.

Only chassis poweroff helps.

Errors logged to console
ix2: Setup failure - unsupported SFP+ module type.
ix1: Setup failure - unsupported SFP+ module type.


When does the error occur?
---------------------------

We have a setting with two lagg interfaces, each using two ports from different cards

ifconfig_lagg0="up laggproto lacp laggport ix0 laggport ix2"

ifconfig_lagg1="up laggproto lacp laggport ix1 laggport ix3"

The error occurs because the lagg(4) code shuts the attached laggport and un-shuts them afterwards, this may occur concurrently.

The error does not occur when the no ifconfig_ix* and no ifconfig_lagg* vars are NOT set in /etc/rc.conf. If in this condition the lagg ports are setup sequentially with several seconds time in between and the laggports are added sequentially with several seconds time in between the error does NOT occur.

Further the error does not occur when all switchports are shutdown and are un-shut after the server has booted sequentially with several seconds time in between.


Hypothesis:
---------------------------

Concurrent interface up/down events lead to errors on system with two dual Port Intel [Fillme X540] cards.


Provokation / reproduction of the error:
---------------------------

concurrent ifconfig up/down on all four interfaces reproducibly provokes the error

#!/bin/sh
set -x

trap cleanup 1 2 3 6

cleanup()

{
  echo "Caught Signal ... cleaning up."
  kill $( jobs -p )
  echo "Done cleanup ... quitting."
  exit 1
}

RUN=2000

for ifn in ix0 ix1 ix2 ix3 ; do

	# sh -c "for I in $(seq ${RUN}) ; do ifconfig ${ifn} down ; ifconfig ${ifn} up ; done" &

	bash -c "$(printf 'for I in $(seq %s) ; do ifconfig %s down ; ifconfig %s up ; done\n' ${RUN} ${ifn} ${ifn})" &

done
Comment 1 Piotr Kubaj freebsd_committer freebsd_triage 2023-02-03 17:38:31 UTC
The error is clear: "unsupported SFP+ module type". Can you write what SFP+ modules you use?
Comment 2 Piotr Kubaj freebsd_committer freebsd_triage 2023-05-05 15:33:23 UTC
Closing - no response for 3 months.