Bug 239216 - Problem with ix(4) driver on systems with two Intel X520 cards
Summary: Problem with ix(4) driver on systems with two Intel X520 cards
Status: New
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net mailing list
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2019-07-15 09:41 UTC by Sascha
Modified: 2019-07-15 13:33 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sascha 2019-07-15 09:41:30 UTC
May relate to: bug #221317

---------------------------
Symptoms of the error
---------------------------

Network cards become unusable
---------------------------

ifconfig up/down cycle does not bing the card back online. The card stays in a "lights on - protocol down" state.

the switch shows the following:

Dell Switch S4048-ON  # show interfaces Te1/4 link-status 

Interface Name                 : TenGigabitEthernet 1/4

PORT Link State                : DOWN

    PMA/PMD RX Status          : Up

    PCS RX Status              : Down

    PHY XS TX Status           : N/A

Dell Switch S4048-ON # 

SFP+ plug/unplug does not have any effect.

Soft powercylce may but mostly does not cure the problem.

Only chassis poweroff helps.

Errors logged to console
ix2: Setup failure - unsupported SFP+ module type.
ix1: Setup failure - unsupported SFP+ module type.


When does the error occur?
---------------------------

We have a setting with two lagg interfaces, each using two ports from different cards

ifconfig_lagg0="up laggproto lacp laggport ix0 laggport ix2"

ifconfig_lagg1="up laggproto lacp laggport ix1 laggport ix3"

The error occurs because the lagg(4) code shuts the attached laggport and un-shuts them afterwards, this may occur concurrently.

The error does not occur when the no ifconfig_ix* and no ifconfig_lagg* vars are NOT set in /etc/rc.conf. If in this condition the lagg ports are setup sequentially with several seconds time in between and the laggports are added sequentially with several seconds time in between the error does NOT occur.

Further the error does not occur when all switchports are shutdown and are un-shut after the server has booted sequentially with several seconds time in between.


Hypothesis:
---------------------------

Concurrent interface up/down events lead to errors on system with two dual Port Intel [Fillme X540] cards.


Provokation / reproduction of the error:
---------------------------

concurrent ifconfig up/down on all four interfaces reproducibly provokes the error

#!/bin/sh
set -x

trap cleanup 1 2 3 6

cleanup()

{
  echo "Caught Signal ... cleaning up."
  kill $( jobs -p )
  echo "Done cleanup ... quitting."
  exit 1
}

RUN=2000

for ifn in ix0 ix1 ix2 ix3 ; do

	# sh -c "for I in $(seq ${RUN}) ; do ifconfig ${ifn} down ; ifconfig ${ifn} up ; done" &

	bash -c "$(printf 'for I in $(seq %s) ; do ifconfig %s down ; ifconfig %s up ; done\n' ${RUN} ${ifn} ${ifn})" &

done