Bug 233114 - [ix] X520 NIC drops link ("no carrier") after several runs of "pktgen -f rx -i ix0"
Summary: [ix] X520 NIC drops link ("no carrier") after several runs of "pktgen -f rx -...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-net mailing list
URL:
Keywords: IntelNetworking
Depends on:
Blocks:
 
Reported: 2018-11-10 19:08 UTC by Lev A. Serebryakov
Modified: 2018-12-07 13:10 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lev A. Serebryakov freebsd_committer 2018-11-10 19:08:49 UTC
'82599ES 10-Gigabit SFI/SFP+ Network Connection' with optical SFP+ module connected to other such card with point-to-point link (no 10G switch inbetween) drop down link ("status: no carrier") after several runs of nermap's pkt-gen with receive function ("pkt-gen -f rx -i ix0 -N").

ifconfig ix0 down up doesn't help, only full system reboot helps.

It is 13-CURRENT r34031
Comment 1 Stephen Hurd freebsd_committer 2018-11-12 16:32:52 UTC
Does forcing the media type at one or both ends re-establish link when in the failed state?
Comment 2 Lev A. Serebryakov freebsd_committer 2018-11-12 16:37:01 UTC
(In reply to Stephen Hurd from comment #1)
I'll try next time it happens.

It is not 100% reproducible, though.

After power cycle it worked for rest of day.
Comment 3 Charles Goncalves 2018-11-13 08:30:34 UTC
(In reply to Lev A. Serebryakov from comment #2)

Can you look if is related to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317 ?

Try ifconfig down/up several times like in this post script.
Comment 4 Lev A. Serebryakov freebsd_committer 2018-11-18 21:22:26 UTC
New data: 

(1) I can not reproduce this with pkt-gen anymore

(2) I can reproduce problem with shell loop of ifconfig down / ifconfig up (As in PR221317)

When it failed it says, that it could not initialize "unsupported SFP+ type".

After that setting media with "ifconfig ix0 media" doesn't work  — ifconfig complains about failed ioctl().

System reboot helps.
Comment 5 Lev A. Serebryakov freebsd_committer 2018-11-28 14:20:58 UTC
I've reproduced it again with pkt-gen, and ifconfig could not set media, error is "Device not configured".
Comment 6 Jeff Pieper 2018-11-28 15:35:49 UTC
X540 is 10GBASE-T, so if your device really is an 82599ES SFP+, then it is X520 :)
Comment 7 Lev A. Serebryakov freebsd_committer 2018-11-28 15:37:16 UTC
(In reply to Jeff Pieper from comment #6)
Yep, my NICs are X520-DA1 and X520-DA2
Comment 8 Lev A. Serebryakov freebsd_committer 2018-11-28 15:38:37 UTC
(In reply to Jeff Pieper from comment #6)
It looks the same as https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=221317 but for 12-STABLE and 13-CURRENT, not 11-STABLE, and drivers were changed A LOT. But symptoms are exactly the same.
Comment 9 Lev A. Serebryakov freebsd_committer 2018-11-28 16:02:33 UTC
If I build driver as module, kldunload + kldload helps.
Comment 10 Lev A. Serebryakov freebsd_committer 2018-11-29 09:04:39 UTC
More data. It is r340913 (version becomes truncated in first comment).

"ifconfig down && ifconfig up" loop could not reproduce this bug anymore.

"pkt-gen -f rx -i ix0" works.

But "pkt-gen -f tx -i ix0" kills NIC after 5 or 6 runs, and it is 100% reproducible.
Comment 11 Lev A. Serebryakov freebsd_committer 2018-11-29 13:11:29 UTC
kldunload could crash system, so it is not very viable workaround for automatic benchmarking :-(
Comment 12 Eric Joyner freebsd_committer 2018-12-04 21:26:36 UTC
Was https://reviews.freebsd.org/rS341156 intended to fix this problem? It would affect the TX side of things.
Comment 13 Lev A. Serebryakov freebsd_committer 2018-12-05 00:04:07 UTC
(In reply to Eric Joyner from comment #12)
Unfortunately, no. This fix helps to run automated benchmarks multiple times (as pkt-gen with fixed number of packets can finish successfully now), but ix0 still drops link sometimes.
Comment 14 Piotr Pietruszewski 2018-12-07 13:10:32 UTC
(In reply to Lev A. Serebryakov from comment #13)

The bug seems to be fixed by applying patch D18468 which is currently under review ( https://reviews.freebsd.org/D18468 ). Please let me know if the patch solves your problem.