Summary: | iflib: if_igb(4) and some if_em(4) devices don't recognize/report carrier loss. | ||
---|---|---|---|
Product: | Base System | Reporter: | Harald Schmalzbauer <bugzilla.freebsd> |
Component: | kern | Assignee: | freebsd-net (Nobody) <net> |
Status: | Closed DUPLICATE | ||
Severity: | Affects Many People | CC: | chris, emaste, krzysztof.galazka, net, webmaster |
Priority: | --- | Keywords: | needs-qa |
Version: | 12.0-STABLE | ||
Hardware: | Any | ||
OS: | Any | ||
URL: | https://lists.freebsd.org/pipermail/freebsd-net/2018-May/050497.html | ||
See Also: |
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239240 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236724 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240818 |
Description
Harald Schmalzbauer
2019-09-18 06:20:09 UTC
Sorry, need to correct myself, much more if_em(4) _and_ if_igb(4) NICs are affected! Took all Intel NICs I have available to verify (in an S1200v3RP server): 82574L (Hartwell) - if_em(4) attaches And the followig if_igb(4) NICs: 82576 (Kawela) i210 (Springville) i350 (StonyLake) According to ifconfig(8), all NICs above report active link state, with the initial media speed, no matter if a cable is attached at all, or the link speed changed (100->1000 but ifconfig still shows 100Base-TX). So currently, with 12.1-PRERELEASE r352354, I can see the issue with all Intel NICs I have available for testing, with one exception: On a different machine with on-board i217-V (Clarkville, the one I reported in my initial post), the link state swiftly and correctly changes. Unfortunately I won't find time to look into the svn hisroty to see if somebody fixed i217 back then or inspect anything in more detail – for the next few weeks at least :-( Hope this can be fixed before 12.1 ships. Thanks, -harry This issue was also reported in this PR:https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239240 We were able to reproduce it and are working on a fix. Closing per comment 2 *** This bug has been marked as a duplicate of bug 239240 *** Initially, https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239240 describes a different issue and I'd like to keep that separate, although John Delano added comment #3, where he mentions the same link state issue. But the proposed fix indicates this is an independent issue! The proposed phabriactor patch in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239240#c13 (https://reviews.freebsd.org/D21769) doens't fix the problem. There's progress though: After once marking the interface down and up again, the link state correctly changes from there on. But right after loading the kernel module, link state still doesn't change when physical connection was lost (repeatedly won't change, unless you manually down/up). Tested with 12.1-beta1, D21769+D21712 and i210, i350, 82576 (all hw.mac.type igb ) and 82574L (em) - consistently the same behaviour. Thanks, -Harry (In reply to John Delano from comment #16) in Bug 239240 Thanks for your details. Besides the link state issue and the initial issue reported in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240658 we are referring a third issue/fix with your latest comment – the first-packet completion issue. Since you mention that you didn't apply the proposed fix for the real 240658 issue, I use this bug report for my comment, which describes one of the issues you addressed and to which your feedback corresponds better. Regarding the 1st-packet-fix: Thanks to the very detailed commit log in r343369 from erj@, he explains why r343369 replaces r341156, r340310 and r341156. This new fix (r343369) was MFCd in 344097 by marius, back in february. How/why do you apply https://reviews.freebsd.org/D18545? Regarding the topic (and link-state-fix https://reviews.freebsd.org/D21769): I cannot reproduce this odd 1st-linkdetection-failure in my ESXi passthrough environment! But I intentionally took the plain iron and packed all intel NICs I have into it, to do the test without any hypervisor/passthrough in between. I did the test twice, since it seemed very odd to me too: After cold-start booting from ISO with latest 12.1-BETA1 with https://reviews.freebsd.org/D21769 applied, the link state still reports the last state/speed before unplugging, and does not catch the line drop, which changes when I once bring the interface down/up. In this scenario, rc(8) hasn't configured the NICs! Mabye that's the difference to your setup. Will see to find time in the evening to further isolate the special condition. Thanks, -harry Reset assignee after re-open (In reply to Harald Schmalzbauer from comment #5) https://reviews.freebsd.org/D18545 is listed in comment #6 in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=239240. Since it was supposed to fix the problem, I applied the patch. (In reply to John Delano from comment #7) To what FreeBSD version did you apply D18545 from comment #6 of the queue-detection-issue bug report? As expected, it's not applicable to 12.1 for me and should also fail on all other versions newer than stable/12 from february: patch -C -p1 -i D18545.diff Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |Index: head/sys/dev/e1000/em_txrx.c |=================================================================== |--- head/sys/dev/e1000/em_txrx.c |+++ head/sys/dev/e1000/em_txrx.c -------------------------- Patching file sys/dev/e1000/em_txrx.c using Plan A... Reversed (or previously applied) patch detected! Assume -R? [y] n Apply anyway? [n] y Hunk #1 failed at 457. 1 out of 1 hunks failed while patching sys/dev/e1000/em_txrx.c : : : -------------------------- |Index: head/sys/dev/ixl/ixl_txrx.c |=================================================================== |--- head/sys/dev/ixl/ixl_txrx.c |+++ head/sys/dev/ixl/ixl_txrx.c -------------------------- Patching file sys/dev/ixl/ixl_txrx.c using Plan A... Reversed (or previously applied) patch detected! Assume -R? [y] n Apply anyway? [n] y Hunk #1 failed at 515. Hunk #2 failed at 788. Hunk #3 failed at 805. 3 out of 3 hunks failed while patching sys/dev/ixl/ixl_txrx.c done Update: The link-state-change issue doesn't affect _configured_ interfaces, so the fix in https://reviews.freebsd.org/D21769 is most likely perfectly valid. I guess the remaining issue is by design. As long as the interface was never configured, state change only works once in one direction -> recorded as active. If I have the interface configured before or afterwards, connection change/loss will be recognized. But not for unconfigured interfaces. So the issue has no production influencing effects. It's might just caus confusion on systems with (cold) standby interfaces, since the admin might assume there's a link active and searches all switches unsuccessfully for that link… During this test I saw that assigning an IP address to an already link-established em(4) or igb(4) interface, causes loss/interruption of the establisehd ethernet link. Don't know if this is desired… Guess these facts are related and as long as the internal steps configuring a em/igb(4) (or probably iflib generic?) NIC stay as they are, it won't be possible to catch the link state loss with unconfigured/cold-stanby interfaces, right? Thanks, -Harry(In reply to John Delano from comment #7) (In reply to Harald Schmalzbauer from comment #8) Installed on Release 12.0 (In reply to Harald Schmalzbauer from comment #9) https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240818 confirms that the remaining issue is by design (iflib igb). Since the fix for my issue is also reported in Bug 236724, I'll close this as duplicate. *** This bug has been marked as a duplicate of bug 236724 *** |