Summary: | [ixgbe] Media type detection broken | ||
---|---|---|---|
Product: | Base System | Reporter: | Andrew Boyer <aboyer> |
Component: | kern | Assignee: | Eric Joyner <erj> |
Status: | Closed FIXED | ||
Severity: | Affects Only Me | CC: | borjam, cramerj, sbruno |
Priority: | Normal | Keywords: | IntelNetworking |
Version: | Unspecified | Flags: | borjam:
maintainer-feedback?
(sbruno) borjam: mfc-stable10? |
Hardware: | Any | ||
OS: | Any | ||
Attachments: |
Description
Andrew Boyer
2010-09-03 16:20:03 UTC
Responsible Changed From-To: freebsd-bugs->freebsd-net Over to maintainer(s). A casual look over ixgbe/if_ix.c seems to show that a lot of changes in the optics detection have been made. I suspect that this ticket is mostly deprecated, but I'm unclear if its been fully resolved. I don't think the bug is obsolete. I am seeing a similar behavior on 10.3-RELEASE. Using SFP cables, the interface doesn't detect the media type properly. ix2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:bd:70:26 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (Unknown <rxpause,txpause>) status: active pciconf -lv pci0:130:0:0 ix2@pci0:130:0:0: class=0x020000 card=0x061115d9 chip=0x10fb8086 rev=0x01 hdr=0x00 vendor = 'Intel Corporation' device = '82599ES 10-Gigabit SFI/SFP+ Network Connection' class = network subclass = ethernet The interface works, but lagg refuses to activate it, as it's not marked as full-duplex. FreeBSD nvme1 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r297264: Fri Mar 25 02:10:02 UTC 2016 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 I want to blame this on r293334, then. That made a significant media type detection change. (In reply to Eric Joyner from comment #4) I'm going to assume that the call to get_media_type(hw) isn't returning one of the three types checked for the switch/case statement. this effectively sets adapter->optics = 0. https://svnweb.freebsd.org/base/head/sys/dev/ixgbe/if_ix.c?r1=293334&r2=293333&pathrev=293334 (In reply to borjam from comment #3) Do you have the ability to insert a printf in the driver code to print out the value of hw->mac.ops.get_media_type(hw) in the default case? e.g. stable10 % svn diff Index: sys/dev/ixgbe/if_ix.c =================================================================== --- sys/dev/ixgbe/if_ix.c (revision 301691) +++ sys/dev/ixgbe/if_ix.c (working copy) @@ -3851,6 +3851,9 @@ adapter->optics = IFM_10G_CX4; break; default: + device_printf(dev, + "Unknown media type for optics, %d\n", + hw->mac.ops.get_media_type(hw)); adapter->optics = 0; break; } I tried the latest driver version from the Intel website (3.1.14) and the behavior is the same. Thanks, Sean, I was checking the code wondering where to put a pair of printf. Alright, I am sorry I wasn´t aware of the -v option to ifconfig! Interestingly, the driver is detecting the SFP+, but it doesn´t know what to do with it. Now I'm checking if I can find where it's failing to condifigure the proper media and mediaopts. ix2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:bd:70:26 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (Unknown <rxpause,txpause>) status: active plugged: SFP/SFP+/SFP28 1X Copper Passive (Copper pigtail) vendor: Intel Corp PN: XDACBL3M-C SN: M7B08970 DATE: 2016-05-26 ix3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:bd:70:27 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect status: no carrier plugged: SFP/SFP+/SFP28 1X Copper Passive (Copper pigtail) vendor: Intel Corp PN: XDACBL3M-C SN: M7B08964 DATE: 2016-05-26 (In reply to Sean Bruno from comment #6) I tried that, and the printf is not reached. Looking at ifconfig -v, the SFP is detected (and it's made by Intel). Inserting device_printf(adapter->dev, "adapter->phy_layer=%u\n", layer); in function ixgbe_media_status, I see it detects 0, unable to determine the media type. An ifconfig -vvvvvv for the interfaces shows this. Interestingly, speed is detected properly by ifconfig (which goes to read the i2c interface by itself) but not by the media identification routines in the driver. borjam@nvme1:/usr/src/sys/dev/ixgbe % ifconfig -vvvvvv ix2 ix2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:bd:70:26 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (Unknown <rxpause,txpause>) status: active plugged: SFP/SFP+/SFP28 1X Copper Passive (Copper pigtail) vendor: Intel Corp PN: XDACBL3M-C SN: M7B08970 DATE: 2016-05-26 Class: 1X Copper Passive Length: short distance Tech: Passive Cable Media: Twin Axial Pair Speed: 100 MBytes/sec SFF8472 DUMP (0xA0 0..127 range): 03 04 21 01 00 00 04 41 84 80 D5 06 64 00 00 00 00 00 03 00 49 6E 74 65 6C 20 43 6F 72 70 20 20 20 20 20 20 00 00 1B 21 58 44 41 43 42 4C 33 4D 2D 43 20 20 20 20 20 20 43 20 20 20 00 00 00 61 00 00 00 00 4D 37 42 30 38 39 37 30 20 20 20 20 20 20 20 20 31 36 30 35 32 36 20 20 00 00 00 42 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 borjam@nvme1:/usr/src/sys/dev/ixgbe % ifconfig -vvvvvv ix3 ix3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:bd:70:27 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (Unknown <rxpause,txpause>) status: active plugged: SFP/SFP+/SFP28 1X Copper Active (Copper pigtail) vendor: BROCADE PN: 58-1000027-01 SN: CBX109500001232 DATE: 2009-12-11 Class: 1X Copper Active Length: short distance Tech: Active Cable Media: Twin Axial Pair Speed: 100 MBytes/sec SFF8472 DUMP (0xA0 0..127 range): 03 04 21 02 00 00 04 41 08 80 55 00 67 00 00 00 00 00 03 00 42 52 4F 43 41 44 45 20 20 20 20 20 20 20 20 20 00 00 05 1E 35 38 2D 31 30 30 30 30 32 37 2D 30 31 20 20 20 41 20 20 20 04 00 00 70 00 12 00 00 43 42 58 31 30 39 35 30 30 30 30 31 32 33 32 20 30 39 31 32 31 31 20 20 00 00 00 D4 32 47 53 50 57 57 42 2D 42 46 42 2D 45 4E 20 20 20 20 20 20 20 20 20 20 20 20 20 20 31 20 20 20 borjam@clientes-nvme1:/usr/src/sys/dev/ixgbe % Created attachment 171652 [details]
just a bunch of printf code to see what path the driver is taking.
I'm interested to know what code path the detection of '0' is going down in your use case.
Either ixgbe_82599.c::ixgbe_get_supported_physical_layer_82599() is missing a case for your SFP or ixgbe_phy.c::ixgbe_get_supported_phy_sfp_layer_generic() is missing the case for your SFP.
I order to debug this further, can you insert the following code into your driver and see what happens?
Now this is interesting. I inserted all those printfs and nothing came out, despite booting in verbose mode. I'll double check the PCI ID numbers just in case it's calling the wrong routine. However, after booting in verbose mode and doing "ifconfig ix2 up" and "ifconfig ix3 up" I see this on the console: ix2: Link is up 10 Gbps Full Duplex ix2: link state changed to UP ix3: Link is up 10 Gbps Full Duplex ix3: link state changed to UP Curiously, ifconfig still shows the unknown media error. # ifconfig -v ix2 ix2: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:bd:70:26 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (Unknown <rxpause,txpause>) status: active plugged: SFP/SFP+/SFP28 1X Copper Passive (Copper pigtail) vendor: Intel Corp PN: XDACBL3M-C SN: M7B08964 DATE: 2016-05-26 # ifconfig -v ix3 ix3: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:bd:70:27 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (Unknown <rxpause,txpause>) status: active plugged: SFP/SFP+/SFP28 1X Copper Passive (Copper pigtail) vendor: Intel Corp PN: XDACBL3M-C SN: M7B08970 DATE: 2016-05-26 Found it! After checking all over the drived, I found the problem. In ixgbe_phy.c there's a function called ixgbe_get_supported_phy_sfp_layer_generic which determines the capabilities of the SFP module. If the SFP is identified as a passive module it's assumed to be IXGBE_PHYSICAL_LAYER_SFP_PLUS_CU. If it's identified as Avago, Intel (my Prolabs are identified as Intel) or unknown it reads the 10 GB and 1 GB compliance codes via i2c, comparing them against bit masks. For 10 GbE, the important byte is byte 3, which according to SFF-8472 has the following compliance codes: 10GbE 7 - 10G Base-ER 6 - 10G Base-LRM 5 - 10G Base-LR 4 - 10G Base-SR Infiniband 3 - 1X SX 2 - 1X LX 1 - 1X Copper Active 0 - 1X Copper Passive SFF-8472 got from: ftp://ftp.seagate.com/sff/SFF-8472.PDF My modules identify themselves with bit 0, which would mean Infiniband 1X Copper Passive. And as far as I know, 10GbE over twinax uses Infiniband cables. It seems that the driver author didn't consider the copper active/passive options as 10 GbE compatible as they are labelled as "Infiniband", not 10 GbE. So, adding the case for bit 0 active solved it. # ifconfig ix0 ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:bb:0d:40 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>) status: active And now LACP works as well, as it relies on the full-duplex status of the interface. # ifconfig lagg0 lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:bb:0d:40 inet 10.0.5.61 netmask 0xffffff00 broadcast 10.0.5.255 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect status: active laggproto lacp lagghash l2,l3,l4 laggport: ix0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> laggport: ix1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> Diffs follow in the next update. Created attachment 171751 [details]
Patch to add 10 GbE twinax media (Passive infiniband)
Created attachment 171752 [details]
Patch to add 10 GbE twinax media (Passive infiniband)
I've tried the same fix against CURRENT-r300097 and it works as well. Unfortunately, this is one of those "void your warranty"-type scenarios. With the hardware development and testing we've done, it was deemed that we are unable to support cables with the Infiniband Compliance Codes (as in the document you provided). And while your patch may work for your setup, it's still an unsupported configuration. If you were to purchase cables with 10G Ethernet Compliance Codes, we would of course provide support for that. (In reply to Jeb Cramer from comment #17) I understand your point, although there is no such capability for DA cables. DA cables advertise an Infiniband compliance property, but not a 10 GbE compliance, because the flag doesn't exist in the standard. Moreover, Infiniband compatibility doesn't preclude 10 GbE compatibility. These cables also advertise Fibre Channel "intra enclosure" capability in byte 8. Which, again, doesn't preclude 10 GbE compatibility. Intel itself recommends and endorses passive DA cables here: http://itpeernetwork.intel.com/10-gigabit-ethernet-alphabet-soup-never-tasted-so-good/ Anyway, my patch was semantically confusing and I was dissolving the problem. I think I've found the origin. There is a visible error in ixgbe_identify_sfp_module_generic() (file ixgbe_phy.c). The function uses a variable with a dual purpose: hw->phy.sfp_type can mean the actual type of SFP, which of course makes sense. However, after determining the type of SFP, the driver reads the vendor specific information from the SFP and, upon detecting certain vendors (Tyco, FTl, Avago, Intel or "unknown") it sets the dual personality variable to a vendor specific type (tsk tsk! ;) ). After doing this, there's a small section of code with this comment: "Allow any DA cable vendor". It just checks wether cable_tech (byte 8 of the SFP description) advertises either an active or passive DA, and returns SUCCESS, but it fails to set the hw->phy.sfp_type to the proper value. It remains identified as "Intel" and that is what causes the code to reach the point I patched before. Actually it shouldn't have arrived there. Frankly, the whole function should be rewritten. Intel manufactures and sells SFP+ DA cables, and, curiously, *this function specifically precludes using Intel or Avago SFP+ DA cables*. I did a quick check commenting out the Intel case and it works perfectly. I imagine that there's an omission in the mentioned "Allow any DA cable vendor" section, so I've fixed it setting the proper value for hw->phy.type to ixgbe_phy_sfp_passive_unknown or ixgbe_phy_sfp_active_unknown depending on the cable_tech description. It works now. I understand that my previous patch was rejected because, despite making it work, it wasn't a very good solution. This solution is more general: "Ignore the vendor when the SFP+ module is a DA" and indeed I think it was the intent of the author of the driver. Created attachment 171974 [details]
Fix for ixgbe_phy.c DA SFP+ detection
This patch fixes the case for "accept DA SFP+ cables despite the vendor".
(In reply to Borja Marcos from comment #18) Have you looked at byte 8 in the compliance codes section? All of the SFP+ DA cables we use here specify that they're "Passive Cable", in the SFP+ Cable Technology section. I think your issue is that you're using some sort of Infiniband DA cables instead of regular DA cables for Ethernet. Though it sounds like they may be physically the same, but just with different bits set in the SFP EEPROM. It uses two different variables for type. If the driver was following the expected path for your cables, it would assign hw->phy.sfp_type to ixgbe_sfp_type_da_cu_core0/ixgbe_sfp_type_da_cu_core1, then assign hw->phy.type to ixgbe_phy_sfp_intel, then check if condition under "Allow any DA cable vendor" and return IXGBE_SUCCESS. It still sounds like the cable you're using doesn't set byte 8 properly. And after looking over this more, it looks like you're right, Borja. The code doesn't properly handle the case where the Vendor bytes match something the driver checks for, and the cable is direct attach. It makes ixgbe_get_supported_phy_sfp_layer_generic() do the wrong thing. Jeb, if you're still on this, can you run this by the shared code team again? They may end up telling us to just not use ixgbe_get_supported_phy_sfp_layer_generic() or something, I'm sure. Maybe the switch statement should use hw->phy.sfp_type instead hw->phy.type? (In reply to Eric Joyner from comment #21) Sorry, when writing the explanation I got a bit distracted. hw->phy.type is used for dual purposes. *If the vendor bytes are not recognized*, it is set properly to DA in case byte 8 contains the right bits. But if the vendor type is recognized, hw->phy.type gets a new value (a vendor identification) and it is not rewritten with the DA type. That's why the function ixgbe_get_supported_phy_sfp_layer_generic() is reached. The cable sets the bits in byte 8 properly. But the function ixgbe_get_supported_phy_sfp_layer_generic() doesn't check them. It only matches 1 GbE and 10 GbE compliance bits. So, when I saw that it was the place where "unknown media" was detected, I checked and I saw that "Infiniband" was set, assuming that it was compatible with 10 GbE (which it is). But now, forget about the Infiniband thing, please. Check the patch I've attached to this bug, it fixes the issue completely. And sorry about the confusion with hw->phy.sfp_type and hw->phy.type. I don't know what is hw->phy.sfp_type used for, but the variable in play is hw->phy.type. I am sure that my patch reflects the intended code path because the default: case (unknown vendor) does exactly that. The media detection code would really need some polishing, as it's quite clear that it's been patched and re-patched multiple times (example: the case for FTL to make it work specifically for FTL active DA cables). Please review the patch I posted here yesterday. (In reply to Eric Joyner from comment #21) It *DOES* set byte 8 properly. I wasn't looking at it because, when diagnosing the bug, I focused on the ixgbe_get_supported_phy_sfp_layer_generic() and I focused on byte 3, which was checked for 10 GbE compliance. Sorry for the confusion, but these days I've inflicted myself a crash course on SFP+ identification ;) ix0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9000 options=e407bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 0c:c4:7a:bb:0d:40 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (10Gbase-Twinax <full-duplex,rxpause,txpause>) status: active plugged: SFP/SFP+/SFP28 1X Copper Passive (Copper pigtail) vendor: Intel Corp PN: XDACBL3M-C SN: M7B08963 DATE: 2016-05-26 SFF8472 DUMP (0xA0 0..127 range): 03 04 21 01 00 00 04 41 84 80 D5 06 64 00 00 00 00 00 03 00 49 6E 74 65 6C 20 43 6F 72 70 20 20 20 20 20 20 00 00 1B 21 58 44 41 43 42 4C 33 4D 2D 43 20 20 20 20 20 20 43 20 20 20 00 00 00 61 00 00 00 00 4D 37 42 30 38 39 36 33 20 20 20 20 20 20 20 20 31 36 30 35 32 36 20 20 00 00 00 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 Byte 8 is 84h, which means bits 2 (IXGBE_SFF_DA_PASSIVE_CABLE) and 7 (Fibre channel in-enclosure) are set, which is correct. But the code doesn't detect this for a known vendor. After the fix it's checked correctly. Looking again at the usage of phy.sfp_type and phy.type, I see that phy.sfp_type seems to be set correctly, but the variable that actually matters is phy.type, which is overwritten by a vendor designation and, unless my fix is applied, remains a vendor designation. Anyway, forget Infiniband now, it's become a McGuffin :) This will be committed today, pending build tests. MFC to stable/11 will happen after RE approval. A commit references this bug: Author: sbruno Date: Tue Jul 19 17:31:48 UTC 2016 New revision: 303032 URL: https://svnweb.freebsd.org/changeset/base/303032 Log: Fixup DA cable detection routines to not set the cable type to unknown if they do not match one of two cable types. PR: 150249 Submitted by: borjam@sarenet.es Reviewed by: erj MFC after: 3 days Changes: head/sys/dev/ixgbe/ixgbe_phy.c I've tried the fix on 11/STABLE and it works as intended. (In reply to Borja Marcos from comment #27) I will chase release engineering to get this MFC'd to stable/11 tomorrow. A commit references this bug: Author: sbruno Date: Sun Jul 24 16:32:34 UTC 2016 New revision: 303268 URL: https://svnweb.freebsd.org/changeset/base/303268 Log: MFC r303032 Fixup DA cable detection routines to not set the cable type to unknown if they do not match one of two cable types. PR: 150249 Approved by: re (gjb) Changes: _U stable/11/ stable/11/sys/dev/ixgbe/ixgbe_phy.c A commit references this bug: Author: sbruno Date: Sun Jul 24 16:33:48 UTC 2016 New revision: 303269 URL: https://svnweb.freebsd.org/changeset/base/303269 Log: MFC r303032 Fixup DA cable detection routines to not set the cable type to unknown if they do not match one of two cable types. PR: 150249 Changes: stable/10/sys/dev/ixgbe/ixgbe_phy.c |