Bug 265857 - qlnxe: no IPV6 pings between nodes on the same switch until an IPv4 address is set
Summary: qlnxe: no IPV6 pings between nodes on the same switch until an IPv4 address i...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.1-RELEASE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-08-15 12:55 UTC by benoitc
Modified: 2022-11-17 08:51 UTC (History)
3 users (show)

See Also:


Attachments
timeout after setting mtu (69.15 KB, image/png)
2022-11-15 10:19 UTC, benoitc
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description benoitc 2022-08-15 12:55:29 UTC
I have setup 3 nodes on a fresh Freebsd 13.1-RELEASE-p1. They have the same gateway and IPS are in same /64. All 3 nodes are on the same switch (mikrotik) and same vlan untagged.

I can ping them from an external machine through the router/gateway but the nodes can't ping each others using a public IPv6 /64. Pings using local-link work. `ndp -a` only return local ping-links, the gateway address and address of the current node. 

Pings start to work when the interface is put in promiscuous mode or an IPV4 address is set.

The card is an HPE Eth 10/25Gb 2p 621SFP28 Adptr: https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=a00035643en_us

See this thread on th emailing-list for the details: https://lists.freebsd.org/archives/freebsd-net/2022-August/002294.html


Initial configuration was pretty straightforward:

```
hostname="node1.domain.tld"
keymap="fr.macbook.kbd"
ifconfig_ql0=""
ifconfig_ql0_ipv6="inet6 <PREFIX>::11/64"
ipv6_defaultrouter="<PREFIX>::1"
sshd_enable="YES"
ntpd_enable="YES"
# Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable
dumpdev="AUTO"
zfs_enable="YES"
```
Others machines are `<PREFIX>::12`, `<PREFIX>::13`. The prefix is the same for all To be sure, I  replaced the content by <PREFIX> using sed:

node 1:

```
 $ ifconfig ql0
ql0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
ether b4:7a:f1:7a:9c:10
inet6 <PREFIX>::11 prefixlen 64
inet6 fe80::b67a:f1ff:fe7a:9c10%ql0 prefixlen 64 scopeid 0x1
media: Ethernet autoselect (25GBase-SR <full-duplex>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
```

node 2:

```
 $ ifconfig ql0
ql0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
ether b4:7a:f1:7a:99:52
inet6 <PREFIX>::12 prefixlen 64
inet6 fe80::b67a:f1ff:fe7a:9952%ql0 prefixlen 64 scopeid 0x1
media: Ethernet autoselect (25GBase-SR <full-duplex>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
```

node 3
```
ifconfig ql0
ql0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=507bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWFILTER,VLAN_HWTSO>
ether b4:7a:f1:18:ff:d8
inet6 <PREFIX>::13 prefixlen 64
inet6 fe80::b67a:f1ff:fe18:ffd8%ql0 prefixlen 64 scopeid 0x1
media: Ethernet autoselect (25GBase-SR <full-duplex>)
status: active
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
```
Comment 1 benoitc 2022-08-15 22:06:53 UTC
looking at the source code:

https://cgit.freebsd.org/src/tree/sys/dev/qlnx/qlnxe/qlnx_os.c#n2675
it seems that this snippet initialising the ha record :

```
ifp->if_flags |= IFF_UP;
			if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
				QLNX_LOCK(ha);
				qlnx_init_locked(ha);
				QLNX_UNLOCK(ha);
			}
```

should also be added to 
https://cgit.freebsd.org/src/tree/sys/dev/qlnx/qlnxe/qlnx_os.c#n2687

otherwise I am not sure when this is correctly initialised using an IPv6 address. Thoughts?
Comment 2 benoitc 2022-08-15 22:07:10 UTC
looking at the source code:

https://cgit.freebsd.org/src/tree/sys/dev/qlnx/qlnxe/qlnx_os.c#n2675
it seems that this snippet initialising the ha record :

```
ifp->if_flags |= IFF_UP;
			if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
				QLNX_LOCK(ha);
				qlnx_init_locked(ha);
				QLNX_UNLOCK(ha);
			}
```

should also be added to 
https://cgit.freebsd.org/src/tree/sys/dev/qlnx/qlnxe/qlnx_os.c#n2687

otherwise I am not sure when this is correctly initialised using an IPv6 address. Thoughts?
Comment 3 benoitc 2022-10-23 20:55:05 UTC
any news ?
Comment 4 Zhenlei Huang freebsd_committer freebsd_triage 2022-10-24 01:32:56 UTC
(In reply to benoitc from comment #0)
>I have setup 3 nodes on a fresh Freebsd 13.1-RELEASE-p1. They have the same gateway
> and IPS are in same /64. All 3 nodes are on the same switch (mikrotik) and same 
> vlan untagged.

Is this a layer 2 switch or layer 3 one ?

>I can ping them from an external machine through the router/gateway but the nodes 
> can't ping each others using a public IPv6 /64. Pings using local-link work.
>  `ndp -a` only return local ping-links, the gateway address and address of the current node. 

Do you mean an external machine can ping and have responses from the gateway, but no responses from the nodes?
and also the nodes can't ping each others ?

Is the router connected with the switch(mikrotik) or the switch it self is the gateway ?

And can you try pinging IPv6 link-local addresses to each others (although it should not matter)?

Can you share some tcpdump captures while pinging?


While I'd suggest you connect two of the three nodes with twisted pair and this is a
minimal setup to narrow down the problem.
Comment 5 Zhenlei Huang freebsd_committer freebsd_triage 2022-10-24 01:48:59 UTC
> Pings start to work when the interface is put in promiscuous mode or an IPV4 address is set.

> The card is an HPE Eth 10/25Gb 2p 621SFP28 Adptr: https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=a00035643en_us

> See this thread on th emailing-list for the details:
> https://lists.freebsd.org/archives/freebsd-net/2022-August/002294.html

Sorry to the noise, I did not look through the mailing list before the reply.
Comment 6 Zhenlei Huang freebsd_committer freebsd_triage 2022-10-24 01:56:58 UTC
(In reply to benoitc from comment #1)

I'm not sure whether the interface is not properly initialized or not, but you can try to change the MTU (maximum is 9000 as defined in the driver's source code), and see if it works.

On node1 and node2, ifconfig ql0 mtu 4000, then ping link-local address from node1 to node2.


Some glue in the driver's source code:

```
        case SIOCSIFMTU:
                QL_DPRINT4(ha, "SIOCSIFMTU (0x%lx)\n", cmd);

                if (ifr->ifr_mtu > QLNX_MAX_MTU) {
                        ret = EINVAL;
                } else {
                        QLNX_LOCK(ha);
                        ifp->if_mtu = ifr->ifr_mtu;
                        ha->max_frame_size =
                                ifp->if_mtu + ETHER_HDR_LEN + ETHER_CRC_LEN;
                        if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
                                qlnx_init_locked(ha);
                        }

                        QLNX_UNLOCK(ha);
                }

                break;
```
Comment 7 Zhenlei Huang freebsd_committer freebsd_triage 2022-10-31 13:25:02 UTC
(In reply to Zhenlei Huang from comment #6)

Hi benoitc,

Any good luck ?
Comment 8 benoitc 2022-11-11 10:49:57 UTC
I will try, so just changing the mtu using ifconfif ?
Comment 9 benoitc 2022-11-14 22:36:53 UTC
changing the mtu didn't fix the issue.  Howerver I can consistently reproduce the following: once the cards are put in promiscuuous  mode they are able to reach each others. Disabling promiscuous mode after don't changethis state and cards are still  able to ping each others.  I don't see any useful log that can explain it though.
Comment 10 Zhenlei Huang freebsd_committer freebsd_triage 2022-11-15 01:08:53 UTC
(In reply to benoitc from comment #8)

Sorry for late response. I'm lost in lots of mails.

> I will try, so just changing the mtu using ifconfif ?

Yes and no.

Steps on node1:
 1. ifconfig ql0 inet6 -ifdisabled  # Turns on IPv6 Link-Local address
 2. ifconfig ql0 mtu 4000
 3. ifconfig ql0 mtu 1500 # Optionally (Mandatory if the switch does not support or is configured with jumbo frame)

Repeat the above steps on node2. Then ping6 from node1 to node2 with IPv6 Link-Local addresses.
Comment 11 benoitc 2022-11-15 10:19:29 UTC
Created attachment 238090 [details]
timeout after setting mtu
Comment 12 benoitc 2022-11-15 10:21:27 UTC
(In reply to Zhenlei Huang from comment #10)
I've tried it but got no ping: https://bugs.freebsd.org/bugzilla/attachment.cgi?id=238090

I'm not sure why i can access to them, but they can't ping each others. (using the same card). On the same switch they can ping another machine uneder freebsd but with a mellanox card.
Comment 13 benoitc 2022-11-15 10:26:56 UTC
Comment on attachment 238090 [details]
timeout after setting mtu

wrong ipv6. But when I tried without the interface:

```
ping6 fe80::b67a:f1ff:fe7a:9c10
ping6: UDP connect: Network is unreachable
```
Comment 14 benoitc 2022-11-15 12:05:52 UTC
forget about the last comment. So the weird stuff is the following:


i'm able to ping ::10 from ::11, ::12; ::13 using the public address on the same switch. This machine has a mellanox card (mce)
::11, ::12, ::13 wich are using the qlnxe drivers are able to ping ::10 or external machines. But they are not able to pnng each others using their public (global) IP. 
::11, :12, ::13 are able to ping each others using their local link.

Under linux I have no such issue :/
Comment 15 Zhenlei Huang freebsd_committer freebsd_triage 2022-11-15 14:01:18 UTC
(In reply to benoitc from comment #14)
> i'm able to ping ::10 from ::11, ::12; ::13 using the public address on the same 
> switch. This machine has a mellanox card (mce)

Do you mean the machine has public address <PREFIX>::10 on mellanox interface ?

> ::11, ::12, ::13 wich are using the qlnxe drivers are able to ping ::10 or external
> machines. But they are not able to pnng each others using their public (global) IP. 
> ::11, :12, ::13 are able to ping each others using their local link.

I'm confused with that statement. What do you mean by `::11, :12, ::13 are able to ping each others using their local link` ?
Comment 16 Zhenlei Huang freebsd_committer freebsd_triage 2022-11-15 14:14:03 UTC
(In reply to benoitc from comment #13)

> wrong ipv6. But when I tried without the interface:

> ```
> ping6 fe80::b67a:f1ff:fe7a:9c10
> ping6: UDP connect: Network is unreachable
> ```

While ping link-local addresses, you need the scope (they're link-local, not global unique).

```
ping6 -c1 fe80::b67a:f1ff:fe7a:9c10%ql0
```
Comment 17 benoitc 2022-11-15 14:19:17 UTC
(In reply to Zhenlei Huang from comment #15)

So to give a better idea I have 4 machines plugged to this switch. 3 with a qlnxe card (<PREFIX>::11, <PREFIX>::12, <PREFIX>::13) and one with a mellanox card <PREFIX>::10 :

         |- ::11 machine A
         |- ::12 machine B
SWITCH --|- ::13 machine C
         |
         |- ::10 machine D


A, B, C can ping D on <PREFIX>::10, and D can ping A, B, C on their <PREFIX> .

A, B, C can't ping each others using the global <PREFIX>. >>>

A, B, C , D can ping each others using the local link IP `fe080::....`

A, B, C, D can be pinged from an external machine (outside of the switch) and can ping other machines on internet using their <PREFIX> IP.


I don't understand why A, B, C can't ping each others on <PREFX>::{11,12,13} while they can using the IPv6 local link `fe80::` and be reached from outside. Any idea what could be the issue? Maybe an option to set. Using Linux this is working perfectly and th eissue is not reproduced.
Comment 18 benoitc 2022-11-15 14:20:26 UTC
(In reply to Zhenlei Huang from comment #16)


Yes I did that. each machines can ping others using their local-link address afaik. Even without having to set the MTU.
Comment 19 benoitc 2022-11-15 14:20:41 UTC
(In reply to Zhenlei Huang from comment #16)


Yes I did that. each machines can ping others using their local-link address afaik. Even without having to set the MTU.
Comment 20 Zhenlei Huang freebsd_committer freebsd_triage 2022-11-15 15:30:11 UTC
(In reply to benoitc from comment #17)

> A, B, C , D can ping each others using the local link IP `fe080::....`
Why I got confused is that the attachment shows that ping from node2 to node1 with link-local address fails .

> A, B, C can ping D on <PREFIX>::10, and D can ping A, B, C on their <PREFIX> .
This looks like port isolation is tuned on, and A, B, C are in private VLAN.

Do you have physical access to the environment? Can you please exchange the switch port of A with that of D ? Then the setup looks like this:

         |- ::10 machine D
         |- ::12 machine B
SWITCH --|- ::13 machine C
         |
         |- ::11 machine A

Then ping global IPv6 address from B to D, and B to A .


> Using Linux this is working perfectly and th eissue is not reproduced.
Do you mean that with same switch and machines, but A, B and C boot into Linux ?
Comment 21 benoitc 2022-11-15 20:05:32 UTC
(In reply to Zhenlei Huang from comment #20)

> Why I got confused is that the attachment shows that ping from node2 to node1 with link-local address fails .
I tested after reboot without mtu set and it worked. Which is weird indeed.

> This looks like port isolation is tuned on, and A, B, C are in private VLAN.
> 
> Do you have physical access to the environment? Can you please exchange the switch > port of A with that of D ? Then the setup looks like this:
> 
>          |- ::10 machine D
>          |- ::12 machine B
> SWITCH --|- ::13 machine C
>          |
>          |- ::11 machine A
> 
> Then ping global IPv6 address from B to D, and B to A .

I did it just to be sure, but this didn't change anything. I also checked on the switch and no port isolation is set.


> > Using Linux this is working perfectly and th eissue is not reproduced.
>
> Do you mean that with same switch and machines, but A, B and C boot into Linux ?

yes exactly. Under linux it works like a charm...
Comment 22 Zhenlei Huang freebsd_committer freebsd_triage 2022-11-16 15:42:10 UTC
(In reply to benoitc from comment #1)
> looking at the source code:

> https://cgit.freebsd.org/src/tree/sys/dev/qlnx/qlnxe/qlnx_os.c#n2675
> it seems that this snippet initialising the ha record :

> ```
> ifp->if_flags |= IFF_UP;
> 			if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
> 				QLNX_LOCK(ha);
> 				qlnx_init_locked(ha);
> 				QLNX_UNLOCK(ha);
> 			}
> ```

> should also be added to 
> https://cgit.freebsd.org/src/tree/sys/dev/qlnx/qlnxe/qlnx_os.c#n2687

> otherwise I am not sure when this is correctly initialised using an IPv6 address.
> Thoughts?

Line https://cgit.freebsd.org/src/tree/sys/dev/qlnx/qlnxe/qlnx_os.c#n2687 calls ether_ioctl() which in turn calls `ifp->if_init(ifp->if_softc)`.

For qlnx driver `ifp->if_init` is `qlnx_init()` which is a wrapper of `qlnx_init_locked()`.

So when adding IPv6 addresses to the interface, it will be initialised (the same with IPv4).
Comment 23 Zhenlei Huang freebsd_committer freebsd_triage 2022-11-16 15:43:06 UTC
(In reply to Zhenlei Huang from comment #22)
> So when adding IPv6 addresses to the interface, it will be initialised (the same with IPv4).

That's why changing MTU does not help.
Comment 24 Zhenlei Huang freebsd_committer freebsd_triage 2022-11-16 16:57:35 UTC
I suspect some multicast packets are discarded.

Can you do a tcpdump capture on D while ping6 A -> B, and ping6 A -> D ?
Make sure clear neighbor cache on both side before every ping.

tcpdump listerning on D:
```
# tcpdump -nvei mce0
```


For A -> B:
```
# ndp -c # on B
```

```
# ndp -c # on A
# ping6 -c1 <PREFIX>::12
# ndp -na
```

For A -> D:
```
# ndp -c # on D
```

```
# ndp -c # on A
# ping6 -c1 <PREFIX>::10
# ndp -na
```

You can omit the <PREFIX> . Or send the pcap privately to my FreeBSD email zlei@FreeBSD.org
Comment 25 Zhenlei Huang freebsd_committer freebsd_triage 2022-11-16 17:06:46 UTC
And also try this:

tcpdump capture on D while ping6 D -> A .
Make sure clear neighbor cache on both side before every ping.

For D -> A:
```
# ndp -c # on A
```

```
# ndp -c # on D
# ping6 -c1 <PREFIX>::11 # address of A
# ndp -na
```
Comment 26 benoitc 2022-11-16 23:54:15 UTC
I sent you the pcap.

Also I did a test by enabling / disablin promisc.

Before enablinb promisc running `ndp -a` on A doesn't show B IPv6. Once I enable promisc and disabled right after the IPv6 of B is shown on A.
Comment 27 benoitc 2022-11-16 23:55:09 UTC
I sent you the pcap.

Also I did a test by enabling / disablin promisc.

Before enablinb promisc running `ndp -a` on A doesn't show B IPv6. Once I enable promisc and disabled right after the IPv6 of B is shown on A.
Comment 28 benoitc 2022-11-16 23:56:22 UTC
(In reply to benoitc from comment #26)
I forgot to tell I was pinging B from to get it in the list of IPs returned by `ndp -a` .
Comment 29 Zhenlei Huang freebsd_committer freebsd_triage 2022-11-17 03:08:37 UTC
(In reply to benoitc from comment #26)

> I suspect some multicast packets are discarded.
From the pcap you provided, my previous guess is right.

While ping6 A(<PREFIX>::11) -> B (<PREFIX>::12), the initial neighbor solicitation is sent to multicast address ff02::1:ff00:12, the corresponding layer 2 address is 33:33:ff:00:00:12 .
For some reason (the qlnx might be not configured correctly) B dropped the NS packet and then A continuously send NS packets.

For destination link-local addresses fe80::b67a:f1ff:fe7a:9952, the corresponding layer 2 address is 33:33:ff:7a:99:52 , and B is willing to accept the NS packet (with multicast dest 33:33:ff:7a:99:52).

When the interface is put into promiscuous mode, all multicast addresses are accepted.


To workaround it, you either

1) Put the interface ql0 into promiscuous mode by `ifconfig ql0 promisc`

2) Config the public IPv6 address to include last three octets of the ether address. <PREFIX>::7a:9c10 for A, <PREFIX>::7a:9952 for B, e.g.

3) Manually register NDP entries for all nodes. `ndp -s <PREFIX>::12 b4:7a:f1:7a:99:52` on A, e.g.

For 1) although it looks overhead to put interface into promiscuous mode, but switches are smart and will filter out unwanted packets. And you can still employ `ipv6_privacy` simultaneously.

For 2) you can give it a try if you do not need `ipv6_privacy` and those public IPv6 addresses can fulfill.

For 3) if 2) can not fulfill. You may want some automation tools (Puppet for example) to free up your hands.
Comment 30 benoitc 2022-11-17 07:23:18 UTC
Thanks for the debugging :) I will check what can I do. 

When you're saying "the qlnx might be not configured correctly" is this something internal to the driver (I bet it is since it's working on linux) or czn I do something about it?
Comment 31 Zhenlei Huang freebsd_committer freebsd_triage 2022-11-17 08:51:31 UTC
(In reply to benoitc from comment #30)

> When you're saying "the qlnx might be not configured correctly" is this something
> internal to the driver (I bet it is since it's working on linux) 

Yes.

> or czn I do something about it?

No, I'm afraid.