Bug 233535 - Machines lost ping6 after adding same IPv6 address
Summary: Machines lost ping6 after adding same IPv6 address
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-11-26 16:12 UTC by Slava Shwartsman
Modified: 2018-12-17 16:29 UTC (History)
7 users (show)

See Also:


Attachments
Fix missing decrement of refcount in IPv6 code. (813 bytes, patch)
2018-12-17 14:43 UTC, Hans Petter Selasky
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Slava Shwartsman freebsd_committer 2018-11-26 16:12:28 UTC
Setup:
2 hosts connected back-to-back 
# uname -rv
13.0-CURRENT FreeBSD 13.0-CURRENT r340922 GENERIC-NODEBUG

Steps to reproduce:
1. Configure IPv6 address on both hosts:
HOST A: ifconfig igb0 inet6 2002::1
HOST B: ifconfig igb0 inet6 2002::2

2. Ping to make sure all works:
# ping6 2002::2
PING6(56=40+8+8 bytes) 2002::1 --> 2002::2
16 bytes from 2002::2, icmp_seq=0 hlim=64 time=0.101 ms
16 bytes from 2002::2, icmp_seq=1 hlim=64 time=0.085 ms
^C
--- 2002::2 ping6 statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.085/0.093/0.101/0.008 ms

3. On both hosts configure the same IP 
HOST A: ifconfig igb0 inet6 2002::1
HOST B: ifconfig igb0 inet6 2002::2

4. Check ping again
# ping6 2002::2
PING6(56=40+8+8 bytes) 2002::1 --> 2002::2
^C
--- 2002::2 ping6 statistics ---
7 packets transmitted, 0 packets received, 100.0% packet loss

Few notes:
==================
1. Seems like sometimes need to do step 3 multiple times.
2. Pinging from the other side, may resolve the issue.
3. Issue is reproducing on other NIC vendors.


From tcpdump, I see that the other side is getting the NS messages, but never replies:
# tcpdump -nei igb0 host 2002::1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on igb0, link-type EN10MB (Ethernet), capture size 262144 bytes
18:10:10.176317 0c:c4:7a:a8:b7:f6 > 33:33:ff:00:00:02, ethertype IPv6 (0x86dd), length 86: 2002::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2002::2, length 32
18:10:11.196325 0c:c4:7a:a8:b7:f6 > 33:33:ff:00:00:02, ethertype IPv6 (0x86dd), length 86: 2002::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2002::2, length 32
18:10:12.343345 0c:c4:7a:a8:b7:f6 > 33:33:ff:00:00:02, ethertype IPv6 (0x86dd), length 86: 2002::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2002::2, length 32
18:10:13.377329 0c:c4:7a:a8:b7:f6 > 33:33:ff:00:00:02, ethertype IPv6 (0x86dd), length 86: 2002::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2002::2, length 32
18:10:14.382334 0c:c4:7a:a8:b7:f6 > 33:33:ff:00:00:02, ethertype IPv6 (0x86dd), length 86: 2002::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2002::2, length 32
18:10:15.533880 0c:c4:7a:a8:b7:f6 > 33:33:ff:00:00:02, ethertype IPv6 (0x86dd), length 86: 2002::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2002::2, length 32
18:10:16.583342 0c:c4:7a:a8:b7:f6 > 33:33:ff:00:00:02, ethertype IPv6 (0x86dd), length 86: 2002::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2002::2, length 32
18:10:17.603347 0c:c4:7a:a8:b7:f6 > 33:33:ff:00:00:02, ethertype IPv6 (0x86dd), length 86: 2002::1 > ff02::1:ff00:2: ICMP6, neighbor solicitation, who has 2002::2, length 32
^C
8 packets captured
41 packets received by filter
0 packets dropped by kernel
Comment 1 Conrad Meyer freebsd_committer 2018-11-26 16:44:57 UTC
After step 3, what does ifconfig think the configured prefixes are?  Please include netstat -rn (inet6 portion) as well.
Comment 2 Conrad Meyer freebsd_committer 2018-11-26 17:43:13 UTC
(Maybe related to bug 233283.)
Comment 3 Andrey V. Elsukov freebsd_committer 2018-11-26 22:15:36 UTC
I think it is related to DAD (duplicate address detection). But what you expect to see after you did these steps?
Comment 4 Slava Shwartsman freebsd_committer 2018-11-27 08:56:33 UTC
(In reply to Conrad Meyer from comment #1)
Same issue appeared when setting the prefix:
HOST A: ifconfig igb0 inet6 2002::1/64
HOST B: ifconfig igb0 inet6 2002::2/64

# ping6 2002::2
PING6(56=40+8+8 bytes) 2002::1 --> 2002::2
16 bytes from 2002::2, icmp_seq=0 hlim=64 time=0.266 ms
16 bytes from 2002::2, icmp_seq=1 hlim=64 time=0.087 ms
^C
--- 2002::2 ping6 statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.087/0.176/0.266/0.090 ms
# ifconfig igb0 inet6 2002::1/64
# ping6 2002::2
PING6(56=40+8+8 bytes) fe80::ec4:7aff:fea8:b7f6%igb0 --> 2002::2
^C
--- 2002::2 ping6 statistics ---
54 packets transmitted, 0 packets received, 100.0% packet loss

# netstat -rn
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            10.209.52.1        UGS        igb0
10.209.52.0/22     link#1             U          igb0
10.209.52.157      link#1             UHS         lo0
127.0.0.1          link#3             UH          lo0

Internet6:
Destination                       Gateway                       Flags     Netif Expire
::/96                             ::1                           UGRS        lo0
::1                               link#3                        UH          lo0
::ffff:0.0.0.0/96                 ::1                           UGRS        lo0
2002::/64                         link#1                        U          igb0
2002::1                           link#1                        UHS         lo0
fe80::/10                         ::1                           UGRS        lo0
fe80::%igb0/64                    link#1                        U          igb0
fe80::ec4:7aff:fea8:b7f6%igb0     link#1                        UHS         lo0
fe80::%lo0/64                     link#3                        U           lo0
fe80::1%lo0                       link#3                        UHS         lo0
ff02::/16                         ::1                           UGRS        lo0


(In reply to Andrey V. Elsukov from comment #3)
I would expect that ping will continue to work.
Comment 5 Andrey V. Elsukov freebsd_committer 2018-11-27 09:39:30 UTC
(In reply to Slava Shwartsman from comment #4)
> (In reply to Andrey V. Elsukov from comment #3)
> I would expect that ping will continue to work.

I'll try to reproduce this. But can you also show the output of `ifconfig igb0` command? Are both addresses has "duplicated" flag?
Comment 6 Slava Shwartsman freebsd_committer 2018-11-27 11:25:47 UTC
(In reply to Andrey V. Elsukov from comment #5)
This is the output after the issue reproduced:

# ifconfig igb0
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 0c:c4:7a:a8:b7:f6
        inet6 fe80::ec4:7aff:fea8:b7f6%igb0 prefixlen 64 scopeid 0x1
        inet6 2002::1 prefixlen 64
        inet 10.209.52.157 netmask 0xfffffc00 broadcast 10.209.55.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>


# ifconfig igb0
igb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=e527bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,WOL_MAGIC,VLAN_HWFILTER,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether 0c:c4:7a:a8:b7:76
        inet6 fe80::ec4:7aff:fea8:b776%igb0 prefixlen 64 scopeid 0x1
        inet6 2002::2 prefixlen 64
        inet 10.209.52.158 netmask 0xfffffc00 broadcast 10.209.55.255
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
Comment 7 Slava Shwartsman freebsd_committer 2018-12-06 13:06:03 UTC
Any updates?
Comment 8 Andrey V. Elsukov freebsd_committer 2018-12-07 13:09:36 UTC
(In reply to Slava Shwartsman from comment #7)
> Any updates?

Sorry for long delay, I just tried your test scenario. And I'm able to reproduce the problem. From a first look, it seems there is some race in multicast/MLD code. I see that the host, that doesn't respond to ND6 NS, for each NS packet increments the `ip6s_notmember` counter of ip6 statistics in icmp6_input(). And ND6 code doesn't have a chance to send a reply.
Comment 9 Andrey V. Elsukov freebsd_committer 2018-12-08 11:16:41 UTC
It seems the problem is even worse. The system leaves multicast groups after some time without any reconfiguration and stops to reply to ND6 NS.
1. ifmcstat before test:
 em0:
	inet 10.9.8.6
	igmpv2
		group 224.0.0.1 mode exclude
			mcast-macaddr 01:00:5e:00:00:01
	inet6 fe80::222:4dff:fe6a:5eb9%em0 scopeid 0x1
	mldv1 flags=2<USEALLOW>
		group ff01::1%em0 scopeid 0x1 mode exclude
			mcast-macaddr 33:33:00:00:00:01
		group ff02::1%em0 scopeid 0x1 mode exclude
			mcast-macaddr 33:33:00:00:00:01
2. ifconfig em0 inet6 fc00::2
3. ifmcstat
em0:
	inet6 fe80::222:4dff:fe6a:5eb9%em0 scopeid 0x1
	mldv1 flags=2<USEALLOW>
		group ff02::2:d4f1:c447%em0 scopeid 0x1 mode exclude
			mcast-macaddr 33:33:d4:f1:c4:47
		group ff02::2:ffd4:f1c4%em0 scopeid 0x1 mode exclude
			mcast-macaddr 33:33:ff:d4:f1:c4
		group ff02::1:ff00:2%em0 scopeid 0x1 mode exclude
			mcast-macaddr 33:33:ff:00:00:02
	inet 10.9.8.6
	igmpv2
		group 224.0.0.1 mode exclude
			mcast-macaddr 01:00:5e:00:00:01
	inet6 fe80::222:4dff:fe6a:5eb9%em0 scopeid 0x1
	mldv1 flags=2<USEALLOW>
		group ff01::1%em0 scopeid 0x1 mode exclude
			mcast-macaddr 33:33:00:00:00:01
		group ff02::1%em0 scopeid 0x1 mode exclude
			mcast-macaddr 33:33:00:00:00:01
4. Wait about 1 minute
5. ifmcstat 
em0:
	inet 10.9.8.6
	igmpv2
		group 224.0.0.1 mode exclude
			mcast-macaddr 01:00:5e:00:00:01
	inet6 fe80::222:4dff:fe6a:5eb9%em0 scopeid 0x1
	mldv1 flags=2<USEALLOW>
		group ff01::1%em0 scopeid 0x1 mode exclude
			mcast-macaddr 33:33:00:00:00:01
		group ff02::1%em0 scopeid 0x1 mode exclude
			mcast-macaddr 33:33:00:00:00:01
6. On the second host: ndp -c && ping6 fc00::2 => no reply
Comment 10 Andrey V. Elsukov freebsd_committer 2018-12-08 12:02:09 UTC
This looks like really bad problem for 12.0 release. Can somebody check and confirm that the system leaves multicast groups? If it is not only my machine, this can break IPv6 connectivity after upgrade.
Comment 11 Andrey V. Elsukov freebsd_committer 2018-12-13 13:17:36 UTC
Also there is memory leak for "in6_multi" type. It can be observed by this script:

while true; do
ifconfig mce3 inet6 fe80::15
sleep 2
done
Comment 12 Hans Petter Selasky freebsd_committer 2018-12-17 14:43:21 UTC
Created attachment 200199 [details]
Fix missing decrement of refcount in IPv6 code.

Hi,

Please find attached a patch to try to fix this issue.

--HPS
Comment 13 Andrey V. Elsukov freebsd_committer 2018-12-17 15:24:22 UTC
So, my guess is that (In reply to Hans Petter Selasky from comment #12)
> Created attachment 200199 [details]
> Fix missing decrement of refcount in IPv6 code.
> 
> Hi,
> 
> Please find attached a patch to try to fix this issue.

The assertion in in6_mcast.c fires just after boot.
Comment 14 Hans Petter Selasky freebsd_committer 2018-12-17 16:29:09 UTC
Can you show the backtrace?
Comment 15 Hans Petter Selasky freebsd_committer 2018-12-17 16:29:53 UTC
Can you try the patch with MPASS() removed?