Bug 193280 - CARP IPv6 NDP issue
Summary: CARP IPv6 NDP issue
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 10.0-RELEASE
Hardware: Any Any
: --- Affects Many People
Assignee: Gleb Smirnoff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-09-03 12:29 UTC by thomas.wilhelm
Modified: 2023-09-21 03:11 UTC (History)
8 users (show)

See Also:


Attachments
Wireshark (39.18 KB, image/png)
2014-09-03 12:29 UTC, thomas.wilhelm
no flags Details
kyua testcase to demonstrate the problem (2.29 KB, patch)
2022-01-11 20:14 UTC, Thomas Steen Rasmussen / Tykling
no flags Details | Diff
ndp neighbor solicitation carp source mac fix (679 bytes, patch)
2022-01-11 20:15 UTC, Thomas Steen Rasmussen / Tykling
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description thomas.wilhelm 2014-09-03 12:29:00 UTC
Created attachment 146727 [details]
Wireshark

The system doesn'nt send  correct ndp neighbor solicitation messages for ipv6. For these messages, carp uses the hardware-mac address of the interface and not the virtual mac for the carp-ip. This ends up in packet-loss for the virtual ipv6 address. 

On ipv4, the arp cache shows the virtual mac entry. (correct)
On ipv6, the ndp cache shows the hardware mac entry. <--problem

The interface is configured with two ipv4 virtual addresses and one ipv6 virtual address. 

The ipv4 addresses work like a charm.


The Wireshark attachement shows that the server sends the hardware-mac-address instead of the virtual mac address ( 00:00:5e:00:01:01 )

system-info:
FreeBSD XXHOSTNAMEXX 10.0-RELEASE-p7 FreeBSD 10.0-RELEASE-p7 #0: Tue Jul  8 06:37:44 UTC 2014     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
Comment 1 Hiroki Sato freebsd_committer freebsd_triage 2014-09-04 22:33:17 UTC
I will take a look into this issue.
Comment 2 thomas.wilhelm 2014-10-07 07:28:15 UTC
Any news here?!
Comment 3 thomas.wilhelm 2015-04-30 15:19:06 UTC
We need a solution as soon as possible. Please investigate...

Do you need any further information?
Comment 4 reinier 2015-08-12 21:16:28 UTC
Is there any progress on this subject, or better yet: has it been fixed in the upcoming 10.2-RELEASE?

I'm having the same issues as the OP, how I can I help fixing this issue?
Comment 5 Eitan Adler freebsd_committer freebsd_triage 2018-05-28 19:45:52 UTC
batch change:

For bugs that match the following
-  Status Is In progress 
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation

DO:

Reset to open status.


Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.
Comment 6 linus.sundqvist 2019-06-18 07:12:07 UTC
This bug is still present in FreeBSD 12.0 (-p5).
Comment 7 Jorge Schrauwen 2021-12-19 11:00:38 UTC
Still seems to be present on 13.0-RELEASE-p4
Comment 8 Thomas Steen Rasmussen / Tykling 2022-01-08 09:19:49 UTC
I also hit this issue today, very frustrating problem! This was on a fresh 13-STABLE from a few days ago. This issue basically makes carp for ipv6 useless :(

An observation is that it seems to go away with preempt disabled, meaning net.inet.carp.preempt=0. In my current setup I am able to run without preempt because I have full routing between the nodes. But other setups (incuding others I manage) are not so lucky.

I wonder what triggers it, because I've used carp and ipv6 together in many setups over the years (and still do), and I've never come across this issue before.

Maybe we can share some overall information about our setups and see if we can find some common thing which triggers it.

I hit this issue on a pair of BGP (bird2) routers when I tried to make the bgp source IP be a CARP IP. Worked fine for ipv4, but not for ipv6.

Things which might be a factor?:
- NIC driver is igb

Things which are probably not a factor:
- VLAN tagging
- lagg(4) lacp

I see the same issue on a customer facing interface which is VLAN tagged on top of a lagg(4) interface. So VLAN and lagg(4) do not appear to matter.

I will update this issue if I think of more information which could be relevant.
Comment 9 Thomas Steen Rasmussen / Tykling 2022-01-11 20:12:42 UTC
OK so with a lot of help from my friends at semaphor.dk I managed to get quite a bit further with this issue today.

The issue is, as the original bug report correctly identified, that some ndp neighbor solicitaition messages are sent out with the wrong source mac when the source IP is a carp IP. This is a big deal because it breaks carp completely for v6.

To trigger it requires some software, say ping(1) on the BACKUP node to initiate traffic to something, say the default gateway, with the ping source IP set to the shared CARP ip. This will break ndp for the shared IP. If you just need a workaround then stop doing that (maybe use devd to start whatever makes outbound connections when the node becomes MASTER).

Because the ndp cache is empty the ping packet will trigger a neighbor solicitation packet, which will have the shared CARP ip as source IP (as per RFC), but the packet incorrectly has the NIC real mac address as source mac rather than the shared virtual mac.

It gets a bit long to try to describe everything so I have created a kyua (1) testcase to illustrate the problem.

The testcase creates three jails, two of them with a shared IP, and then runs ping -6 towards the third in both jails, checks the exit codes, and finally checks to make sure the ndp table on the third jail contains the virtual mac for the shared IP.

The testcase is attached, along with a patch which appears to fix the issue in my end.

The issue and the patch:

The codepath for sending neighbor *advertisements* nd6_na_output_fib() checks to see if the IP it is advertising is a carp ip and sets the source mac accordingly in carp_macmatch6_p() - this works.

This check is missing in the codepath for sending neighbor *solicitations*  nd6_ns_output_fib(). This means the mbuf tag PACKET_TAG_CARP is not set and carp_output never changes the source mac.

The attached patch attempts to fix this by calling carp_macmatch6_p() from nd6_ns_output_fib() if it is a carp IP.

The patch works and appears to be stable but it comes with a big warning, this needs someone who knows the code better than we do to make sure it doesn't break everything (or is just plain wrong).

Thanks!

ps. this lovely dtrace snippet helped to understand how carp_output is called from nd6_ns_output which was very useful. dtrace is fantastic

dtrace -n 'fbt:kernel:nd6_ns_output:entry{this->ja=1} fbt:carp:carp_output:entry /this->ja/ {this->ifp=args[0];this->m=args[1]; this->sa=args[2]; this->loc=1;stack();} fbt:carp:carp_output:return /this->ja/ {this->loc=0;this->eh=(struct ether_header*)this->m->m_data;this->s=(u_char*)this->eh->ether_shost;printf("%u, %x  %s %02x%02x%02x%02x%02x%02x sa_fam:%u ifp->t %u master=%u", args[1], args[0], this->ifp->if_xname,this->s[0],this->s[1],this->s[2],this->s[3],this->s[4],this->s[5], this->sa->sa_family, this->ifp->if_type,this->ifp->if_carp->cif_vrs.tqh_first->sc_state);tracemem(this->m->m_data,86); printf(" tags_head:%p", this->m->m_pkthdr.tags.slh_first);} fbt:kernel:m_tag_locate:return /this->loc/ {printf("PACKET_TAG_CARP %p,%u", args[1], args[1]->m_tag_len)} fbt:kernel:nd6_ns_output:return{this->ja=0}'
Comment 10 Thomas Steen Rasmussen / Tykling 2022-01-11 20:14:40 UTC
Created attachment 230926 [details]
kyua testcase to demonstrate the problem
Comment 11 Thomas Steen Rasmussen / Tykling 2022-01-11 20:15:24 UTC
Created attachment 230927 [details]
ndp neighbor solicitation carp source mac fix
Comment 12 Thomas Steen Rasmussen / Tykling 2022-01-11 21:25:08 UTC
A couple of things I forgot to mention:

- net.inet.carp.preempt=0 or 1 has no effect on this.
- This patch is against stable/13-n248794-802ff7fcee2 but I suspect it is also present in -current since it has been with us at least since the carp rewrite in 10.
- I failed to mention that the wrong mac is also in source link-layer address option inside the icmp6 ndp NS packet, which I guess is the mac that really matters.

I am happy to answer questions, I am available on IRC as Tykling
Comment 13 Gleb Smirnoff freebsd_committer freebsd_triage 2022-01-12 03:40:16 UTC
Thomas Steen Rasmussen,

thanks a lot for your thorough analysis! Very much appreciated. I don't feel myself ultimate IPv6 expert, but to me your patch looks correct. I have put it on reviews board and I will work on getting IPv6 experts attention. In either way I'm going to commit it in a week, just trusting yours and mine judgement.

Feel free to register at reviews board and improve/edit/comment your changes before they hit git:

https://reviews.freebsd.org/D33858
https://reviews.freebsd.org/D33859
Comment 14 commit-hook freebsd_committer freebsd_triage 2022-01-25 05:04:15 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=4a178afb4aa9876094c19faf6d3bf065a5ebe163

commit 4a178afb4aa9876094c19faf6d3bf065a5ebe163
Author:     Thomas Steen Rasmussen <thomas@gibfest.dk>
AuthorDate: 2022-01-25 05:02:47 +0000
Commit:     Gleb Smirnoff <glebius@FreeBSD.org>
CommitDate: 2022-01-25 05:02:47 +0000

    tests/netinet: add test for IPv6 NS and CARP

    PR:                     193280
    Reviewed by:            melifaro
    Differential revision:  https://reviews.freebsd.org/D33859

 tests/sys/netinet/carp.sh | 64 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)
Comment 15 commit-hook freebsd_committer freebsd_triage 2022-01-25 05:04:17 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=bc6abdd97e951b54294d331698317a607246255d

commit bc6abdd97e951b54294d331698317a607246255d
Author:     Thomas Steen Rasmussen <thomas@gibfest.dk>
AuthorDate: 2022-01-25 05:02:47 +0000
Commit:     Gleb Smirnoff <glebius@FreeBSD.org>
CommitDate: 2022-01-25 05:02:47 +0000

    nd6: use CARP link level address in SLLAO for NS sent out

    When sending an NS, check if we are using a IPv6 CARP address
    and if we do, then put proper CARP link level address into
    ND_OPT_SOURCE_LINKADDR option and also put PACKET_TAG_CARP tag
    on the packet.  The latter will enforce CARP link level address
    at the data link layer too, which might be necessary for broken
    implementations.
    The code really follows what NA sending code has been doing since
    introduction of carp(4).  While here, bring to style(9) the whole
    block of code.

    PR:                     193280
    Differential revision:  https://reviews.freebsd.org/D33858

 sys/netinet6/nd6_nbr.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)
Comment 16 Thomas Steen Rasmussen / Tykling 2022-01-27 14:06:28 UTC
Thank you Gleb. Any chance of an MFC to 13? Otherwise I would say this can be closed.
Comment 17 Gleb Smirnoff freebsd_committer freebsd_triage 2022-01-27 14:49:06 UTC
Thank you, Thomas! Let's wait two weeks since commit date for MFC.
Comment 18 commit-hook freebsd_committer freebsd_triage 2022-02-07 18:57:08 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=d2e24c54ef8311a053eddb05a0ce336daf890abb

commit d2e24c54ef8311a053eddb05a0ce336daf890abb
Author:     Thomas Steen Rasmussen <thomas@gibfest.dk>
AuthorDate: 2022-01-25 05:02:47 +0000
Commit:     Gleb Smirnoff <glebius@FreeBSD.org>
CommitDate: 2022-02-07 18:55:54 +0000

    nd6: use CARP link level address in SLLAO for NS sent out

    When sending an NS, check if we are using a IPv6 CARP address
    and if we do, then put proper CARP link level address into
    ND_OPT_SOURCE_LINKADDR option and also put PACKET_TAG_CARP tag
    on the packet.  The latter will enforce CARP link level address
    at the data link layer too, which might be necessary for broken
    implementations.
    The code really follows what NA sending code has been doing since
    introduction of carp(4).  While here, bring to style(9) the whole
    block of code.

    PR:                     193280
    Differential revision:  https://reviews.freebsd.org/D33858

    (cherry picked from commit bc6abdd97e951b54294d331698317a607246255d)

 sys/netinet6/nd6_nbr.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)
Comment 19 glassestax 2023-05-05 02:26:47 UTC
MARKED AS SPAM