SCTP connections using IPv6 hang on the third leg of the connection setup when
trying to connect to the system itself over lo0.
Netstat shows these lines about the pending/hanging connect...
sctp46 1to1 fe80::225:90ff:f.6042 ::1.41368 ESTABLISHED
sctp46 1to1 ::1.41368 ::1.6042 COOKIE_ECHOED
Tcpdump (tcpdump -i lo0) trace collected during the SCTP/IPv6 connection
setup shows this...
09:35:21.999609 IP6 ::1.41140 > ::1.6042: sctp (1) [INIT] [init tag: 1501306277] [rwnd: 1864135] [OS: 10] [MIS: 2048] [init TSN: 2226957883]
09:35:22.000383 IP6 ::1.6042 > ::1.41140: sctp (1) [INIT ACK] [init tag: 3322565489] [rwnd: 1864135] [OS: 10] [MIS: 2048] [init TSN: 1843245874]
09:35:22.000870 IP6 ::1.41140 > ::1.6042: sctp (1) [COOKIE ECHO]
09:35:23.000476 IP6 ::1.41140 > ::1.6042: sctp (1) [COOKIE ECHO]
09:35:25.000820 IP6 ::1.41140 > ::1.6042: sctp (1) [COOKIE ECHO]
09:35:29.000743 IP6 ::1.41140 > ::1.6042: sctp (1) [COOKIE ECHO]
Running the exact same connection test while using IPv4 addresses
there is no problem. Everything works just fine.
While using IPv6 the connect() call eventually ends with timeout.
So, it seems something in the SCTP implementation is not exactly the same
when the IP layer is IPv6 compared to the case when the IP layer is IPv4.
The really odd thing about this is that this exact result only happens
when the server is binding to multiple addresses and the client is using
an implicit wild card bind while calling connect() without an explicit bind.
If I change the test client to use explicit sctp_bindx() calls for all
suitable local addresses the connection still hangs but netstat shows me
sctp46 1to1 fe80::1.44586 ::1.6042 COOKIE_WAIT
sctp46 1to1 fe80::1.6042 LISTEN
At the same time tcpdump shows me no SCTP packets via lo0 at all.
Needless to say maybe, but with explicit bind on the client side
and IPv4 addresses everything works just fine.
So, there is some sort of disparity between how SCTP connection setup
works on top of IPv6 and IPv4.
Not the foggiest idea yet.
How-To-Repeat: I am not quite sure of the conditions that trigger this odd behavior.
Over to maintainer(s).
Which addresses are you binding? Are the programs you use
I'm pretty sure the problem is related to the address scopes
in IPv6. I haven't done testing link local addresses at all,
The problem seems to be SCTP specific.
For bugs matching the following criteria:
Status: In Progress Changed: (is less than) 2014-06-01
Reset to default assignee and clear in-progress tags.
Mail being skipped
Is it known whether this is still a problem? Is there any sample reproducible that could be used to verify?
The reason is known. When one binds the local addresses explicitly
one by one before calling connect() the system activates all of the
local addresses right after the INIT and INIT-ACK before COOKIE-ECHO
and COOKIE-ACK. Because at this phase the system has not seen any
other traffic between the endpoints but the couple of packets needed
for a successful INIT and INIT-ACK it only knows about one single pair
of operational addresses. Activating the other potential local addresses
at this phase causes the COOKIE processing being attempted using a
different local address which may not be routable at all to the one
known peer address.
The local addresses can be activated only when they have been used
for a successful INIT + INIT-ACK or they have been tested and proven
to be routable to at least one of the reported peer addresses. This
testing of functional address pairs works only through successful
pairs of HEARTBEAT + HEARTBEAT-ACK.
The current logic is opportunistic and wrong.
(In reply to jau from comment #6)
A way to fix it to rewrite the handling of the local addresses. At least this was the outcome of a discussion with rrs. It just requires a fair amount of changes and fixes a specific issue. The current address handling was optimised for end-points binding against the wildcard address.