Bug 278666 - [sctp] Heartbeat jittering is not +- anymore
Summary: [sctp] Heartbeat jittering is not +- anymore
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: --- Affects Some People
Assignee: Michael Tuexen
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-04-30 15:52 UTC by Albin Hellqvist
Modified: 2024-05-10 20:21 UTC (History)
1 user (show)

See Also:
tuexen: mfc-stable14?
tuexen: mfc-stable13?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Albin Hellqvist 2024-04-30 15:52:26 UTC
Hi,

When reading the SCTP RFC regarding the SCTP heartbeat timing, it says:
"On an idle destination address that is allowed to heartbeat, it is RECOMMENDED that a HEARTBEAT chunk is sent once per RTO of that destination address plus the protocol parameter 'HB.interval', with jittering of +/- 50% of the RTO value and exponential backoff of the RTO if the previous HEARTBEAT chunk is unanswered."

It was implemented like this previously, but it was later changed here:
https://cgit.freebsd.org/src/commit/?id=70e95f0b6917a8b8cd4a2a5f883f3e9753fc86d8

I am wondering if this can be considered a bug? Because previously, the jitter could be like [-RTO/2, RTO/2], but now it is [0, RTO/2]. This results in a longer path failover on average for idle paths that are not sending DATA chunks and receiving SACK:s.

Thank you in advance!
Comment 1 Michael Tuexen freebsd_committer freebsd_triage 2024-05-01 06:28:15 UTC
It is intended to be [-RTO/2, RTO/2]. Let me look into this.
Comment 2 Michael Tuexen freebsd_committer freebsd_triage 2024-05-02 14:37:56 UTC
I was wrong. The text says:
```
once per RTO of that destination address plus the protocol parameter 'HB.interval', with jittering of +/- 50% of the RTO value
```
This means you use RTO + HB.interval +/- RTO/2. This should result in [HB.interval + RTO - RTO/2, Hb.interval + RTO + RTO/2].
The code we use when removing the integer overflow protection:
```
jitter = rndval % to_ticks;
if (to_ticks > 1) {
	to_ticks >>= 1;
}
to_ticks += jitter;
```
Isn't `jitter` in [0, RTO] and we are then adding that to RTO/2. Doesn't this result in a value in [RTO/2, 3*RTO/2]. We finally add HB.interval.

So I think the code is almost correct. The jitter stuff should only be done on idle confirmed paths, which are not potentially failed. Once we agree on the timer value, I'll fix when the jitter will be used.
Comment 3 Albin Hellqvist 2024-05-03 13:17:15 UTC
(In reply to Michael Tuexen from comment #2)
Hmm you are correct, I was mistaken. [RTO/2, 3*RTO/2] seems good!

Is it this part that you think could be incorrect?
https://cgit.freebsd.org/src/commit/?id=9312ba239e05b035c965e5790f70fe71a4ed2e8c
Comment 4 Michael Tuexen freebsd_committer freebsd_triage 2024-05-06 20:51:21 UTC
Please have a look at review D45107. If you create an account there, you can comment or approve the patch.
Comment 5 commit-hook freebsd_committer freebsd_triage 2024-05-10 20:20:27 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5120ea0d8871fd5d56a8fb70117b727b9d8a4e62

commit 5120ea0d8871fd5d56a8fb70117b727b9d8a4e62
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2024-05-10 18:58:01 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2024-05-10 19:02:56 +0000

    sctp: improve heartbeat timer computation
    PR:             278666
    Reviewed by:    Albin Hellqvist
    MFC after:      3 days
    Pull Request:   https://reviews.freebsd.org/D45107

 sys/netinet/sctputil.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)