Bug 278666

Summary: [sctp] Heartbeat jittering is not +- anymore
Product: Base System Reporter: Albin Hellqvist <albin.hellqvist>
Component: kernAssignee: Michael Tuexen <tuexen>
Status: In Progress ---    
Severity: Affects Some People CC: tuexen
Priority: --- Flags: tuexen: mfc-stable14?
tuexen: mfc-stable13?
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Albin Hellqvist 2024-04-30 15:52:26 UTC
Hi,

When reading the SCTP RFC regarding the SCTP heartbeat timing, it says:
"On an idle destination address that is allowed to heartbeat, it is RECOMMENDED that a HEARTBEAT chunk is sent once per RTO of that destination address plus the protocol parameter 'HB.interval', with jittering of +/- 50% of the RTO value and exponential backoff of the RTO if the previous HEARTBEAT chunk is unanswered."

It was implemented like this previously, but it was later changed here:
https://cgit.freebsd.org/src/commit/?id=70e95f0b6917a8b8cd4a2a5f883f3e9753fc86d8

I am wondering if this can be considered a bug? Because previously, the jitter could be like [-RTO/2, RTO/2], but now it is [0, RTO/2]. This results in a longer path failover on average for idle paths that are not sending DATA chunks and receiving SACK:s.

Thank you in advance!
Comment 1 Michael Tuexen freebsd_committer freebsd_triage 2024-05-01 06:28:15 UTC
It is intended to be [-RTO/2, RTO/2]. Let me look into this.
Comment 2 Michael Tuexen freebsd_committer freebsd_triage 2024-05-02 14:37:56 UTC
I was wrong. The text says:
```
once per RTO of that destination address plus the protocol parameter 'HB.interval', with jittering of +/- 50% of the RTO value
```
This means you use RTO + HB.interval +/- RTO/2. This should result in [HB.interval + RTO - RTO/2, Hb.interval + RTO + RTO/2].
The code we use when removing the integer overflow protection:
```
jitter = rndval % to_ticks;
if (to_ticks > 1) {
	to_ticks >>= 1;
}
to_ticks += jitter;
```
Isn't `jitter` in [0, RTO] and we are then adding that to RTO/2. Doesn't this result in a value in [RTO/2, 3*RTO/2]. We finally add HB.interval.

So I think the code is almost correct. The jitter stuff should only be done on idle confirmed paths, which are not potentially failed. Once we agree on the timer value, I'll fix when the jitter will be used.
Comment 3 Albin Hellqvist 2024-05-03 13:17:15 UTC
(In reply to Michael Tuexen from comment #2)
Hmm you are correct, I was mistaken. [RTO/2, 3*RTO/2] seems good!

Is it this part that you think could be incorrect?
https://cgit.freebsd.org/src/commit/?id=9312ba239e05b035c965e5790f70fe71a4ed2e8c
Comment 4 Michael Tuexen freebsd_committer freebsd_triage 2024-05-06 20:51:21 UTC
Please have a look at review D45107. If you create an account there, you can comment or approve the patch.
Comment 5 commit-hook freebsd_committer freebsd_triage 2024-05-10 20:20:27 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5120ea0d8871fd5d56a8fb70117b727b9d8a4e62

commit 5120ea0d8871fd5d56a8fb70117b727b9d8a4e62
Author:     Michael Tuexen <tuexen@FreeBSD.org>
AuthorDate: 2024-05-10 18:58:01 +0000
Commit:     Michael Tuexen <tuexen@FreeBSD.org>
CommitDate: 2024-05-10 19:02:56 +0000

    sctp: improve heartbeat timer computation
    PR:             278666
    Reviewed by:    Albin Hellqvist
    MFC after:      3 days
    Pull Request:   https://reviews.freebsd.org/D45107

 sys/netinet/sctputil.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)