Bug 251052

Summary: [sctp] Throughput becomes extremely low under load
Product: Base System Reporter: Paul Reynolds <paul.reynolds>
Component: kernAssignee: freebsd-net (Nobody) <net>
Status: In Progress ---    
Severity: Affects Many People CC: bc979, bdrewery, markj, tuexen
Priority: ---    
Version: 11.4-STABLE   
Hardware: amd64   
OS: Any   
Attachments:
Description Flags
Test program to simulate high SCTP load none

Description Paul Reynolds 2020-11-11 16:26:37 UTC
Given a service using SCTP, when it receives a spike in traffic, it becomes extremely slow to respond. A given block of traffic that takes seconds if spaced out in time, instead takes upwards of 15-20 minutes if received in a burst. I suspect some sub-optimal behavior when receive buffers are full, but that is just a wild guess. This behavior is reproducible on both 11.4-RELEASE and 12.2-RELEASE. I will attach a test program.
Comment 1 Paul Reynolds 2020-11-11 16:47:08 UTC
Created attachment 219565 [details]
Test program to simulate high SCTP load

The test program forks a few processes to generate traffic directed at a single process to simulate a high load. It should run to completion in a short matter of time, but in both 11.4-RELEASE and 12.2-RELEASE it runs for a bit, then slows way down. It does eventually complete, but only after a significant amount of time has passed.
Comment 2 Michael Tuexen freebsd_committer freebsd_triage 2020-11-14 08:49:06 UTC
I can reproduce the issue also in head on a VM. The receiver announces a window of 1, which slows down the connection. I haven't seen this on 1-to-1 style sockets, so it might be related using 1-to-many style sockets. Will do further testing.
Comment 3 Bryan Drewery freebsd_committer freebsd_triage 2022-04-21 22:38:13 UTC
Any updates on this from either Paul or Michael?
Comment 4 Bryan Drewery freebsd_committer freebsd_triage 2022-04-22 21:23:05 UTC
I see something that I think is similar with iperf3 and netperf. Using 1 stream is fine but using >1 stream completely kills bandwidth.

`iperf3 --sctp -c 127.0.0.1` is OK. `iperf3 --sctp --nstreams 1 -c 127.0.0.1` is not. (Yes with 1 stream)


`netperf -t SCTP_STREAM -H 127.0.0.1` is OK. `netperf -t SCTP_STREAM_MANY -H 127.0.0.1 -- -T 2` is not.
Comment 5 Michael Tuexen freebsd_committer freebsd_triage 2022-04-22 21:38:55 UTC
(In reply to Bryan Drewery from comment #4)
How are iperf and netperf using multiple streams? Just selecting the stream when performing a sendmsg() call or are they using multiple threads/processes, each sending on one particular stream?
Comment 6 bc979 2022-04-22 22:10:47 UTC
I believe the iperf3 test case is incorrect.  I don't fine --nstreams documented.  However, it does do something and it appears that it sends very little data the first second and then none after that.  The output indicates that it is only using 1 stream.  The proper argument for multiple streams is -P.  Hear are the results using -P 1 when sending on a 100MB LAN between 2 13.1-RC3 systems:

test# iperf3 -c master --sctp -P 1
Connecting to host master, port 5201
[  5] local 10.0.1.235 port 52689 connected to 10.0.1.250 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  13.0 MBytes   109 Mbits/sec                  
[  5]   1.00-2.00   sec  11.2 MBytes  94.4 Mbits/sec                  
[  5]   2.00-3.00   sec  11.2 MBytes  93.9 Mbits/sec                  
[  5]   3.00-4.00   sec  11.2 MBytes  94.4 Mbits/sec                  
[  5]   4.00-5.00   sec  11.2 MBytes  94.4 Mbits/sec                  
[  5]   5.00-6.00   sec  11.2 MBytes  94.4 Mbits/sec                  
[  5]   6.00-7.00   sec  11.2 MBytes  94.4 Mbits/sec                  
[  5]   7.00-8.00   sec  11.2 MBytes  93.8 Mbits/sec                  
[  5]   8.00-9.00   sec  11.2 MBytes  94.4 Mbits/sec                  
[  5]   9.00-10.00  sec  11.2 MBytes  93.8 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec   114 MBytes  95.7 Mbits/sec                  sender
[  5]   0.00-10.02  sec   112 MBytes  94.2 Mbits/sec                  receiver

The results are as expected (Well a bit better than I expected actually).  Now changing to -P 2 I get:

test# iperf3 -c master --sctp -P 2
Connecting to host master, port 5201
[  5] local 10.0.1.235 port 60452 connected to 10.0.1.250 port 5201
[  7] local 10.0.1.235 port 40730 connected to 10.0.1.250 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  9.12 MBytes  76.5 Mbits/sec                  
[  7]   0.00-1.00   sec  5.56 MBytes  46.7 Mbits/sec                  
[SUM]   0.00-1.00   sec  14.7 MBytes   123 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   1.00-2.00   sec  7.50 MBytes  62.9 Mbits/sec                  
[  7]   1.00-2.00   sec  3.75 MBytes  31.5 Mbits/sec                  
[SUM]   1.00-2.00   sec  11.2 MBytes  94.4 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   2.00-3.00   sec  5.69 MBytes  47.7 Mbits/sec                  
[  7]   2.00-3.00   sec  5.56 MBytes  46.7 Mbits/sec                  
[SUM]   2.00-3.00   sec  11.2 MBytes  94.4 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   3.00-4.00   sec  3.81 MBytes  32.0 Mbits/sec                  
[  7]   3.00-4.00   sec  7.44 MBytes  62.4 Mbits/sec                  
[SUM]   3.00-4.00   sec  11.2 MBytes  94.4 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   4.00-5.00   sec  7.62 MBytes  64.0 Mbits/sec                  
[  7]   4.00-5.00   sec  3.62 MBytes  30.4 Mbits/sec                  
[SUM]   4.00-5.00   sec  11.2 MBytes  94.4 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   5.00-6.00   sec  4.12 MBytes  34.6 Mbits/sec                  
[  7]   5.00-6.00   sec  7.12 MBytes  59.8 Mbits/sec                  
[SUM]   5.00-6.00   sec  11.2 MBytes  94.4 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   6.00-7.00   sec  3.69 MBytes  30.9 Mbits/sec                  
[  7]   6.00-7.00   sec  7.56 MBytes  63.4 Mbits/sec                  
[SUM]   6.00-7.00   sec  11.2 MBytes  94.4 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   7.00-8.00   sec  2.69 MBytes  22.5 Mbits/sec                  
[  7]   7.00-8.00   sec  8.50 MBytes  71.3 Mbits/sec                  
[SUM]   7.00-8.00   sec  11.2 MBytes  93.8 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   8.00-9.00   sec   192 KBytes  1.57 Mbits/sec                  
[  7]   8.00-9.00   sec  11.1 MBytes  92.8 Mbits/sec                  
[SUM]   8.00-9.00   sec  11.2 MBytes  94.4 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec                  
[  7]   9.00-10.00  sec  11.2 MBytes  94.4 Mbits/sec                  
[SUM]   9.00-10.00  sec  11.2 MBytes  94.4 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  44.4 MBytes  37.3 Mbits/sec                  sender
[  5]   0.00-10.06  sec  42.7 MBytes  35.6 Mbits/sec                  receiver
[  7]   0.00-10.00  sec  71.4 MBytes  59.9 Mbits/sec                  sender
[  7]   0.00-10.06  sec  70.3 MBytes  58.7 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec   116 MBytes  97.2 Mbits/sec                  sender
[SUM]   0.00-10.06  sec   113 MBytes  94.3 Mbits/sec                  receiver


It specifically indicates that there are two streams in use.  The sums seem to be the same as for 1 stream.  However, the data transferred each second is quite uneven.  I don't know if that is a feature or bug.  I suspect it is just the way the buffering worked.
Comment 7 Michael Tuexen freebsd_committer freebsd_triage 2022-04-22 22:15:19 UTC
(In reply to bc979 from comment #6)
But how is iperf distributing the load on the two streams?
Comment 8 Bryan Drewery freebsd_committer freebsd_triage 2022-04-22 22:23:57 UTC
(In reply to bc979 from comment #6)

-P is for multiple streams. --nstreams is for SCTP multiple subflows.

  --sctp                    use SCTP rather than TCP
  --nstreams      #         number of SCTP streams
  -P, --parallel  #         number of parallel client streams to run
Comment 9 Bryan Drewery freebsd_committer freebsd_triage 2022-04-22 22:25:28 UTC
(In reply to Michael Tuexen from comment #7)

iperf3 only does this. num_ostreams is only used here so I don't think it does anything else beyond this.

if (test->settings->num_ostreams > 0) {
    struct sctp_initmsg initmsg;

    memset(&initmsg, 0, sizeof(struct sctp_initmsg));
    initmsg.sinit_num_ostreams = test->settings->num_ostreams;

    if (setsockopt(s, IPPROTO_SCTP, SCTP_INITMSG, &initmsg, sizeof(struct sctp_initmsg)) < 0) {
            saved_errno = errno;
            close(s);
            freeaddrinfo(server_res);
            errno = saved_errno;
            i_errno = IESETSCTPNSTREAM;
            return -1;
    }
}
Comment 10 Bryan Drewery freebsd_committer freebsd_triage 2022-04-22 22:40:12 UTC
(In reply to Michael Tuexen from comment #7)

As for netperf it is doing something very different. -T is parsed into num_associations in src/nettest_sctp.c and then only used for test length and for how many *sockets* to create. It creates N sockets and doesn't seem to do anything special beyond that.
Comment 11 bc979 2022-04-23 04:11:27 UTC
(In reply to Bryan Drewery from comment #9)

It appears that iperf3 does not use multiple streams regardless of the settings for the arguments.  With nstreams set to > 1, tcpdump shows that the SCTP field SID is always 0.  Looking through the code, the actual write statement used does not support including a stream pointer.  I suspect that was overlooked in the last update.  The code does call a sctp_write like function, but it is just set to their standard write module.  Iperf3 will not test this problem although it does sort of indicate a problem since the throughput basically goes to zero.  This just might be a problem in iperf3 rather than in SCTP.