Bug 260116 - [sctp] POLLOUT/EVFILT_WRITE is always true for poll/kqueue
Summary: [sctp] POLLOUT/EVFILT_WRITE is always true for poll/kqueue
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-29 13:35 UTC by Albin
Modified: 2022-01-25 12:22 UTC (History)
3 users (show)

See Also:


Attachments
Test files for reproducing the bug (4.31 KB, application/x-zip-compressed)
2021-11-29 13:37 UTC, Albin
no flags Details
Proposed patch (6.67 KB, patch)
2022-01-25 11:09 UTC, Björn Svensson
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Albin 2021-11-29 13:35:11 UTC
It seems like POLLOUT/EVFILT_WRITE always is true for SCTP sockets even if the socket send buffer is full.

Besides this, it should also account for SO_SNDLOWAT but it is possible that we will get this for "free" since it looks like the functionality already is there.

Based on some experiments (which might be faulty), SCTP sockets doesn't use struct sockbuf in the same way as TCP uses it. A lot of the fields in that struct are 0 for SCTP sockets since it also has its own structs.

This seems to affect filt_sowrite() in (https://github.com/freebsd/freebsd-src/blob/main/sys/kern/uipc_socket.c). More specifically, this line:
kn->kn_data = sbspace(&so->so_snd);

I have tested to replace that line with:
kn->kn_data = (so->so_snd.sb_hiwat - so->so_snd.sb_acc);

Which seems to do the trick, but I expect this to have side effects.
Comment 1 Albin 2021-11-29 13:37:32 UTC
Created attachment 229791 [details]
Test files for reproducing the bug

I added some tests if it helps. They can be compiled using:
g++ -o client client.cpp utils.cpp
g++ -o server server.cpp utils.cpp

I have tested it in VMs with the SCTP module enabled. I have also added alias addresses to the loopback interface so that I could run both the server and client over the loopback interface.

These alias addresess were added:
127.0.0.2
127.0.0.3
127.0.0.4
127.0.0.5
127.0.0.6
127.0.0.7
::2
::3
::4
::5
::6
::7

The test can be started using:
./server --poll --ipv4 --sctp --test-pollout
./client --poll --ipv4 --sctp --test-pollout

Or with kqueue:
./client --kqueue --ipv4 --sctp --test-pollout

IPv6 as well as TCP can be tested as well to compare the behavior.
Comment 2 Björn Svensson 2022-01-24 11:31:14 UTC
After some investigations we have seen that the introduction of
"sendfile(2) system call' required some changes in the struct sockbuf.
The field sb_cc was split into sb_ccc and sb_acc in commit: 0f9d0a73a495

For SCTP a macro was introduced to handle this change via commit: 4e88d37a2a73

The change of using sb_ccc instead of sb_cc unfortunately seemed to trigger problems with SCTP sockets and select() so it was changed to use sb_acc instead via commit: 975c975bf0f1


The problem we now see is that when POLLOUT/EVFILT_WRITE are handled in
sopoll_generic() and filt_sowrite() they call sbspace():

https://github.com/freebsd/freebsd-src/blob/main/sys/kern/uipc_socket.c#L3952
https://github.com/freebsd/freebsd-src/blob/main/sys/kern/uipc_socket.c#L3630
https://github.com/freebsd/freebsd-src/blob/main/sys/sys/socketvar.h#L325

but sbspace() still uses sb->sb_ccc when calculating the space.

https://github.com/freebsd/freebsd-src/blob/main/sys/sys/sockbuf.h#L239

This results in that the socket always seems writeable even when the sendbuffer is full.

Some ideas to solve this would be to:

- Change sbspace() to use sb_acc when sb_ccc is zero.
  * This might be a bit hacky
  * This might affect other users of the function.

- Update both sb_acc and sb_ccc in SCTP
  Since SCTP already updates sb_acc to fit into the framework we can 
  double book the value to sb_ccc as well. The sb_acc field should be the
  owner of the information, but when changed it's copied to sb_ccc.
  * This only affects the SCTP code

Any thoughts or other ideas?

I will create a patchset for updating both sb_ccc and sb_acc and test this idea.
Comment 3 Björn Svensson 2022-01-25 11:09:13 UTC
Created attachment 231307 [details]
Proposed patch

This proposed patch adds a macro for unifying the sb_acc and sb_ccc counters after the value has been updated.
The macro can be changed for upstream users (like a userspace sctp stack) in a similar way as for sb_cc, i.e. it can be disabled when there is no usage of sb_ccc and sb_acc.
See patch for more details.
Comment 4 Michael Tuexen freebsd_committer 2022-01-25 11:37:34 UTC
(In reply to Björn Svensson from comment #3)
Can you state what you are trying to fix? Are you trying to fix sendfile() for SCTP or the kqueue write stuff or something else?
Comment 5 Michael Tuexen freebsd_committer 2022-01-25 11:37:43 UTC
(In reply to Björn Svensson from comment #3)
Can you state what you are trying to fix? Are you trying to fix sendfile() for SCTP or the kqueue write stuff or something else?
Comment 6 Björn Svensson 2022-01-25 12:22:47 UTC
(In reply to Michael Tuexen from comment #5)
I'm trying to fix the problem that a sctp socket is not indicating that the sendbuffer is full when polling the socket, or via kernel event notifications.

The finding is that both counters sb_ccc and sb_acc is needed to be updated in a socketbuffer since they have different users in the framework/system.

sb_ccc is checked when polling the socket, but only sb_acc is updated.
There is a comment at the sb_cc define that it needs to represent sb_acc since otherwise select() is not working.

So the idea was to update both counters when the buffer is changed.