It seems like POLLOUT/EVFILT_WRITE always is true for SCTP sockets even if the socket send buffer is full. Besides this, it should also account for SO_SNDLOWAT but it is possible that we will get this for "free" since it looks like the functionality already is there. Based on some experiments (which might be faulty), SCTP sockets doesn't use struct sockbuf in the same way as TCP uses it. A lot of the fields in that struct are 0 for SCTP sockets since it also has its own structs. This seems to affect filt_sowrite() in (https://github.com/freebsd/freebsd-src/blob/main/sys/kern/uipc_socket.c). More specifically, this line: kn->kn_data = sbspace(&so->so_snd); I have tested to replace that line with: kn->kn_data = (so->so_snd.sb_hiwat - so->so_snd.sb_acc); Which seems to do the trick, but I expect this to have side effects.
Created attachment 229791 [details] Test files for reproducing the bug I added some tests if it helps. They can be compiled using: g++ -o client client.cpp utils.cpp g++ -o server server.cpp utils.cpp I have tested it in VMs with the SCTP module enabled. I have also added alias addresses to the loopback interface so that I could run both the server and client over the loopback interface. These alias addresess were added: 127.0.0.2 127.0.0.3 127.0.0.4 127.0.0.5 127.0.0.6 127.0.0.7 ::2 ::3 ::4 ::5 ::6 ::7 The test can be started using: ./server --poll --ipv4 --sctp --test-pollout ./client --poll --ipv4 --sctp --test-pollout Or with kqueue: ./client --kqueue --ipv4 --sctp --test-pollout IPv6 as well as TCP can be tested as well to compare the behavior.
After some investigations we have seen that the introduction of "sendfile(2) system call' required some changes in the struct sockbuf. The field sb_cc was split into sb_ccc and sb_acc in commit: 0f9d0a73a495 For SCTP a macro was introduced to handle this change via commit: 4e88d37a2a73 The change of using sb_ccc instead of sb_cc unfortunately seemed to trigger problems with SCTP sockets and select() so it was changed to use sb_acc instead via commit: 975c975bf0f1 The problem we now see is that when POLLOUT/EVFILT_WRITE are handled in sopoll_generic() and filt_sowrite() they call sbspace(): https://github.com/freebsd/freebsd-src/blob/main/sys/kern/uipc_socket.c#L3952 https://github.com/freebsd/freebsd-src/blob/main/sys/kern/uipc_socket.c#L3630 https://github.com/freebsd/freebsd-src/blob/main/sys/sys/socketvar.h#L325 but sbspace() still uses sb->sb_ccc when calculating the space. https://github.com/freebsd/freebsd-src/blob/main/sys/sys/sockbuf.h#L239 This results in that the socket always seems writeable even when the sendbuffer is full. Some ideas to solve this would be to: - Change sbspace() to use sb_acc when sb_ccc is zero. * This might be a bit hacky * This might affect other users of the function. - Update both sb_acc and sb_ccc in SCTP Since SCTP already updates sb_acc to fit into the framework we can double book the value to sb_ccc as well. The sb_acc field should be the owner of the information, but when changed it's copied to sb_ccc. * This only affects the SCTP code Any thoughts or other ideas? I will create a patchset for updating both sb_ccc and sb_acc and test this idea.
Created attachment 231307 [details] Proposed patch This proposed patch adds a macro for unifying the sb_acc and sb_ccc counters after the value has been updated. The macro can be changed for upstream users (like a userspace sctp stack) in a similar way as for sb_cc, i.e. it can be disabled when there is no usage of sb_ccc and sb_acc. See patch for more details.
(In reply to Björn Svensson from comment #3) Can you state what you are trying to fix? Are you trying to fix sendfile() for SCTP or the kqueue write stuff or something else?
(In reply to Michael Tuexen from comment #5) I'm trying to fix the problem that a sctp socket is not indicating that the sendbuffer is full when polling the socket, or via kernel event notifications. The finding is that both counters sb_ccc and sb_acc is needed to be updated in a socketbuffer since they have different users in the framework/system. sb_ccc is checked when polling the socket, but only sb_acc is updated. There is a comment at the sb_cc define that it needs to represent sb_acc since otherwise select() is not working. So the idea was to update both counters when the buffer is changed.
Created attachment 231650 [details] Proposed patch v2 Updated the proposed patch; removed changes that was found to not be required. The patch (still) adds a new macro which makes sure that a sockets sendbuffer counters sb_ccc and sb_acc are both updated for 1-to-1 style sockets. Both counters are needed when the socket API is used with poll()/kevent() since the sb_ccc counter is used to determine if the sendbuffer is full, and the sb_acc counter in other cases. This will correct the problem that an 1-to-1 style SCTP socket always is indicated to be writeable.
I have noted that the non-blocking section (https://datatracker.ietf.org/doc/html/rfc6458#section-3.2) mention that an application should use caution when writing to a 1-to-many style socket and using select()/poll(). The handling of the sendbuffer is implementation specific. For 1-to-1 style sockets I can't see such warning in https://datatracker.ietf.org/doc/html/rfc6458#section-4, but maybe I have missed something. Looking at the git history also indicates that this issue should have existed for many years without any mention of it by users on mailinglists, which makes me wonder. Is the assumption that we should be able to use poll() on a 1-to-1 style socket to get sendbuffer full indications maybe wrong?
(In reply to Björn Svensson from comment #8) I think for 1-to-1 style sockets it should work and it worked (I think) until the socket layer in FreeBSD was changed. I'll look into the patch soon.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=edc5b6ea881d7e196fee8df7ebcd372f8f5b4469 commit edc5b6ea881d7e196fee8df7ebcd372f8f5b4469 Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2022-05-14 10:38:43 +0000 Commit: Michael Tuexen <tuexen@FreeBSD.org> CommitDate: 2022-05-14 10:38:43 +0000 sctp: use sb_avail() when accessing sb_acc for reading This is a cleanup to simplify a patch for PR 260116. PR: 260116 MFC after: 3 days sys/netinet/sctp_indata.c | 2 +- sys/netinet/sctp_os_bsd.h | 2 ++ sys/netinet/sctp_output.c | 2 +- sys/netinet/sctp_pcb.c | 2 +- sys/netinet/sctp_usrreq.c | 7 +++---- sys/netinet/sctputil.c | 30 +++++++++++++++--------------- 6 files changed, 23 insertions(+), 22 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=a1742f61f4b79c84eb36ab12d83e577f714551d8 commit a1742f61f4b79c84eb36ab12d83e577f714551d8 Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2022-05-14 10:38:43 +0000 Commit: Michael Tuexen <tuexen@FreeBSD.org> CommitDate: 2023-02-01 22:56:33 +0000 sctp: use sb_avail() when accessing sb_acc for reading This is a cleanup to simplify a patch for PR 260116. PR: 260116 (cherry picked from commit edc5b6ea881d7e196fee8df7ebcd372f8f5b4469) sys/netinet/sctp_indata.c | 2 +- sys/netinet/sctp_os_bsd.h | 2 ++ sys/netinet/sctp_output.c | 2 +- sys/netinet/sctp_pcb.c | 2 +- sys/netinet/sctp_usrreq.c | 7 +++---- sys/netinet/sctputil.c | 30 +++++++++++++++--------------- 6 files changed, 23 insertions(+), 22 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=b279e84a47ddb59e55b5a3cec31c51cd41bf0dc3 commit b279e84a47ddb59e55b5a3cec31c51cd41bf0dc3 Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2023-07-28 12:36:11 +0000 Commit: Michael Tuexen <tuexen@FreeBSD.org> CommitDate: 2023-07-28 12:36:11 +0000 sctp: improve consistency This is simplifying a patch to address PR 260116. PR: 260116 MFC after: 1 week sys/netinet/sctputil.c | 8 ++------ sys/netinet/sctputil.h | 6 +----- 2 files changed, 3 insertions(+), 11 deletions(-)
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=c620788150d274c09a070ab486602c98407d73b0 commit c620788150d274c09a070ab486602c98407d73b0 Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2023-07-28 13:16:23 +0000 Commit: Michael Tuexen <tuexen@FreeBSD.org> CommitDate: 2023-07-28 13:16:23 +0000 sctp: keep sb_acc and sb_ccc in sync PR: 260116 MFC after: 1 week sys/netinet/sctp_os_bsd.h | 18 ++++++++++++------ sys/netinet/sctp_output.c | 2 +- sys/netinet/sctp_pcb.c | 2 +- sys/netinet/sctp_var.h | 4 ++-- sys/netinet/sctputil.c | 4 ++-- sys/netinet/sctputil.h | 4 ++-- 6 files changed, 20 insertions(+), 14 deletions(-)
I have tested your last changes by cherry picking them to stable/13 and it works great! Thanks for your fix!
It looks like these changes should be cherry-picked into stable/13 still (and this PR ought to remain open until then).
(In reply to Ed Maste from comment #15) That was the plan...
(In reply to Michael Tuexen from comment #16) Oh, sorry, my bad!
(In reply to Albin from comment #17) No problem. You actually reminded me to MFC the change.
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=6c9b92e741f0f878deff8ad3553239155c517a7a commit 6c9b92e741f0f878deff8ad3553239155c517a7a Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2023-07-28 12:36:11 +0000 Commit: Michael Tuexen <tuexen@FreeBSD.org> CommitDate: 2024-01-11 12:18:37 +0000 sctp: improve consistency This is simplifying a patch to address PR 260116. PR: 260116 (cherry picked from commit b279e84a47ddb59e55b5a3cec31c51cd41bf0dc3) sys/netinet/sctputil.c | 8 ++------ sys/netinet/sctputil.h | 6 +----- 2 files changed, 3 insertions(+), 11 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f141ff9935fde0ab9790a7687605bf3e455d1300 commit f141ff9935fde0ab9790a7687605bf3e455d1300 Author: Michael Tuexen <tuexen@FreeBSD.org> AuthorDate: 2023-07-28 13:16:23 +0000 Commit: Michael Tuexen <tuexen@FreeBSD.org> CommitDate: 2024-01-11 12:20:06 +0000 sctp: keep sb_acc and sb_ccc in sync PR: 260116 (cherry picked from commit c620788150d274c09a070ab486602c98407d73b0) sys/netinet/sctp_os_bsd.h | 18 ++++++++++++------ sys/netinet/sctp_output.c | 2 +- sys/netinet/sctp_pcb.c | 2 +- sys/netinet/sctp_var.h | 4 ++-- sys/netinet/sctputil.c | 4 ++-- sys/netinet/sctputil.h | 4 ++-- 6 files changed, 20 insertions(+), 14 deletions(-)