Bug 279354 - New test kern/unix_seqpacket_test:random_eor_and_waitall reliably fails
Summary: New test kern/unix_seqpacket_test:random_eor_and_waitall reliably fails
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: tests (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Gleb Smirnoff
URL:
Keywords:
: 279994 (view as bug list)
Depends on:
Blocks:
 
Reported: 2024-05-27 18:36 UTC by Ryan Libby
Modified: 2025-05-30 10:26 UTC (History)
4 users (show)

See Also:
linimon: mfc-stable14?
linimon: mfc-stable13?


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ryan Libby freebsd_committer freebsd_triage 2024-05-27 18:36:03 UTC
The new test kern/unix_seqpacket_test:random_eor_and_waitall reliably
fails in both CI and in my manual testing on an amd64 GENRIC VM.

The test was added here:
https://cgit.freebsd.org/src/commit/?id=eb338e2370b4644382e6404d7402bc05eef13e54
eb338e2370b4 tests/unix_seqpacket: provide random data pumping test with MSG_EOR

Here is a failure in CI from April 9, the first run I could find after
the test was committed:
https://ci.freebsd.org/job/FreeBSD-main-amd64-test/25066/testReport/sys.kern/unix_seqpacket_test/random_eor_and_waitall/
and is still failing as of the latest run on May 19:
https://ci.freebsd.org/job/FreeBSD-main-amd64-test/25240/testReport/sys.kern/unix_seqpacket_test/random_eor_and_waitall/

It fails for me every time when run with this on an amd64 GENERIC vm:
kyua debug -k /usr/tests/sys/Kyuafile kern/unix_seqpacket_test:random_eor_and_waitall

I've seen it fail in a few different ways:

> % for i in {1..10}; do kyua debug -k /usr/tests/sys/Kyuafile kern/unix_seqpacket_test:random_eor_and_waitall; done
> Using seed: 0x41fd, 0xd11e, 0x7725, 0xadf8, 0xe04f, 0x1d61,
> *** Check failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1255: len != iov.iov_len: recvmsg(MSG_WAITALL): 1132, expected 3141
> kern/unix_seqpacket_test:random_eor_and_waitall  ->  failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1182: send(params->sock, &params->sendbuf[off], len, flags) == len not met
> Using seed: 0x8fa8, 0xdbe5, 0x1403, 0xb14d, 0x84f8, 0xfbd0,
> kern/unix_seqpacket_test:random_eor_and_waitall  ->  failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1182: send(params->sock, &params->sendbuf[off], len, flags) == len not met
> Using seed: 0x5b73, 0xc363, 0x39d7, 0xc52a, 0xfa9d, 0x15ab,
> *** Check failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1255: len != iov.iov_len: recvmsg(MSG_WAITALL): 4484, expected 27917
> kern/unix_seqpacket_test:random_eor_and_waitall  ->  failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1269: data corruption past 4923
> Using seed: 0xaa47, 0x3831, 0xb603, 0x97df, 0xb839, 0x0109,
> *** Check failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1255: len != iov.iov_len: recvmsg(MSG_WAITALL): 4525, expected 9299
> kern/unix_seqpacket_test:random_eor_and_waitall  ->  failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1182: send(params->sock, &params->sendbuf[off], len, flags) == len not met
> Using seed: 0x679a, 0xc263, 0xa25f, 0x348c, 0x2d3a, 0x0cd2,
> kern/unix_seqpacket_test:random_eor_and_waitall  ->  failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1182: send(params->sock, &params->sendbuf[off], len, flags) == len not met
> Using seed: 0xaa1e, 0x7317, 0x2dde, 0xe299, 0x1139, 0xf8d8,
> kern/unix_seqpacket_test:random_eor_and_waitall  ->  failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1182: send(params->sock, &params->sendbuf[off], len, flags) == len not met
> Using seed: 0x6b58, 0x2247, 0x5d93, 0x9c57, 0x326d, 0x1614,
> *** Check failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1255: len != iov.iov_len: recvmsg(MSG_WAITALL): 682, expected 24997
> kern/unix_seqpacket_test:random_eor_and_waitall  ->  failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1182: send(params->sock, &params->sendbuf[off], len, flags) == len not met
> Using seed: 0x8eca, 0xc5bd, 0xc09d, 0xe15e, 0xe7c3, 0xfdad,
> kern/unix_seqpacket_test:random_eor_and_waitall  ->  failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1182: send(params->sock, &params->sendbuf[off], len, flags) == len not met
> Using seed: 0x2859, 0x311b, 0x69d4, 0xd44c, 0xce3d, 0xe01b,
> kern/unix_seqpacket_test:random_eor_and_waitall  ->  failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1182: send(params->sock, &params->sendbuf[off], len, flags) == len not met
> Using seed: 0x0329, 0x992f, 0x6937, 0x766c, 0x47e5, 0x5270,
> kern/unix_seqpacket_test:random_eor_and_waitall  ->  failed: /usr/src/freebsd/tests/sys/kern/unix_seqpacket_test.c:1182: send(params->sock, &params->sendbuf[off], len, flags) == len not met

I traced the the errno for the send() failure as EMSGSIZE.

The test should be adjusted not to produce a failure.
Comment 1 Gleb Smirnoff freebsd_committer freebsd_triage 2024-05-28 03:12:29 UTC
That's because SOCK_SEQPACKET indeed is buggy. The test was committed
together with new implementation d80a97def9a1db6f07f5d2e68f7ad62b27918947.
With that revision test reliably doesn't fail.  However, the new implementation
had three issues: aio(9) incompatibility, lack of sendfile(2) support and
finally krpc(9) incompatibility.  In my private branch I have already
covered all expept krpc.  This one is really tough.  Anyway, the plan is
that the new implementation gets finally back into the main branch and
won't be reverted.

You can assign this bug to me or just close it.
Comment 2 Warner Losh freebsd_committer freebsd_triage 2024-06-26 15:17:20 UTC
*** Bug 279994 has been marked as a duplicate of this bug. ***
Comment 3 commit-hook freebsd_committer freebsd_triage 2024-06-26 15:19:34 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=0c00c3d75b27de4a367cfbf7299c000ed0e62486

commit 0c00c3d75b27de4a367cfbf7299c000ed0e62486
Author:     Warner Losh <imp@FreeBSD.org>
AuthorDate: 2024-06-26 15:18:03 +0000
Commit:     Warner Losh <imp@FreeBSD.org>
CommitDate: 2024-06-26 15:18:50 +0000

    test: Change bug number

    There was already a bug on this, so change to old bug

    PR: 279354
    Sponsored by:           Netflix

 tests/sys/kern/unix_seqpacket_test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Comment 4 Warner Losh freebsd_committer freebsd_triage 2024-06-26 15:20:30 UTC
Assigned to gleb, disabled the test in CI since it sounds like it won't be fixed "shortly"
Comment 5 commit-hook freebsd_committer freebsd_triage 2024-09-25 07:14:38 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=c9c2452a25355db7d89bfc93fd9d50f46690949c

commit c9c2452a25355db7d89bfc93fd9d50f46690949c
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-09-21 12:41:06 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-09-25 11:44:14 +0000

    unix tests: Skip random_eor_and_waitall unconditionally

    This test always fails, I don't see any reason to make it conditional on
    the "CI" test parameter.

    There is at least one test bug here, we're using the wrong sysctl to
    obtain the receive buffer size, but fixing that is not sufficient.

    PR:             279354
    Reviewed by:    glebius
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D46726

 tests/sys/kern/unix_seqpacket_test.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
Comment 6 Christine Caulfield 2025-05-28 12:54:59 UTC
I have a similar issue with unix SEQPACKETs in recent freebsd-devel. If I send a packet longer than the receive buffer then it gets fragmented in recv (so I get it in two parts) rather than MSG_TRUNC being set as used to be the case on earlier kernels - and as seems to happen on other OSs.

I can post a reproducer here if that's helpful, or raise another BZ if you'd prefer.
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2025-05-28 12:56:55 UTC
Gleb, can you take a look at this, and at comment 6 in particular?
Comment 8 Gleb Smirnoff freebsd_committer freebsd_triage 2025-05-28 15:52:24 UTC
On Wed May 28 12:54:59  2025 UTC, ccaulfie@redhat.com wrote:
> I have a similar issue with unix SEQPACKETs in recent freebsd-devel. If I send a
> packet longer than the receive buffer then it gets fragmented in recv (so I get
> it in two parts) rather than MSG_TRUNC being set as used to be the case on
> earlier kernels - and as seems to happen on other OSs.
> 
> I can post a reproducer here if that's helpful, or raise another BZ if you'd
> prefer.

Well, the specification contradicts self :( I made the SOCK_SEQPACKET following
this guidance:

"The SOCK_SEQPACKET socket type is similar to the SOCK_STREAM type, and is also
connection-oriented. The only difference between these types is that record
boundaries are maintained using the SOCK_SEQPACKET type. A record can be sent
using one or more output operations and received using one or more input
operations, but a single operation never transfers parts of more than one
record. Record boundaries are visible to the receiver via the MSG_EOR flag in
the received message flags returned by the recvmsg() function. It is protocol-
specific whether a maximum record size is imposed."

https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html

However, after your report, I was looking for more into and in the recvmsg()
specfication found this:

"For message-based sockets, such as SOCK_DGRAM and SOCK_SEQPACKET, the entire
message shall be read in a single operation. If a message is too long to fit in
the supplied buffers, and MSG_PEEK is not set in the flags argument, the excess
bytes shall be discarded, and MSG_TRUNC shall be set in the msg_flags member of
the msghdr structure. "

https://pubs.opengroup.org/onlinepubs/9699919799/functions/recvmsg.html

This really sucks, to be fair :( IMHO, the specification from V2_chap02
makes a useful kind of socket, while following what recvmsg() says would
basically create an alias for SOCK_DGRAM.  That's what SOCK_SEQPACKET was
before my changes, and hence it wasn't passing random_eor_and_waitall test.

Anyway, let's don't spam bug 279354 and open a new one for this discussion.
You can paste this message as a starter for the bug report.  Or maybe a
mailing list?
Comment 9 Christine Caulfield 2025-05-29 07:34:07 UTC
Thanks. new BZ created as requested, with attached reproducer: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=287135