Bug 282495 - Linuxulator sendfile returns EAGAIN repeatedly, actually repeatingly sending (the same) data
Summary: Linuxulator sendfile returns EAGAIN repeatedly, actually repeatingly sending ...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.1-RELEASE
Hardware: amd64 Any
: --- Affects Many People
Assignee: Mark Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-11-02 16:27 UTC by pieter
Modified: 2024-11-20 21:53 UTC (History)
1 user (show)

See Also:


Attachments
Simple Go executable that shows an adapted internal Go sendfile wrapper exercising the apparent bug (2.54 KB, text/plain)
2024-11-02 16:27 UTC, pieter
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description pieter 2024-11-02 16:27:07 UTC
Created attachment 254879 [details]
Simple Go executable that shows an adapted internal Go sendfile wrapper exercising the apparent bug

I'm not currently able to dive deeper, but running a very basic Go HTTP webserver in the Linuxulator on 14.1-RELEASE-p5 gave me corrupted downloads.  I dove into the standard Go libraries and found that ultimately, sendfile is used for such data transfers.

My test code is a simplified version of the Go code that shows the main issue: the linux_sendfile syscall returns EAGAIN as errno and -1 as result, BUT there's actually data being sent. This is only the case when the TCP socket is marked as non-blocking. At EAGAIN the Go library code retries until success, which in practice means that much more data is sent than expected and pieces of data are repeated. Every time I execute the test, the output is different.

My attached test code exercises this:
% go build main.go
% ./main & 
% nc -v localhost 4040 > testfile

I'm stopping the loop at EAGAIN directly, but normally it would retry a number of times, every time a bit of data actually being sent, but the sender not knowing about it. 

The Linux sendfile man page indicates that sendfile can return early without having sent all data, but this isn't an error.  So either linux_sendfile should return the amount of bytes sent and no error, OR not actually send any data and return EAGAIN and -1.
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2024-11-04 19:11:36 UTC
On main I compiled your program with:

$ env GOOS=linux go build main.go
$ brandelf -t Linux main

then ran it with some dummy data:

$ date > file2
$ ./main &
$ nc -v localhost 4040 > testfile
Entering loop, remain = 29
syscall returned n=29, err=%!s(<nil>)
Sendfile syscall success, written: 29, remain: 0
Wrote 29 bytes, handled: %!s(bool=true)Connection to localhost 4040 port [tcp/*] succeeded!
$ cat testfile
Mon Nov  4 13:54:57 EST 2024
$

So something is different.  Maybe something to do with the size of "file2"?  Or the problem is in 14.1 but not FreeBSD-main.  Or I don't quite understand the expected behaviour.
Comment 2 pieter 2024-11-04 20:00:38 UTC
I used a test file of 1MB size; every time the results were different depending on the available buffers, e.g. +/- 140KByte was sent in the first sendfile call that returned EAGAIN. The file should be big enough to be able to fill buffers I think.
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2024-11-05 02:00:08 UTC
(In reply to pieter from comment #2)
I see now, thanks.

> So either linux_sendfile should return the amount of bytes sent and no error, OR not actually send any data and return EAGAIN and -1.

Got it.  I believe this patch will fix it, though I haven't tested it yet: https://reviews.freebsd.org/D47447
Comment 4 pieter 2024-11-05 20:29:17 UTC
I tested the patch, but it's not 100% working yet.

My test program first sends 147456 bytes (of 1MB), so it calls sendfile again with the remainder of 901120 bytes. Then sendfile returns with 0 bytes sent and errno not set:

syscall returned n=147456, err=%!s(<nil>)
Sendfile syscall success, written: 147456, remain: 901120
Entering loop, remain = 901120
syscall returned n=0, err=%!s(<nil>)

I see the same kind of behavior in the standard Go net/http library, where a HTTP file download is cut off and the client retries to get the rest.
Comment 5 Mark Johnston freebsd_committer freebsd_triage 2024-11-10 15:19:00 UTC
(In reply to pieter from comment #4)
I updated and tested the patch and fixed this, thanks.

Your test program still exits sometimes since it doesn't handle EAGAIN by retrying or blocking, but I believe the kernel's behaviour is now correct.
Comment 6 pieter 2024-11-11 22:00:59 UTC
I had removed the exit call to further test indeed and also tested the initial patch with the Go http server library.

Patched and built a new kernel with the updated patch, but haven't had time yet to test. Will do so asap and comment here. 

Thanks for the effort so far, it's highly appreciated!
Comment 7 pieter 2024-11-13 14:00:48 UTC
(In reply to Mark Johnston from comment #5)
I have now tested the current improved patch and can confirm that it resolved the issue for me. Thanks a lot!

I'm not able to perform regression tests or test other aspects of the Linux sendfile implementation, but at least the test case and Go HTTP library now function as expected.
Comment 8 commit-hook freebsd_committer freebsd_triage 2024-11-13 14:16:58 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=a43b745aaf4f5bbc96875d2ab3ec9bea8024eda4

commit a43b745aaf4f5bbc96875d2ab3ec9bea8024eda4
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-11-13 14:15:47 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-11-13 14:15:47 +0000

    linux sendfile: Fix handling of non-blocking sockets

    FreeBSD sendfile() may perform a partial transfer and return EAGAIN if
    the socket is non-blocking.  Linux sendfile() expects no error in this
    case, so squash EAGAIN.

    PR:             282495
    Tested by:      pieter@krikkit.xyz
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D47447

 sys/compat/linux/linux_socket.c | 7 +++++++
 1 file changed, 7 insertions(+)
Comment 9 Mark Johnston freebsd_committer freebsd_triage 2024-11-13 14:20:19 UTC
Thank you for testing.
Comment 10 commit-hook freebsd_committer freebsd_triage 2024-11-20 21:41:31 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=91a80544737486dbe5bbae28c05c2a7a654cb61f

commit 91a80544737486dbe5bbae28c05c2a7a654cb61f
Author:     Mark Johnston <markj@FreeBSD.org>
AuthorDate: 2024-11-13 14:15:47 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2024-11-20 21:41:08 +0000

    linux sendfile: Fix handling of non-blocking sockets

    FreeBSD sendfile() may perform a partial transfer and return EAGAIN if
    the socket is non-blocking.  Linux sendfile() expects no error in this
    case, so squash EAGAIN.

    PR:             282495
    Tested by:      pieter@krikkit.xyz
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D47447

    (cherry picked from commit a43b745aaf4f5bbc96875d2ab3ec9bea8024eda4)

 sys/compat/linux/linux_socket.c | 7 +++++++
 1 file changed, 7 insertions(+)
Comment 11 Mark Johnston freebsd_committer freebsd_triage 2024-11-20 21:53:07 UTC
Thanks for the report and for testing.