Bug 259380 - linux(4): linux_recvfrom(2) fails: linux_recvfrom -1 errno -22 Invalid argument
Summary: linux(4): linux_recvfrom(2) fails: linux_recvfrom -1 errno -22 Invalid argument
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.2-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords: needs-qa
Depends on:
Blocks:
 
Reported: 2021-10-23 14:11 UTC by Jason Mader
Modified: 2021-11-16 17:30 UTC (History)
3 users (show)

See Also:
koobs: maintainer-feedback? (trasz)
koobs: mfc-stable13?
koobs: mfc-stable12?
koobs: mfc-stable11-


Attachments
revert linux_recvfrom() in linux_socket.c (2.44 KB, patch)
2021-11-09 10:55 UTC, Jason Mader
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Mader 2021-10-23 14:11:57 UTC
In a FreeBSD 13.0 jail with Linux compatibility, one pair of Linux daemon and utility programs do not communicate properly. They used to work in FreeBSD 11.2. Here is a ktrace of the daemon receiving part of a message; it's like this for every 6 bytes. I am guessing the problem is "linux_recvfrom -1 errno -22 Invalid argument"


 92539 rlm      CALL  linux_select(0x4000,0x85e6d8,0,0,0x7fffffffddd0)
 92539 rlm      RET   linux_select 1
 92539 rlm      CALL  linux_recvfrom(0x4,0x85d914,0x6,0x4000,0x861c7c,0x7fffffffde00)
 92539 rlm      GIO   fd 4 read 6 bytes
       ",2,5,0"
 92539 rlm      RET   linux_recvfrom -1 errno -22 Invalid argument
 92539 rlm      CALL  linux_time(0x7fffffffdf38)
Comment 1 Ed Maste freebsd_committer 2021-10-25 14:53:28 UTC
CC trasz@, but I expect we'll need more detail to have a chance of making progress here.
Comment 2 Jason Mader 2021-10-26 06:38:16 UTC
(In reply to Ed Maste from comment #1)
Of course. Let me know what I can do to provide more detail. ktrace was the only thing I could think of so far to see why the binaries weren't working.
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2021-10-26 23:43:17 UTC
@Jason Can you detail the relevent daemon & utility programs along with steps to reproduce, and their upstream source repository links (if available)
Comment 4 Jason Mader 2021-10-27 16:00:07 UTC
(In reply to Kubilay Kocak from comment #3)
These are the Reprise software license manager and utility program for Linux x86_64, so the sources aren't available and will only work with a file for a specific system. I first noticed the problem trying to exit license server with, `rlmutil rlmdown RLM -q`:

Read error from network (-105)
Timeout on read() (comm: -13)Operation now in progress (errno: 115)

This software does work on FreeBSD 11.2, setup very similar with the license manager process running in a jail. One thing that is different is that on FreeBSD 11.2 I am using an IP alias in the jail, but now I am using epair & bridge. (Because [Bug 258949] /32 netmask doesn't work with an alias in FreeBSD 13.0) Of note, there are 4 other Linux x86_64 license manager's working properly in the same jail.
Comment 5 Jason Mader 2021-11-03 20:33:04 UTC
This is how it used to be behave in 11.2,

 59822 rlm      CALL  linux_select(0x4000,0x7323e8,0,0,0x7fffffffcdd0)
 59822 rlm      RET   linux_select 1
 59822 rlm      CALL  linux_recvfrom(0x4,0x731704,0x6,0x4000,0x72c94c,0x7fffffffcdcc)
 59822 rlm      GIO   fd 4 read 6 bytes
       0x0000 0100 8e00 008f                                                                                       |......|

 59822 rlm      RET   linux_recvfrom 6
 59822 rlm      CALL  linux_select(0x4000,0x72c148,0,0,0x7fffffffcdd0)
 59822 rlm      RET   linux_select 1
 59822 rlm      CALL  linux_recvfrom(0x4,0x73170a,0x8e,0x4000,0x72c94c,0x7fffffffcdcc)
 59822 rlm      GIO   fd 4 read 142 bytes
 59822 rlm      RET   linux_recvfrom 142/0x8e

ssize_t recvfrom(int sockfd, void *buf, size_t len, int flags, struct sockaddr *src_addr, socklen_t *addrlen);
                        0x4   0x731704         0x6     0x4000                   0x72c94c      0x7fffffffcdcc

This is the first recvfrom error on 13.0 that matches the above 6 byte read, looks like just getting a size for the next message,

 35514 rlm      CALL  linux_select(0x4000,0x85e6d8,0,0,0x7fffffffddc0)
 35514 rlm      RET   linux_select 1
 35514 rlm      CALL  linux_recvfrom(0x4,0x85d914,0x6,0x4000,0x861c7c,0x7fffffffddf0)
 35514 rlm      GIO   fd 4 read 6 bytes
       0x0000 0100 b900 fdb7                                                                                       |......|

 35514 rlm      RET   linux_recvfrom -1 errno -22 Invalid argument

If there is any way to find out more detail on why this linux_recvfrom() fails, please let me know and I'll provide that information.
Comment 6 Jason Mader 2021-11-07 16:49:55 UTC
I’ve been testing releases, and have found that these Linux binaries worked as expected in FreeBSD 11.4 and 12.0; but not FreeBSD 12.1 and later.
Comment 7 Jason Mader 2021-11-07 17:19:20 UTC
(In reply to Jason Mader from comment #6)
Sorry, I made a mistake, this works in FreeBSD 12.1 as well; my problem begins in FreeBSD 12.2.

linux_socket.c also changed significantly between 12.1 and 12.2.
Comment 8 Jason Mader 2021-11-08 19:44:14 UTC
I reverted linux_recvfrom() in FreeBSD 12.3-BETA3 from FreeBSD 12.1 (adding the dependent functions linux_sa_put() and the prior version of bsd_to_linux_sockaddr() ) and the Linux binaries work in FreeBSD 12.3-BETA3. Not yet sure where the problem is exactly though.
Comment 9 Jason Mader 2021-11-09 10:55:40 UTC
Created attachment 229379 [details]
revert linux_recvfrom() in linux_socket.c

After adding some debugging statements into linux_recvfrom(), I found that the error happens here,

	error = kern_recvit(td, args->s, &msg, UIO_SYSSPACE, NULL);
	if (error != 0)
		goto out;

The value in error that is returned is: 54

I'm attaching a diff that reverts FreeBSD 12.3-BETA3 linux_socket.c to 12.1 and works for the Linux binaries, though I don't yet understand what the critical difference is to linux_recvfrom().
Comment 10 Jason Mader 2021-11-11 16:14:26 UTC
(In reply to Jason Mader from comment #9)
When in FreeBSD 12.2+ linux_recvfrom() the problem seems to be at, 

	error = bsd_to_linux_sockaddr(sa, &lsa, msg.msg_namelen);

The old bsd_to_linux_sockaddr((struct sockaddr *)PTRIN(args->from)) is returning 0, but the new bsd_to_linux_sockaddr(sa, &lsa, msg.msg_namelen) is returning 22.
Comment 11 Jason Mader 2021-11-12 00:46:33 UTC
(In reply to Jason Mader from comment #10)
When linux_recvfrom() calls kern_recvit() the value of msg.msg_namelen is 28, and after the call it is 0.

kern_recvit() source didn't change, but bsd_to_linux_sockaddr() did. Prior to FreeBSD 12.2, bsd_to_linux_sockaddr() didn't check the value of msg.msg_namelen (as len). Now it does,

	if (len < 2 || len > UCHAR_MAX)
		return (EINVAL);

I am currently working around this with,

--- linux_socket.c
+++ linux_socket.c
@@ -926,10 +926,10 @@
 		goto out;

 	if (PTRIN(args->from) != NULL) {
-		error = bsd_to_linux_sockaddr(sa, &lsa, msg.msg_namelen);
+		error = bsd_to_linux_sockaddr(sa, &lsa, fromlen);
 		if (error == 0)
 			error = copyout(lsa, PTRIN(args->from),
-			    msg.msg_namelen);
+			    fromlen);
 		free(lsa, M_SONAME);
 	}
Comment 12 Edward Tomasz Napierala freebsd_committer 2021-11-16 13:07:40 UTC
Thanks for investigating this!  Would it be possible for you to print out the value for both 'msg.msg_namelen' and 'fromlen' when this happens?
Comment 13 Jason Mader 2021-11-16 17:30:15 UTC
(In reply to Edward Tomasz Napierala from comment #12)
I changed linux_socket.c linux_recvfrom() from,

        if (PTRIN(args->from) != NULL) {
                error = linux_copyout_sockaddr(sa, PTRIN(args->from), msg.msg_namelen);

to,
        if (PTRIN(args->from) != NULL) {
                printf("msg_namelen: %d, fromlen: %d\n", msg.msg_namelen, fromlen);
                error = linux_copyout_sockaddr(sa, PTRIN(args->from), fromlen);
        }

And got,

linux: jid 1 pid 77110 (rlmutil): unsupported socket(AF_NETLINK, 3, NETLINK_ROUTE)
msg_namelen: 0, fromlen: 28
msg_namelen: 0, fromlen: 28
msg_namelen: 0, fromlen: 28
msg_namelen: 0, fromlen: 28
msg_namelen: 0, fromlen: 28
msg_namelen: 0, fromlen: 28
msg_namelen: 0, fromlen: 28
msg_namelen: 0, fromlen: 28

None of the other clients connecting to their servers do "(PTRIN(args->from) != NULL)" though, so there is no output, and why they are all working without the workaround.