Summary: | recvmsg / IP_RECVDSTADDR issue | ||
---|---|---|---|
Product: | Base System | Reporter: | Eugene Grosbein <eugen> |
Component: | kern | Assignee: | Eugene Grosbein <eugen> |
Status: | Closed Works As Intended | ||
Severity: | Affects Only Me | CC: | kib, mav |
Priority: | --- | ||
Version: | 12.2-STABLE | ||
Hardware: | Any | ||
OS: | Any |
Description
Eugene Grosbein
2021-11-26 06:02:08 UTC
The backtrace you reported is suspicious, pthread_suspend_all_np() cannot appear in the stack between _recvmg() and recvmsg(), most likely it is __thr_recvmsg() which was mis-identified due to missed debug symbols in libthr. Compile both libc and libthr with debug info. After that, first catch the kernel-side backtrace for the thread hung in recvmsg(). Next, show userspace backtraces from _all_ threads in the process. That said, GetSockDstAddress() is strange. From its name, it seems that the purpose of the function is to obtain the destination address, as the control message. But it also tries to read some data from the socket, and the data is discarded. If the socket is blocking, and there is no data, then it is expected for recvmsg(2) to block. Do you know what protocol type this socket is? [But this should be visible from the kernel stack]. Do you know if this socket is blocking? Also, why is it safe to discard the data? (In reply to Konstantin Belousov from comment #1) mpd5 uses additional threads to talk with RADIUS server only and these additional threads are short-lived. The output of "thread apply all bt" in this PR was not redacted, so there was only single thread 1 at the moment. The socket is IPv4 UDP created by incoming L2TP over UDP request. It is in blocking read mode. > That said, GetSockDstAddress() is strange. From its name, it seems that the purpose of the function is to obtain the destination address, as the control message. But it also tries to read some data from the socket, and the data is discarded. I coded the function GetSockDstAddress(). It is called just once after socket creation in case L2TP server "self" address not specified in the mpd.conf You are right, the purpose is to receive control message with destination address from the kernel and it is my first attempt to programm this. I believed this is right way to do that without reading payload data from the socket. Maybe there is some application logic error if some payload data is discarded, but recvmsg() still has control message to return because of IP_RECVDSTADDR socket option, hasn't it? So it should not block just after first UDP datagramm arrived, I believe. I am interested in fixing the problem and will add more information you requested. However, I need also to minimize impact on this production server, so please show exact command I should use to catch the kernel-side backtrace. This my machine still uses FreeBSD 11.4-STABLE/i386 r365547 (September 2020). > Maybe there is some application logic error if some payload data is discarded,
> but recvmsg() still has control message to return because of IP_RECVDSTADDR
> socket option, hasn't it? So it should not block just after first UDP datagramm
> arrived, I believe.
No, this is completely off how the control messages work. You do not have
control message 'sit' in the socket buffer. CMSG is generated _on the packet
insertion into the sockbuf queue_. On receive, associated cmsg is either
externalized and copied out, if cmsg buffer is provided by userspace, or
simply dropped.
So the behavior you see is probably right (I cannot assert if fully because
I do not know app logic and protocol): you have blocked socket, which sometimes
happens to have empty receive queue, and you call recvmsg(2) on it. Until
something is received, the syscall is blocked.
(In reply to Konstantin Belousov from comment #3) Thank you very much for your commentaries. Now I think I've found a bug in my change, so it makes an attempt to perform the *second* read from the socket without answering anything to first request so it could block forever indeed, or until some long timeout. I'll test better version locally for several days and report back. The problem's not in FreeBSD, but in the application. |