Bug 260055

Summary: recvmsg / IP_RECVDSTADDR issue
Product: Base System Reporter: Eugene Grosbein <eugen>
Component: kernAssignee: Eugene Grosbein <eugen>
Status: Closed Works As Intended    
Severity: Affects Only Me CC: kib, mav
Priority: ---    
Version: 12.2-STABLE   
Hardware: Any   
OS: Any   

Description Eugene Grosbein freebsd_committer freebsd_triage 2021-11-26 06:02:08 UTC
Multi-protocol daemon net/mpd5 that includes L2TP server implementation recently got ability to use IP_RECVDSTADDR IPv4 UDP socket option to support multi-homed configurations without multiple single-IP sub-configurations.

Since then, users observe strange "hangs" of L2TP server when it stops processing new incoming connections for hours (if not killed). It may "unhang" after several hours if left in this state and continue to work normally until next hang.

This problem observed under FreeBSD versions 11, 12 and 13.

I have it, too. I've rebult binary with debugging symbols and got coredump with SIGQUIT when problem occured. Here is a backtrace that shows it blocks on recvmsg() trying to obtain source address of UDP socket:

(gdb) thread apply all bt

Thread 1 (process 100333):
#0  0x2853a6e3 in _recvmsg () from /lib/libc.so.7
#1  0x2844f0c1 in pthread_suspend_all_np () from /lib/libthr.so.3
#2  0x284d118f in recvmsg () from /lib/libc.so.7
#3  0x080dea71 in GetSockDstAddress (sock=24, addr=0x28c484e8) at util.c:1581
#4  0x0806c48a in L2tpServerEvent (type=1, arg=0x28c47048) at l2tp.c:1419
#5  0x080a1e4f in EventHandler (arg=0x28c47068) at event.c:146
#6  0x080ea7de in pevent_ctx_execute (arg=0x28c139b4) at contrib/libpdel/util/pevent.c:884
#7  0x080ea354 in pevent_ctx_service (ev=0x28c139b4) at contrib/libpdel/util/pevent.c:773
#8  0x080e9e33 in pevent_ctx_main (arg=0x28a41004) at contrib/libpdel/util/pevent.c:719
#9  0x2844c2c2 in pthread_create () from /lib/libthr.so.3
#10 0x00000000 in ?? ()
(gdb) frame 3
#3  0x080dea71 in GetSockDstAddress (sock=24, addr=0x28c484e8) at util.c:1581
1581            if ((size = recvmsg(sock, &b.msg, 0)) < 0) {
(gdb) p b
$1 = {msg = {msg_name = 0x0, msg_namelen = 0, msg_iov = 0xbbbfd334, msg_iovlen = 1,
    msg_control = 0xbbbfd340, msg_controllen = 16, msg_flags = 0}, iov = {iov_base = 0xbbbfd354,
    iov_len = 1024}}

There is corresponding source code, function GetSockDstAddress()
https://sourceforge.net/p/mpd/svn/HEAD/tree/trunk/src/util.c#l1553

FreeBSD base system has not many consumers of IP_RECVDSTADDR socket options:

src/contrib/pf/tftp-proxy
src/contrib/unbound
contrib/bsnmp/snmpd

None of them is threaded application like net/mpd5. I suspect an issue in threaded iplementation of this socket option that leads to long time block of recvmsg().
Comment 1 Konstantin Belousov freebsd_committer freebsd_triage 2021-11-26 07:58:52 UTC
The backtrace you reported is suspicious, pthread_suspend_all_np()
cannot appear in the stack between _recvmg() and recvmsg(), most
likely it is __thr_recvmsg() which was mis-identified due to missed
debug symbols in libthr.  Compile both libc and libthr with debug
info.

After that, first catch the kernel-side backtrace for the thread
hung in recvmsg().  Next, show userspace backtraces from _all_
threads in the process.

That said, GetSockDstAddress() is strange.  From its name, it seems
that the purpose of the function is to obtain the destination address,
as the control message. But it also tries to read some data from the
socket, and the data is discarded.

If the socket is blocking, and there is no data, then it is expected
for recvmsg(2) to block.  Do you know what protocol type this socket
is? [But this should be visible from the kernel stack].  Do you know
if this socket is blocking? Also, why is it safe to discard the data?
Comment 2 Eugene Grosbein freebsd_committer freebsd_triage 2021-11-26 13:25:30 UTC
(In reply to Konstantin Belousov from comment #1)

mpd5 uses additional threads to talk with RADIUS server only and these additional threads are short-lived. The output of "thread apply all bt" in this PR was not redacted, so there was only single thread 1 at the moment.

The socket is IPv4 UDP created by incoming L2TP over UDP request. It is in blocking read mode.

> That said, GetSockDstAddress() is strange.  From its name, it seems
that the purpose of the function is to obtain the destination address,
as the control message. But it also tries to read some data from the
socket, and the data is discarded.

I coded the function GetSockDstAddress(). It is called just once after socket creation in case L2TP server "self" address not specified in the mpd.conf

You are right, the purpose is to receive control message with destination address from the kernel and it is my first attempt to programm this. I believed this is right way to do that without reading payload data from the socket. 

Maybe there is some application logic error if some payload data is discarded, but recvmsg() still has control message to return because of IP_RECVDSTADDR socket option, hasn't it? So it should not block just after first UDP datagramm arrived, I believe.

I am interested in fixing the problem and will add more information you requested. However, I need also to minimize impact on this production server, so please show exact command I should use to catch the kernel-side backtrace. This my machine still uses FreeBSD 11.4-STABLE/i386 r365547 (September 2020).
Comment 3 Konstantin Belousov freebsd_committer freebsd_triage 2021-11-26 21:51:52 UTC
> Maybe there is some application logic error if some payload data is discarded,
> but recvmsg() still has control message to return because of IP_RECVDSTADDR
> socket option, hasn't it? So it should not block just after first UDP datagramm
> arrived, I believe.

No, this is completely off how the control messages work.  You do not have
control message 'sit' in the socket buffer.  CMSG is generated _on the packet
insertion into the sockbuf queue_.  On receive, associated cmsg is either
externalized and copied out, if cmsg buffer is provided by userspace, or
simply dropped.

So the behavior you see is probably right (I cannot assert if fully because
I do not know app logic and protocol): you have blocked socket, which sometimes
happens to have empty receive queue, and you call recvmsg(2) on it.  Until
something is received, the syscall is blocked.
Comment 4 Eugene Grosbein freebsd_committer freebsd_triage 2021-11-27 11:29:46 UTC
(In reply to Konstantin Belousov from comment #3)

Thank you very much for your commentaries. Now I think I've found a bug in my change, so it makes an attempt to perform the *second* read from the socket without answering anything to first request so it could block forever indeed, or until some long timeout. I'll test better version locally for several days and report back.
Comment 5 Eugene Grosbein freebsd_committer freebsd_triage 2021-12-04 12:35:45 UTC
The problem's not in FreeBSD, but in the application.