When an INET socket is being closed from two ends almost simultaneously, with both sides using a series of SO_LINGER, shutdown and close, and one side being slightly behind the other, the side closing last gets -1 from close() on such a socket, and errno 54, Connection reset by peer, even though there was no data in the socket (so it more likely seems that the FIN that the late shutdown() was assumed to send was counted towards the actual data, but it should not have). No other UNIX flavor (and we have many here at NCBI) exhibits this kind of problem. Also, the problem is not 100% reproducible on FreeBSD (but is pretty frequent, nonetheless). We have seen this very same problem on both version 6 and 8 of the OS (reporting here for 8). Fix: There are workarounds (as to not to use either setsockopt(SO_LINGER) or shutdown(SHUT_WR), or both) but they do not fix the problem in general. Patch attached with submission follows: How-To-Repeat: In the attached, there is a server-client C program that reproduces the problem (although, the program itself has to be started a few times before the actual error occurs). At the bottom of the attached C program, a ktrace of the communicating processes, and errno 54 coming out of close(). select(0,...) is used to slowdown one process to imitate some of its midwork before it is ready to proceed with actual socket disposal. If either shutdown() or setsockopt(SO_LINGER) are excluded from the code, the problem in close() seems to be gone. while ktrace -i ./a.out; do date; done kdump
Responsible Changed From-To: freebsd-bugs->freebsd-net Pass this over to maintainers. Thanks for the good PR and test case!
Hi, We observed the same issue on our FreeBSD6 and 7 servers. I tried to reproduce the problem writing a simple test case but failed -- I didn't come to the idea of shutdown()/close() sequence (as Anton did). Although looking now at the code we had the issue with I see that shutdown()/close() sequence was used there too. It looks like SO_LINGER is not important to reproduce ECONNRESET. shutdown()/close() on one end and close() on the other is enough. Also, slowdown of one the processes (done by Anton using select()) is not important too. Taking this into consideration I have wrote a simplified version of a test to reproduce the bug (may be it worth of including to tools/regression/sockets?). I can easily reproduce the error with this test on FreeBSD7.1 and 8-STABLE. Adding some prints to the kernel code I localized the place where the error appears and added panic() to get a backtrace. So, the backtrace: (kgdb) bt #0 doadump () at pcpu.h:246 #1 0xc04ec829 in db_fncall (dummy1=-1064461270, dummy2=0, dummy3=-1, dummy4=0xe85e58b0 "ÄX^è") at /usr/src/sys/ddb/db_command.c:548 #2 0xc04ecc5f in db_command (last_cmdp=0xc0e0af9c, cmd_table=0x0, dopager=0) at /usr/src/sys/ddb/db_command.c:445 #3 0xc04ecd14 in db_command_script (command=0xc0e0bec4 "call doadump") at /usr/src/sys/ddb/db_command.c:516 #4 0xc04f0e50 in db_script_exec (scriptname=0xe85e59bc "kdb.enter.panic", warnifnotfound=Variable "warnifnotfound" is not available. ) at /usr/src/sys/ddb/db_script.c:302 #5 0xc04f0f37 in db_script_kdbenter (eventname=0xc0cc78ea "panic") at /usr/src/sys/ddb/db_script.c:324 #6 0xc04eec18 in db_trap (type=3, code=0) at /usr/src/sys/ddb/db_main.c:228 #7 0xc08d9aa6 in kdb_trap (type=3, code=0, tf=0xe85e5af8) at /usr/src/sys/kern/subr_kdb.c:535 #8 0xc0befecb in trap (frame=0xe85e5af8) at /usr/src/sys/i386/i386/trap.c:690 #9 0xc0bd15eb in calltrap () at /usr/src/sys/i386/i386/exception.s:165 #10 0xc08d9c2a in kdb_enter (why=0xc0cc78ea "panic", msg=0xc0cc78ea "panic") at cpufunc.h:71 #11 0xc08a95b6 in panic (fmt=0xc0ce6585 "ECONNRESET") at /usr/src/sys/kern/kern_shutdown.c:562 #12 0xc0a3d805 in tcp_usr_disconnect (so=0xc715c670) at /usr/src/sys/netinet/tcp_usrreq.c:552 #13 0xc09111bd in sodisconnect (so=0xc715c670) at /usr/src/sys/kern/uipc_socket.c:810 #14 0xc0914144 in soclose (so=0xc715c670) at /usr/src/sys/kern/uipc_socket.c:658 #15 0xc08f6459 in soo_close (fp=0xc743e230, td=0xc7023000) at /usr/src/sys/kern/sys_socket.c:291 #16 0xc086efc3 in _fdrop (fp=0xc743e230, td=0xc7023000) at file.h:293 #17 0xc0870cf0 in closef (fp=0xc743e230, td=0xc7023000) at /usr/src/sys/kern/kern_descrip.c:2117 #18 0xc0871097 in kern_close (td=0xc7023000, fd=4) at /usr/src/sys/kern/kern_descrip.c:1162 #19 0xc087123a in close (td=0xc7023000, uap=0xe85e5cf8) at /usr/src/sys/kern/kern_descrip.c:1114 #20 0xc0bef600 in syscall (frame=0xe85e5d38) at /usr/src/sys/i386/i386/trap.c:1111 #21 0xc0bd1680 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:261 #22 0x00000033 in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) fr 12 #12 0xc0a3d805 in tcp_usr_disconnect (so=0xc715c670) at /usr/src/sys/netinet/tcp_usrreq.c:552 552 panic("ECONNRESET"); (kgdb) list 547 inp = sotoinpcb(so); 548 KASSERT(inp != NULL, ("tcp_usr_disconnect: inp == NULL")); 549 INP_WLOCK(inp); 550 if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) { 551 error = ECONNRESET; 552 panic("ECONNRESET"); 553 /* log(LOG_INFO, "ECONNRESET 3: file %s; line %d\n", __FILE__, __LINE__); */ 554 goto out; 555 } 556 tp = intotcpcb(inp); (kgdb) p/x inp->inp_flags $1 = 0x4800000 #define INP_DROPPED 0x04000000 /* protocol drop flag */ (kgdb) fr 14 #14 0xc0914144 in soclose (so=0xc715c670) at /usr/src/sys/kern/uipc_socket.c:658 658 error = sodisconnect(so); (kgdb) list 653 654 CURVNET_SET(so->so_vnet); 655 funsetown(&so->so_sigio); 656 if (so->so_state & SS_ISCONNECTED) { 657 if ((so->so_state & SS_ISDISCONNECTING) == 0) { 658 error = sodisconnect(so); 659 if (error) { 660 if (error == ENOTCONN) 661 error = 0; 662 goto drop; (kgdb) p/x so->so_state $2 = 0x2000 #define SS_ISDISCONNECTED 0x2000 /* socket disconnected from peer */ (the code differs a bit here from original 8-STABLE version as it includes patch from kern/144061 to deal with ENOTCONN errors but this does not influence). Actually, I think that the issue here is similar to kern/144061. In the code above so_state is checked for SS_ISCONNECTED and SS_ISDISCONNECTING without locking and then sodisconnect() is called. It looks like the following may happen: 1) after shutdown() our output is closed; 2) then we call close(), soclose() checks that we are still in SS_ISCONNECTED and calls sodisconnect(); 3) at this time FIN arrives from the other end, which has called close() too, and the kernel disconnects the socket (INP_DROPPED is set); 4) sodisconnect()/tcp_usr_disconnect() checks for INP_DROPPED and returns ECONNRESET. I am attaching the patch, which may not be a solution but rather for illustration to described above. Running the test with this patch I am observing the following messages in error logs May 27 23:55:41 zhuzha kernel: ECONNRESET: so->state: 0x2000; file /usr/src/sys/kern/uipc_socket.c; line 664 and test does not fail. -- Mikolaj Golub
On 27 May 2010, at 22:25, Mikolaj Golub wrote: > We observed the same issue on our FreeBSD6 and 7 servers. I tried to = reproduce > the problem writing a simple test case but failed -- I didn't come to = the idea > of shutdown()/close() sequence (as Anton did). Although looking now at = the > code we had the issue with I see that shutdown()/close() sequence was = used > there too. Hi Mikolaj-- Indeed, this looks very like the ENOTCONN issue in kern/144061 -- = another indication that a bit more refinement is required in the = synchronization for our socket-layer state machine. Unfortunately, not = an issue we'll likely address in a general sense until 9.0. However, a = similar workaround to kenr/144061 does sound like the solution -- I'll = get this new regression test in the tree, as well as the workaround, = tomorrow, with the hopes of making 8.1. Thanks all! Robert >=20 > It looks like SO_LINGER is not important to reproduce ECONNRESET. > shutdown()/close() on one end and close() on the other is enough. = Also, > slowdown of one the processes (done by Anton using select()) is not = important > too. Taking this into consideration I have wrote a simplified version = of a test > to reproduce the bug (may be it worth of including to = tools/regression/sockets?). >=20 > I can easily reproduce the error with this test on FreeBSD7.1 and > 8-STABLE. Adding some prints to the kernel code I localized the place = where > the error appears and added panic() to get a backtrace. >=20 > So, the backtrace: >=20 > (kgdb) bt > #0 doadump () at pcpu.h:246 > #1 0xc04ec829 in db_fncall (dummy1=3D-1064461270, dummy2=3D0, = dummy3=3D-1, dummy4=3D0xe85e58b0 "=C4X^=E8") > at /usr/src/sys/ddb/db_command.c:548 > #2 0xc04ecc5f in db_command (last_cmdp=3D0xc0e0af9c, cmd_table=3D0x0, = dopager=3D0) > at /usr/src/sys/ddb/db_command.c:445 > #3 0xc04ecd14 in db_command_script (command=3D0xc0e0bec4 "call = doadump") > at /usr/src/sys/ddb/db_command.c:516 > #4 0xc04f0e50 in db_script_exec (scriptname=3D0xe85e59bc = "kdb.enter.panic", warnifnotfound=3DVariable "warnifnotfound" is not = available. > ) > at /usr/src/sys/ddb/db_script.c:302 > #5 0xc04f0f37 in db_script_kdbenter (eventname=3D0xc0cc78ea "panic") > at /usr/src/sys/ddb/db_script.c:324 > #6 0xc04eec18 in db_trap (type=3D3, code=3D0) at = /usr/src/sys/ddb/db_main.c:228 > #7 0xc08d9aa6 in kdb_trap (type=3D3, code=3D0, tf=3D0xe85e5af8) at = /usr/src/sys/kern/subr_kdb.c:535 > #8 0xc0befecb in trap (frame=3D0xe85e5af8) at = /usr/src/sys/i386/i386/trap.c:690 > #9 0xc0bd15eb in calltrap () at = /usr/src/sys/i386/i386/exception.s:165 > #10 0xc08d9c2a in kdb_enter (why=3D0xc0cc78ea "panic", msg=3D0xc0cc78ea = "panic") at cpufunc.h:71 > #11 0xc08a95b6 in panic (fmt=3D0xc0ce6585 "ECONNRESET") at = /usr/src/sys/kern/kern_shutdown.c:562 > #12 0xc0a3d805 in tcp_usr_disconnect (so=3D0xc715c670) at = /usr/src/sys/netinet/tcp_usrreq.c:552 > #13 0xc09111bd in sodisconnect (so=3D0xc715c670) at = /usr/src/sys/kern/uipc_socket.c:810 > #14 0xc0914144 in soclose (so=3D0xc715c670) at = /usr/src/sys/kern/uipc_socket.c:658 > #15 0xc08f6459 in soo_close (fp=3D0xc743e230, td=3D0xc7023000) > at /usr/src/sys/kern/sys_socket.c:291 > #16 0xc086efc3 in _fdrop (fp=3D0xc743e230, td=3D0xc7023000) at = file.h:293 > #17 0xc0870cf0 in closef (fp=3D0xc743e230, td=3D0xc7023000) > at /usr/src/sys/kern/kern_descrip.c:2117 > #18 0xc0871097 in kern_close (td=3D0xc7023000, fd=3D4) at = /usr/src/sys/kern/kern_descrip.c:1162 > #19 0xc087123a in close (td=3D0xc7023000, uap=3D0xe85e5cf8) > at /usr/src/sys/kern/kern_descrip.c:1114 > #20 0xc0bef600 in syscall (frame=3D0xe85e5d38) at = /usr/src/sys/i386/i386/trap.c:1111 > #21 0xc0bd1680 in Xint0x80_syscall () at = /usr/src/sys/i386/i386/exception.s:261 > #22 0x00000033 in ?? () > Previous frame inner to this frame (corrupt stack?) > (kgdb) fr 12 > #12 0xc0a3d805 in tcp_usr_disconnect (so=3D0xc715c670) at = /usr/src/sys/netinet/tcp_usrreq.c:552 > 552 panic("ECONNRESET"); > (kgdb) list > 547 inp =3D sotoinpcb(so); > 548 KASSERT(inp !=3D NULL, ("tcp_usr_disconnect: inp =3D=3D = NULL")); > 549 INP_WLOCK(inp); > 550 if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) { > 551 error =3D ECONNRESET; > 552 panic("ECONNRESET"); > 553 /* log(LOG_INFO, "ECONNRESET 3: file %s; line = %d\n", __FILE__, __LINE__); */ > 554 goto out; > 555 } > 556 tp =3D intotcpcb(inp); > (kgdb) p/x inp->inp_flags > $1 =3D 0x4800000 >=20 > #define INP_DROPPED 0x04000000 /* protocol drop flag */ >=20 > (kgdb) fr 14 > #14 0xc0914144 in soclose (so=3D0xc715c670) at = /usr/src/sys/kern/uipc_socket.c:658 > 658 error =3D sodisconnect(so); > (kgdb) list > 653 > 654 CURVNET_SET(so->so_vnet); > 655 funsetown(&so->so_sigio); > 656 if (so->so_state & SS_ISCONNECTED) { > 657 if ((so->so_state & SS_ISDISCONNECTING) =3D=3D = 0) { > 658 error =3D sodisconnect(so); > 659 if (error) { > 660 if (error =3D=3D ENOTCONN) > 661 error =3D 0; > 662 goto drop; > (kgdb) p/x so->so_state =20 > $2 =3D 0x2000 >=20 > #define SS_ISDISCONNECTED 0x2000 /* socket disconnected from = peer */ >=20 > (the code differs a bit here from original 8-STABLE version as it = includes > patch from kern/144061 to deal with ENOTCONN errors but this does not > influence). >=20 > Actually, I think that the issue here is similar to kern/144061. In = the code > above so_state is checked for SS_ISCONNECTED and SS_ISDISCONNECTING = without > locking and then sodisconnect() is called. It looks like the following = may > happen: >=20 > 1) after shutdown() our output is closed; >=20 > 2) then we call close(), soclose() checks that we are still in = SS_ISCONNECTED > and calls sodisconnect(); >=20 > 3) at this time FIN arrives from the other end, which has called = close() too, > and the kernel disconnects the socket (INP_DROPPED is set); >=20 > 4) sodisconnect()/tcp_usr_disconnect() checks for INP_DROPPED and = returns > ECONNRESET. >=20 > I am attaching the patch, which may not be a solution but rather for > illustration to described above. Running the test with this patch I am > observing the following messages in error logs >=20 > May 27 23:55:41 zhuzha kernel: ECONNRESET: so->state: 0x2000; file = /usr/src/sys/kern/uipc_socket.c; line 664 >=20 > and test does not fail. >=20 > --=20 > Mikolaj Golub >=20 > <tcp_close.c><uipc_socket.c.econnreset.patch>
Hi Mikolaj and Robert, Thanks for following up on this issue! > It looks like SO_LINGER is not important to reproduce ECONNRESET. Indeed, it is not. > I am attaching the patch, which may not be a solution IMHO, it is not, unfortunately, a solution: it seems to clear ECONNRESET blindly and w/o distinguishing the situation when the remote end closes the connection prematurely (i.e. before acknowledging all data written from the local end) -- and that qualifies for the true "connection reset by peer" from close()... Anton Lavrentiev Contractor NIH/NLM/NCBI
On Fri, 28 May 2010 04:40:03 GMT Lavrentiev, Anton (NIH/NLM/NCBI) [C] wrote: LA> IMHO, it is not, unfortunately, a solution: it seems to clear ECONNRESET LA> blindly and w/o distinguishing the situation when the remote end closes the LA> connection prematurely (i.e. before acknowledging all data written from the LA> local end) -- and that qualifies for the true "connection reset by peer" LA> from close()... I am not very familiar with the socket/tcp code but it looks for me that it might not make any difference. I can be wrong here but the situation you have described as true "connection reset by peer" seems to have the following path in the code: soclose() -> sodisconnect() -> tcp_usr_disconnect() -> tcp_disconnect() But tcp_disconnect() does not return error, so we will not have ECONNRESET error in any case. May be you have a good test suite to reproduce this situation? :-) -- Mikolaj Golub
> IMHO, it is not, unfortunately, a solution: it seems to clear ECONNRESET > blindly and w/o distinguishing the situation when the remote end closes the > connection prematurely (i.e. before acknowledging all data written from the > local end) -- and that qualifies for the true "connection reset by peer" > from close()... I did some experiments the results I would like to share here. The idea is following: the client sends data in one write() more then a win, while the server closes the connection without reading (sending RST on close). I also played with LINGER option. I have managed to get ECONNRESET only on write(), if the server sends RST before the client calls write(). In all other cases write()/close() returned without error. See the attachment for details. So I think that with the workaround (ignore ECONNRESET returned by sodisconnect() in soclose()) we would not make the situation worse (while it fixed the issue with applications getting unexpectedly ECONNRESET after shutdown()/close() sequence). -- Mikolaj Golub
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped
Created attachment 252292 [details] A reproducer The reproducer was provided by Sad Clouds on the freebsd-net mailing list on April 3rd, 2024.
See https://reviews.freebsd.org/D48148
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=053a988497342a6fd0a717cc097d09c23f83e103 commit 053a988497342a6fd0a717cc097d09c23f83e103 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2024-12-23 18:35:49 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2024-12-23 18:35:49 +0000 tcp: don't ever return ECONNRESET on close(2) The SUS doesn't mention this error code as a possible one [1]. The FreeBSD manual page specifies a possible ECONNRESET for close(2): [ECONNRESET] The underlying object was a stream socket that was shut down by the peer before all pending data was delivered. In the past it had been EINVAL (see 21367f630d72), and this EINVAL was added as a safety measure in 623dce13c64ef. After conversion to ECONNRESET it had been documented in the manual page in 78e3a7fdd51e6, but I bet wasn't ever tested to actually be ever returned, cause the tcp-testsuite[2] didn't exist back then. So documentation is incorrect since 2006, if my bet wins. Anyway, in the modern FreeBSD the condition described above doesn't end up with ECONNRESET error code from close(2). The error condition is reported via SO_ERROR socket option, though. This can be checked using the tcp-testsuite, temporarily disabling the getsockopt(SO_ERROR) lines using sed command [3]. Most of these getsockopt(2)s are followed by '+0.00 close(3) = 0', which will confirm that close(2) doesn't return ECONNRESET even on a socket that has the error stored, neither it is returned in the case described in the manual page. The latter case is covered by multiple tests residing in tcp- testsuite/state-event-engine/rcv-rst-*. However, the deleted block of code could be entered in a race condition between close(2) and processing of incoming packet, when connection had already been half-closed with shutdown(SHUT_WR) and sits in TCPS_LAST_ACK. This was reported in the bug 146845. With the block deleted, we will continue into tcp_disconnect() which has proper handling of INP_DROPPED. The race explanation follows. The connection is in TCPS_LAST_ACK. The network input thread acquires the tcpcb lock first, sets INP_DROPPED, acquires the socket lock in soisdisconnected() and clears SS_ISCONNECTED. Meanwhile, the syscall thread goes through sodisconnect() which checks for SS_ISCONNECTED locklessly(!). The check passes and the thread blocks on the tcpcb lock in tcp_usr_disconnect(). Once input thread releases the lock, the syscall thread observes INP_DROPPED and returns ECONNRESET. - Thread 1: tcp_do_segment()->tcp_close()->in_pcbdrop(),soisdisconnected() - Thread 2: sys_close()...->soclose()->sodisconnect()->tcp_usr_disconnect() Note that the lockless operation in sodisconnect() isn't correct, but enforcing the socket lock there will not fix the problem. [1] https://pubs.opengroup.org/onlinepubs/9799919799/ [2] https://github.com/freebsd-net/tcp-testsuite [3] sed -i "" -Ee '/\+0\.00 getsockopt\(3, SOL_SOCKET, SO_ERROR, \[ECONNRESET\]/d' $(grep -lr ECONNRESET tcp-testsuite) PR: 146845 Reviewed by: tuexen, rrs, imp Differential Revision: https://reviews.freebsd.org/D48148 lib/libsys/close.2 | 5 +---- sys/netinet/tcp_usrreq.c | 5 ----- 2 files changed, 1 insertion(+), 9 deletions(-)