Bug 262172

Summary: UNIX Domain bind() - distinguish between random garbage and alive listening socket
Product: Base System Reporter: firk
Component: kernAssignee: Gleb Smirnoff <glebius>
Status: Open ---    
Severity: Affects Some People CC: arrowd, emaste, eugen, glebius, lwhsu
Priority: ---    
Version: Unspecified   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
patch for 12.3
none
patch for 10.4
none
patch for 11.4
none
patch for 13-stable
none
new patch none

Description firk 2022-02-24 16:46:05 UTC
Created attachment 232076 [details]
patch for 12.3

The bind()/bindat() syscall for AF_UNIX returns EADDRINUSE when some file already exist at the specified path. EADDRINUSE should mean "this address already bound to someone", and this is not always true here. There may be just some file (not a socket) or a dead socket (pending non-unlinked vnode from already closed, previously bound socket). While the first case usually considered as an application error, the second usually means "we need to remove this stale entry and try again", but to do this we should first check if the socket is really dead.

There is a hacky way to do this: try to connect to it, and see what happens, but: 1) it is an extra unneeded system call,
2) is the socket is not dead, the listening application will receive spurious connection request, which is not always desirable,
3) there is a problem to distinguish between ECONNREFUSED from a dead socket and ECONNREFUSED from full-backlog; we can try to connect with a wrong protocol (SOCK_STREAM vs SOCK_DGRAM etc) to get another errno, but anyway it is a hack

So I made a patch to return EEXIST instead or EADDRINUSE when there is not really bound socket but just a random file or a dead socket. As this may cause some incompatibilities, I added sysctl to optionally enable this feature.
Comment 1 firk 2022-02-24 16:47:08 UTC
Created attachment 232077 [details]
patch for 10.4
Comment 2 firk 2022-02-24 16:47:37 UTC
Created attachment 232078 [details]
patch for 11.4
Comment 3 firk 2022-02-24 16:48:07 UTC
Created attachment 232079 [details]
patch for 13-stable
Comment 4 Mike Karels freebsd_committer freebsd_triage 2022-03-02 21:28:08 UTC
In concept I like the idea of distinct error values; I would even go one farther, and use EFTYPE for non-socket files.  However, I wonder whether it is too late for this change, as applications can't count on it, especially if it is optional (and off by default).

On the implementation, I wouldn't use VOP_UNP_CONNECT().  It doesn't have side effects now, but that could change.  Just testing v_unpcb would be OK.
Comment 5 firk 2022-03-02 21:53:55 UTC
Even disabled by default, it may be used by aware applications in the following way:

EADDRINUSE -> assume really busy address, print an error and fail,
EEXIST -> remove stale socket and try again,
EFTYPE -> print an error about wrong path and fail

When the feature is disabled, application will just fail on all three cases (a lot of application behaves in this way already), but when the feature is enabled, it will handle stale sockets properly.

As for unaware applications:
1) most of them just fails on any bind error,
2) some of them tries to cleanup stale socket before bind, or after any bind error,
3) some of them tries to cleanup stale socket on EADDRINUSE and displays a error for all other codes.

Third type of applications may be broken due to this change, and they are the only reason for disabling it by default.
Comment 6 firk 2022-03-03 00:04:06 UTC
Created attachment 232216 [details]
new patch
Comment 7 Gleb Smirnoff freebsd_committer freebsd_triage 2022-09-08 04:20:04 UTC
I like your patch. Note that EFTYPE and EEXIST are not listed in specification for bind(2), thus this allows us consider suggested functionality to be an extension over specification.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/bind.html

May I ask you to do two things:

1) Check how these edge cases work on other popular POSIX OSes, at least Linux and Mac OS X. If they also do something smart here, we probably should stay compatible with them.

2) If step 1) doesn't uncover need for any additions, please document in bind(2) cross referencing to unix(4).
Comment 8 firk 2022-09-21 22:59:11 UTC
As for Linux, i looked here: https://github.com/torvalds/linux/blob/master/net/unix/af_unix.c

unix_bind() calls unix_bind_bsd() which has "return err == -EEXIST ? -EADDRINUSE : err;" at the fail-end, looks like always unconditionally returning EADDRINUSE for already existant path. Man page says the same.

As for MacOS, I don't have it and don't know where to see or check. Seen man pages here https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/bind.2.html and here https://www.unix.com/man-page/osx/2/bind/ - both looks like plain old manpage from BSD, with the same behaviour.

So added patches for manpages, differential revision here: https://reviews.freebsd.org/D34557