Created attachment 232076 [details] patch for 12.3 The bind()/bindat() syscall for AF_UNIX returns EADDRINUSE when some file already exist at the specified path. EADDRINUSE should mean "this address already bound to someone", and this is not always true here. There may be just some file (not a socket) or a dead socket (pending non-unlinked vnode from already closed, previously bound socket). While the first case usually considered as an application error, the second usually means "we need to remove this stale entry and try again", but to do this we should first check if the socket is really dead. There is a hacky way to do this: try to connect to it, and see what happens, but: 1) it is an extra unneeded system call, 2) is the socket is not dead, the listening application will receive spurious connection request, which is not always desirable, 3) there is a problem to distinguish between ECONNREFUSED from a dead socket and ECONNREFUSED from full-backlog; we can try to connect with a wrong protocol (SOCK_STREAM vs SOCK_DGRAM etc) to get another errno, but anyway it is a hack So I made a patch to return EEXIST instead or EADDRINUSE when there is not really bound socket but just a random file or a dead socket. As this may cause some incompatibilities, I added sysctl to optionally enable this feature.
Created attachment 232077 [details] patch for 10.4
Created attachment 232078 [details] patch for 11.4
Created attachment 232079 [details] patch for 13-stable
In concept I like the idea of distinct error values; I would even go one farther, and use EFTYPE for non-socket files. However, I wonder whether it is too late for this change, as applications can't count on it, especially if it is optional (and off by default). On the implementation, I wouldn't use VOP_UNP_CONNECT(). It doesn't have side effects now, but that could change. Just testing v_unpcb would be OK.
Even disabled by default, it may be used by aware applications in the following way: EADDRINUSE -> assume really busy address, print an error and fail, EEXIST -> remove stale socket and try again, EFTYPE -> print an error about wrong path and fail When the feature is disabled, application will just fail on all three cases (a lot of application behaves in this way already), but when the feature is enabled, it will handle stale sockets properly. As for unaware applications: 1) most of them just fails on any bind error, 2) some of them tries to cleanup stale socket before bind, or after any bind error, 3) some of them tries to cleanup stale socket on EADDRINUSE and displays a error for all other codes. Third type of applications may be broken due to this change, and they are the only reason for disabling it by default.
Created attachment 232216 [details] new patch
I like your patch. Note that EFTYPE and EEXIST are not listed in specification for bind(2), thus this allows us consider suggested functionality to be an extension over specification. https://pubs.opengroup.org/onlinepubs/9699919799/functions/bind.html May I ask you to do two things: 1) Check how these edge cases work on other popular POSIX OSes, at least Linux and Mac OS X. If they also do something smart here, we probably should stay compatible with them. 2) If step 1) doesn't uncover need for any additions, please document in bind(2) cross referencing to unix(4).
As for Linux, i looked here: https://github.com/torvalds/linux/blob/master/net/unix/af_unix.c unix_bind() calls unix_bind_bsd() which has "return err == -EEXIST ? -EADDRINUSE : err;" at the fail-end, looks like always unconditionally returning EADDRINUSE for already existant path. Man page says the same. As for MacOS, I don't have it and don't know where to see or check. Seen man pages here https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/bind.2.html and here https://www.unix.com/man-page/osx/2/bind/ - both looks like plain old manpage from BSD, with the same behaviour. So added patches for manpages, differential revision here: https://reviews.freebsd.org/D34557