Created attachment 222189 [details] Patch against the ports tree Patch that removes the `unlink` call before binding some UNIX socket. The problem with the existing approach is that, when Xorg is launched as `root`, it can always unlink existing sockets, which it does when no display is explicitly specified and `-displayfd` is used. Moreover, the X server removes the UNIX socket when exiting (even in most abnormal cases). Typically, sddm exactly triggers the problem. So when you try to open another session through it, the first session's sockets are crushed and it becomes stale. I'm proposing the patch here since it is not practically useful on platforms that launch Xorg with an unprivileged user and because upstream seems inactive.
I am quite skeptical of pulling this in without it first being submitted and accepted upstream. Can you explain exactly what problem you are trying to solve?
Hi, As for the high-level problem I'm fixing, I wrote this description: > Typically, sddm exactly triggers the problem. So when you try to open > another session through it, the first session's sockets are crushed > and it becomes stale. What do you find unclear in it exactly? As for the low-level description, let me recapitulate with more context. As you most probably know, there are two ways that Xorg selects its display at startup: - Either you supply an explicit display on the command-line (e.g., ":1"). - Or you do not, and either "-displayfd" has been specified, in which case Xorg will try to assign some automatically, or it has not, in which case :0 is used by default. ("-displayfd" has other effects, but this is a separate discussion, relevant to the other bug #253278.) The problem is with the automatic assignment of display. It works by trying to grab the first display available (from :0 when using UNIX sockets), which it does by simply unlinking an existing socket file and then binding a new socket to the same file. When the server is run as root, this operation always succeeds, and any existing socket for :0 ("/tmp/.X11-UNIX/X0") that would exist is removed. This leaves any preexisting X session on :0 stale (e.g., launching new apps make them go to the new session). And SDDM triggers that: It passes "-displayfd", and no display directive, relying on the server to do display assignmnent, which it doesn't do correctly. Now, I agree that this problem is probably not FreeBSD specific (it affects platforms/distributions running Xorg as root). If you prefer, I'll report it upstream and we'll see how it goes.
Amusingly, I found this code at start of 'startx': # Automatically determine an unused $DISPLAY d=0 while true ; do [ -e "/tmp/.X$d-lock" -o -S "/tmp/.X11-unix/X$d" ] || break d=$(($d + 1)) done defaultdisplay=":$d" unset d This is exactly what the proposed change makes X (launched with '-displayfd') do.
Reported upstream here: https://gitlab.freedesktop.org/xorg/lib/libxtrans/-/issues/4
+1. Prevents Plasma/Wayland (debug session) destroying X11 socket used by Sway (dogfood session) at least until https://invent.kde.org/plasma/kwin/-/commit/9f0f4527029a. Both run Xwayland with different flags e.g., [kernel] - /sbin/init |-- login [pam] (login) | `-- sway |-- Xwayland :0 -rootless -terminate -core -listenfd 35 -listenfd 37 -wm 40 ... |-- login [pam] (login) | `-- ck-launch-session dbus-run-session startplasma-wayland | `-- dbus-run-session startplasma-wayland | |-- dbus-daemon --nofork --print-address 4 --session | `-- startplasma-wayland | `-- kwin_wayland_wrapper --xwayland ../libexec/startplasma-waylandsession (kwin_wayland_wrappe) | `-- kwin_wayland --wayland_fd 4 --xwayland ../libexec/startplasma-waylandsession | `-- Xwayland -displayfd 49 -rootless -wm 52 -auth /var/run/user/1234/xauth_XOBvXm Notice, Sway doesn't use -displayfd unlike Plasma but simply checks locks. https://github.com/swaywm/wlroots/blob/0.13.0/xwayland/sockets.c#L130-L186
I would like upstream to comment on the MR that is in the patch in Phab that Jan linked here before we commit this.
Upstream suggested https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/844 which helps Xwayland at least after https://gitlab.freedesktop.org/wlroots/wlroots/-/commit/4741e9d8410f Note, Linux is likely unaffected due to using abstract unix(4) sockets i.e., no paths to unlink(2) and leading NUL byte in sun_path.
Hi Jan, I provided a lot of background and explanations (and asked questions) in this mail almost 2 years ago: https://lists.x.org/archives/xorg-devel/2021-July/058735.html Every line of it is still valid. Unfortunately, I never received any response, and after some months I kinda lost hope in upstream doing anything on this matter. Thanks for pointing me to Jon Turnery's MR, which I was not aware of. Upstream doesn't seem to have made any progress since then (almost 1 year and a half ago). Glancing at the MR, I have doubts about their approach, in particular the fact that this should be fixed in the server. I think it is likely it should fixed in xtrans instead. After all, this is where sockets are created or unlinked, which should not happen without a locking mechanism. Moreover, as pointed out in the mail, the current lock file mechanism in the X server is logically incorrect and could be defeated in arguably rare or provoked situations. I may try to give it a new shot soon (next week perhaps), but not making hard promises. Thanks and regards.
May I push this in? This change seems safe for us as `/etc/rc.d/cleartmp` removes stale X sockets on system boot.
(In reply to Gleb Popov from comment #9) I concur. The patch is still up-to-date. zeising@: I don't think it's reasonable to hold this as there is no sign of upstream really pondering the topic and given that: - This problem has been preventing the launch of multiple concurrent sessions, an important use case, for years already. - I've been using it without the slightest side effect for years. Thanks.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/ports/commit/?id=3312d027696ff994d07c434a2baaba1f5b18d998 commit 3312d027696ff994d07c434a2baaba1f5b18d998 Author: Gleb Popov <arrowd@FreeBSD.org> AuthorDate: 2023-09-01 11:43:07 +0000 Commit: Gleb Popov <arrowd@FreeBSD.org> CommitDate: 2023-09-01 11:47:43 +0000 x11/xtrans: Don't unlink X11 Unix sockets. Without this change calling `Xorg -displayfd ..` would always make X to use :0 as the DISPLAY. This was a showstopper for running parallel desktop sessions. The leftover UDS' are cleaned up during system's boot by the /etc/rc.d/cleartmp service. Based on work of jbeich and Olivier Certner <olivier.freebsd@free.fr>. PR: 253277 Approved by: manu Sponsored by: Serenity Cybersecurity, LLC Differential Revision: https://reviews.freebsd.org/D30557 net/tigervnc-server/Makefile | 2 +- x11-fonts/libFS/Makefile | 1 + x11-fonts/libXfont/Makefile | 2 +- x11-fonts/libXfont2/Makefile | 1 + x11-fonts/xfs/Makefile | 1 + x11-servers/xarcan/Makefile | 2 +- x11-servers/xorg-server/Makefile | 2 +- x11-servers/xwayland-devel/Makefile | 1 + x11/cinnamon-session/Makefile | 2 +- x11/gnome-session/Makefile | 2 +- x11/libICE/Makefile | 2 +- x11/libSM/Makefile | 1 + x11/libX11/Makefile | 1 + x11/mate-session-manager/Makefile | 2 +- x11/xtrans/Makefile | 4 ++++ x11/xtrans/distinfo | 2 ++ 16 files changed, 20 insertions(+), 8 deletions(-)
This is finally landed. Thanks Olivier for the fix!
Great! Thanks Gleb!
For the record, on "stale" socket files: - Xorg has a signal handler to clean up sockets on abnormal termination. So, unless the process is killed with SIGKILL or something else is very wrong, abnormal termination doesn't leave stale sockets. - The automatic display selection ('-displayfd' and no explicit display ID passed) will automatically skip the stale socket and associated display ID and assign another ID. - Reboot leads to socket cleanup, as Gleb pointed out. That's why I've never had the slightest problem with this patch and why you most probably won't either (unless you love to kill your Xorg and insist on explicitly reusing such a killed display without an intervening reboot).