Bug 286692 - unix(4): regressed after base d15792780760
Summary: unix(4): regressed after base d15792780760
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 15.0-CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Gleb Smirnoff
URL:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2025-05-09 11:36 UTC by Evgenii Khramtsov
Modified: 2025-05-25 22:41 UTC (History)
3 users (show)

See Also:


Attachments
kdump.stdout.txt (23.85 KB, text/plain)
2025-05-12 15:46 UTC, Evgenii Khramtsov
no flags Details
kdump2.stdout.txt.zst (10.94 KB, application/octet-stream)
2025-05-12 16:25 UTC, Evgenii Khramtsov
no flags Details
kdump3.stdout.txt (294.66 KB, text/plain)
2025-05-12 18:21 UTC, Evgenii Khramtsov
no flags Details
kdump4.stdout.txt.zst (4.71 KB, application/octet-stream)
2025-05-24 03:59 UTC, Evgenii Khramtsov
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Evgenii Khramtsov 2025-05-09 11:36:57 UTC
x11-wm/sway starts normally, though executing a command through it, e.g. dmenu-wl

(I have this in ~/.config/sway/config):
set $menu dmenu-wl_path | dmenu-wl -h 30 -nb '#FFFFFF' -sb '#BFBFBF' -nf '#000000' -sf '#000000' | xargs swaymsg exec --

$ dbus-run-session seatd-launch sway
($mod+d pressed, dmenu-wl appears)
(sway starts consuming 100% WCPU in top(1) output)

Another config hotkey exec that causes this:
$mod+End exec 'swaymsg "output * power off"'

Using a hotkey that spawns x11/foot terminal through exec does not cause this issue,
so maybe swaymsg reproduces this?

git bisect told me this is base d15792780760

I reverted the commits fixing d15792780760 (to avoid revert d15792780760 conflicts),
then reverted d15792780760, rebuilt kernel, then the issue is gone.

Sorry, I can't provide a more minimal reproducer because ENOTIME.
Comment 1 Maxim Konovalov freebsd_committer freebsd_triage 2025-05-09 21:07:16 UTC
Gleb probably wants to take a look.
Comment 2 Gleb Smirnoff freebsd_committer freebsd_triage 2025-05-12 14:18:50 UTC
Eugene, what was the most recent revision you have tried? There were
a few fixups after d15792780760.  Have you tried 1df3097d5525 or something
newer?
Comment 3 Evgenii Khramtsov 2025-05-12 14:47:01 UTC
(In reply to Gleb Smirnoff from comment #2)

I am on base b0dd1a604810, and I reboot every half-a-week or so. I fired another base+ports build now to confirm and will report back the state of unix(4) once I reboot this machine.
Comment 4 Gleb Smirnoff freebsd_committer freebsd_triage 2025-05-12 14:56:15 UTC
b0dd1a604810 is newer than 1df3097d5525.  If the bug is present on
b0dd1a604810, then no need to recheck.
Comment 5 Gleb Smirnoff freebsd_committer freebsd_triage 2025-05-12 15:22:01 UTC
Can you please capture ktrace from swaymsg? Let's use second case as it looks
simplier:

$mod+End exec 'ktrace -f /tmp/ktrace.out swaymsg "output * power off"'

Then, cd /tmp && kdump.
Comment 6 Evgenii Khramtsov 2025-05-12 15:44:38 UTC
base eed3be47967f ports 4e97ab08bdfa still affected, will post kdump shortly
Comment 7 Evgenii Khramtsov 2025-05-12 15:46:04 UTC
Created attachment 260351 [details]
kdump.stdout.txt

started sway, did $mod+End, got 100% WCPU, killed sway
Comment 8 Gleb Smirnoff freebsd_committer freebsd_triage 2025-05-12 16:02:52 UTC
Pardon my misunderstanding, this was sway eating CPU, not swaymsg.  Can you
please do this instead:

ktrace -f /tmp/ktrace.out $sway_pid

, then reproduce the problem, kill sway, look at kdump in /tmp.

Anyway, your captured ktrace from swaymsg can also be useful, too, as we
learned what exactly the other side sends to the socket to get sway into
bad state.
Comment 9 Evgenii Khramtsov 2025-05-12 16:25:10 UTC
Created attachment 260352 [details]
kdump2.stdout.txt.zst

$ du -A /tmp/ktrace.out
4174032	/tmp/ktrace.out

Oh no-no-no :D

That was swaymsg, sleep 10, then killall sway

Then I did

$ kdump | head -c 16384000 | zstd -9 -o kdump2.stdout.txt.zst

which should give more looped entries than enough I guess.
Comment 10 Gleb Smirnoff freebsd_committer freebsd_triage 2025-05-12 17:37:43 UTC
Can you please also record ktrace of sway on kernel with reverted d15792780760?
That will help a lot to compare the trace with the old kernel versus the
regressed one.
Comment 11 Evgenii Khramtsov 2025-05-12 18:21:47 UTC
Created attachment 260356 [details]
kdump3.stdout.txt

> Can you please also record ktrace of sway on kernel with reverted d15792780760?

Yessir!
Comment 12 commit-hook freebsd_committer freebsd_triage 2025-05-21 02:55:47 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=eafe5967ac558de142d91660e18e9238289890e3

commit eafe5967ac558de142d91660e18e9238289890e3
Author:     Gleb Smirnoff <glebius@FreeBSD.org>
AuthorDate: 2025-05-21 00:30:57 +0000
Commit:     Gleb Smirnoff <glebius@FreeBSD.org>
CommitDate: 2025-05-21 02:54:51 +0000

    unix: fix EVFILT_WRITE when peer close(2)s and shutdown(2)s

    For the close(2) case restore reporting the event with EV_EOF set.  This
    fixes bug 286692.

    For the shutdown(2) case restore original behavior, but leave comment that
    we may want to change that.  The d15792780760 was not intended to bring in
    functional API changes.

    Provide tests for both cases.

    PR:                     286692
    Fixes:                  d15792780760ef94647af9b377b5f0a80e1826bc

 sys/kern/uipc_usrreq.c       | 16 ++++++++++++---
 tests/sys/kern/unix_stream.c | 48 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 61 insertions(+), 3 deletions(-)
Comment 13 Evgenii Khramtsov 2025-05-21 16:09:14 UTC
I'll be able to test and report in a few days, hopefully this week. Please don't perceive lack of fast feedback (considering fast responses in the past) as lack of feedback.
Comment 14 Evgenii Khramtsov 2025-05-23 06:57:11 UTC
Fixed as of base 4814de3243e0. I didn't test which commit fixed this. Thank you.
Comment 15 Evgenii Khramtsov 2025-05-24 03:10:04 UTC
Hmm, I still see sway consuming 100% WCPU but only after abnormal application shutdown (SIGKILL Wayland app that couldn't start but continued to exist as a process due to background CPU stress from poudriere). I'll reopen if/once I find a reproducer.
Comment 16 Evgenii Khramtsov 2025-05-24 03:59:57 UTC
Created attachment 260672 [details]
kdump4.stdout.txt.zst

Attached kdump output.

I actively opened/closed various docs in libreoffice with poudriere building firefox in background, listening to audacious, and using dolphin as well. I couldn't find a specific reproducer but I captured ktrace of sway when it gets to the "100%" state again.
Comment 17 Evgenii Khramtsov 2025-05-24 04:00:18 UTC
>  You have to specify a comment when changing the Status of a bug from Closed to Open. 

.
Comment 18 Gleb Smirnoff freebsd_committer freebsd_triage 2025-05-24 15:23:36 UTC
On Sat May 24 03:59:57  2025 UTC, 2khramtsov@gmail.com wrote:
> Attached kdump output.
> 
> I actively opened/closed various docs in libreoffice with poudriere building
> firefox in background, listening to audacious, and using dolphin as well. I
> couldn't find a specific reproducer but I captured ktrace of sway when it gets
> to the "100%" state again.

Unfortunately this trace is not very helpful.  We need to catch what happened
before the busy loop was entered.  Can you please run sway continuously in the
tracing mode?  To prevent trace file growing too big, you can script to disable
and then immediately enable back tracing every 5 minutes or so.  We need to see
the story of the descriptor before busy loop.  If file gets too big, can you
please share it somewhere on Internet instead of putting it into bugzilla.  No
need to kdump the file, you can share the binary, it should be more compact.
If the file contains potentially private information you can encrypt it with
my PGP key ed25519/FEFF58C05D05CC22, available at
https://docs.freebsd.org/en/articles/pgpkeys/#_gleb_smirnoff_glebiusfreebsd_org
Comment 19 Evgenii Khramtsov 2025-05-25 22:41:51 UTC
(In reply to Gleb Smirnoff from comment #18)

I couldn't reproduce it, seems hard to reproduce, so, feel free to close as I have no idea if this PR should be open. I'll reopen if/once I get to reproduce it and capture it as you asked.