Bug 279203 - killpg(): Forking fast leads to livelock
Summary: killpg(): Forking fast leads to livelock
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.3-RELEASE
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-05-21 19:35 UTC by Michael Gmelin
Modified: 2024-05-24 09:33 UTC (History)
4 users (show)

See Also:
grembo: mfc-stable13?
grembo: needs_errata? (secteam)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Gmelin freebsd_committer freebsd_triage 2024-05-21 19:35:46 UTC
Forking logger many times like this:

    #!/bin/sh
    for id in $(jot 100); do
      logger -p local2.info -t pot "wledkjweldjwldjkwedj" &
    done

sends the machine into some race condition, causing loads of 300-500. I can reproduce it on multicore machines (including within bhyve), not on single core. Load is mostly caused by system calls. When knowing pids, it's sometimes possible to recover the host by killing all logger processes (killall won't work though, as the machine is too loaded for that).

I could not reproduce this on 13.2 (at least not as easily). When building logger without capsicum, this doesn't happen, but that could be a red herring.

Happens on 13.3 as well as 13.3p2.

This is causing quite some headache.  We put logger under a lock to reduce concurrency, which made things better, but we still see the general situation (either other things call logger or, more likely, this is just a symptom of a bigger underlying issue).
Comment 1 Michael Gmelin freebsd_committer freebsd_triage 2024-05-22 10:17:26 UTC
This only seems to affect 13.3 and 13-STABLE after https://cgit.freebsd.org/src/commit/?h=stable/13&id=2b0cd3b552942c642a84f8e224b989c02d97125d ("killpg(2): close a race with fork(2), part1").
Comment 2 Michael Gmelin freebsd_committer freebsd_triage 2024-05-22 10:29:31 UTC
Cherry-picking this commit fixes the issue for me:

commit 7a70f17ac4bd64dc1a5020f963ba4380cf37b7e5
Author: Konstantin Belousov <kib@FreeBSD.org>
Date:   Fri Jul 7 20:19:33 2023 +0300

    killpg(): more carefully avoid LoR
    
    otherwise we could end up with the livelock.  When pg_killsx trylock
    failed, ensure that we do wait for lock availability before retry.
    
    Reported and tested by: pho
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week