Bug 260499 - [carp] Carp unable recover after demotion by send error
Summary: [carp] Carp unable recover after demotion by send error
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: Gleb Smirnoff
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-17 18:47 UTC by Marius Halden
Modified: 2022-01-21 18:44 UTC (History)
1 user (show)

See Also:


Attachments
testcase.sh (1.31 KB, text/plain)
2021-12-17 18:47 UTC, Marius Halden
no flags Details
alternative patch (1.93 KB, text/plain)
2021-12-17 19:40 UTC, Gleb Smirnoff
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marius Halden 2021-12-17 18:47:31 UTC
Created attachment 230203 [details]
testcase.sh

When carp is demoted due to send errors it will never recover as the code path which will decrement the demotion counter appears to be dead.

Expected output from test case when working:
> # sh testcase.sh
> Staring jail test1.
> ...
> carp: 1@epair1b: BACKUP -> MASTER (master timed out)
> net.inet.carp.demotion: 0
> 
> Enabling pf
> pf enabled
> ...
> carp: demoted by 240 to 240 (send error 13 on epair1b)
> net.inet.carp.demotion: 240
> 
> Disabling pf
> pf disabled
> .......
> carp: demoted by -240 to 0 (send ok on epair1b)
> net.inet.carp.demotion: 0
> Cleaning up.

Actual output from test case:
> # sh testcase.sh
> Staring jail test1.
> ...
> carp: 1@epair0b: BACKUP -> MASTER (master timed out)
> net.inet.carp.demotion: 0
> 
> Enabling pf
> pf enabled
> ...
> carp: demoted by 240 to 240 (send error 13 on epair0b)
> net.inet.carp.demotion: 240
> 
> Disabling pf
> pf disabled
> .......
> carp: demoted by 240 to 240 (send error 13 on epair0b)
> net.inet.carp.demotion: 240
> Cleaning up.

Note, the test case uses PF to force ip_output to fail.
Comment 1 Marius Halden 2021-12-17 18:56:13 UTC
A patch is available as review D33536
Comment 2 Gleb Smirnoff freebsd_committer freebsd_triage 2021-12-17 19:40:07 UTC
Created attachment 230207 [details]
alternative patch

Marius, thanks a lot for the test case! I came with alternative patch, that IMHO makes it easier to understand what's going on here.
Comment 3 Marius Halden 2021-12-18 10:29:17 UTC
I tested the patch, and it looks good to me.
Comment 4 commit-hook freebsd_committer freebsd_triage 2021-12-19 01:20:32 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9a8cf950b259f6833c7562ce941b0cfeae6687e5

commit 9a8cf950b259f6833c7562ce941b0cfeae6687e5
Author:     Gleb Smirnoff <glebius@FreeBSD.org>
AuthorDate: 2021-12-19 01:19:26 +0000
Commit:     Gleb Smirnoff <glebius@FreeBSD.org>
CommitDate: 2021-12-19 01:19:26 +0000

    carp: fix send error demotion recovery

    The problem is that carp(4) would clear the error counter on first
    successful send, and stop counting successes after that.  Fix this
    logic and document it in human language.

    PR:                     260499
    Differential revision:  https://reviews.freebsd.org/D33536

 sys/netinet/ip_carp.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)
Comment 5 Marius Halden 2021-12-19 12:12:44 UTC
Thanks for committing this.

Are you planting to MFC this to 13-Stable? I couldn’t see an MFC line in the commit.
Comment 6 Gleb Smirnoff freebsd_committer freebsd_triage 2021-12-19 16:03:47 UTC
I will do that after the holidays.
Comment 7 Marius Halden 2022-01-21 11:09:03 UTC
Hi,

How are you coming along with mfc-ing this? From what I can see it hasn't happened yet.
Comment 8 commit-hook freebsd_committer freebsd_triage 2022-01-21 18:43:10 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f7735c3952d9d54f745f1682ba1f13a70897a8f7

commit f7735c3952d9d54f745f1682ba1f13a70897a8f7
Author:     Gleb Smirnoff <glebius@FreeBSD.org>
AuthorDate: 2021-12-19 01:19:26 +0000
Commit:     Gleb Smirnoff <glebius@FreeBSD.org>
CommitDate: 2022-01-21 18:41:46 +0000

    carp: fix send error demotion recovery

    The problem is that carp(4) would clear the error counter on first
    successful send, and stop counting successes after that.  Fix this
    logic and document it in human language.

    PR:                     260499
    Differential revision:  https://reviews.freebsd.org/D33536

    (cherry picked from commit 9a8cf950b259f6833c7562ce941b0cfeae6687e5)

 sys/netinet/ip_carp.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)