Created attachment 230203 [details] testcase.sh When carp is demoted due to send errors it will never recover as the code path which will decrement the demotion counter appears to be dead. Expected output from test case when working: > # sh testcase.sh > Staring jail test1. > ... > carp: 1@epair1b: BACKUP -> MASTER (master timed out) > net.inet.carp.demotion: 0 > > Enabling pf > pf enabled > ... > carp: demoted by 240 to 240 (send error 13 on epair1b) > net.inet.carp.demotion: 240 > > Disabling pf > pf disabled > ....... > carp: demoted by -240 to 0 (send ok on epair1b) > net.inet.carp.demotion: 0 > Cleaning up. Actual output from test case: > # sh testcase.sh > Staring jail test1. > ... > carp: 1@epair0b: BACKUP -> MASTER (master timed out) > net.inet.carp.demotion: 0 > > Enabling pf > pf enabled > ... > carp: demoted by 240 to 240 (send error 13 on epair0b) > net.inet.carp.demotion: 240 > > Disabling pf > pf disabled > ....... > carp: demoted by 240 to 240 (send error 13 on epair0b) > net.inet.carp.demotion: 240 > Cleaning up. Note, the test case uses PF to force ip_output to fail.
A patch is available as review D33536
Created attachment 230207 [details] alternative patch Marius, thanks a lot for the test case! I came with alternative patch, that IMHO makes it easier to understand what's going on here.
I tested the patch, and it looks good to me.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9a8cf950b259f6833c7562ce941b0cfeae6687e5 commit 9a8cf950b259f6833c7562ce941b0cfeae6687e5 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2021-12-19 01:19:26 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2021-12-19 01:19:26 +0000 carp: fix send error demotion recovery The problem is that carp(4) would clear the error counter on first successful send, and stop counting successes after that. Fix this logic and document it in human language. PR: 260499 Differential revision: https://reviews.freebsd.org/D33536 sys/netinet/ip_carp.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-)
Thanks for committing this. Are you planting to MFC this to 13-Stable? I couldn’t see an MFC line in the commit.
I will do that after the holidays.
Hi, How are you coming along with mfc-ing this? From what I can see it hasn't happened yet.
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=f7735c3952d9d54f745f1682ba1f13a70897a8f7 commit f7735c3952d9d54f745f1682ba1f13a70897a8f7 Author: Gleb Smirnoff <glebius@FreeBSD.org> AuthorDate: 2021-12-19 01:19:26 +0000 Commit: Gleb Smirnoff <glebius@FreeBSD.org> CommitDate: 2022-01-21 18:41:46 +0000 carp: fix send error demotion recovery The problem is that carp(4) would clear the error counter on first successful send, and stop counting successes after that. Fix this logic and document it in human language. PR: 260499 Differential revision: https://reviews.freebsd.org/D33536 (cherry picked from commit 9a8cf950b259f6833c7562ce941b0cfeae6687e5) sys/netinet/ip_carp.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-)