Bug 275225 - On ARM64 carp preempt not working as expected
Summary: On ARM64 carp preempt not working as expected
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.0-RELEASE
Hardware: arm64 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-11-21 06:18 UTC by ekoort
Modified: 2024-03-28 08:22 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description ekoort 2023-11-21 06:18:12 UTC
Hello,

It seems that carp completely ignores net.inet.carp.preempt value and always promotes "master" to host having higher advskew.
Two hosts, RPI4 and Pine Rockpro64. Same issue with FBSD 13.1 and 14.0

Config:
/boot/loader.conf: carp_load="YES"
/etc/sysctl.conf: net.inet.carp.preempt=0

RPI4 /etc/rc.conf:
# primary
ifconfig_genet0="inet 192.168.1.19 netmask 255.255.255.0"
defaultrouter="192.168.1.1"
ifconfig_genet0_alias0="inet vhid 1 pass xyz alias 192.168.1.20/24"
#ifconfig_genet0_alias0="inet vhid 1 advskew 100 pass xyz alias 192.168.1.20/24"

ROCKPRO64 /etc/rc.conf:
ifconfig_dwc0="inet 192.168.1.21 netmask 255.255.255.0"
defaultrouter="192.168.1.1"
# primary
#ifconfig_dwc0_alias0="inet vhid 1 pass xyz alias 192.168.1.20/24"
# backup
ifconfig_dwc0_alias0="inet vhid 1 advskew 100 pass xyz alias 192.168.1.20/24"

As told before when host having higher advskew comes online it will always promoted to be a primary regardless net.inet.carp.preempt=0/1 setting.

I tested with same configuration in Virtualbox with two AMD64 FBSD14 VM-s. It worked as expected: primary A goes down, secondary B takes over, primary A comes up and stays secondary and former secondary B stays primary.
Comment 1 Reid Linnemann 2023-11-21 23:14:34 UTC
I think you've misunderstood net.inet.carp.preempt:

   net.inet.carp.preempt                 Allow virtual hosts to preempt each
                                           other.  When enabled, a vhid in a
                                           backup state would preempt a master
                                           that is announcing itself with a
                                           lower advskew.  Disabled by
                                           default.

Here, 'virtual host' refers to the shared address associated with a VHID. This setting allows a host with multiple VHIDs to assume the master role on all VHIDs if just one of them fails over. See in carp(4):

     Assume that host A is the preferred master and we are running the
     192.168.1.0/24 prefix on em0 and 192.168.2.0/24 on em1.  This is the
     setup for host A (advskew is above 0 so it could be overwritten in the
     emergency situation from the other host):

           ifconfig em0 vhid 1 advskew 100 pass mekmitasdigoat 192.168.1.1/24
           ifconfig em1 vhid 2 advskew 100 pass mekmitasdigoat 192.168.2.1/24

     The setup for host B is identical, but it has a higher advskew:

           ifconfig em0 vhid 1 advskew 200 pass mekmitasdigoat 192.168.1.1/24
           ifconfig em1 vhid 2 advskew 200 pass mekmitasdigoat 192.168.2.1/24

     When one of the physical interfaces of host A fails, advskew is demoted
     to a configured value on all its carp vhids.  Due to the preempt option,
     host B would start announcing itself, and thus preempt host A on both
     interfaces instead of just the failed one.

It seems likely to me that in your RPI/ROCKPRO setup that your two hosts are not actually exchanging CARP messages correctly and both believe that they are the lowest advbase for the VHID.
Comment 2 Reid Linnemann 2023-11-21 23:17:16 UTC
Correction to my previous comment - both believe that have the lowest advskew, not advbase
Comment 3 ekoort 2024-02-24 06:39:43 UTC
As of today it works as expected - ip stays there where it is when 'main' comes back up. Don't know and can not explain why.
Comment 4 ekoort 2024-03-28 08:22:42 UTC
So today it did not work as expected. 
Main went down, secondary took over, main came up and instantly main took over while it (cluster services) should stay on secondary.
So it's a mixed results for unknown reason.