Bug 213282

Summary: FreeBSD 10.2 / Carp / PfSync
Product: Base System Reporter: JeanAumont
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed Overcome By Events    
Severity: Affects Many People CC: kp, patfbsd
Priority: ---    
Version: 10.2-STABLE   
Hardware: Any   
OS: Any   

Description JeanAumont 2016-10-07 17:48:11 UTC
Hi,

I have a 2 FreeBSD 10.2 firewall in a MASTER / BACKUP configuration with around 20 interfaces.

All the interfaces on the MASTER have an ADVBASE of 2 and a ADVSKEW of 90.
All the interfaces on the BACKUP have an ADVBASE of 2 and a ADVSKEW of 100.

Carp preempt is enabled on both firewall.

There is a lot of traffic passing on those firewall, around 80000 connections.

Our MASTER firewall crash (due to a bad disk controller firmware).
The BACKUP firewall became the MASTER, and no traffic was lost.

But when the firewall that crash rebooted, it became the MASTER again, and this is when we lost some connections.

Is there a SYNC of the PF table between the firewall before a firewall became MASTER again?

By looking at the carp code rapidly, I did not see any thing regarding this situation.

Thanks,

Jean Aumont
Comment 1 JeanAumont 2016-10-07 19:07:46 UTC
After searching a little more, I see that there is a
sysctl variable "net.pfsync.carp_demotion_factor" that seem to 
be used to control carp and pfsync.

root:~ # sysctl -a | grep carp
device	carp
net.inet.carp.ifdown_demotion_factor: 240
net.inet.carp.senderr_demotion_factor: 240
net.inet.carp.demotion: 0
net.inet.carp.log: 1
net.inet.carp.preempt: 1
net.inet.carp.allow: 1
net.pfsync.carp_demotion_factor: 240

Both firewall are now in normal operation and both firewall have a 
net.pfsync.carp_demotion_factor: 240

Is this normal ?

Thanks,

Jean Aumont
Comment 2 patfbsd 2016-12-01 19:37:57 UTC
Hello,

I think your problem is that your advskew values are very near (80 and 90).

Use something like 0 (preferred master) and 200 (preferred backup)

When enabled, pfsync increments carp.demotion when starting the bulk synchronization of the states, but pfsync is started after carp and pf. So there is small lap of time between the two and your firewall can became master between the two.

Anyway the lap of time should be small and the backup should become master again until the bulk is finished.

Check the logs of carp in /var/log/message

It is hard to say if there is a bug or not, you can ask on the FreeBSD mailing lists for help.

Regards.
Comment 3 JeanAumont 2016-12-02 21:24:24 UTC
Hi,

In my opinion, the code should never let a firewall become MASTER if the state table (pfsync) has not finish being populated with all the states.

During the boot, the firewall should be in INIT mode and then become MASTER when the replication of the state is terminated.

It is clearly a bug and the fact that you have 2 MASTER at the same time will only cause problem to a lot of TCP sessions.

Currently the advskew of my firewall are 90 and 100. 
Will changing them to 0 and 200 made a differnce ? 
Does the firewall reads the content (advskew) of the CARP packet being received or does it used a timer to verify which one advertise more often?

 
This bug is affect the network traffic and should be look at.

Thanks,

Jean Aumont
Comment 4 Kristof Provost freebsd_committer freebsd_triage 2019-02-01 13:32:25 UTC
FreeBSD 10.2 is no longer supported.
If you can reproduce this problem on 12.0 or 11.2 please reopen this bug, ideally with a reproduction script attached.