Created attachment 240523 [details] snapshot of wireshart screen When a CARP interface is failing over from Master to Backup, it responds to GARP announcement of the incoming-master with "Duplice use of <virtual ip> detected" GARP packet. In the attached snapshot, the machine with physical MAC VMware_a7:e3:41 is the incoming-master node and the machine with physical MAC VMware_a7:0f:7f is the going-to-be-backup node. I observe that when the going-to-be-backup node responds i see that the Sendor MAC address is the physical MAC address and the Sender IP Address is the Virtual IP address.
This issue affects the learning of Cisco ACI devices.
It is not easy to analyze by only the screen snapshot, can you please provide tcpdump captures? > This issue affects the learning of Cisco ACI devices. Not quite understand how it affects Cisco ACI. What do you expect ?
Details: Machine 1: Physical MAC: 00:50:56:a7:0f:7f IP Address: 10.10.4.17 Machine 2: Physical MAC: 00:50:56:a7:e3:41 IP Address: 10.10.4.18 CARP: Virtual MAC: 00:00:5e:00:01:01 Virtual IP: 10.10.4.19 Steps followed: 1. Configure CARP on Machine 1. ifconfig nic0 vhid 1 pass testing alias 10.10.4.19/28 advskew 10 This box becomes the MASTER 2. Configure CARP on Machine 2. ifconfig nic0 vhid 1 pass testing alias 10.10.4.19/28 advskew 20 This box becomes the BACKUP 3. Re-configure CARP on Machine 1, to trigger a failover. ifconfig nic0 vhid 1 pass testing alias 10.10.4.19/28 advskew 30 Since now the advskew value of Machine 1 is higher than the Machine 2's value, Machine 1 will become the BACKUP and Machine 2 will become the MASTER. Observation / Failure. At step 3, the moment Machine 2 becomes the MASTER, it makes the ARP announcement. To this announcement when the Machine 1, who is in BACKUP state, which is supposed to be quiet, responds with "Duplicate use of <ip> detected" GARP message. Interestingly at this point, the Source MAC address is the physical MAC address and the Source IP address is the Virtual IP address. Please find the attached tcpdump files captured from both the machines. Due to this error, the CISCO ACI endpoint table messed up and is routing traffic to the wrong device.
Created attachment 240549 [details] tcpdump files from Machine1
Created attachment 240550 [details] tcpdump files from Machine2
There appears to be another race condition and could be similar to one of the earlier issues fixed. https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191832
https://lists.freebsd.org/pipermail/freebsd-net/2006-November/012476.html This list seems to be talking about the same problem.
(In reply to franklin.suvi@gmail.com from comment #3) > Steps followed: > 1. Configure CARP on Machine 1. > ifconfig nic0 vhid 1 pass testing alias 10.10.4.19/28 advskew 10 > This box becomes the MASTER > 2. Configure CARP on Machine 2. > ifconfig nic0 vhid 1 pass testing alias 10.10.4.19/28 advskew 20 > This box becomes the BACKUP > 3. Re-configure CARP on Machine 1, to trigger a failover. > ifconfig nic0 vhid 1 pass testing alias 10.10.4.19/28 advskew 30 > Since now the advskew value of Machine 1 is higher than the Machine 2's value, > Machine 1 will become the BACKUP and Machine 2 will become the MASTER. I'm able to repeat this on 13.1-RELEASE.
Any updates on the fix ?
(In reply to franklin.suvi@gmail.com from comment #9) Sorry I'm busy working on some bugs related to VLAN PCP . I'll re-check this PR this weekend.
While testing carp, I see multiple issues. The fix will not come immediately, so I'd like to propose you do the following to see if it helps. 1. While in the example of the man doc, host A and B are set different advskew, I recommend against and set advskew to a same one. So you can change vhid state on either host. 2. The preferred way to make a host master or backup is `ifconfig nic0 vhid 1 state master` or `ifconfig nic0 vhid 1 state backup`. 3. For aliases, the recommend prefixlen / netmask is 32 / 255.255.255.255 4. If `CISCO ACI endpoint table messed up`, can you try setting only virtual IP 10.10.4.19 on both hosts and see whether it helps or not ? i.e., only `ifconfig nic0 vhid 1 advskew 20 10.10.4.19/28`. Apparently the fourth suggestion has drawbacks and you lost the ability to reach exact host via host IP (not the virtual one).