Bug 191832

Summary: [carp] carp breaks the network
Product: Base System Reporter: Steven Hartland <smh>
Component: kernAssignee: Steven Hartland <smh>
Status: Closed FIXED    
Severity: Affects Many People CC: glebius
Priority: ---    
Version: 10.0-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Shows CARP MAC address conflict in progress none

Description Steven Hartland freebsd_committer freebsd_triage 2014-07-12 02:43:32 UTC
When creating a carp initialises we're seeing it break the existing mast.

The reason being it adds the IP to the physical interface before it configures the custom MAC. This results in network devices e.g. cisco switches learning the wrong MAC for the address.

Once this happens the BACKUP sees traffic instead of the MASTER.

Having a look through seems to confirm the ordering of IP -> MAC assignment could well be incorrect.

This is a major issue when using CARP as it means is highly unreliable :(
Comment 1 Steven Hartland freebsd_committer freebsd_triage 2014-07-12 12:32:13 UTC
The problem occurs when we reboot one of the machines which have jails with
supporting carp IP's.

An example jail.conf entry:-
== machine01 ==
test01 {
    host.hostname = "test01a";
    ip4.addr = "10.10.10.5";
    ip4.addr += "10.10.10.11";
    ip4.addr += "10.10.10.12";
    exec.prestart += "/sbin/ifconfig igb0 vhid 1 pass testpass alias 10.10.10.11/32";
    exec.prestart += "/sbin/ifconfig igb0 vhid 2 pass testpass alias 10.10.10.12/32";
}

== machine02 ==
test01 {
    host.hostname = "test01b";
    ip4.addr = "10.10.10.6";
    ip4.addr += "10.10.10.11";
    ip4.addr += "10.10.10.12";
    exec.prestart += "/sbin/ifconfig igb0 vhid 1 pass testpass advskew 100 alias 10.10.10.11/32";
    exec.prestart += "/sbin/ifconfig igb0 vhid 2 pass testpass advskew 100 alias 10.10.10.12/32";
}

On reboot the machine02 the machines will complain about their IP's in use e.g.
Jul 12 01:12:50 machine01 kernel: Trying to mount root from zfs:tank/root []...
Jul 12 01:12:51 machine01 ntpd[1136]: ntpd 4.2.4p5-a (1)
Jul 12 01:12:51 machine01 kernel: .
Jul 12 01:12:53 machine01 kernel: 
Jul 12 01:12:53 machine01 kernel: arp: 00:00:5e:00:01:02 is using my IP address 10.10.10.12 on igb0!
Jul 12 01:12:53 machine01 kernel: igb0: promiscuous mode enabled
Jul 12 01:12:53 machine01 kernel: carp: VHID 1@igb0: INIT -> BACKUP
Jul 12 01:12:54 machine01 kernel: arp: 00:00:5e:00:01:01 is using my IP address 10.10.10.11 on igb0!
-----------
Jul 12 01:12:53 machine02 kernel: arp: 10.10.10.10 moved from 00:00:5e:00:01:01 to 00:25:90:79:67:9a on igb0

In our particular case we have 6 carp interfaces on each machine, but I don't
believe that's a factor.

The machines are both connected to Cisco 6509 routers and when this happens
the Ciscos end up with an ARP entry for the carp IP's pointing to the physical
nic MAC instead of the CARP MAC e.g.
> sh ip arp 10.10.10.11
> Protocol  Address          Age (min)  Hardware Addr   Type   Interface
> Internet  10.10.10.11           78   0025.9079.679a   ARPA   Vlan10

We also have the following settings in sysctl.conf:
net.inet.carp.preempt=1
net.inet.carp.senderr_demotion_factor=0

The first setting is as we want the main master to stay master if its running.

The second setting is for when we've used CARP on top of LAGG to prevent CARP
breaking while LAGG negotiates, after which it will never recover. This however
is not the case here as these machines aren't using LAGG.
Comment 2 Steven Hartland freebsd_committer freebsd_triage 2014-07-13 14:31:21 UTC
I'm not really familiar with the network code flow but tracing through from the arp "is using my IP address" warnings I'm wondering if the issue is a race condition in sys/netinet/in.c:in_ifinit where it adds the address to ia->ia_addr.sin_addr.s_addr before it calls carp attach.

Does this mean its possible for the address to respond before it knows its a carp address and hence the problem?

Also on the machines we're seeing the issue they are hosting very busy sites on the carp addresses so this could be a requirement for reproduction.
Comment 3 Steven Hartland freebsd_committer freebsd_triage 2014-07-31 10:09:55 UTC
Created attachment 145182 [details]
Shows CARP MAC address conflict in progress

After adding a DELAY between adding the address to ia_hash and calling carp_attach_p in in_ifinit I've confirmed that we do indeed have a race condition between the address being available and it being attached to carp.

This means that ARP requests for the IP can result in a response using the interface MAC instead of the CARP MAC.

When this happens communication to the machines participating in the CARP are disrupted with packets destined for the already running MASTER being sent to the initialising BACKUP.

This can be clearly seen in the attached trace.

It can also be seen via ifconfig

After add to ia_hash but before CARP attach:
igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        ether f0:4d:a2:75:41:5a
        inet 10.10.1.240 netmask 0xffffff00 broadcast 10.10.1.255 
        inet6 fe80::f24d:a2ff:fe75:415a%igb0 prefixlen 64 scopeid 0x1 
        inet 10.10.1.241 netmask 0xffffffff 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        carp: INIT vhid 1 advbase 1 advskew 0
----
After CARP attach completes
igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
        ether f0:4d:a2:75:41:5a
        inet 10.10.1.240 netmask 0xffffff00 broadcast 10.10.1.255 
        inet6 fe80::f24d:a2ff:fe75:415a%igb0 prefixlen 64 scopeid 0x1 
        inet 10.10.1.241 netmask 0xffffffff broadcast 10.10.1.241 vhid 1 
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        carp: MASTER vhid 1 advbase 1 advskew 0
Comment 4 Steven Hartland freebsd_committer freebsd_triage 2014-08-01 11:57:47 UTC
Fix for this has been committed to head as r269340
http://svnweb.freebsd.org/changeset/base/269340

This currently requires gleb's in_control rewrite as without it panic's so and MFC of this would require all of those dependencies.

Given how this essentially breaks CARP, this should seriously be considered.
Comment 5 Steven Hartland freebsd_committer freebsd_triage 2014-08-01 17:16:37 UTC
In addition to the race condition between IP allocation and CARP attachment it turns out that while jail exec.prestart runs before the prison is created it runs after any IP alias creation.

This explains why we're seeing gratuitous arp happening for the IP from with interface MAC instead of the CARP MAC.

Changes to jail are hence required to allow CARP backed IP's to be used in jails.
Comment 6 Steven Hartland freebsd_committer freebsd_triage 2014-08-04 10:45:53 UTC
Changes required for jails to properly support CARP is being reviewed here:
https://phabric.freebsd.org/D528
Comment 7 commit-hook freebsd_committer freebsd_triage 2014-08-04 16:32:44 UTC
A commit references this bug:

Author: smh
Date: Mon Aug  4 16:32:09 UTC 2014
New revision: 269522
URL: http://svnweb.freebsd.org/changeset/base/269522

Log:
  Added support for extra ifconfig args to jail ip4.addr & ip6.addr params

  This allows for CARP interfaces to be  used in jails e.g.
  ip4.addr = "em0|10.10.1.20/32 vhid 1 pass MyPass advskew 100"

  Before this change using exec.prestart to configure a CARP address
  would result in the wrong MAC being broadcast on startup as jail creates
  IP aliases to support ip[4|6].addr before exec.prestart is executed.

  PR:		191832
  Reviewed by:	jamie
  MFC after:	1 week
  X-MFC-With:	r269340
  Phabric:	D528
  Sponsored by:	Multiplay

Changes:
  head/usr.sbin/jail/command.c
  head/usr.sbin/jail/config.c
  head/usr.sbin/jail/jail.8