Bug 229384 - Can't remove address from carp
Summary: Can't remove address from carp
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 11.2-RELEASE
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Luiz Otavio O Souza,+55 (14) 99772-1255
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-28 10:07 UTC by Kajetan Staszkiewicz
Modified: 2018-10-17 10:02 UTC (History)
5 users (show)

See Also:
koobs: mfc-stable11+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kajetan Staszkiewicz 2018-06-28 10:07:41 UTC
I have systems running FreeBSD 11.1 where I noticed that carp goes MASTER on both routers. Those systems get their carp addresses changed very often, they are Load Balancers used in testing environment where multiple changes happen every minute or so.

A few times I've seen both routers on MASTER state with same IP addresses configured, or at least shown in ifconfig. Today I've seen ifconfig clearly showing different IP addresses. Router 2 shows more of them than Router 1. When trying to add missing IP addresses to Rotuer 1 I got this error:

# sudo ifconfig internal4021 inet6 2a00:X::1407/128 alias vhid 211
ifconfig: ioctl (SIOCAIFADDR): File exists

Let's try to remove it then:
# sudo ifconfig internal4021 inet6 2a00:X::1407/128 -alias vhid 211
ifconfig: ioctl (SIOCDIFADDR): Can't assign requested address

I'm also wondering if we really need carps to compare configured addresses. In scenarios where routers are reconfigured often by some external tool there is a chance for a short period of config being different on master and slave routers which at this moment might cause them to go MASTER/MASTER conflict. It would be really useful to have this behaviour tunable.

I am unsure if the bug happens also on 11.2.
Comment 1 Kajetan Staszkiewicz 2018-07-31 13:43:51 UTC
The issue of conflicting carps happens on 11.2 too. At this very moment ifconfig shows correct addresses, though. Can I somehow see which addresses are used by carp to calculate checksum? I can dump memory of both routers.
Comment 2 Vinícius Zavam freebsd_committer freebsd_triage 2018-07-31 19:20:30 UTC
don't you need to set 'pass' together with the 'vhid' in order to proper setup carp? I am not even mentioning the 'advskew' stuff, but the password thing might do the trick for you.

I'm also considering that you enabled ipv6 properly, and it's working.
Comment 3 Kajetan Staszkiewicz 2018-07-31 22:17:36 UTC
Maybe I did not made myself clear. Carp is set up correctly and operates flawlessly until multiple addresses are added and removed in a short period of time during some testes which are run on those routers. Rebooting routers and having them configure the addresses once by just adding them after reboot makes things work fine again. The master/master behaviour is consistent with checksum issues, that is when there are different carp addresses configured on each router even when ifconfig shows identical addresses. A few times I was able to observe different addresses shown by ifconfig but I was unable to delete extra addresses due to errors as show here before.

Long story short: only stress-testing by adding and removing carp addresses breaks carp.

And no, it is not necessary to provide any other parameters when carp vhid is already configured. Each extra IP address can be added to existing vhid by just specifying vhid.
Comment 4 Vinícius Zavam freebsd_committer freebsd_triage 2018-08-01 06:45:40 UTC
(In reply to Kajetan Staszkiewicz from comment #3)

hey, thanks for the information and more details on this one. appreciated!

would you mind to share your setup a little bit more? like, rc.conf? you are running it on amd64/11.2-RELEASE, right? I would like to try reproducing this issue here. I see you are also using VLAN in your scenario, correct?

btw, what kind of switch are you using in between the boxes and what NIC are you using over there?
Comment 5 Vinícius Zavam freebsd_committer freebsd_triage 2018-08-01 14:10:12 UTC
FreeBSD 11.2-RELEASE #0 r335510: Fri Jun 22 04:32:14 UTC 2018 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 1102000 1102000

root@freebsd-11-2-a:~ # ifconfig | grep carp
        carp: MASTER vhid 123 advbase 1 advskew 100
        carp: MASTER vhid 234 advbase 1 advskew 100

root@freebsd-11-2-b:~ # ifconfig | grep carp
        carp: BACKUP vhid 123 advbase 1 advskew 200
        carp: BACKUP vhid 234 advbase 1 advskew 200

# cat /boot/loader.conf 
carp_load="YES"

# cat /etc/rc.conf
create_args_vlan999="vlan 999"
dumpdev="NO"
hostname="freebsd-11-2-a"
ifconfig_vlan999_ipv6="inet6 2001:db8:fb5d:999::1/64"
ifconfig_vlan999_alias0="inet6 2001:db8:fb5d:999::999/128 vhid 234 pass BSDfree advskew 100"
ifconfig_vtnet0="DHCP"
ifconfig_vtnet0_ipv6="inet6 2001:db8:fb5d:86::1/64"
ifconfig_vtnet0_alias0="inet6 2001:db8:fb5d:86::86/128 vhid 123 pass freeBSD advskew 100"
ipv6_activate_all_interfaces="YES"
sshd_enable="YES"
vlans_vtnet0="vlan999"

# cat /etc/rc.conf
create_args_vlan999="vlan 999"
dumpdev="NO"
hostname="freebsd-11-2-b"
ifconfig_vlan999_ipv6="inet6 2001:db8:fb5d:999::2/64"
ifconfig_vlan999_alias0="inet6 2001:db8:fb5d:999::999/128 vhid 234 pass BSDfree advskew 200"
ifconfig_vtnet0="DHCP"
ifconfig_vtnet0_ipv6="inet6 2001:db8:fb5d:86::2/64"
ifconfig_vtnet0_alias0="inet6 2001:db8:fb5d:86::86/128 vhid 123 pass freeBSD advskew 200"
ipv6_activate_all_interfaces="YES"
sshd_enable="YES"
vlans_vtnet0="vlan999"

# ifconfig vtnet0 inet6 2001:db8:fb5d:86::8686 prefixlen 128 vhid 123
# echo $?
0

# ifconfig vlan999 inet6 2001:db8:fb5d:999:0:9:9:9 prefixlen 128 vhid 234 alias
# echo $?
0

# ifconfig vtnet0 inet6 2001:db8:fb5d:86::8686 prefixlen 128 vhid 123 delete
# echo $?
0

# ifconfig vlan999 inet6 2001:db8:fb5d:999:0:9:9:9 prefixlen 128 vhid 234 -alias
# echo $?
0
Comment 6 Kajetan Staszkiewicz 2018-08-03 15:36:02 UTC
I don't use rc system to configure carps at all. They are done dynamically whenever required by new configuraion from some external system. I'm not runnig a clean 11.2-RELEASE. There are some custom patches for pf and backports from HEAD, please check here: https://github.com/innogames/freebsd/commits/iglb/11.2/SomethingCompletelyDifferent

I wrote this script in an attempt to provide carp misconfiguration:
---------- 8< ----------
#!/bin/sh

IFACE=internal4027
VHID=27
ADDR_BASE='2a00:f00:42:0000'


if [ -z "$1" ]; then
        for SUBNET in $(seq 10); do
                SUBNET=$(printf '%04x' $SUBNET)
                sh $(dirname $0)/break_carps.sh $SUBNET &
                sh $(dirname $0)/break_carps.sh $SUBNET &
        done
else
        echo "Spawned child for subnet $1"
        for i in $(seq 10); do
                ADDR_LAST=$(printf '%04x' $i)
                ADDR="$ADDR_BASE:$1::$ADDR_LAST"
                ifconfig $IFACE inet6 $ADDR/128 vhid $VHID alias
                sleep 0.5
                ifconfig $IFACE inet6 $ADDR/128 vhid $VHID -alias
        done
fi

wait

---------- >8 ----------

Even better than misconfiguration, I got this nice panic:

Fatal trap 12: page fault while in kernel mode

#13 0xffffffff8159a8d7 in carp_hmac_prepare (sc=0xfffff80008599000)
    at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/netinet/ip_carp.c:384
#14 0xffffffff8159a531 in carp_ioctl (ifr=<value optimized out>,
    cmd=<value optimized out>, td=<value optimized out>)
    at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/netinet/ip_carp.c:1698
#15 0xffffffff8072732f in ifioctl (so=0xfffff80011f61000, cmd=3223349749,
    data=0xfffffe0666e5e9d0 "internal4027", td=0xfffff80102a8b620)
    at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/net/if.c:3041
#16 0xffffffff8069736d in kern_ioctl (td=0xfffff80102a8b620, fd=3,
    com=3223349749, data=<value optimized out>) at file.h:323

Then I compiled kernel with INVARIANTS and INVARIANT_SUPPORT. I tried something even less ugly than the script above, that is I just added same IPv6 carp address twice (one which is *not* in any network available on iface). This resulted in:

panic: carp_attach: ifa 0xfffff80095390000 attached

#11 0xffffffff815b3675 in carp_attach (ifa=0xfffff80095390000, vhid=27)
    at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/netinet/ip_carp.c:1809
#12 0xffffffff807e8ff2 in in6_control (so=<value optimized out>,
    cmd=<value optimized out>, data=<value optimized out>,
    ifp=<value optimized out>, td=<value optimized out>)
    at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/netinet6/in6.c:572
#13 0xffffffff80713b19 in ifioctl (so=0xfffff80011eeb360, cmd=2156423451,
    data=<value optimized out>, td=0xfffff80095861620)
    at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/net/if.c:3071
#14 0xffffffff80688729 in kern_ioctl (td=0xfffff80095861620,
    fd=<value optimized out>, com=<value optimized out>,
    data=<value optimized out>) at file.h:323


While browsing kernel source on Github I've come accross 5bd3158a3b7184c65d3e1b6d96faf0dd720eb6ac (Which is in master branch but not in 11.2-RELEASE). This indeed solves issues on adding same CARP address twice, but the script above still kills system:

panic: Bad link elm 0xfffff800a44c3c00 prev->next != elm

#11 0xffffffff807eb33d in in6_unlink_ifa (ia=<value optimized out>,
    ifp=<value optimized out>) at fnv_hash.h:29
#12 0xffffffff807e8b2d in in6_control (so=<value optimized out>,
    cmd=2166384921, data=<value optimized out>, ifp=<value optimized out>,
    td=<value optimized out>)
    at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/netinet6/in6.c:701
#13 0xffffffff80713b19 in ifioctl (so=0xfffff80011ee6000, cmd=2166384921,
    data=<value optimized out>, td=0xfffff8002f61b620)
    at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/net/if.c:3071
#14 0xffffffff80688729 in kern_ioctl (td=0xfffff8002f61b620,
    fd=<value optimized out>, com=<value optimized out>,
    data=<value optimized out>) at file.h:323


I'll put kernel with 5bd3158a3b7184c65d3e1b6d96faf0dd720eb6ac on production and see if that solves issue reported in this ticket. But obviously there are more issues deeper.
Comment 7 Vinícius Zavam freebsd_committer freebsd_triage 2018-08-06 09:38:27 UTC
(In reply to Kajetan Staszkiewicz from comment #6)

avoiding the use of rc to setup carp, MFC/MFH direct to 'releng', running your custom base system and kernel (SomethingCompletelyDifferent)? I would be a bit skeptic, and say that this issue can be closed. right? it's neither related to 11.1- or 11.2-RELEASE.

did you try with a *clean* source from HEAD? did you try with any other STABLE branch (w/o modifying it)? does it happens when you run RELENG/11.2 with no patched or modified code?

btw, you do need to setup the 'pass' manually before trying to test anything related to carp as we can see on its manual page and ifconfig(8); no matter if you are using VLAN or not, the steps should be the same.
Comment 8 Kajetan Staszkiewicz 2018-08-07 10:41:52 UTC
I can test it again on clean kernel if you wish, but I will compile it with INVARIANTS nevertheless, as they make the issue pop up immediately.

When it comes to patches, the ones for carp system are from FreeBSD's HEAD. They are not my own. They *improve* things for FreeBSD 11.2 and but make it crash elsewhere afterwards.

And sorry, but I can't agree on RC system - issuing *standard* ifconfig commands, just a bit more often than usual must not crash the kernel.

And of course I have "pass" set.
Comment 9 Kajetan Staszkiewicz 2018-08-07 22:22:44 UTC
Non-patched 11.2-RELEASE-p1 kernel with only change being lines:
options         INVARIANTS
options         INVARIANT_SUPPORT
in kernel configuration file produces the same result and to cause it all which has to be done is to add the same carp address twice:

[22:14:37] kajetan-test-aw-3 ~/ # sudo kldload carp
[22:14:47] kajetan-test-aw-3 ~/ # sudo ifconfig vtnet0 vhid 254 pass randompass 
[22:14:57] kajetan-test-aw-3 ~/ # sudo ifconfig vtnet0 inet6 2a00:X:0001::1/128 vhid 254 alias
[22:15:22] kajetan-test-aw-3 ~/ # sudo ifconfig vtnet0 inet6 2a00:X:0001::1/128 vhid 254 alias


panic: carp_attach: ifa 0xfffff80003dd5a00 attached

#0  doadump (textdump=<value optimized out>) at pcpu.h:229
#1  0xffffffff80ac4b0c in kern_reboot (howto=260) at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/kern/kern_shutdown.c:383
#2  0xffffffff80ac5021 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/kern/kern_shutdown.c:776
#3  0xffffffff80ac4e00 in kassert_panic (fmt=<value optimized out>) at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/kern/kern_shutdown.c:666
#4  0xffffffff8223c655 in carp_attach (ifa=0xfffff80003dd5a00, vhid=127) at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/netinet/ip_carp.c:1806
#5  0xffffffff80cc0a72 in in6_control (so=<value optimized out>, cmd=<value optimized out>, data=<value optimized out>, ifp=<value optimized out>, td=<value optimized out>)
    at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/netinet6/in6.c:572
#6  0xffffffff80baea29 in ifioctl (so=0xfffff800058d1000, cmd=2156423451, data=<value optimized out>, td=0xfffff80005d17620) at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/net/if.c:3071
#7  0xffffffff80b261a9 in kern_ioctl (td=0xfffff80005d17620, fd=<value optimized out>, com=<value optimized out>, data=<value optimized out>) at file.h:323
#8  0xffffffff80b25e7c in sys_ioctl (td=0xfffff80005d17620, uap=0xfffff80005d17b58) at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/kern/sys_generic.c:745
#9  0xffffffff80f38509 in amd64_syscall (td=0xfffff80005d17620, traced=0) at subr_syscall.c:132
#10 0xffffffff80f1447d in fast_syscall_common () at /usr/home/kajetan.staszkiewicz/freebsd.git/sys/amd64/amd64/exception.S:479
#11 0x0000000800fddf7a in ?? ()

Now you have a nice issue for 11.2. I really don't feel like repeating the rest information I already gave.
Comment 10 Vinícius Zavam freebsd_committer freebsd_triage 2018-08-08 13:29:27 UTC
(In reply to Kajetan Staszkiewicz from comment #9)

good that you really took it personal (although that was not the intention). the idea was/is to have the closest scenario to reproduce any panic regarding the branch you reported as problematic, but not 'Something Completely Different'.

now that you reported using 11.2-RELEASE base and its GENERIC kernel with only these two extra options might help a little more others to reproduce and try a fix (if you do not have one already to suggest).

ty for your time and concern :)
Comment 11 Kajetan Staszkiewicz 2018-08-08 18:45:22 UTC
The fix is mentioned in my message dated 2018-08-03 15:36:02 UTC. Please stop fixating on the fact that I have patched kernel and start reading my messages *thoroughly* instead.

The issue seems not carp-specific anymore with some patches from HEAD applied. I'll search if there is a better issue already opened and if not, open another one.
Comment 12 Vinícius Zavam freebsd_committer freebsd_triage 2018-08-09 08:40:36 UTC
(In reply to Kajetan Staszkiewicz from comment #11)

as I said, the main idea was to get as closest as possible to what we get from 11.2-RELEASE in order to help you (and others). with all respect, there's no such a thing like 'fixating on the fact that I have patched kernel' - you are really getting things too much personal. and, again, as I said: that is not the/my intention. we should be at the same page here, please.

regarding the INVARIANT options, they sure might give people hints and help when it comes to debugging, but -AFAIK- they are not meant to run in production env like releng/11.2. their descriptions might confuse some.

when it comes to the recommended merge that you wrote about, at the very same text you mentioned that the system is passive to be "killed" (it really does not sound like fixing to me). sorry, but I did read that. BR,
Comment 13 Michael 2018-08-09 10:11:08 UTC
Hi,

I didn't follow the complete thread, but when I read "vtnet" devices, are you using some cloud vendor? There are vendors out there pointing out that HA failover protocols are not supported via their X-Stack or whatever.

I'm only a OPNsense/pfSense user and never had these issues on pyhsical hardware, can't imagine there is something with Vanilla 11.1.

Michael
Comment 14 Kajetan Staszkiewicz 2018-08-12 22:24:38 UTC
The issues described here happen both on VM and hardware. I switched to testing things on VMs because they reboot way faster. They do happen on hardware too.
Comment 15 Kajetan Staszkiewicz 2018-08-12 23:11:52 UTC
No patch I fetched from HEAD made 11.2 fully stable, so yes, none "sounds like fixing". But I think we are dealing here with two separate bugs.

One is really easy to trigger by just adding same IPv6 carp address twice. It is aready fixed in  https://github.com/freebsd/freebsd/commit/5bd3158a3b7184c65d3e1b6d96faf0dd720eb6ac and all I ask is that it gets included in 11 branch.

The other thing is when IP addresses are very fast added and removed, they don't even have to be CARP addresses. You know what, let's not get further into discussions here. I will investigate more and open another ticket for it.
Comment 16 commit-hook freebsd_committer freebsd_triage 2018-08-20 01:01:38 UTC
A commit references this bug:

Author: loos
Date: Mon Aug 20 01:01:34 UTC 2018
New revision: 338081
URL: https://svnweb.freebsd.org/changeset/base/338081

Log:
  MFC r312770 and r337854:

  After the in_control() changes in r257692, an existing address is
  (intentionally) deleted first and then completely added again (so all the
  events, announces and hooks are given a chance to run).

  This cause an issue with CARP where the existing CARP data structure is
  removed together with the last address for a given VHID, which will cause
  a subsequent fail when the address is later re-added.

  This change fixes this issue by adding a new flag to keep the CARP data
  structure when an address is not being removed.

  There was an additional issue with IPv6 CARP addresses, where the CARP data
  structure would never be removed after a change and lead to VHIDs which
  cannot be destroyed.

  PR:		229384
  Sponsored by:	Rubicon Communications, LLC (Netgate)

Changes:
_U  stable/11/
  stable/11/sys/net/if.c
  stable/11/sys/netinet/in.c
  stable/11/sys/netinet/ip_carp.c
  stable/11/sys/netinet/ip_carp.h
  stable/11/sys/netinet6/in6.c
Comment 17 Andres Montalban 2018-10-16 20:05:57 UTC
I just hinted this in 11.2-RELEASE but works in 10.4-RELEASE. Do you think this fix will reach 11.3-RELEASE and if so what's the timeline because this is blocking our upgrade to 11-RELEASE.

Thanks!
Comment 18 Steven Hartland freebsd_committer freebsd_triage 2018-10-16 22:06:31 UTC
As the fixes have already been merged to stable/11 they will be in 11.3-RELEASE.
Comment 19 Kubilay Kocak freebsd_committer freebsd_triage 2018-10-17 10:02:50 UTC
Assign to committer that resolved