Created attachment 175654 [details] dmesg # steps FreeBSD 11.0Rp1 amd64 - dmesg attached - ifconfig (IPs masked) igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 78:45:c4:fa:d2:12 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active igb1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 78:45:c4:fa:d2:12 nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL> media: Ethernet autoselect (1000baseT <full-duplex>) status: active lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384 options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6> inet6 ::1 prefixlen 128 inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3 inet 127.0.0.1 netmask 0xff000000 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> groups: lo lagg0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500 options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6> ether 78:45:c4:fa:d2:12 inet 10.0.9.83 netmask 0xfffffff0 broadcast 10.0.9.95 inet 10.0.9.84 netmask 0xffffffff broadcast 10.0.9.84 vhid 1 inet 10.0.9.85 netmask 0xffffffff broadcast 10.0.9.85 vhid 3 inet6 fe80::7a45:c4ff:fefa:d212%lagg0 prefixlen 64 scopeid 0x4 inet6 3000:3050:3000:4::83 prefixlen 64 inet6 3000:3050:3000:4::84 prefixlen 64 vhid 2 inet6 3000:3050:3000:4::85 prefixlen 64 vhid 4 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: Ethernet autoselect status: active carp: BACKUP vhid 1 advbase 1 advskew 100 carp: BACKUP vhid 3 advbase 1 advskew 0 carp: BACKUP vhid 2 advbase 1 advskew 100 carp: BACKUP vhid 4 advbase 1 advskew 0 groups: lagg laggproto lacp lagghash l2,l3,l4 laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> laggport: igb1 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING> issue `service netif restart` This was initially done via net/mosh connection and tmux inside that, but repeated again with direct console access (KVM remote mgmt tool). ## actual results the system hangs, 100% reproducible. - no keyboard entry - no ability to Alt-F3 to switch tabs - no ping over network - a hard reboot is required to regain control - final message in log appears to be Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN ### console Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_stop Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lo0: 48 ### /var/log/messages Oct 12 08:00:00 bridget newsyslog[1525]: logfile turned over due to size>100K Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_stop Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lo0: 48 Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_gateway_enable is set to NO. Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget kernel: carp: 2@lagg0: BACKUP -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget kernel: carp: 4@lagg0: MASTER -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget last message repeated 3 times Oct 12 08:01:21 bridget kernel: carp: 1@lagg0: BACKUP -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget last message repeated 2 times Oct 12 08:01:21 bridget kernel: carp: 3@lagg0: MASTER -> INIT (hardware interface up) Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode disabled Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode disabled Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode disabled Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: The following interfaces were not configured: Oct 12 08:01:21 bridget kernel: ifa_maintain_loopback_route: deletion failed for interface lagg0: 3 Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed wlan(4)s: Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: cloned_interfaces_sticky is set to NO. Oct 12 08:01:21 bridget kernel: lagg0: link state changed to DOWN Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Destroyed clones: lagg0 Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable is set to YES. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_start Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Created wlan(4)s: Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: Cloned: lagg0 Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command: start_precmd: checkauto Oct 12 08:01:21 bridget root: /etc/pccard_ether: DEBUG: run_rc_command: doit: pccard_ether_start Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: netif_enable is set to YES. Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: run_rc_command: doit: netif_start lagg0 Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Created wlan(4)s: Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: Cloned: Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:21 bridget kernel: lagg0: link state changed to UP Oct 12 08:01:21 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_gateway_enable is set to NO. Oct 12 08:01:21 bridget kernel: igb0: promiscuous mode enabled Oct 12 08:01:21 bridget kernel: igb1: promiscuous mode enabled Oct 12 08:01:21 bridget kernel: lagg0: promiscuous mode enabled Oct 12 08:01:21 bridget kernel: igb0: link state changed to DOWN Oct 12 08:01:21 bridget kernel: carp: 1@lagg0: INIT -> BACKUP (initialization complete) Oct 12 08:01:21 bridget kernel: carp: 3@lagg0: INIT -> BACKUP (initialization complete) Oct 12 08:01:21 bridget kernel: carp: 2@lagg0: INIT -> BACKUP (initialization complete) Oct 12 08:01:21 bridget kernel: carp: 4@lagg0: INIT -> BACKUP (initialization complete) Oct 12 08:01:21 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:22 bridget dch: /etc/rc.d/netif: DEBUG: checkyesno: ipv6_activate_all_interfaces is set to NO. Oct 12 08:01:22 bridget kernel: igb1: link state changed to DOWN Oct 12 08:01:22 bridget kernel: carp: 1@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 240 (interface down) Oct 12 08:01:22 bridget kernel: carp: 3@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 480 (interface down) Oct 12 08:01:22 bridget kernel: carp: 2@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 720 (interface down) Oct 12 08:01:22 bridget kernel: carp: 4@lagg0: BACKUP -> INIT (hardware interface down) Oct 12 08:01:22 bridget kernel: carp: demoted by 240 to 960 (interface down) Oct 12 08:01:22 bridget kernel: lagg0: link state changed to DOWN Oct 12 08:01:24 bridget root: /etc/rc.d/netif: DEBUG: checkyesno: rc_startmsgs is set to YES. # expected results after a short period of downtime, the network is re-established. # notes if carp config is disabled, and system is rebooted, this functions as expected. # config ``` # /etc/rc.conf on 1st node hostname="one.my.domain" ifconfig_igb0="up" ifconfig_igb1="up" cloned_interfaces="lagg0" ifconfig_lagg0="inet 10.0.9.82 netmask 255.255.255.240 laggproto lacp laggport igb0 laggport igb1" ifconfig_lagg0_ipv6="inet6 3000:3050:3000:4::82/64" # ifconfig_lo1="inet 10.0.0.254 netmask 255.255.255.0" defaultrouter="10.0.9.81" ipv6_defaultrouter="3000:3050:3000:4::1" # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable dumpdev="AUTO" zfs_enable="YES" # carp on kld_list="carp" ifconfig_lagg0_aliases="\ inet vhid 1 advskew 0 pass pwd1 10.0.9.84/32 \ inet6 vhid 2 advskew 0 pass pwd2 3000:3050:3000:4::84/64 \ inet vhid 3 advskew 100 pass pwd3 10.0.9.85/32 \ inet6 vhid 4 advskew 100 pass pwd4 3000:3050:3000:4::85/64" # debugging rc.d scripts rc_debug="YES" rc_startmsgs="YES" ``` ``` # /etc/rc.conf on 2nd node hostname="two.my.domain" ifconfig_igb0="up" ifconfig_igb1="up" cloned_interfaces="lagg0" ifconfig_lagg0="inet 10.0.9.83 netmask 255.255.255.240 laggproto lacp laggport igb0 laggport igb1" ifconfig_lagg0_ipv6="inet6 3000:3050:3000:4::83/64" defaultrouter="10.0.9.81" ipv6_defaultrouter="3000:3050:3000:4::1" # Set dumpdev to "AUTO" to enable crash dumps, "NO" to disable dumpdev="AUTO" zfs_enable="YES" # carp on kld_list="carp" ifconfig_lagg0_aliases="\ inet vhid 1 advskew 100 pass pwd1 10.0.9.84/32 \ inet6 vhid 2 advskew 100 pass pwd2 3000:3050:3000:4::84/64 \ inet vhid 3 advskew 0 pass pwd3 10.0.9.85/32 \ inet6 vhid 4 advskew 0 pass pwd4 3000:3050:3000:4::85/64" # debugging rc.d scripts rc_debug="YES" rc_startmsgs="YES" ``` ``` # /boot/loader.conf /boot/loader.conf # storage # zfs won't start mounting volumes without this zfs_load="YES" kern.geom.label.gptid.enable="0" # hardware coretemp_load="YES" # console # ensure console in IPMI mode remains accessible instead of going all white hw.vga.textmode=1 # bhyve and jails vmm_load="YES" nmdm_load="YES" if_bridge_load="YES" if_tap_load="YES" kern.racct.enable=1 # debug super powers dtraceall_load="YES" # runtime # maxfiles kern.maxfiles="25000" # network # fibs # https://blog.feld.me/posts/2015/06/routing-a-freebsd-jail-through-openvpn/ # https://www.freebsd.org/cgi/man.cgi?query=setfib net.fibs=2 # from https://calomel.org/freebsd_network_tuning.html accf_data_load="YES" accf_dns_load="YES" autoboot_delay="3" ahci_load="YES" aio_load="YES" cc_htcp_load="YES" net.tcp.hostcache.cachelimit="0" ``` ``` # /etc/sysctl.conf # carp tweaks net.inet.carp.preempt=1 ```
I’ve had a very quick look, and at first glance it seems like an overly strict KASSERT() more than anything else. Basically, during service netif restart the scripts try to set up carp on an address that’s already got it configured. That runs into the assert and panics the box (or actually panics later on if INVARIANTS is not set). Simply replacing the KASSERT with a check (and returning errors) prevents the panic. I don’t have a carp test setup, but this should make things a lot better already. Can you check if this works for you? diff --git a/sys/netinet/ip_carp.c b/sys/netinet/ip_carp.c index 7855af2..ea27f0a 100644 --- a/sys/netinet/ip_carp.c +++ b/sys/netinet/ip_carp.c @@ -1804,7 +1804,8 @@ carp_attach(struct ifaddr *ifa, int vhid) struct carp_softc *sc; int index, error; - KASSERT(ifa->ifa_carp == NULL, ("%s: ifa %p attached", __func__, ifa)); + if (ifa->ifa_carp != NULL) + return (EBUSY); switch (ifa->ifa_addr->sa_family) { #ifdef INET
@Kristov Could you include the patch in comment 1 as an attachment please
Created attachment 177414 [details] Change assert into check
^Triage: clear the now obsolete 'patch' keyword. To submitter: is this aging PR still relevant?