Bug 246629 - Multicast stack problem - MRT_ADD_VIF Address already in use
Summary: Multicast stack problem - MRT_ADD_VIF Address already in use
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: Bjoern A. Zeeb
URL:
Keywords: regression
: 248512 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-05-21 11:35 UTC by Ozkan KIRIK
Modified: 2020-09-07 09:59 UTC (History)
7 users (show)

See Also:
louis.freebsd: maintainer-feedback+
bz: mfc-stable12?
bz: mfc-stable11-
bz: mfc-stable10-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ozkan KIRIK 2020-05-21 11:35:20 UTC
Hello, 


I'm using FreeBSD 12.1-STABLE. 

# uname -a
FreeBSD test.test.com 12.1-STABLE FreeBSD 12.1-STABLE #8 1f999e39f46(v2)-dirty: Wed Apr 22 08:40:36 +03 2020    test@test.test.com:/usr/obj/usr/src/amd64.amd64/sys/amd64

After restarting multicast routing daemon, daemon can't start with this error: "Failed adding VIF 1 (MRT_ADD_VIF) for iface em1: Address already in use". pimd tries to disable all vifs from kernel (shown in output below) but still throws the same error.

I tried with both mrouted and pimd. Both of them shows same behavior.
My opinion is kernel doesn't disable VIFs. 

I opened a bug report for netstat -g also ( bug #246626 ), I think there is 
corruption about the multicast stack in the kernel.

There is no problem on FreeBSD 11.2.

To reproduce error:
# pimd
# killall pimd
# pimd -d -s debug
debug level 0xffffffff (dvmrp_detail,dvmrp_prunes,dvmrp_routes,dvmrp_neighbors,dvmrp_timers,igmp_proto,igmp_timers,igmp_members,trace,timeout,packets,interfaces,kernel,cache,rsrr,pim_detail,pim_hello,pim_register,pim_join_prune,pim_bootstrap,pim_asserts,pim_cand_rp,pim_routes,pim_timers,pim_rpf)
11:52:23.035 pimd version 2.3.2 starting ...
11:52:23.035 Got 262144 byte send buffer size in 0 iterations
11:52:23.035 Got 262144 byte recv buffer size in 0 iterations
11:52:23.035 Got 262144 byte send buffer size in 0 iterations
11:52:23.035 Got 262144 byte recv buffer size in 0 iterations
11:52:23.035 Getting vifs from kernel
11:52:23.035 Installing em0 (10.2.4.20 on subnet 10.2.4/24) as vif #0 - rate 0
11:52:23.035 Installing em1 (192.168.58.1 on subnet 192.168.58) as vif #1 - rate 0
11:52:23.035 Installing em2 (192.168.59.1 on subnet 192.168.59) as vif #2 - rate 0
11:52:23.035 Installing em1.1600 (192.168.16.1 on subnet 192.168.16) as vif #3 - rate 0
11:52:23.035 Installing em1.1700 (192.168.17.1 on subnet 192.168.17) as vif #4 - rate 0
11:52:23.035 Disabling all vifs from kernel
11:52:23.035 Getting vifs from /usr/local/etc//pimd.conf
11:52:23.035 Local Cand-BSR address 192.168.59.1, priority 5
11:52:23.035 Local Cand-RP address 192.168.59.1, priority 20, interval 30 sec
11:52:23.035 spt-threshold packets 0 interval 100
11:52:23.035 Local static RP: 169.254.0.1, group 232.0.0.0/8
11:52:23.035 IGMP query interval  : 12 sec
11:52:23.035 IGMP querier timeout : 41 sec
11:52:23.035 **Failed adding VIF 1 (MRT_ADD_VIF) for iface em1: Address already in use**
Comment 1 Ozkan KIRIK 2020-05-23 06:43:23 UTC
There is no problem FreeBSD 12.1-p5

Latest SVN base/stable/12 kernel produces this problem.
Comment 2 Ozkan KIRIK 2020-05-23 11:58:10 UTC
Problem started after this commit: 
https://svnweb.freebsd.org/base?view=revision&revision=356621
Comment 3 Mark Linimon freebsd_committer freebsd_triage 2020-05-24 00:04:46 UTC
Notify committer of r356621.
Comment 4 Louis 2020-06-14 17:07:53 UTC
Hello,

Many people are using pfSense and "some of them" try to get multicast working using IMCP-proxy or PIMD. That simply does not work.

I put a lot of effort getting multicast to work and it is simply not possible. Reading this bug report, I am 99% sure that this is one of the underlaying issues. That is also the verdict of jimp (pfSense Lead Designer)

So please fix it with high priority!

During PIMD startup I notice:
1) after processing the first few (vlan)interfaces something goes wrong.   
Jun 10 11:53:50 pfSense kernel: vlan4: changing name to 'em0.4'
Jun 10 11:53:49 pfSense kernel: vlan3: changing name to 'em0.6'
Jun 10 11:53:47 pfSense sshd[11688]: Server listening on 0.0.0.0 port 22.
Jun 10 11:53:47 pfSense sshd[11688]: Server listening on :: port 22.
>>> after starting PIMD the three vlan’s below provide a VIF the ones above DO NOT, so some thing wrong here ! <<
Jun 10 11:53:47 pfSense kernel: vlan2: changing name to 'lagg0.13'
Jun 10 11:53:47 pfSense kernel: vlan1: changing name to 'lagg0.26'
Jun 10 11:53:47 pfSense kernel: vlan0: changing name to 'lagg0.10'
 
2) Related things are going wrong with PIMD
Jun 10 11:54:11 pfSense pimd[52647]: Getting vifs from /var/etc/pimd/pimd.conf
Jun 10 11:54:11 pfSense pimd[52647]: Disabling all vifs from kernel
Jun 10 11:54:11 pfSense pimd[52368]: Disabling all vifs from kernel
>> see here that only from three vlan's there are vifs, exactly the ones from the beginning of the kernel startup and not the once starting a bit later <<
Jun 10 11:54:11 pfSense pimd[52647]: Installing lagg0.13 (192.168.13.1 on subnet 192.168.13) as vif #2 - rate 0
Jun 10 11:54:11 pfSense pimd[52368]: Installing lagg0.13 (192.168.13.1 on subnet 192.168.13) as vif #2 - rate 0
Jun 10 11:54:11 pfSense pimd[52368]: Installing lagg0.26 (192.168.2.1 on subnet 192.168.2) as vif #1 - rate 0
Jun 10 11:54:11 pfSense pimd[52647]: Installing lagg0.26 (192.168.2.1 on subnet 192.168.2) as vif #1 - rate 0
Jun 10 11:54:11 pfSense pimd[52368]: Installing lagg0.10 (192.168.10.1 on subnet 192.168.10) as vif #0 - rate 0
Jun 10 11:54:11 pfSense pimd[52647]: Installing lagg0.10 (192.168.10.1 on subnet 192.168.10) as vif #0 - rate 0

3) then there is a problem recognising interfaces
Jun 10 11:54:11 pfSense pimd[52647]: /var/etc/pimd/pimd.conf:12 - Invalid phyint address 'ix1.116'
Jun 10 11:54:11 pfSense pimd[52647]: /var/etc/pimd/pimd.conf:11 - Invalid phyint address 'lagg0.16'
Jun 10 11:54:11 pfSense pimd[52647]: /var/etc/pimd/pimd.conf:10 - Invalid phyint address 'ix0.14'

4) and the result is
Jun 10 11:54:11 pfSense pimd[52368]: Cannot forward: no enabled vifs


As allready indicated, I expect that this is all related to this bug.

Hope to see it fixed soon. Importance is defenitively not correct!!

Sincerely


Louis
Comment 5 Joel S 2020-06-15 13:31:51 UTC
Created this account simply to mention this is also causing me some issues as well. Been trying to get my multicast-traffic to work correctly on pfsense 2.5.0 snapshot for the last month without luck.

Their related issue can be found here: https://redmine.pfsense.org/issues/7727

Any ideas would be appreciated. I also have the mentioned errors found in these bug reports. Just wanted to advise and be able to keep an eye on any possible patches.
Comment 6 Bjoern A. Zeeb freebsd_committer 2020-06-16 17:13:08 UTC
Sorry, I didn't notice the CC: on the bug.  Let me it (for now) and have a look.
Comment 7 Joel S 2020-06-16 17:15:46 UTC
(In reply to Bjoern A. Zeeb from comment #6)

Understood, it's no problem at all :)
We appreciate you being able to take some time into it for us!
Comment 8 Bjoern A. Zeeb freebsd_committer 2020-06-16 17:28:17 UTC
I quick initial guess (if anyone can test this before me) is this one line change (should also apply to HEAD):

Index: sys/netinet/ip_mroute.c
===================================================================
--- sys/netinet/ip_mroute.c     (revision 362232)
+++ sys/netinet/ip_mroute.c     (working copy)
@@ -739,7 +739,7 @@ X_ip_mrouter_done(void)
            if_allmulti(ifp, 0);
        }
     }
-    bzero((caddr_t)V_viftable, sizeof(V_viftable));
+    bzero((caddr_t)V_viftable, sizeof(V_viftable) * MAXVIFS);
     V_numvifs = 0;
     V_pim_assert_enabled = 0;
Comment 9 Bjoern A. Zeeb freebsd_committer 2020-06-17 17:00:02 UTC
(In reply to Bjoern A. Zeeb from comment #8)

The patch works; also hit an epoch panic while testing on HEAD.  I'll commit after review and merge to stable/12 a few days after.

Sorry for the breakage.

/bz
Comment 10 Joel S 2020-06-17 17:44:19 UTC
(In reply to Bjoern A. Zeeb from comment #9)

Your the man. Thanks a million for fixing this for us.
We understand we all make mistakes, we are all only human. But we all appreciate you taking the time to review and fix this for us, it makes the world of difference for us.
Comment 11 commit-hook freebsd_committer 2020-06-17 21:05:24 UTC
A commit references this bug:

Author: bz
Date: Wed Jun 17 21:04:39 UTC 2020
New revision: 362289
URL: https://svnweb.freebsd.org/changeset/base/362289

Log:
  When converting the static arrays to mallocarray() in r356621 I missed
  one place where we now need to multiply the size of the struct with the
  number of entries.  This lead to problems when restarting user space
  daemons, as the cleanup was never properly done, resulting in MRT_ADD_VIF
  EADDRINUSE.
  Properly zero all array elements to avoid this problem.

  PR:		246629, 206583
  Reported by:	(many)
  MFC after:	4 days
  Sponsored by:	Rubicon Communications, LLC (d/b/a "Netgate")

Changes:
  head/sys/netinet/ip_mroute.c
Comment 12 Bjoern A. Zeeb freebsd_committer 2020-06-17 21:11:19 UTC
Sorry for the breakage guys.


In case I need to reproduce this again:

(rc.conf)
vlans_igb0="vlan100 vlan101"
create_args_vlan100="vlan 100"
create_args_vlan101="vlan 101"
ifconfig_vlan100="192.0.2.1/24"
ifconfig_vlan101="203.0.113.1/24"

igmpproxy (install pkg) config (x.conf):
quickleave
phyint vlan100 upstream ratelimit 0 threshold 1
altnet 192.0.2.0/24

phyint vlan101 downstream ratelimit 0 threshold 1
altnet 203.0.113.0/24

phyint igb0 disabled
phyint igb1 disabled


(commands)
kldload ip_mroute
igmpproxy -dvvvvvvvv x.conf
wait briefly, ^c and restart again:
igmpproxy -dvvvvvvvv x.conf
Comment 13 commit-hook freebsd_committer 2020-06-21 11:49:45 UTC
A commit references this bug:

Author: bz
Date: Sun Jun 21 11:48:55 UTC 2020
New revision: 362465
URL: https://svnweb.freebsd.org/changeset/base/362465

Log:
  MFC r362289:

    When converting the static arrays to mallocarray() in r356621 I missed
    one place where we now need to multiply the size of the struct with the
    number of entries.  This lead to problems when restarting user space
    daemons, as the cleanup was never properly done, resulting in MRT_ADD_VIF
    EADDRINUSE.
    Properly zero all array elements to avoid this problem.

  PR:		246629, 206583

Changes:
_U  stable/12/
  stable/12/sys/netinet/ip_mroute.c
Comment 14 Bjoern A. Zeeb freebsd_committer 2020-06-21 11:55:09 UTC
Should all be fine again;  the next snapshots, release or if you rebuild stable/12 after this should work again as expected.   Sorry one more time for the breakage and not immediately noticing this PR and thanks to the pfsense people for pointing me at it.
Comment 15 Louis 2020-06-21 13:56:11 UTC
Hello,

Hello thanks again! I also notid that there was still a problem. Hopefully solved now.

However one question. As you can see in the bootlog I added 14/6 there are also messages like:
Jun 10 11:54:11 pfSense pimd[52647]: /var/etc/pimd/pimd.conf:12 - Invalid phyint 

I wonder if these errors are related to this issue and also solved now, or that they are related to another issue/bug?  

Sincerely,


Loui
Comment 16 Bjoern A. Zeeb freebsd_committer 2020-06-21 16:28:27 UTC
(In reply to Louis from comment #15)

Sorry, Louis.  I cannot say.  Do you have the pimd.conf and the related rc.conf snippets or at least an ifconfig -a or ifconfig -l from a started system as the names pimd complains about the and vlans created on the lagg interfaces and the physical interfaces you pasted in do not seem to relate to each other and sadly your comments don't make full sense to me out of context.

It might be wise to take this offline with me if you want and we can open a different bug report if we think this is a different FreeBSD issue.
Comment 17 Louis 2020-06-21 18:44:21 UTC
Hello, 

I support you remark below, however I do not know how to do that :) 

"It might be wise to take this offline with me if you want and we can open a different bug report if we think this is a different FreeBSD issue."

Further on
- My pfSense system has 9 vlans only the first 3 seems to be recognised correctly

- I did collect the info you where requesting and put that in a text file, however I do not know how to attach a file to this bugreport (please let me know)

- I do hope that your latest patch will arrive in the pfSense snapshots, so that I can test it. I hope and assume soon (days)

- My fieling is that there is prehaps more than one bug. And we have a small timeslot now and momentum now in which things can be fixed. If we do not take that I am afraid that pfSense will not support multicast for at least many other months. So if there is more than one bug we should identify that as quickly as possible!

Related bug report for pfSense is kown as
https://redmine.pfsense.org/issues/10558#change-46850

Louis
Comment 18 commit-hook freebsd_committer 2020-06-21 22:09:51 UTC
A commit references this bug:

Author: bz
Date: Sun Jun 21 22:09:30 UTC 2020
New revision: 362472
URL: https://svnweb.freebsd.org/changeset/base/362472

Log:
  Rather than zeroing MAXVIFS times size of pointer [r362289] (still better than
  sizeof pointer before [r354857]), we need to zero MAXVIFS times the size of
  the struct.  All good things come in threes; I hope this is it on this one.

  PR:		246629, 206583
  Reported by:	kib
  MFC after:	ASAP

Changes:
  head/sys/netinet/ip_mroute.c
Comment 19 Bjoern A. Zeeb freebsd_committer 2020-06-21 22:12:29 UTC
More eyes more fixes..
Comment 20 commit-hook freebsd_committer 2020-06-22 10:53:16 UTC
A commit references this bug:

Author: bz
Date: Mon Jun 22 10:52:31 UTC 2020
New revision: 362494
URL: https://svnweb.freebsd.org/changeset/base/362494

Log:
  MFC r362472:

    Rather than zeroing MAXVIFS times size of pointer [r362289] (still better than
    sizeof pointer before [r354857]), we need to zero MAXVIFS times the size of
    the struct.  All good things come in threes; I hope this is it on this one.

  PR:		246629, 206583
  Reported by:	kib

Changes:
  stable/12/sys/netinet/ip_mroute.c
Comment 21 Bjoern A. Zeeb freebsd_committer 2020-08-07 13:05:59 UTC
*** Bug 248512 has been marked as a duplicate of this bug. ***