Bug 233995 - Inconsistent arp handling in multiple fibs
Summary: Inconsistent arp handling in multiple fibs
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
Depends on:
Reported: 2018-12-13 21:46 UTC by Patrik Hildingsson
Modified: 2019-01-10 19:15 UTC (History)
2 users (show)

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Patrik Hildingsson 2018-12-13 21:46:36 UTC
uname -a; FreeBSD lrrr.guld.sen 12.0-RELEASE FreeBSD 12.0-RELEASE r341666 GENERIC  amd64
uname -U; 1200086
uname -K; 1200086

        laggproto lacp lagghash l2,l3,l4
        laggport: em0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        laggport: igb0 flags=1c<ACTIVE,COLLECTING,DISTRIBUTING>
        groups: lagg

fib0 contains
Destination        Gateway            Flags     Netif Expire
default         UGS       lagg0    link#4             U         lagg0     link#4             UHS         lo0

fib1 contains
Destination        Gateway            Flags     Netif Expire
default         UGS    lagg0.200    link#17            U      lagg0.200      tt:uu:vv:xx:yy:zz  UHS    lagg0.200

When attempting communication in fib 1 with remote hosts within the same routing domain , i.e. it fails and the following logentry is written:
Dec 13 21:00:03 lrrr kernel: arpresolve: can't allocate llinfo for on lagg0.200

When adding the following entry in fib0 the communication works.
setfib 0 route add -net -iface lagg0.200

In addition, when adding a static ARP entry the communication works.
setfib 1 arp -s ss:uu:vv:xx:yy:zz

It seems to me that ARP should work in either fib regardless of fib0 having the network in question. This issue seems to have existed before according to https://lists.freebsd.org/pipermail/freebsd-net/2012-May/032340.html
Comment 1 Patrik Hildingsson 2019-01-10 06:05:36 UTC
I have, since I filed the PR, abandoned the idea of having certain jails use an alternate routing table (FIB) and instead moved to using VIMAGE/VNET. The PR should, however, be reproducible on any system running the filed setup.
Comment 2 Marek Zarychta 2019-01-10 07:42:00 UTC
(In reply to Patrik Hildingsson from comment #1)
Could you please also post here the value of "sysctl net.add_addr_allfibs" setting?
Comment 3 Patrik Hildingsson 2019-01-10 15:53:04 UTC
(In reply to Marek Zarychta from comment #2)
/boot/loader.conf is set to net.add_addr_allfibs=0
Comment 4 Alan Somers freebsd_committer 2019-01-10 16:05:16 UTC
What command did you use when you attempted to communicate from fib 1?  If you didn't somehow change the process's fib to 1 (like with setfib(1)), then the command would use fib 0.
Comment 5 Patrik Hildingsson 2019-01-10 16:11:51 UTC
(In reply to Alan Somers from comment #4)
I associated a jail to fib 1, running a java app inside of the jail. Several other programs, such as telnet, were executed inside of the jail.

It is my understanding that once you are inside the jail, i.e. running a shell inside the jail fib 1 would be used regardless of running setfib fibno. prior to all commands. Executing traceroute inside the jail showed that traceroute used the correct default route in fib 1.
Comment 6 Alan Somers freebsd_committer 2019-01-10 16:42:51 UTC
The surest way to tell if you set the fib correctly is to do "ps -ax -O fib -O jid".
Comment 7 Marek Zarychta 2019-01-10 18:22:54 UTC
(In reply to Patrik Hildingsson from comment #3)
If net.add_addr_allfibs is set to 0, then IMHO everything works as intended. You have only to imprison lagg0.200 interface with the command "ifconfig  lagg0.200 fib 1" and set appropriate routes for fib 1.
You can, of course, add some routes with -iface option to reduce the impact of net.add_addr_allfibs=0 setting. 
I can't confirm any issues with IPv4/6 routing and arp/ndp resolution running fib based jails on 12.0-STABLE with net.add_addr_allfibs=0 set.
Comment 8 Patrik Hildingsson 2019-01-10 19:15:02 UTC
(In reply to Marek Zarychta from comment #7)
Imprisoning the interface to a certain fib is probably what I was lacking in the first place. I will test it on a test system later this month. Please go ahead and close this PR meanwhile.

Thank you all for engaging yourselves in the matter.