A user reported unsynchronised FIB regarding the RIB contents on its router. It is using FreeBSD head (7b8696bf128) and routing table filled by net/frr7 (BGP). # sysctl net.route.algo net.route.algo.debug_level: 5 net.route.algo.inet.algo: dpdk_lpm4 net.route.algo.inet.algo_list: dpdk_lpm4, bsearch4, radix4_lockless, radix4 net.route.algo.inet6.algo: dpdk_lpm6 net.route.algo.inet6.algo_list: dpdk_lpm6, radix6_lockless, radix6 net.route.algo.fib_max_sync_delay_ms: 1000 net.route.algo.bucket_change_threshold_rate: 500 net.route.algo.bucket_time_ms: 50 # netstat -4rnW | grep 51.15.0.0 51.15.0.0/17 149.6.174.241 UG1 8 1500 cxl0 51.15.0.0/16 149.6.174.241 UG1 8 1500 cxl0 # netstat -4onW Nexthop data Internet: Idx Type IFA Gateway Flags Use Mtu Netif Addrif Refcnt Prepend 1 v4/resolve 127.0.0.1 lo0/resolve H 76508 16384 lo0 2 2 v4/resolve 193.239.188.197 lo1/resolve H 0 16384 lo1 2 3 v4/resolve 149.6.174.242 cxl0/resolve 192653 1500 cxl0 3 4 v4/resolve 127.0.0.1 lo0/resolve HS 0 16384 lo0 cxl0 2 5 v4/gw 127.0.0.1 127.0.0.1 G1B 3546 16384 lo0 7 6 v4/resolve 193.239.188.38 igb0/resolve 2893089 1500 igb0 5 7 v4/resolve 127.0.0.1 lo0/resolve HS 0 16384 lo0 igb0 2 8 v4/gw 149.6.174.242 149.6.174.241 G1 176047 1500 cxl0 598127 9 v4/resolve 193.239.188.217 igb2/resolve 709843 9216 igb2 3 10 v4/resolve 127.0.0.1 lo0/resolve HS 0 16384 lo0 igb2 2 11 v4/resolve 193.239.188.215 igb3/resolve 708796 9216 igb3 3 12 v4/resolve 127.0.0.1 lo0/resolve HS 0 16384 lo0 igb3 2 13 v4/resolve 95.129.200.114 vlan960/resolve 2450 1500 vlan960 2 14 v4/resolve 127.0.0.1 lo0/resolve HS 0 16384 lo0 vlan960 2 15 v4/gw 95.129.200.114 95.129.200.123 G1 0 1500 vlan960 4 16 v4/gw 95.129.200.114 95.129.200.118 G1 0 1500 vlan960 5 17 v4/gw 95.129.200.114 95.129.200.124 G1 32 1500 vlan960 3 18 v4/gw 193.239.188.215 193.239.188.214 G1 54272925 9216 igb3 7 19 v4/gw 95.129.200.114 95.129.200.113 GH1 3 1500 vlan960 2 20 v4/gw 95.129.200.114 95.129.200.117 G1 95310 1500 vlan960 14 21 v4/gw 95.129.200.114 95.129.200.120 G1 0 1500 vlan960 8 22 v4/gw 95.129.200.114 95.129.200.117 GH1 0 1500 vlan960 7 23 v4/gw 95.129.200.114 95.129.200.125 G1 3613 1500 vlan960 35 24 v4/gw 95.129.200.114 95.129.200.119 G1 0 1500 vlan960 2 25 v4/gw 193.239.188.215 193.239.188.214 GH1 2662 9216 igb3 4 26 v4/gw 193.239.188.217 193.239.188.216 G1 98212 9216 igb2 4 27 v4/gw 193.239.188.217 193.239.188.216 GH1 1895 9216 igb2 5 28 v4/gw 193.239.188.38 193.239.188.34 G1 31251 1500 igb0 154517 29 v4/gw 193.239.188.38 193.239.188.36 G1 16214 1500 igb0 79933 30 v4/gw 149.6.174.242 149.6.174.241 G1 177 1500 cxl0 2 # traceroute 51.15.183.144 traceroute to 51.15.183.144 (51.15.183.144), 64 hops max, 40 byte packets 1 em2-910.panem.atnoc.net (193.239.188.36) 0.526 ms 0.255 ms 0.193 ms => The next-hop used here (193.239.188.36) is reachable via igb0 only, but the routing entry shows it should exit through the cxl0 interface. Switching the route lookup algo to radix4 fixed the problem.
It looks like a duplicate of already asignned bug 256834
*** Bug 256834 has been marked as a duplicate of this bug. ***
(In reply to Marek Zarychta from comment #1) Indeed, it was a double post (no idea how this could happen). So I've set the duplicate status on 256834 (the newest).
It's a bit tough to root cause what has happened here. Is there any chance you can load test_lookup kernel module and run `sysctl net.route.test.run_inet_scan=1` periodically? The module will scan the entire table and print all of the inconsistencies in dmesg.
I've reproduced the problem. The proposed fix: https://reviews.freebsd.org/D31546 Hopefully will land it in a couple of days.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=36e15b717eec80047fe7442898b5752101f2fbca commit 36e15b717eec80047fe7442898b5752101f2fbca Author: Alexander V. Chernikov <melifaro@FreeBSD.org> AuthorDate: 2021-08-15 22:25:21 +0000 Commit: Alexander V. Chernikov <melifaro@FreeBSD.org> CommitDate: 2021-08-17 20:46:22 +0000 routing: Fix crashes with dpdk_lpm[46] algo. When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know the nexthop of the "parent" prefix to update its internal state. The glue code, which utilises RIB as a backing route store, uses fib[46]_lookup_rt() for the prefix destination after its deletion to fetch the desired nexthop. This approach does not work when deleting less-specific prefixes with most-specific ones are still present. For example, if 10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting 10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search result instead of 10.0.0.0/22. This, in turn, results in the failed datastructure update: part of the deleted /23 prefix will still contain the reference to an old nexthop. This leads to the use-after-free behaviour, ending with the eventual crashes. Fix the logic flaw by properly fetching the prefix "parent" via newly-created rt_get_inet[6]_parent() helpers. Differential Revision: https://reviews.freebsd.org/D31546 PR: 256882,256833 MFC after: 1 week sys/contrib/dpdk_rte_lpm/dpdk_lpm.c | 32 ++++---- sys/contrib/dpdk_rte_lpm/dpdk_lpm6.c | 42 +++++----- sys/net/radix.c | 14 ++++ sys/net/radix.h | 1 + sys/net/route/route_ctl.h | 3 + sys/net/route/route_helpers.c | 150 +++++++++++++++++++++++++++++++++++ 6 files changed, 208 insertions(+), 34 deletions(-)