Bug 256833 - dpdk_lpm4 seems to create unsynced RIB/FIB
Summary: dpdk_lpm4 seems to create unsynced RIB/FIB
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Some People
Assignee: Alexander V. Chernikov
URL:
Keywords:
: 256834 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-06-25 16:19 UTC by Olivier Cochard
Modified: 2021-08-17 21:20 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Olivier Cochard freebsd_committer 2021-06-25 16:19:49 UTC
A user reported unsynchronised FIB regarding the RIB contents on its router.
It is using FreeBSD head (7b8696bf128) and routing table filled by net/frr7 (BGP).


# sysctl net.route.algo
net.route.algo.debug_level: 5
net.route.algo.inet.algo: dpdk_lpm4
net.route.algo.inet.algo_list: dpdk_lpm4, bsearch4, radix4_lockless,
radix4
net.route.algo.inet6.algo: dpdk_lpm6
net.route.algo.inet6.algo_list: dpdk_lpm6, radix6_lockless, radix6
net.route.algo.fib_max_sync_delay_ms: 1000
net.route.algo.bucket_change_threshold_rate: 500
net.route.algo.bucket_time_ms: 50

# netstat -4rnW | grep 51.15.0.0
51.15.0.0/17       149.6.174.241      UG1         8   1500       cxl0
51.15.0.0/16       149.6.174.241      UG1         8   1500       cxl0

# netstat -4onW
Nexthop data

Internet:
Idx   Type         IFA                Gateway             Flags      Use Mtu         Netif     Addrif Refcnt Prepend
1       v4/resolve 127.0.0.1          lo0/resolve        H         76508  16384        lo0               2
2       v4/resolve 193.239.188.197    lo1/resolve        H             0  16384        lo1               2
3       v4/resolve 149.6.174.242      cxl0/resolve                192653   1500       cxl0               3
4       v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0      cxl0     2
5            v4/gw 127.0.0.1          127.0.0.1          G1B        3546  16384        lo0               7
6       v4/resolve 193.239.188.38     igb0/resolve               2893089   1500       igb0               5
7       v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0      igb0     2
8            v4/gw 149.6.174.242      149.6.174.241      G1       176047   1500       cxl0          598127
9       v4/resolve 193.239.188.217    igb2/resolve                709843   9216       igb2               3
10      v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0      igb2     2
11      v4/resolve 193.239.188.215    igb3/resolve                708796   9216       igb3               3
12      v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0      igb3     2
13      v4/resolve 95.129.200.114     vlan960/resolve               2450   1500    vlan960               2
14      v4/resolve 127.0.0.1          lo0/resolve        HS            0  16384        lo0   vlan960     2
15           v4/gw 95.129.200.114     95.129.200.123     G1            0   1500    vlan960               4
16           v4/gw 95.129.200.114     95.129.200.118     G1            0   1500    vlan960               5
17           v4/gw 95.129.200.114     95.129.200.124     G1           32   1500    vlan960               3
18           v4/gw 193.239.188.215    193.239.188.214    G1     54272925   9216       igb3               7
19           v4/gw 95.129.200.114     95.129.200.113     GH1           3   1500    vlan960               2
20           v4/gw 95.129.200.114     95.129.200.117     G1        95310   1500    vlan960              14
21           v4/gw 95.129.200.114     95.129.200.120     G1            0   1500    vlan960               8
22           v4/gw 95.129.200.114     95.129.200.117     GH1           0   1500    vlan960               7
23           v4/gw 95.129.200.114     95.129.200.125     G1         3613   1500    vlan960              35
24           v4/gw 95.129.200.114     95.129.200.119     G1            0   1500    vlan960               2
25           v4/gw 193.239.188.215    193.239.188.214    GH1        2662   9216       igb3               4
26           v4/gw 193.239.188.217    193.239.188.216    G1        98212   9216       igb2               4
27           v4/gw 193.239.188.217    193.239.188.216    GH1        1895   9216       igb2               5
28           v4/gw 193.239.188.38     193.239.188.34     G1        31251   1500       igb0          154517
29           v4/gw 193.239.188.38     193.239.188.36     G1        16214   1500       igb0           79933
30           v4/gw 149.6.174.242      149.6.174.241      G1          177   1500       cxl0               2

# traceroute 51.15.183.144
traceroute to 51.15.183.144 (51.15.183.144), 64 hops max, 40 byte
packets
 1  em2-910.panem.atnoc.net (193.239.188.36)  0.526 ms  0.255 ms  0.193
ms

=> The next-hop used here (193.239.188.36) is reachable via igb0 only, but the routing entry shows it should exit through the cxl0 interface.

Switching the route lookup algo to radix4 fixed the problem.
Comment 1 Marek Zarychta 2021-06-25 16:58:32 UTC
It looks like a duplicate of already asignned bug 256834
Comment 2 Olivier Cochard freebsd_committer 2021-06-25 17:00:41 UTC
*** Bug 256834 has been marked as a duplicate of this bug. ***
Comment 3 Olivier Cochard freebsd_committer 2021-06-25 17:01:48 UTC
(In reply to Marek Zarychta from comment #1)
Indeed, it was a double post (no idea how this could happen).
So I've set the duplicate status on 256834 (the newest).
Comment 4 Alexander V. Chernikov freebsd_committer 2021-08-01 11:21:30 UTC
It's a bit tough to root cause what has happened here.

Is there any chance you can load test_lookup kernel module and run `sysctl net.route.test.run_inet_scan=1` periodically? The module will scan the entire table and print all of the inconsistencies in dmesg.
Comment 5 Alexander V. Chernikov freebsd_committer 2021-08-15 22:45:33 UTC
I've reproduced the problem.

The proposed fix: https://reviews.freebsd.org/D31546
Hopefully will land it in a couple of days.
Comment 6 Alexander V. Chernikov freebsd_committer 2021-08-17 21:20:28 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=36e15b717eec80047fe7442898b5752101f2fbca

commit 36e15b717eec80047fe7442898b5752101f2fbca
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-08-15 22:25:21 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-08-17 20:46:22 +0000

    routing: Fix crashes with dpdk_lpm[46] algo.

    When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know
     the nexthop of the "parent" prefix to update its internal state.
    The glue code, which utilises RIB as a backing route store, uses
     fib[46]_lookup_rt() for the prefix destination after its deletion
     to fetch the desired nexthop.
    This approach does not work when deleting less-specific prefixes
     with most-specific ones are still present. For example, if
     10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting
     10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search
     result instead of 10.0.0.0/22. This, in turn, results in the failed
     datastructure update: part of the deleted /23 prefix will still
     contain the reference to an old nexthop. This leads to the
     use-after-free behaviour, ending with the eventual crashes.

    Fix the logic flaw by properly fetching the prefix "parent" via
     newly-created rt_get_inet[6]_parent() helpers.

    Differential Revision: https://reviews.freebsd.org/D31546
    PR:     256882,256833
    MFC after:      1 week

 sys/contrib/dpdk_rte_lpm/dpdk_lpm.c  |  32 ++++----
 sys/contrib/dpdk_rte_lpm/dpdk_lpm6.c |  42 +++++-----
 sys/net/radix.c                      |  14 ++++
 sys/net/radix.h                      |   1 +
 sys/net/route/route_ctl.h            |   3 +
 sys/net/route/route_helpers.c        | 150 +++++++++++++++++++++++++++++++++++
 6 files changed, 208 insertions(+), 34 deletions(-)