Summary: | dpdk_lpm4 seems to create unsynced RIB/FIB | ||
---|---|---|---|
Product: | Base System | Reporter: | Olivier Cochard <olivier> |
Component: | kern | Assignee: | Alexander V. Chernikov <melifaro> |
Status: | Closed FIXED | ||
Severity: | Affects Some People | CC: | konrad.kreciwilk, zarychtam |
Priority: | --- | ||
Version: | CURRENT | ||
Hardware: | Any | ||
OS: | Any | ||
See Also: | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=256882 |
Description
Olivier Cochard
2021-06-25 16:19:49 UTC
It looks like a duplicate of already asignned bug 256834 *** Bug 256834 has been marked as a duplicate of this bug. *** (In reply to Marek Zarychta from comment #1) Indeed, it was a double post (no idea how this could happen). So I've set the duplicate status on 256834 (the newest). It's a bit tough to root cause what has happened here. Is there any chance you can load test_lookup kernel module and run `sysctl net.route.test.run_inet_scan=1` periodically? The module will scan the entire table and print all of the inconsistencies in dmesg. I've reproduced the problem. The proposed fix: https://reviews.freebsd.org/D31546 Hopefully will land it in a couple of days. A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=36e15b717eec80047fe7442898b5752101f2fbca commit 36e15b717eec80047fe7442898b5752101f2fbca Author: Alexander V. Chernikov <melifaro@FreeBSD.org> AuthorDate: 2021-08-15 22:25:21 +0000 Commit: Alexander V. Chernikov <melifaro@FreeBSD.org> CommitDate: 2021-08-17 20:46:22 +0000 routing: Fix crashes with dpdk_lpm[46] algo. When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know the nexthop of the "parent" prefix to update its internal state. The glue code, which utilises RIB as a backing route store, uses fib[46]_lookup_rt() for the prefix destination after its deletion to fetch the desired nexthop. This approach does not work when deleting less-specific prefixes with most-specific ones are still present. For example, if 10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting 10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search result instead of 10.0.0.0/22. This, in turn, results in the failed datastructure update: part of the deleted /23 prefix will still contain the reference to an old nexthop. This leads to the use-after-free behaviour, ending with the eventual crashes. Fix the logic flaw by properly fetching the prefix "parent" via newly-created rt_get_inet[6]_parent() helpers. Differential Revision: https://reviews.freebsd.org/D31546 PR: 256882,256833 MFC after: 1 week sys/contrib/dpdk_rte_lpm/dpdk_lpm.c | 32 ++++---- sys/contrib/dpdk_rte_lpm/dpdk_lpm6.c | 42 +++++----- sys/net/radix.c | 14 ++++ sys/net/radix.h | 1 + sys/net/route/route_ctl.h | 3 + sys/net/route/route_helpers.c | 150 +++++++++++++++++++++++++++++++++++ 6 files changed, 208 insertions(+), 34 deletions(-) |