I have DELL R630 with ccX (Chelsio T62100-SO-CR) agreggated with lagg0 and vlans on it. I move all vlans to JAIL (VNET). There works bird which received full feed (900k prefixes). Sysctl for JAIL: net.route.algo.inet.algo="dxr" net.route.algo.inet6.algo="dpdk_lpm6" net.inet.ip.forwarding="1" net.inet6.ip6.forwarding="1" net.inet.ip.redirect="0" net.inet.udp.blackhole="1" net.inet.tcp.blackhole="2" net.inet.icmp.drop_redirect="1" After a few day crash has occurred: Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 02 Fatal trap 12: page fault while in kernel mode fault virtual address = 0x401050168 cpuid = 3; apic id = 06 fault virtual address = 0x0 Fatal trap 12: page fault while in kernel mode cpuid = 0; Fatal trap 12: page fault while in kernel mode apic id = 00 cpuid = 9; fault virtual address = 0x0 apic id = 12 fault code = supervisor read data, page not present fault virtual address = 0x0 instruction pointer = 0x20:0xffffffff80d88f94 fault code = supervisor read data, page not present stack pointer = 0x28:0xfffffe00ff3ed650 instruction pointer = 0x20:0xffffffff80d88f94 stack pointer = 0x28:0xfffffe00ff529650 Fatal trap 12: page fault while in kernel mode frame pointer = 0x28:0xfffffe00ff529650 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq147: t6nex1:0a9) trap number = 12 panic: page fault cpuid = 9 time = 1624957922 KDB: stack backtrace: #0 0xffffffff80cb0515 at kdb_backtrace+0x65 #1 0xffffffff80c643b1 at vpanic+0x181 #2 0xffffffff80c64223 at panic+0x43 #3 0xffffffff810ee277 at trap_fatal+0x387 #4 0xffffffff810ee2cf at trap_pfault+0x4f #5 0xffffffff810ed923 at trap+0x253 #6 0xffffffff810c53b8 at calltrap+0x8 #7 0xffffffff80e0ccf9 at ip_tryforward+0x6d9 #8 0xffffffff80e0f066 at ip_input+0x356 #9 0xffffffff80d993fa at netisr_dispatch_src+0xca #10 0xffffffff80d7d988 at ether_demux+0x148 #11 0xffffffff80d7ed0c at ether_nh_input+0x34c #12 0xffffffff80d993fa at netisr_dispatch_src+0xca #13 0xffffffff80d7ddd9 at ether_input+0x69 #14 0xffffffff80d7d971 at ether_demux+0x131 #15 0xffffffff80d7ed0c at ether_nh_input+0x34c #16 0xffffffff80d993fa at netisr_dispatch_src+0xca #17 0xffffffff80d7ddd9 at ether_input+0x69 Uptime: 3d9h43m58s Dumping 5134 out of 32631 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% I used dpdk_lpm4 before and crashed also happened: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x7a fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80df611e stack pointer = 0x0:0xfffffe00ff38d620 frame pointer = 0x0:0xfffffe00ff38d640 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq98: t6nex0:0a0) trap number = 12 panic: page fault cpuid = 0 time = 1624658764 KDB: stack backtrace: #0 0xffffffff80cab3c5 at kdb_backtrace+0x65 #1 0xffffffff80c5f231 at vpanic+0x181 #2 0xffffffff80c5f0a3 at panic+0x43 #3 0xffffffff810e2277 at trap_fatal+0x387 #4 0xffffffff810e22cf at trap_pfault+0x4f #5 0xffffffff810e1923 at trap+0x253 #6 0xffffffff810b8ce8 at calltrap+0x8 #7 0xffffffff80e038ee at ip_findroute+0x1e #8 0xffffffff80e03297 at ip_tryforward+0x247 #9 0xffffffff80e05a96 at ip_input+0x356 #10 0xffffffff80d8fe2a at netisr_dispatch_src+0xca #11 0xffffffff80d74628 at ether_demux+0x148 #12 0xffffffff80d759ac at ether_nh_input+0x34c #13 0xffffffff80d8fe2a at netisr_dispatch_src+0xca #14 0xffffffff80d74a79 at ether_input+0x69 #15 0xffffffff80d74611 at ether_demux+0x131 #16 0xffffffff80d759ac at ether_nh_input+0x34c #17 0xffffffff80d8fe2a at netisr_dispatch_src+0xca Uptime: 23h9m10s Dumping 4852 out of 32631 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% ------------------------------------------------------------------------
I also notice that before crash server route (affects several random prefixes) to wrong interfaces, for example: # route -vv -n get 8.8.8.8 RTA_DST: inet 8.8.8.8; RTA_IFP: link ; RTM_GET: Report Metrics: len 224, pid: 0, seq 1, errno 0, flags:<UP,GATEWAY,HOST,STATIC> locks: inits: sockaddrs: <DST,IFP> 8.8.8.8 link#0 route to: 8.8.8.8 destination: 8.8.8.0 mask: 255.255.255.0 gateway: 79.110.194.33 fib: 0 interface: vlan2804 flags: <UP,GATEWAY,DONE,PROTO1> recvpipe sendpipe ssthresh rtt,msec mtu weight expire 0 0 0 0 1500 1 0 locks: inits: sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA> 8.8.8.0 79.110.194.33 255.255.255.0 vlan2804:0.7.43.64.9e.60 79.110.194.34 I don't see packet to 8.8.8.8 using tcpdump -ni vlan2804, instead I counted them on another vlan. It happens over time, in the beginning everything looks good.
It looks like (In reply to Konrad from comment #1) >I also notice that before crash server route (affects several random prefixes) to wrong interfaces Please compare with similar bug 256833.
Do you have configured any non-default net.isr. settings in /boot/loader.conf ?
I have set: net.isr.maxqlimit="10240" net.link.ifqmaxlen="10240" net.isr.defaultqlimit="10240"
(In reply to Konrad from comment #4) Can you isolate as many configuration variables (lagg / vlan / algo / vnet / to establish a minimal reproduction case
Is there any chance you can try https://cgit.freebsd.org/src/commit/?id=054948bd81bb9e4e32449cf351b62e501b8831ff and check whether it fixes the crashes?
I applied the patch. I will send you feedback in a few days
Created attachment 226942 [details] core.txt after a few hours system crashed, I attache core.txt I have set: net.route.algo.inet.algo=dpdk_lpm4 net.route.algo.inet6.algo=dpdk_lpm6
Okay, so looks like the returned nexthop has NULL ifp, which really shouldn't happen. Is there any chance you can privately share the kernel & core?
I sent you an email with a link
I've reproduced the problem. The proposed fix: https://reviews.freebsd.org/D31546 Hopefully will land it in a couple of days.
(In reply to Alexander V. Chernikov from comment #11) Q: the first panic mentioned dxr as the IPv4 lookup algo. Was it indeed dxr at the moment of crash? I'm pretty sure that all aforementioned panics with dpdk_lpm4 should be fixes with the proposed patch, however it would be nice to have a bit more clarity on DXR.
Crash appeared on dxr and dpdk_lpm4. I've switched to DXR. So I will able to send crash files.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=36e15b717eec80047fe7442898b5752101f2fbca commit 36e15b717eec80047fe7442898b5752101f2fbca Author: Alexander V. Chernikov <melifaro@FreeBSD.org> AuthorDate: 2021-08-15 22:25:21 +0000 Commit: Alexander V. Chernikov <melifaro@FreeBSD.org> CommitDate: 2021-08-17 20:46:22 +0000 routing: Fix crashes with dpdk_lpm[46] algo. When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know the nexthop of the "parent" prefix to update its internal state. The glue code, which utilises RIB as a backing route store, uses fib[46]_lookup_rt() for the prefix destination after its deletion to fetch the desired nexthop. This approach does not work when deleting less-specific prefixes with most-specific ones are still present. For example, if 10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting 10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search result instead of 10.0.0.0/22. This, in turn, results in the failed datastructure update: part of the deleted /23 prefix will still contain the reference to an old nexthop. This leads to the use-after-free behaviour, ending with the eventual crashes. Fix the logic flaw by properly fetching the prefix "parent" via newly-created rt_get_inet[6]_parent() helpers. Differential Revision: https://reviews.freebsd.org/D31546 PR: 256882,256833 MFC after: 1 week sys/contrib/dpdk_rte_lpm/dpdk_lpm.c | 32 ++++---- sys/contrib/dpdk_rte_lpm/dpdk_lpm6.c | 42 +++++----- sys/net/radix.c | 14 ++++ sys/net/radix.h | 1 + sys/net/route/route_ctl.h | 3 + sys/net/route/route_helpers.c | 150 +++++++++++++++++++++++++++++++++++ 6 files changed, 208 insertions(+), 34 deletions(-)
DXR has been working for 5 days without crash. Currently I have set: net.route.algo.inet.algo: dxr net.route.algo.inet6.algo: radix6 Crash which I described in the first post was caused probably by dpdk_lpm6, I had set: net.route.algo.inet.algo: dxr net.route.algo.inet6.algo: dpdk_lpm6
(In reply to Alexander V. Chernikov from comment #11) I apply the patch. After switch algo to dpdk_lpm46 system crash immediately Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 04 fault virtual address = 0x18 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80db0f1d stack pointer = 0x28:0xfffffe0156c8e310 frame pointer = 0x28:0xfffffe0156c8e320 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12403 (bird) trap number = 12 panic: page fault cpuid = 2 time = 1630278779 KDB: stack backtrace: #0 0xffffffff80cb9475 at kdb_backtrace+0x65 #1 0xffffffff80c6b5e7 at vpanic+0x187 #2 0xffffffff80c6b453 at panic+0x43 #3 0xffffffff810f8577 at trap_fatal+0x387 #4 0xffffffff810f85cf at trap_pfault+0x4f #5 0xffffffff810f7c4a at trap+0x26a #6 0xffffffff810cf2e8 at calltrap+0x8 #7 0xffffffff80db360f at rt_get_inet6_parent+0x1af #8 0xffffffff82549315 at handle_rtable_change_cb+0xc5 #9 0xffffffff80daf61e at handle_rtable_change_cb+0x30e #10 0xffffffff80db27df at rt_unlinkrte+0x12f #11 0xffffffff80db187c at rib_del_route+0xbc #12 0xffffffff80db7991 at route_output+0x12d1 #13 0xffffffff80d0b923 at sosend_generic+0x623 #14 0xffffffff80d0bd90 at sosend+0x50 #15 0xffffffff80ce1be9 at soo_write+0x49 #16 0xffffffff80cd8b78 at dofilewrite+0x88 #17 0xffffffff80cd86cc at sys_write+0xbc Uptime: 1m39s Dumping 2863 out of 32631 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91% If you need additional files please let me know
Thank you for your patience! If you could share core & debug symbols that would be awesome!
(In reply to Alexander V. Chernikov from comment #17) Should be fixed by f84c30106e8b725774b4e9a32c8dd11c90da8c25 - haven't tested w/o the default route and missed the condition.
I apply the patches to testing enviroment, looks good. I will do it on target system and will send feedback.
I confirm, all your pathes resolved main problem. System has been working for 3 days without crash. Thank you!
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5007bc4e13906104163ca78440ffcefb5c126548 commit 5007bc4e13906104163ca78440ffcefb5c126548 Author: Alexander V. Chernikov <melifaro@FreeBSD.org> AuthorDate: 2021-08-15 22:25:21 +0000 Commit: Alexander V. Chernikov <melifaro@FreeBSD.org> CommitDate: 2021-09-07 21:02:58 +0000 routing: Fix crashes with dpdk_lpm[46] algo. When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know the nexthop of the "parent" prefix to update its internal state. The glue code, which utilises RIB as a backing route store, uses fib[46]_lookup_rt() for the prefix destination after its deletion to fetch the desired nexthop. This approach does not work when deleting less-specific prefixes with most-specific ones are still present. For example, if 10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting 10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search result instead of 10.0.0.0/22. This, in turn, results in the failed datastructure update: part of the deleted /23 prefix will still contain the reference to an old nexthop. This leads to the use-after-free behaviour, ending with the eventual crashes. Fix the logic flaw by properly fetching the prefix "parent" via newly-created rt_get_inet[6]_parent() helpers. Differential Revision: https://reviews.freebsd.org/D31546 PR: 256882,256833 (cherry picked from commit 36e15b717eec80047fe7442898b5752101f2fbca) sys/contrib/dpdk_rte_lpm/dpdk_lpm.c | 32 ++++---- sys/contrib/dpdk_rte_lpm/dpdk_lpm6.c | 42 +++++----- sys/net/radix.c | 14 ++++ sys/net/radix.h | 1 + sys/net/route/route_ctl.h | 3 + sys/net/route/route_helpers.c | 150 +++++++++++++++++++++++++++++++++++ 6 files changed, 208 insertions(+), 34 deletions(-)
(In reply to Konrad from comment #20) Has the system been stable since?
Yes, system has been working stable since then
(In reply to Konrad from comment #23) Great! I'm going to proceed with the case closure then.
^Triage: the 12 branch is now out of support.