Bug 256882 - cc(4): Panic on DELL R630 with Chelsio T62100-SO-CR) lagg0 and vlans in VNET jails (VNET): ip_tryforward / ip_findroute
Summary: cc(4): Panic on DELL R630 with Chelsio T62100-SO-CR) lagg0 and vlans in VNET ...
Status: In Progress
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Alexander V. Chernikov
URL: https://reviews.freebsd.org/D31546
Keywords: crash, needs-qa
Depends on:
Blocks:
 
Reported: 2021-06-29 10:31 UTC by Konrad
Modified: 2021-09-07 21:14 UTC (History)
7 users (show)

See Also:
konrad.kreciwilk: maintainer-feedback+
koobs: mfc-stable13?
koobs: mfc-stable12?


Attachments
core.txt (170.76 KB, text/plain)
2021-08-04 10:33 UTC, Konrad
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Konrad 2021-06-29 10:31:42 UTC
I have DELL R630 with ccX (Chelsio T62100-SO-CR) agreggated with lagg0 and vlans on it. I move all vlans to JAIL (VNET). There works bird which received full feed (900k prefixes). Sysctl for JAIL:

net.route.algo.inet.algo="dxr"
net.route.algo.inet6.algo="dpdk_lpm6"
net.inet.ip.forwarding="1"
net.inet6.ip6.forwarding="1"
net.inet.ip.redirect="0"
net.inet.udp.blackhole="1"
net.inet.tcp.blackhole="2"
net.inet.icmp.drop_redirect="1"


After a few day crash has occurred:


Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02


Fatal trap 12: page fault while in kernel mode

fault virtual address	= 0x401050168
cpuid = 3; apic id = 06



fault virtual address	= 0x0
Fatal trap 12: page fault while in kernel mode

cpuid = 0; Fatal trap 12: page fault while in kernel mode
apic id = 00
cpuid = 9; fault virtual address	= 0x0
apic id = 12
fault code		= supervisor read data, page not present
fault virtual address	= 0x0
instruction pointer	= 0x20:0xffffffff80d88f94
fault code		= supervisor read data, page not present
stack pointer	        = 0x28:0xfffffe00ff3ed650

instruction pointer	= 0x20:0xffffffff80d88f94

stack pointer	        = 0x28:0xfffffe00ff529650
Fatal trap 12: page fault while in kernel mode
frame pointer	        = 0x28:0xfffffe00ff529650
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 12 (irq147: t6nex1:0a9)
trap number		= 12
panic: page fault
cpuid = 9
time = 1624957922
KDB: stack backtrace:
#0 0xffffffff80cb0515 at kdb_backtrace+0x65
#1 0xffffffff80c643b1 at vpanic+0x181
#2 0xffffffff80c64223 at panic+0x43
#3 0xffffffff810ee277 at trap_fatal+0x387
#4 0xffffffff810ee2cf at trap_pfault+0x4f
#5 0xffffffff810ed923 at trap+0x253
#6 0xffffffff810c53b8 at calltrap+0x8
#7 0xffffffff80e0ccf9 at ip_tryforward+0x6d9
#8 0xffffffff80e0f066 at ip_input+0x356
#9 0xffffffff80d993fa at netisr_dispatch_src+0xca
#10 0xffffffff80d7d988 at ether_demux+0x148
#11 0xffffffff80d7ed0c at ether_nh_input+0x34c
#12 0xffffffff80d993fa at netisr_dispatch_src+0xca
#13 0xffffffff80d7ddd9 at ether_input+0x69
#14 0xffffffff80d7d971 at ether_demux+0x131
#15 0xffffffff80d7ed0c at ether_nh_input+0x34c
#16 0xffffffff80d993fa at netisr_dispatch_src+0xca
#17 0xffffffff80d7ddd9 at ether_input+0x69
Uptime: 3d9h43m58s
Dumping 5134 out of 32631 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%



I used dpdk_lpm4 before and crashed also happened:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0x7a
fault code		= supervisor read data, page not present
instruction pointer	= 0x20:0xffffffff80df611e
stack pointer	        = 0x0:0xfffffe00ff38d620
frame pointer	        = 0x0:0xfffffe00ff38d640
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 12 (irq98: t6nex0:0a0)
trap number		= 12
panic: page fault
cpuid = 0
time = 1624658764
KDB: stack backtrace:
#0 0xffffffff80cab3c5 at kdb_backtrace+0x65
#1 0xffffffff80c5f231 at vpanic+0x181
#2 0xffffffff80c5f0a3 at panic+0x43
#3 0xffffffff810e2277 at trap_fatal+0x387
#4 0xffffffff810e22cf at trap_pfault+0x4f
#5 0xffffffff810e1923 at trap+0x253
#6 0xffffffff810b8ce8 at calltrap+0x8
#7 0xffffffff80e038ee at ip_findroute+0x1e
#8 0xffffffff80e03297 at ip_tryforward+0x247
#9 0xffffffff80e05a96 at ip_input+0x356
#10 0xffffffff80d8fe2a at netisr_dispatch_src+0xca
#11 0xffffffff80d74628 at ether_demux+0x148
#12 0xffffffff80d759ac at ether_nh_input+0x34c
#13 0xffffffff80d8fe2a at netisr_dispatch_src+0xca
#14 0xffffffff80d74a79 at ether_input+0x69
#15 0xffffffff80d74611 at ether_demux+0x131
#16 0xffffffff80d759ac at ether_nh_input+0x34c
#17 0xffffffff80d8fe2a at netisr_dispatch_src+0xca
Uptime: 23h9m10s
Dumping 4852 out of 32631 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

------------------------------------------------------------------------
Comment 1 Konrad 2021-06-29 15:31:47 UTC
I also notice that before crash server route (affects several random prefixes) to wrong interfaces, for example: 

# route -vv -n get 8.8.8.8
RTA_DST: inet 8.8.8.8; RTA_IFP: link ; RTM_GET: Report Metrics: len 224, pid: 0, seq 1, errno 0, flags:<UP,GATEWAY,HOST,STATIC>
locks:  inits: 
sockaddrs: <DST,IFP>
 8.8.8.8 link#0
   route to: 8.8.8.8
destination: 8.8.8.0
       mask: 255.255.255.0
    gateway: 79.110.194.33
        fib: 0
  interface: vlan2804
      flags: <UP,GATEWAY,DONE,PROTO1>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         0 

locks:  inits: 
sockaddrs: <DST,GATEWAY,NETMASK,IFP,IFA>
 8.8.8.0 79.110.194.33 255.255.255.0 vlan2804:0.7.43.64.9e.60 79.110.194.34


I don't see packet to 8.8.8.8 using tcpdump -ni vlan2804, instead I counted them on another vlan. It happens over time, in the beginning everything looks good.
Comment 2 Marek Zarychta 2021-06-29 15:35:09 UTC
It looks like (In reply to Konrad from comment #1)
>I also notice that before crash server route (affects several random prefixes) to wrong interfaces

Please compare with similar bug 256833.
Comment 3 Marek Zarychta 2021-06-29 15:56:49 UTC
Do you have configured any non-default net.isr. settings in /boot/loader.conf ?
Comment 4 Konrad 2021-06-29 17:01:38 UTC
I have set:

net.isr.maxqlimit="10240"
net.link.ifqmaxlen="10240"
net.isr.defaultqlimit="10240"
Comment 5 Kubilay Kocak freebsd_committer freebsd_triage 2021-06-29 23:25:49 UTC
(In reply to Konrad from comment #4)

Can you isolate as many configuration variables (lagg / vlan / algo / vnet / to establish a minimal reproduction case
Comment 6 Alexander V. Chernikov freebsd_committer 2021-08-01 11:23:49 UTC
Is there any chance you can try https://cgit.freebsd.org/src/commit/?id=054948bd81bb9e4e32449cf351b62e501b8831ff and check whether it fixes the crashes?
Comment 7 Konrad 2021-08-03 13:10:45 UTC
I applied the patch. I will send you feedback in a few days
Comment 8 Konrad 2021-08-04 10:33:21 UTC
Created attachment 226942 [details]
core.txt

after a few hours system crashed, I attache core.txt
I have set:
net.route.algo.inet.algo=dpdk_lpm4
net.route.algo.inet6.algo=dpdk_lpm6
Comment 9 Alexander V. Chernikov freebsd_committer 2021-08-07 12:50:22 UTC
Okay, so looks like the returned nexthop has NULL ifp, which really shouldn't happen.
Is there any chance you can privately share the kernel & core?
Comment 10 Konrad 2021-08-09 09:58:25 UTC
I sent you an email with a link
Comment 11 Alexander V. Chernikov freebsd_committer 2021-08-15 22:40:51 UTC
I've reproduced the problem.

The proposed fix: https://reviews.freebsd.org/D31546
Hopefully will land it in a couple of days.
Comment 12 Alexander V. Chernikov freebsd_committer 2021-08-15 22:44:54 UTC
(In reply to Alexander V. Chernikov from comment #11)
Q: the first panic mentioned dxr as the IPv4 lookup algo.
Was it indeed dxr at the moment of crash?
I'm pretty sure that all aforementioned panics with dpdk_lpm4 should be fixes with the proposed patch, however it would be nice to have a bit more clarity on DXR.
Comment 13 Konrad 2021-08-16 09:29:46 UTC
Crash appeared on dxr and dpdk_lpm4. 
I've switched to DXR. So I will able to send crash files.
Comment 14 commit-hook freebsd_committer 2021-08-17 21:12:32 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=36e15b717eec80047fe7442898b5752101f2fbca

commit 36e15b717eec80047fe7442898b5752101f2fbca
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-08-15 22:25:21 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-08-17 20:46:22 +0000

    routing: Fix crashes with dpdk_lpm[46] algo.

    When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know
     the nexthop of the "parent" prefix to update its internal state.
    The glue code, which utilises RIB as a backing route store, uses
     fib[46]_lookup_rt() for the prefix destination after its deletion
     to fetch the desired nexthop.
    This approach does not work when deleting less-specific prefixes
     with most-specific ones are still present. For example, if
     10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting
     10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search
     result instead of 10.0.0.0/22. This, in turn, results in the failed
     datastructure update: part of the deleted /23 prefix will still
     contain the reference to an old nexthop. This leads to the
     use-after-free behaviour, ending with the eventual crashes.

    Fix the logic flaw by properly fetching the prefix "parent" via
     newly-created rt_get_inet[6]_parent() helpers.

    Differential Revision: https://reviews.freebsd.org/D31546
    PR:     256882,256833
    MFC after:      1 week

 sys/contrib/dpdk_rte_lpm/dpdk_lpm.c  |  32 ++++----
 sys/contrib/dpdk_rte_lpm/dpdk_lpm6.c |  42 +++++-----
 sys/net/radix.c                      |  14 ++++
 sys/net/radix.h                      |   1 +
 sys/net/route/route_ctl.h            |   3 +
 sys/net/route/route_helpers.c        | 150 +++++++++++++++++++++++++++++++++++
 6 files changed, 208 insertions(+), 34 deletions(-)
Comment 15 Konrad 2021-08-20 09:59:08 UTC
DXR has been working for 5 days without crash. Currently I have set:

net.route.algo.inet.algo: dxr
net.route.algo.inet6.algo: radix6

Crash which I described in the first post was caused probably by dpdk_lpm6, I had set: 

net.route.algo.inet.algo: dxr
net.route.algo.inet6.algo: dpdk_lpm6
Comment 16 Konrad 2021-08-29 23:26:59 UTC
(In reply to Alexander V. Chernikov from comment #11)
I apply the patch. After switch algo to dpdk_lpm46 system crash immediately

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address   = 0x18
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80db0f1d
stack pointer           = 0x28:0xfffffe0156c8e310
frame pointer           = 0x28:0xfffffe0156c8e320
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12403 (bird)
trap number             = 12
panic: page fault
cpuid = 2
time = 1630278779
KDB: stack backtrace:
#0 0xffffffff80cb9475 at kdb_backtrace+0x65
#1 0xffffffff80c6b5e7 at vpanic+0x187
#2 0xffffffff80c6b453 at panic+0x43
#3 0xffffffff810f8577 at trap_fatal+0x387
#4 0xffffffff810f85cf at trap_pfault+0x4f
#5 0xffffffff810f7c4a at trap+0x26a
#6 0xffffffff810cf2e8 at calltrap+0x8
#7 0xffffffff80db360f at rt_get_inet6_parent+0x1af
#8 0xffffffff82549315 at handle_rtable_change_cb+0xc5
#9 0xffffffff80daf61e at handle_rtable_change_cb+0x30e
#10 0xffffffff80db27df at rt_unlinkrte+0x12f
#11 0xffffffff80db187c at rib_del_route+0xbc
#12 0xffffffff80db7991 at route_output+0x12d1
#13 0xffffffff80d0b923 at sosend_generic+0x623
#14 0xffffffff80d0bd90 at sosend+0x50
#15 0xffffffff80ce1be9 at soo_write+0x49
#16 0xffffffff80cd8b78 at dofilewrite+0x88
#17 0xffffffff80cd86cc at sys_write+0xbc
Uptime: 1m39s
Dumping 2863 out of 32631 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%


If you need additional files please let me know
Comment 17 Alexander V. Chernikov freebsd_committer 2021-08-30 06:52:40 UTC
Thank you for your patience! If you could share core & debug symbols that would be awesome!
Comment 18 Alexander V. Chernikov freebsd_committer 2021-08-31 08:16:23 UTC
(In reply to Alexander V. Chernikov from comment #17)
Should be fixed by f84c30106e8b725774b4e9a32c8dd11c90da8c25 - haven't tested w/o the default route and missed the condition.
Comment 19 Konrad 2021-08-31 10:41:32 UTC
I apply the patches to testing enviroment, looks good. I will do it on target system and will send feedback.
Comment 20 Konrad 2021-09-03 10:13:01 UTC
I confirm, all your pathes resolved main problem. System has been working for 3 days without crash.
Thank you!
Comment 21 commit-hook freebsd_committer 2021-09-07 21:14:02 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5007bc4e13906104163ca78440ffcefb5c126548

commit 5007bc4e13906104163ca78440ffcefb5c126548
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-08-15 22:25:21 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-09-07 21:02:58 +0000

    routing: Fix crashes with dpdk_lpm[46] algo.

    When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know
     the nexthop of the "parent" prefix to update its internal state.
    The glue code, which utilises RIB as a backing route store, uses
     fib[46]_lookup_rt() for the prefix destination after its deletion
     to fetch the desired nexthop.
    This approach does not work when deleting less-specific prefixes
     with most-specific ones are still present. For example, if
     10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting
     10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search
     result instead of 10.0.0.0/22. This, in turn, results in the failed
     datastructure update: part of the deleted /23 prefix will still
     contain the reference to an old nexthop. This leads to the
     use-after-free behaviour, ending with the eventual crashes.

    Fix the logic flaw by properly fetching the prefix "parent" via
     newly-created rt_get_inet[6]_parent() helpers.

    Differential Revision: https://reviews.freebsd.org/D31546
    PR:     256882,256833

    (cherry picked from commit 36e15b717eec80047fe7442898b5752101f2fbca)

 sys/contrib/dpdk_rte_lpm/dpdk_lpm.c  |  32 ++++----
 sys/contrib/dpdk_rte_lpm/dpdk_lpm6.c |  42 +++++-----
 sys/net/radix.c                      |  14 ++++
 sys/net/radix.h                      |   1 +
 sys/net/route/route_ctl.h            |   3 +
 sys/net/route/route_helpers.c        | 150 +++++++++++++++++++++++++++++++++++
 6 files changed, 208 insertions(+), 34 deletions(-)