Bug 241191 - route flush panic with RADIX_MPATH
Summary: route flush panic with RADIX_MPATH
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.0-STABLE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-net (Nobody)
URL:
Keywords: crash, needs-qa
Depends on:
Blocks: 240700
  Show dependency treegraph
 
Reported: 2019-10-11 07:25 UTC by Andrey Linkevich
Modified: 2021-05-07 21:20 UTC (History)
6 users (show)

See Also:
koobs: maintainer-feedback? (melifaro)
koobs: maintainer-feedback? (glebius)
koobs: mfc-stable12?
koobs: mfc-stable11?


Attachments
Panic screenshot, 1 (11.52 KB, image/png)
2019-10-11 07:26 UTC, Andrey Linkevich
no flags Details
Panic screenshot, 2 (15.83 KB, image/png)
2019-10-11 07:26 UTC, Andrey Linkevich
no flags Details
Patch (1.45 KB, patch)
2019-10-11 07:33 UTC, Andrey Linkevich
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andrey Linkevich 2019-10-11 07:25:02 UTC
Hello, Colleagues.

We are use RADIX_MPATH .
Have a lot of IPv4 route multipath.

Our FreeBSD 12.0-STABLE is panic when execute `route flush`. (See details and screenshots)

The problem is caused by incorrect processing of return rt_unlinkrte() from sys/net/route.c .

With RADIX_MPATH return may be _NULL_ with *perror = 0.

Than rtrequest1_fib() from sys/net/route.c is panic;
Also route_output() from sys/net/rtsock.c .

I prepared the patch.
Please check it out. If he does not contradict anything, then accept.

Perhaps patch creates some other problems. However, patch solves the original problem successfully.

Thanks.


Add details.

root@noc-srr01:~ # sysctl kern | grep kern.os
kern.ostype: FreeBSD
kern.osrelease: 12.0-STABLE
kern.osrevision: 199506
kern.osreldate: 1200503


Example 1: WITHOUT kernel options RADIX_MPATH

# A lot of routes
root@noc-srr01:/boot/kernel # netstat -rnW | wc -l
   16132

# 
root@noc-srr01:/boot/kernel # netstat -rnW | more
Routing tables

Internet:
Destination        Gateway            Flags       Use    Mtu      Netif Expire
default            10.169.211.1       UGS        3945   1500        xn0
1.1.2.0/30         10.169.213.234     UG1           0   1500        xn1
1.2.2.0/30         10.169.213.234     UG1           0   1500        xn1
1.2.3.0/30         10.169.213.234     UG1           0   1500        xn1
...

# Work flush correctly
root@noc-srr01:/boot/kernel # netstat -rn | wc -l ; route -qn flush ; netstat -rn | wc -l
   16132
     23


Example 2: WITH kernel options RADIX_MPATH

# A lot of routes (16000 * 4 ifs and routers)
root@noc-srr01:~ # netstat -rn | wc -l
   63595

# 
root@noc-srr01:~ # netstat -rn | more
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            10.169.211.1       UGS         xn0
1.1.2.0/30         10.169.213.234     UG1         xn1
1.1.2.0/30         10.169.213.242     UG1         xn2
1.1.2.0/30         10.169.213.233     UG1         xn1
1.1.2.0/30         10.169.213.241     UG1         xn2
1.2.2.0/30         10.169.213.234     UG1         xn1
1.2.2.0/30         10.169.213.242     UG1         xn2
1.2.2.0/30         10.169.213.233     UG1         xn1
1.2.2.0/30         10.169.213.241     UG1         xn2
1.2.3.0/30         10.169.213.234     UG1         xn1
1.2.3.0/30         10.169.213.242     UG1         xn2
1.2.3.0/30         10.169.213.233     UG1         xn1
1.2.3.0/30         10.169.213.241     UG1         xn2

# flush panic ... see ScreenShots in attach
root@noc-srr01:~ # netstat -rn | wc -l ; route -qn flush ; netstat -rn | wc -l
   63599
Comment 1 Andrey Linkevich 2019-10-11 07:26:13 UTC
Created attachment 208238 [details]
Panic screenshot, 1
Comment 2 Andrey Linkevich 2019-10-11 07:26:43 UTC
Created attachment 208239 [details]
Panic screenshot, 2
Comment 3 Andrey Linkevich 2019-10-11 07:33:02 UTC
Created attachment 208240 [details]
Patch
Comment 4 Kubilay Kocak freebsd_committer freebsd_triage 2019-10-11 08:38:54 UTC
Crash report with patch on stable/12, potential 12.1-R candidate

CC recent committers around that section of code
Comment 5 Alexander V. Chernikov freebsd_committer 2021-04-23 22:35:11 UTC
Hi Andrey,

I would like to apologise for the extremely belated reply.
Thank you for submitting the fixes for the RADIX_MPATH!
Properly-working multipath is a must-have for the modern networking OS.

The routing stack got quite a lot of attention in the last year. As a result, the newer FreeBSD versions (starting from 13.0) features rewritten routing stack & multipath implementation. Multipath is now enabled by default.

I'd suggest considering trying it out (preferably, 13-stable branch).


Unfortunately, I don't have cycles to look & merge the patch to 12-S branch, so passing the bug back to -net to allow someone else to potentially pick it.
Comment 6 Andrey Linkevich 2021-04-26 05:28:13 UTC
Hello, Alexander.

We will definitely check the 13.

Thanks/
Comment 7 Michael 2021-05-06 14:01:44 UTC
FreeBSD 14.0-CURRENT (GENERIC)
All parameters are default.

In /boot/loader.conf -> if_wg_load="YES"

In /etc/rc.conf -> wireguard_interfaces="wg0 wg1 wg2 wg3"

In wg0.conf ... wg3.conf (almost everything is identical) :
[Interface]
Address = 10.127.0.9/30
PrivateKey = xxxx...xxxx=
ListenPort = 46010
Table = off
[Peer]
PublicKey = yyyy...yyyy=
AllowedIPs = 10.18.0.0/22, 10.127.0.8/30, 172.16.42.0/24
Endpoint = A.B.C.D:46010
PersistentKeepalive = 25

#> ifconfig
wg0: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
        options=80000<LINKSTATE>
        inet 10.127.0.9 netmask 0xfffffffc
        groups: wg
        nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>
wg1: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
        options=80000<LINKSTATE>
        inet 10.127.0.13 netmask 0xfffffffc
        groups: wg
        nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>
wg2: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
        options=80000<LINKSTATE>
        inet 10.127.0.17 netmask 0xfffffffc
        groups: wg
        nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>
wg3: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
        options=80000<LINKSTATE>
        inet 10.127.0.21 netmask 0xfffffffc
        groups: wg
        nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>

#> netstat -rn4
Routing tables
Internet:
Destination        Gateway            Flags     Netif Expire
default            E.F.G.H            UGS         hn1
10.127.0.8/30      link#7             U           wg0
10.127.0.9         link#7             UHS         lo0
10.127.0.12/30     link#8             U           wg1
10.127.0.13        link#8             UHS         lo0
10.127.0.16/30     link#9             U           wg2
10.127.0.17        link#9             UHS         lo0
10.127.0.20/30     link#10            U           wg3
10.127.0.21        link#10            UHS         lo0
51.83.179.112      link#5             UH          hn1
51.83.236.254      link#5             UHS         hn1
127.0.0.1          link#1             UH          lo0
172.16.42.0/24     link#4             U           hn0
172.16.42.2        link#4             UHS         lo0

Let's try to delete a non-existing route:
#> route delete 10.18.0.0/22 10.127.0.10
route: route has not been found
delete net 10.18.0.0: gateway 10.127.0.10 fib 0: not in table

  At this stage, everything is ok.

Adding routes:
route add 10.18.0.0/22 10.127.0.10
route add 10.18.0.0/22 10.127.0.14
route add 10.18.0.0/22 10.127.0.18
route add 10.18.0.0/22 10.127.0.22

#> netstat -rn4
Routing tables
Internet:
Destination        Gateway            Flags     Netif Expire
default            E.F.G.H            UGS         hn1
10.18.0.0/22       10.127.0.14        UGS         wg1
10.18.0.0/22       10.127.0.10        UGS         wg0
10.18.0.0/22       10.127.0.22        UGS         wg3
10.18.0.0/22       10.127.0.18        UGS         wg2
10.127.0.8/30      link#7             U           wg0
10.127.0.9         link#7             UHS         lo0
10.127.0.12/30     link#8             U           wg1
10.127.0.13        link#8             UHS         lo0
10.127.0.16/30     link#9             U           wg2
10.127.0.17        link#9             UHS         lo0
10.127.0.20/30     link#10            U           wg3
10.127.0.21        link#10            UHS         lo0
51.83.179.112      link#5             UH          hn1
51.83.236.254      link#5             UHS         hn1
127.0.0.1          link#1             UH          lo0
172.16.42.0/24     link#4             U           hn0
172.16.42.2        link#4             UHS         lo0

In /var/log/messages -> the message appears:
kernel: FIB: enabled flowid calculation for locally-originated packets

Let's try to delete a existing one route:
#> route delete 10.18.0.0/22 10.127.0.10
delete net 10.18.0.0: gateway 10.127.0.10 fib 0

Let's try to delete a non-existing route (let's say we made a mistake in the figure):
#> route delete 10.18.0.0/22 10.127.0.50

kernel: Fatal trap 12: page fault while in kernel mode
kernel: cpuid = 1; apic id = 01
kernel: fault virtual address    = 0x18
kernel: fault code               = supervisor read data, page not present
kernel: instruction pointer      = 0x20:0xffffffff80d779f4
kernel: stack pointer            = 0x28:0xfffffe00b54f14f0
kernel: frame pointer            = 0x28:0xfffffe00b54f14f0
kernel: code segment             = base rx0, limit 0xfffff, type 0x1b
kernel:                  = DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags = interrupt enabled, resume, IOPL = 0
kernel: current process          = 1648 (route)
kernel: trap number              = 12
kernel: panic: page fault
kernel: cpuid = 1
kernel: time = 1620308885
kernel: KDB: stack backtrace:
kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00b54f11a0
kernel: vpanic() at vpanic+0x181/frame 0xfffffe00b54f11f0
kernel: panic() at panic+0x43/frame 0xfffffe00b54f1250
kernel: trap_fatal() at trap_fatal+0x387/frame 0xfffffe00b54f12b0
kernel: trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00b54f1310
kernel: trap() at trap+0x27d/frame 0xfffffe00b54f1420
kernel: calltrap() at calltrap+0x8/frame 0xfffffe00b54f1420
kernel: --- trap 0xc, rip = 0xffffffff80d779f4, rsp = 0xfffffe00b54f14f0, rbp = 0xfffffe00b54f14f0 ---
kernel: rt_get_inet_prefix_pmask() at rt_get_inet_prefix_pmask+0x4/frame 0xfffffe00b54f14f0
kernel: route_output() at route_output+0x17da/frame 0xfffffe00b54f17d0
kernel: sosend_generic() at sosend_generic+0x633/frame 0xfffffe00b54f1890
kernel: sosend() at sosend+0x50/frame 0xfffffe00b54f18c0
kernel: soo_write() at soo_write+0x49/frame 0xfffffe00b54f1900
kernel: dofilewrite() at dofilewrite+0x88/frame 0xfffffe00b54f1950
kernel: sys_write() at sys_write+0xbc/frame 0xfffffe00b54f19c0
kernel: amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe00b54f1af0
kernel: fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00b54f1af0
kernel: --- syscall (4, FreeBSD ELF64, sys_write), rip = 0x8011ad8ea, rsp = 0x7fffffffe918, rbp = 0x7fffffffe9d0 ---
kernel: KDB: enter: panic
Comment 8 Gleb Smirnoff freebsd_committer 2021-05-06 18:01:27 UTC
(In reply to Michael from comment #7)

Michael, I'm pretty sure the bug you see with 14.0-CURRENT is different to the bug Andrey reports. Can you please file a separate bug report?
Comment 9 Michael 2021-05-06 18:35:53 UTC
Create new thread (Fatal trap 12 in ROUTE_MPATH variant "route delete"):
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255665
Comment 10 Michael 2021-05-06 19:46:37 UTC
(In reply to Gleb Smirnoff from comment #8)
route -qn flush
this is actually the same as
route delete
multi, for all route's

... and, RADIX_MPATH now renamed to ROUTE_MPATH (from 13.0)
https://reviews.freebsd.org/D26449
Comment 11 Alexander V. Chernikov freebsd_committer 2021-05-07 21:20:54 UTC
For the record: it indeed was a different bug and it was fixed in https://cgit.FreeBSD.org/src/commit/?id=aad59c79f5f2b1881c6613b1b0b6ac7be8eb474b