Bug 255665 - Fatal trap 12 while "route delete" in ROUTE_MPATH variant - FreeBSD 14.0-CURRENT (GENERIC)
Summary: Fatal trap 12 while "route delete" in ROUTE_MPATH variant - FreeBSD 14.0-CURR...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: amd64 Any
: --- Affects Many People
Assignee: Alexander V. Chernikov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-06 18:31 UTC by Michael
Modified: 2021-05-30 10:31 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Michael 2021-05-06 18:31:39 UTC
FreeBSD 14.0-CURRENT (GENERIC)
All parameters are default.

In /boot/loader.conf -> if_wg_load="YES"

In /etc/rc.conf -> wireguard_interfaces="wg0 wg1 wg2 wg3"

In wg0.conf ... wg3.conf (almost everything is identical) :
[Interface]
Address = 10.127.0.9/30
PrivateKey = xxxx...xxxx=
ListenPort = 46010
Table = off
[Peer]
PublicKey = yyyy...yyyy=
AllowedIPs = 10.18.0.0/22, 10.127.0.8/30, 172.16.42.0/24
Endpoint = A.B.C.D:46010
PersistentKeepalive = 25

#> ifconfig
wg0: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
        options=80000<LINKSTATE>
        inet 10.127.0.9 netmask 0xfffffffc
        groups: wg
        nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>
wg1: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
        options=80000<LINKSTATE>
        inet 10.127.0.13 netmask 0xfffffffc
        groups: wg
        nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>
wg2: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
        options=80000<LINKSTATE>
        inet 10.127.0.17 netmask 0xfffffffc
        groups: wg
        nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>
wg3: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420
        options=80000<LINKSTATE>
        inet 10.127.0.21 netmask 0xfffffffc
        groups: wg
        nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD>

#> netstat -rn4
Routing tables
Internet:
Destination        Gateway            Flags     Netif Expire
default            E.F.G.H            UGS         hn1
10.127.0.8/30      link#7             U           wg0
10.127.0.9         link#7             UHS         lo0
10.127.0.12/30     link#8             U           wg1
10.127.0.13        link#8             UHS         lo0
10.127.0.16/30     link#9             U           wg2
10.127.0.17        link#9             UHS         lo0
10.127.0.20/30     link#10            U           wg3
10.127.0.21        link#10            UHS         lo0
A.F.S.112          link#5             UH          hn1
A.F.S.254          link#5             UHS         hn1
127.0.0.1          link#1             UH          lo0
172.16.42.0/24     link#4             U           hn0
172.16.42.2        link#4             UHS         lo0

Let's try to delete a non-existing route:
#> route delete 10.18.0.0/22 10.127.0.10
route: route has not been found
delete net 10.18.0.0: gateway 10.127.0.10 fib 0: not in table

  At this stage, everything is ok.

Adding routes:
route add 10.18.0.0/22 10.127.0.10
route add 10.18.0.0/22 10.127.0.14
route add 10.18.0.0/22 10.127.0.18
route add 10.18.0.0/22 10.127.0.22

#> netstat -rn4
Routing tables
Internet:
Destination        Gateway            Flags     Netif Expire
default            E.F.G.H            UGS         hn1
10.18.0.0/22       10.127.0.14        UGS         wg1
10.18.0.0/22       10.127.0.10        UGS         wg0
10.18.0.0/22       10.127.0.22        UGS         wg3
10.18.0.0/22       10.127.0.18        UGS         wg2
10.127.0.8/30      link#7             U           wg0
10.127.0.9         link#7             UHS         lo0
10.127.0.12/30     link#8             U           wg1
10.127.0.13        link#8             UHS         lo0
10.127.0.16/30     link#9             U           wg2
10.127.0.17        link#9             UHS         lo0
10.127.0.20/30     link#10            U           wg3
10.127.0.21        link#10            UHS         lo0
A.F.S.112          link#5             UH          hn1
A.F.S.254          link#5             UHS         hn1
127.0.0.1          link#1             UH          lo0
172.16.42.0/24     link#4             U           hn0
172.16.42.2        link#4             UHS         lo0

In /var/log/messages -> the message appears:
kernel: FIB: enabled flowid calculation for locally-originated packets

Let's try to delete a existing one route:
#> route delete 10.18.0.0/22 10.127.0.10
delete net 10.18.0.0: gateway 10.127.0.10 fib 0

Let's try to delete a non-existing route (let's say we made a mistake in the figure):
#> route delete 10.18.0.0/22 10.127.0.50

kernel: Fatal trap 12: page fault while in kernel mode
kernel: cpuid = 1; apic id = 01
kernel: fault virtual address    = 0x18
kernel: fault code               = supervisor read data, page not present
kernel: instruction pointer      = 0x20:0xffffffff80d779f4
kernel: stack pointer            = 0x28:0xfffffe00b54f14f0
kernel: frame pointer            = 0x28:0xfffffe00b54f14f0
kernel: code segment             = base rx0, limit 0xfffff, type 0x1b
kernel:                  = DPL 0, pres 1, long 1, def32 0, gran 1
kernel: processor eflags = interrupt enabled, resume, IOPL = 0
kernel: current process          = 1648 (route)
kernel: trap number              = 12
kernel: panic: page fault
kernel: cpuid = 1
kernel: time = 1620308885
kernel: KDB: stack backtrace:
kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00b54f11a0
kernel: vpanic() at vpanic+0x181/frame 0xfffffe00b54f11f0
kernel: panic() at panic+0x43/frame 0xfffffe00b54f1250
kernel: trap_fatal() at trap_fatal+0x387/frame 0xfffffe00b54f12b0
kernel: trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00b54f1310
kernel: trap() at trap+0x27d/frame 0xfffffe00b54f1420
kernel: calltrap() at calltrap+0x8/frame 0xfffffe00b54f1420
kernel: --- trap 0xc, rip = 0xffffffff80d779f4, rsp = 0xfffffe00b54f14f0, rbp = 0xfffffe00b54f14f0 ---
kernel: rt_get_inet_prefix_pmask() at rt_get_inet_prefix_pmask+0x4/frame 0xfffffe00b54f14f0
kernel: route_output() at route_output+0x17da/frame 0xfffffe00b54f17d0
kernel: sosend_generic() at sosend_generic+0x633/frame 0xfffffe00b54f1890
kernel: sosend() at sosend+0x50/frame 0xfffffe00b54f18c0
kernel: soo_write() at soo_write+0x49/frame 0xfffffe00b54f1900
kernel: dofilewrite() at dofilewrite+0x88/frame 0xfffffe00b54f1950
kernel: sys_write() at sys_write+0xbc/frame 0xfffffe00b54f19c0
kernel: amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe00b54f1af0
kernel: fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00b54f1af0
kernel: --- syscall (4, FreeBSD ELF64, sys_write), rip = 0x8011ad8ea, rsp = 0x7fffffffe918, rbp = 0x7fffffffe9d0 ---
kernel: KDB: enter: panic
Comment 1 Michael 2021-05-06 19:56:14 UTC
I gave this example with the Wireguard driver as very easily repeatable in any laboratory environment for finding an error in the source code on any platform.
Comment 2 Michael 2021-05-06 20:04:28 UTC
You don't have to do the opposite side for Wireguard interfaces, since the wg interfaces in this bug may not work at all, but in my case, they work very well with load and traffic distribution. But to provide a secondary Internet provider, you need to switch routes (in the script, "route add" and "route delete") - and here - the problem - FATAL TRAP 12 :(
Comment 3 commit-hook freebsd_committer freebsd_triage 2021-05-07 20:49:56 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=aad59c79f5f2b1881c6613b1b0b6ac7be8eb474b

commit aad59c79f5f2b1881c6613b1b0b6ac7be8eb474b
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-05-07 20:36:50 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-05-07 20:41:31 +0000

    Fix panic when trying to delete non-existent gateway in multipath route.

    IF non-existend gateway was specified, the code responsible for calculating
     an updated nexthop group, returned the same already-used nexthop group.
    After the route table update, the operation result contained the same
     old & new nexthop groups. Thus, the code responsible for decomposing
     the notification to the list of simple nexthop-level notifications,
     was not able to find any differences. As a result, it hasn't updated any
      of the "simple" notification fields, resulting in empty rtentry pointer.
    This empty pointer was the direct reason of a panic.

    Fix the problem by returning ESRCH when the new nexthop group is the same
     as the old one after applying gateway filter.

    Reported by:    Michael <michael.adm at gmail.com>
    PR:             255665
    MFC after:      3 days

 sys/net/route/mpath_ctl.c | 11 ++++++++---
 sys/net/route/nhgrp_ctl.c |  6 +++---
 2 files changed, 11 insertions(+), 6 deletions(-)
Comment 4 Michael 2021-05-08 08:09:47 UTC
(In reply to commit-hook from comment #3)
Thanks for quickly fixing this error!
Now there is no kernel panic when deleting a route in ROUTE_MPATH variant.
Everything works as it should.
Comment 5 commit-hook freebsd_committer freebsd_triage 2021-05-30 10:31:25 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=f279295521400a36626ea367e83e432f5e99238f

commit f279295521400a36626ea367e83e432f5e99238f
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-05-07 20:36:50 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-05-30 10:30:45 +0000

    Fix panic when trying to delete non-existent gateway in multipath route.

    IF non-existend gateway was specified, the code responsible for calculating
     an updated nexthop group, returned the same already-used nexthop group.
    After the route table update, the operation result contained the same
     old & new nexthop groups. Thus, the code responsible for decomposing
     the notification to the list of simple nexthop-level notifications,
     was not able to find any differences. As a result, it hasn't updated any
      of the "simple" notification fields, resulting in empty rtentry pointer.
    This empty pointer was the direct reason of a panic.

    Fix the problem by returning ESRCH when the new nexthop group is the same
     as the old one after applying gateway filter.

    Reported by:    Michael <michael.adm at gmail.com>
    PR:             255665

 sys/net/route/mpath_ctl.c | 11 ++++++++---
 sys/net/route/nhgrp_ctl.c |  6 +++---
 2 files changed, 11 insertions(+), 6 deletions(-)