Bug 278394 - Reproducible kernel panic related to IPv4 routes populated by bird2 (BGP)
Summary: Reproducible kernel panic related to IPv4 routes populated by bird2 (BGP)
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.0-RELEASE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Zhenlei Huang
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2024-04-16 21:41 UTC by Gregory Neil Shapiro
Modified: 2024-05-23 06:49 UTC (History)
6 users (show)

See Also:
zlei: mfc-stable14+
zlei: mfc-stable13+


Attachments
dmesg.boot file containing panic and system info (8.19 KB, text/plain)
2024-04-16 21:44 UTC, Gregory Neil Shapiro
no flags Details
core.txt.0 (152.01 KB, text/plain)
2024-04-16 22:55 UTC, Gregory Neil Shapiro
no flags Details
core.txt.6 (223.97 KB, text/plain)
2024-04-17 18:28 UTC, Gregory Neil Shapiro
no flags Details
patch (1.49 KB, patch)
2024-05-14 07:38 UTC, Zhenlei Huang
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-16 21:41:03 UTC
I have a reproducible kernel panic related to IPv4 routes being populated by bird2 (BGP).  It was happening under 13.2-RELEASE-p10 so I upgraded to 14.0-RELEASE-p6.  It still occurs.  First the panic:

  Fatal double fault
  rip 0xffffffff82b45307 rsp 0xfffffe00037f7f10 rbp 0xfffffe00037f8140
  rax 0 rdx 0 rbx 0xfffffe00037f8170
  rcx 0 rsi 0xfffffe00037f8170 rdi 0x4b3cc420
  r8 0 r9 0x700000000 r10 0xfffff80010457880
  r11 0x4 r12 0xfffff8001089e800 r13 0x20000
  r14 0 r15 0xfffffe00037f8174 rflags 0x10282
  cs 0x20 ss 0 ds 0x3b es 0x3b fs 0x13 gs 0x1b
  fsbase 0x878b0a130 gsbase 0xffffffff82a10000 kgsbase 0
  cpuid = 0; apic id = 00
  panic: double fault
  cpuid = 0
  time = 1713229261
  KDB: stack backtrace:
  Uptime: 23d8h18m48s

I'd love to share more about the crash, but savecore fails on boot:

Apr 16 20:59:06 membrane kernel: Starting syslogd.
Apr 16 20:59:06 membrane savecore[984]: /dev/vtbd0p2: Operation not permitted
Apr 16 20:59:06 membrane kernel: No crash dumps in /var/crash.
Apr 16 20:59:06 membrane kernel: Apr 16 20:59:06 membrane savecore[984]: /dev/vtbd0p2: Operation not permitted

I can reproduce it at will by configuring bird2 to export the BGP IPv4 route table (960959 entries) to the OS (already exporting IPv6 route table without an issue).  Note that I do this on multiple other FreeBSD 13.2/14.0 hosts without a problem so it is something specific to this box's configuration.
Comment 1 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-16 21:44:56 UTC
Created attachment 250012 [details]
dmesg.boot file containing panic and system info

Attaching /var/run/dmesg.boot as it gives one more hint.  The first line of the file, right before the panic is:

[fib_algo] inet.0 (radix4_lockless#82) rebuild_fd_flm: switching algo to radix4

Common when growing the IPv4 routing table (see it on other servers that don't panic).

The attachment will also provide you more details about the system in case it helps.
Comment 2 Marek Zarychta 2024-04-16 22:40:59 UTC
(In reply to Gregory Neil Shapiro from comment #0)
>I'd love to share more about the crash, but savecore fails on boot:
>Apr 16 20:59:06 membrane kernel: Starting syslogd.
>Apr 16 20:59:06 membrane savecore[984]: /dev/vtbd0p2: Operation not permitted
I guess it fails due to enabled swap encryption (visible in attached dmesg), but I might be wrong. Anyway please disable encryption of swap.

(In reply to Gregory Neil Shapiro from comment #1)
>[fib_algo] inet.0 (radix4_lockless#82) rebuild_fd_flm: switching algo to radix4
If switching FIB algo is the culprit, try to set it by hand. In case of BGP full view IPv4+IPv6 FIBs, you can add to /etc/sysctl.conf:
net.route.algo.inet.algo=dxr
net.route.algo.inet6.algo=dpdk_lpm6
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2024-04-16 22:52:35 UTC
A double fault could be related to a stack overflow.  The rsp value of 0xfffffe00037f7f10 is fairly close to a page boundary as well.

Since the panic is reproducible, you can try to confirm this theory by increasing the number of stack pages by setting the kern.kstack_pages tunable to, say, 6 or 8.
Comment 4 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-16 22:55:27 UTC
Created attachment 250014 [details]
core.txt.0

(In reply to Marek Zarychta from comment #2)

Yes, disabling encrypted swap allowed savecore to work (thought that had been addressed in an earlier fix).  Attaching core.txt.0 from crashinfo.
Comment 5 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-16 23:11:50 UTC
(In reply to Mark Johnston from comment #3)

Bumped kern.kstack_pages to 6 in loader.conf and rebooted.  Verified it was now 6 (previously 4) via sysctl.  It still crashed.  I'll try the route algorithm change Marek suggested next.

Mark, if you want the core.txt.1 uploaded or another value tried, let me know.  The new panic info:

 Fatal double fault
 rip 0xffffffff82e3841a rsp 0xfffffe005136e000 rbp 0xfffffe005136e230
 rax 0 rdx 0 rbx 0xfffffe0051590020
 rcx 0x11 rsi 0xfffffe0003df1490 rdi 0xfffffe005136e010
 r8 0 r9 0x5d6 r10 0xfffff8004c927bd8
 r11 0xfffffe005136e260 r12 0xfffffe0003df1490 r13 0xfffffe0003df1460
 r14 0 r15 0xfffffe005136e260 rflags 0x10246
 cs 0x20 ss 0 ds 0x3b es 0x3b fs 0x13 gs 0x1b
 fsbase 0xac6ec290620 gsbase 0xffffffff82a10000 kgsbase 0
 cpuid = 0; apic id = 00
 panic: double fault
 cpuid = 0
 time = 1713308750
 KDB: stack backtrace:
 Uptime: 2m17s
Comment 6 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-16 23:16:33 UTC
(In reply to Marek Zarychta from comment #2)
It doesn't look like changing the algo is going to work:

# kldload dpdk_lpm6

Triggered a constant flow of logging:

Apr 16 23:14:24 membrane kernel: [fib_algo] inet6.0 (radix6#100) rebuild_fd_flm: switching algo to dpdk_lpm6
Apr 16 23:14:24 membrane kernel: [fib_algo] inet6.0 setup_fd_instance: dpdk_lpm6 algo instance setup failed, failures=0
Apr 16 23:14:24 membrane kernel: [fib_algo] inet6.0 (radix6#100) rebuild_fd_flm: table rebuild failed
Apr 16 23:14:31 membrane kernel: [fib_algo] inet6.0 (radix6#100) rebuild_fd_flm: switching algo to dpdk_lpm6
Apr 16 23:14:31 membrane kernel: [fib_algo] inet6.0 setup_fd_instance: dpdk_lpm6 algo instance setup failed, failures=0
Apr 16 23:14:31 membrane kernel: [fib_algo] inet6.0 (radix6#100) rebuild_fd_flm: table rebuild failed
...
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2024-04-16 23:19:23 UTC
(In reply to Gregory Neil Shapiro from comment #5)
Hmm, the fact that rsp is still close to a page boundary still suggests that a stack overflow's happening, but perhaps it's an instance of infinite recursion or so.

For some reason we are not getting a stack trace.  I'm not sure why, I think that should generally work in the face of double faults.  Assuming you're running a release kernel, the other quick thing to try is a GENERIC kernel with debugging enabled.
Comment 8 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-16 23:23:28 UTC
I was able to load dxr, set the sysctl, and then enabling IPv4 routes to the OS still crashed the system.
Comment 9 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-16 23:57:12 UTC
(In reply to Mark Johnston from comment #7)

Yeah, even kgdb isn't giving me anything:

# kgdb /boot/kernel/kernel /var/crash/vmcore.1
GNU gdb (GDB) 14.1 [GDB v14.1 for FreeBSD]
...
Reading symbols from /boot/kernel/kernel...
(No debugging symbols found in /boot/kernel/kernel)
'osreldate' has unknown type; cast it to its declared type
(kgdb) bt
No thread selected.
(kgdb) info threads
No threads.

I'll see if I can get a debug GENERIC kernel for 14.0-p6 to boot from and see if that helps.  I hope my upstreams don't hate me for the BGP flaps. :)
Comment 10 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-17 03:17:40 UTC
I got both routing algorithms working by loading them in loader.conf.  I also bumped kern.kstack_pages to 8.  No impact -- still crashes every time I try to export the BGP routing table from bird2 to the kernel (gets around half way through before it crashes).  No backtrace as usual, but I'm pretty sure I'm using a debug kernel.  From the core.txt output:

kernel config

options CONFIG_AUTOGENERATED
ident   GENERIC
machine amd64
cpu     HAMMER
makeoptions     WITH_CTF=1
makeoptions     DEBUG=-g

(note the DEBUG=-g)

and

/boot/kernel/kernel: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /red/herring, BuildID[sha1]=7b7d4b244902bd1c795f110c11cbabe50c40783e, not stripped

Note the "not stripped".
Comment 11 Marek Zarychta 2024-04-17 04:54:26 UTC
(In reply to Gregory Neil Shapiro from comment #10)
The BIRD 2.15.1 package for FreeBSD 14 has two flavours: bird2 (netlink-based) and bird2-rtsock. Please try both of them.
Comment 12 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-17 05:29:45 UTC
Thanks, I've tried the other in the past and had odd errors but I'll try again.  Even if it prevents the crash, I'm guessing a userland program shouldn't be able to panic the kernel so I think it is worth trying to address the root cause.
Comment 13 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-17 05:38:09 UTC
Switching to bird2-rtsock didn't make a difference, same crash.
Comment 14 Mark Johnston freebsd_committer freebsd_triage 2024-04-17 13:50:56 UTC
(In reply to Gregory Neil Shapiro from comment #10)
So, if you're building a kernel from releng/14.0, GENERIC doesn't have debugging options enabled.  DEBUG+=-g merely instructs the compiler to include debug symbols in the output, which helps debuggers but doesn't enable assertions, invariants checking, etc..

We don't include a debug kernel config on releng branches (this is a problem to be fixed).  You can add the following lines (taken from GENERIC on main) to enable more checking which will hopefully catch a problem before the double fault occurs:

# For full debugger support use (turn off in stable branch):                                                                                                                                                                                                                                                                  
options         BUF_TRACKING            # Track buffer history                                                                                                                                                                                                                                                                
options         DDB                     # Support DDB.                                                                                                                                                                                                                                                                        
options         FULL_BUF_TRACKING       # Track more buffer history                                                                                                                                                                                                                                                           
options         GDB                     # Support remote GDB.                                                                                                                                                                                                                                                                 
options         DEADLKRES               # Enable the deadlock resolver                                                                                                                                                                                                                                                        
options         INVARIANTS              # Enable calls of extra sanity checking                                                                                                                                                                                                                                               
options         INVARIANT_SUPPORT       # Extra sanity checks of internal structures, required by INVARIANTS                                                                                                                                                                                                                  
options         QUEUE_MACRO_DEBUG_TRASH # Trash queue(2) internal pointers on invalidation                                                                                                                                                                                                                                    
options         WITNESS                 # Enable checks to detect deadlocks and cycles                                                                                                                                                                                                                                        
options         WITNESS_SKIPSPIN        # Don't run witness on spinlocks for speed                                                                                                                                                                                                                                            
options         MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones                                                                                                                                                                                                                                                            
options         VERBOSE_SYSINIT=0       # Support debug.verbose_sysinit, off by default
Comment 15 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-17 18:28:15 UTC
Created attachment 250034 [details]
core.txt.6

The debug kernel was a great improvement in diagnostics.  Attaching core.txt.6 which provides a full backtrace.
Comment 16 Mark Johnston freebsd_committer freebsd_triage 2024-04-17 21:02:55 UTC
So we have a packet loop:

ip_output() at ip_output+0xce7/frame 0xfffffe00517c4610
vxlan_transmit() at vxlan_transmit+0x591/frame 0xfffffe00517c4720
ether_output_frame() at ether_output_frame+0xf9/frame 0xfffffe00517c4750
ether_output() at ether_output+0x6fb/frame 0xfffffe00517c47e0
ip_output() at ip_output+0x1355/frame 0xfffffe00517c48e0
vxlan_transmit() at vxlan_transmit+0x591/frame 0xfffffe00517c49f0
ether_output_frame() at ether_output_frame+0xf9/frame 0xfffffe00517c4a20
ether_output() at ether_output+0x6fb/frame 0xfffffe00517c4ab0
ip_output() at ip_output+0x1355/frame 0xfffffe00517c4bb0
...

I wonder if vxlan_transmit() should perhaps be calling if_tunnel_check_nesting() somewhere?
Comment 17 Zhenlei Huang freebsd_committer freebsd_triage 2024-04-18 01:58:10 UTC
(In reply to Mark Johnston from comment #16)
> I wonder if vxlan_transmit() should perhaps be calling if_tunnel_check_nesting() somewhere?

I think that's the right fix.
Comment 18 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-18 02:06:45 UTC
Verified:

#23 ip_output (m=m@entry=0xfffff80007e3e000, opt=opt@entry=0x0, ro=<optimized out>, flags=flags@entry=0, imo=0x0, inp=inp@entry=0x0) at ../../../netinet/ip_output.c:699
699			switch (ip_output_pfil(&m, ifp, flags, inp, dst, &fibnum,
(kgdb) print *dst
$6 = {sin_len = 16 '\020', sin_family = 2 '\002', sin_port = 0, sin_addr = {s_addr = 4017795422}, sin_zero = "\000\000\000\000\000\000\000"}

s_addr is 94.177.122.239, which is the vxlanremote IP for vxlan0.

Checking BGP route map:

94.177.122.0/24      unicast [4IXP4RS1 18:24:23.600 from 185.1.125.1] * (100) [AS58057i]
	via 185.1.125.5 on vxlan0

Once that route was exported to the kernel, routing to 94.177.122.239 went out over vxlan0.

Configuration wise, I can avoid the crash with a static route for 94.177.122.239/32 out of the default gateway instead of the tunnel.  However, would be good to fix the crash.

Might even be interesting for all of the tunnel interfaces (not just vxlan) to check if they are about to send a packet for the tunnel endpoint inside the tunnel and realize that won't work (e.g., if dst == vxlanremote).  Is that what if_tunnel_check_nesting() does?
Comment 19 Zhenlei Huang freebsd_committer freebsd_triage 2024-04-18 02:19:49 UTC
@Gregory
bird2 learned more precise route for remote end of the vxlan tunnel from the vxlan interface. When that route was installed into kernel FIB then it ended up recursive encapsulation. Unfortunately vxlan(4) does not handle the recursive correctly and thus results in kernel stack overflow.

As a workaround, use a dedicated fib for vxlan tunnel. Assume x.x.x.x is your tunnel remote end, and y.y.y.y is original route for x.x.x.x .
```
# sysctl -n net.fibs=2
1 -> 2
# route add x.x.x.x y.y.y.y -fib 1
# ifconfig vxlan0 tunnelfib 1
```

Now happy starting bird2 :)
Comment 20 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-18 02:22:27 UTC
Funny thing happened when I put the config in place to work around the loop.  The sytem crashed with a panic in the dxr routing algorithm Marek recommended in comment 2.  I'll file a new bug for that crash.
Comment 21 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-18 02:23:39 UTC
(In reply to Zhenlei Huang from comment #19)
That is another good solution that I can appy generically to all tunnels that might end up giving a BGP route to the tunnel endpoint.   Thanks!
Comment 22 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-04-18 02:29:15 UTC
Bug 278422 filed for dxr crash
Comment 23 Gregory Neil Shapiro freebsd_committer freebsd_triage 2024-05-14 04:21:09 UTC
(In reply to Mark Johnston from comment #16)

Who is best to commit the change Mark suggested?  vxlan isn't listed in src/MAINTAINERS but if_vxlan.c lists a Copyright belonging to @bryanv.
Comment 24 Zhenlei Huang freebsd_committer freebsd_triage 2024-05-14 07:38:00 UTC
Created attachment 250637 [details]
patch

(In reply to Gregory Neil Shapiro from comment #23)
> Who is best to commit the change Mark suggested?  vxlan isn't listed in
> src/MAINTAINERS but if_vxlan.c lists a Copyright belonging to @bryanv.

Busy days. I have a simple WIP draft patch but forgot to upload it. May you please have a try ?
Comment 25 Zhenlei Huang freebsd_committer freebsd_triage 2024-05-14 10:21:48 UTC
@Gregory
Please note, the patch prevents the kernel panic but those packets to be encapsulated which is in the route loop will be dropped. So a more precise route entry for the tunnel end or dedicated FIB is still needed.
Comment 26 commit-hook freebsd_committer freebsd_triage 2024-05-20 12:16:07 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=93fbfef0b50354b7a1620822454ef29cd415cb2d

commit 93fbfef0b50354b7a1620822454ef29cd415cb2d
Author:     Zhenlei Huang <zlei@FreeBSD.org>
AuthorDate: 2024-05-20 12:14:07 +0000
Commit:     Zhenlei Huang <zlei@FreeBSD.org>
CommitDate: 2024-05-20 12:14:07 +0000

    if_vxlan(4): Add checking for loops and nesting of tunnels

    User misconfiguration, either tunnel loops, or a large number of
    different nested tunnels, can overflow the kernel stack. Prevent that
    by using if_tunnel_check_nesting().

    PR:             278394
    Diagnosed by:   markj
    Reviewed by:    kp
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D45197

 sys/net/if_vxlan.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)
Comment 27 commit-hook freebsd_committer freebsd_triage 2024-05-22 14:00:26 UTC
A commit in branch stable/14 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=3ebd2b1c730834123a53b3eddcf9029fcf414782

commit 3ebd2b1c730834123a53b3eddcf9029fcf414782
Author:     Zhenlei Huang <zlei@FreeBSD.org>
AuthorDate: 2024-05-20 12:14:07 +0000
Commit:     Zhenlei Huang <zlei@FreeBSD.org>
CommitDate: 2024-05-22 13:58:31 +0000

    if_vxlan(4): Add checking for loops and nesting of tunnels

    User misconfiguration, either tunnel loops, or a large number of
    different nested tunnels, can overflow the kernel stack. Prevent that
    by using if_tunnel_check_nesting().

    PR:             278394
    Diagnosed by:   markj
    Reviewed by:    kp
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D45197

    (cherry picked from commit 93fbfef0b50354b7a1620822454ef29cd415cb2d)

 sys/net/if_vxlan.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)
Comment 28 commit-hook freebsd_committer freebsd_triage 2024-05-22 14:02:30 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9d653a52f899c420f00e23401c3ff44b493d4d32

commit 9d653a52f899c420f00e23401c3ff44b493d4d32
Author:     Zhenlei Huang <zlei@FreeBSD.org>
AuthorDate: 2024-05-20 12:14:07 +0000
Commit:     Zhenlei Huang <zlei@FreeBSD.org>
CommitDate: 2024-05-22 14:01:00 +0000

    if_vxlan(4): Add checking for loops and nesting of tunnels

    User misconfiguration, either tunnel loops, or a large number of
    different nested tunnels, can overflow the kernel stack. Prevent that
    by using if_tunnel_check_nesting().

    PR:             278394
    Diagnosed by:   markj
    Reviewed by:    kp
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D45197

    (cherry picked from commit 93fbfef0b50354b7a1620822454ef29cd415cb2d)
    (cherry picked from commit 3ebd2b1c730834123a53b3eddcf9029fcf414782)

 sys/net/if_vxlan.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)
Comment 29 commit-hook freebsd_committer freebsd_triage 2024-05-22 23:02:43 UTC
A commit in branch releng/14.1 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=34db75d21876ae406ff57fdc594f151fc4214109

commit 34db75d21876ae406ff57fdc594f151fc4214109
Author:     Zhenlei Huang <zlei@FreeBSD.org>
AuthorDate: 2024-05-20 12:14:07 +0000
Commit:     Zhenlei Huang <zlei@FreeBSD.org>
CommitDate: 2024-05-22 23:00:03 +0000

    if_vxlan(4): Add checking for loops and nesting of tunnels

    User misconfiguration, either tunnel loops, or a large number of
    different nested tunnels, can overflow the kernel stack. Prevent that
    by using if_tunnel_check_nesting().

    PR:             278394
    Diagnosed by:   markj
    Reviewed by:    kp
    Approved by:    re (cperciva)
    MFC after:      1 week
    Differential Revision:  https://reviews.freebsd.org/D45197

    (cherry picked from commit 93fbfef0b50354b7a1620822454ef29cd415cb2d)
    (cherry picked from commit 3ebd2b1c730834123a53b3eddcf9029fcf414782)

 sys/net/if_vxlan.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)
Comment 30 Zhenlei Huang freebsd_committer freebsd_triage 2024-05-23 06:49:13 UTC
The fix has been in stable branches and will be in upcoming release candidate 14.1-RC1. Closing now.