I have a reproducible kernel panic related to IPv4 routes being populated by bird2 (BGP). It was happening under 13.2-RELEASE-p10 so I upgraded to 14.0-RELEASE-p6. It still occurs. First the panic: Fatal double fault rip 0xffffffff82b45307 rsp 0xfffffe00037f7f10 rbp 0xfffffe00037f8140 rax 0 rdx 0 rbx 0xfffffe00037f8170 rcx 0 rsi 0xfffffe00037f8170 rdi 0x4b3cc420 r8 0 r9 0x700000000 r10 0xfffff80010457880 r11 0x4 r12 0xfffff8001089e800 r13 0x20000 r14 0 r15 0xfffffe00037f8174 rflags 0x10282 cs 0x20 ss 0 ds 0x3b es 0x3b fs 0x13 gs 0x1b fsbase 0x878b0a130 gsbase 0xffffffff82a10000 kgsbase 0 cpuid = 0; apic id = 00 panic: double fault cpuid = 0 time = 1713229261 KDB: stack backtrace: Uptime: 23d8h18m48s I'd love to share more about the crash, but savecore fails on boot: Apr 16 20:59:06 membrane kernel: Starting syslogd. Apr 16 20:59:06 membrane savecore[984]: /dev/vtbd0p2: Operation not permitted Apr 16 20:59:06 membrane kernel: No crash dumps in /var/crash. Apr 16 20:59:06 membrane kernel: Apr 16 20:59:06 membrane savecore[984]: /dev/vtbd0p2: Operation not permitted I can reproduce it at will by configuring bird2 to export the BGP IPv4 route table (960959 entries) to the OS (already exporting IPv6 route table without an issue). Note that I do this on multiple other FreeBSD 13.2/14.0 hosts without a problem so it is something specific to this box's configuration.
Created attachment 250012 [details] dmesg.boot file containing panic and system info Attaching /var/run/dmesg.boot as it gives one more hint. The first line of the file, right before the panic is: [fib_algo] inet.0 (radix4_lockless#82) rebuild_fd_flm: switching algo to radix4 Common when growing the IPv4 routing table (see it on other servers that don't panic). The attachment will also provide you more details about the system in case it helps.
(In reply to Gregory Neil Shapiro from comment #0) >I'd love to share more about the crash, but savecore fails on boot: >Apr 16 20:59:06 membrane kernel: Starting syslogd. >Apr 16 20:59:06 membrane savecore[984]: /dev/vtbd0p2: Operation not permitted I guess it fails due to enabled swap encryption (visible in attached dmesg), but I might be wrong. Anyway please disable encryption of swap. (In reply to Gregory Neil Shapiro from comment #1) >[fib_algo] inet.0 (radix4_lockless#82) rebuild_fd_flm: switching algo to radix4 If switching FIB algo is the culprit, try to set it by hand. In case of BGP full view IPv4+IPv6 FIBs, you can add to /etc/sysctl.conf: net.route.algo.inet.algo=dxr net.route.algo.inet6.algo=dpdk_lpm6
A double fault could be related to a stack overflow. The rsp value of 0xfffffe00037f7f10 is fairly close to a page boundary as well. Since the panic is reproducible, you can try to confirm this theory by increasing the number of stack pages by setting the kern.kstack_pages tunable to, say, 6 or 8.
Created attachment 250014 [details] core.txt.0 (In reply to Marek Zarychta from comment #2) Yes, disabling encrypted swap allowed savecore to work (thought that had been addressed in an earlier fix). Attaching core.txt.0 from crashinfo.
(In reply to Mark Johnston from comment #3) Bumped kern.kstack_pages to 6 in loader.conf and rebooted. Verified it was now 6 (previously 4) via sysctl. It still crashed. I'll try the route algorithm change Marek suggested next. Mark, if you want the core.txt.1 uploaded or another value tried, let me know. The new panic info: Fatal double fault rip 0xffffffff82e3841a rsp 0xfffffe005136e000 rbp 0xfffffe005136e230 rax 0 rdx 0 rbx 0xfffffe0051590020 rcx 0x11 rsi 0xfffffe0003df1490 rdi 0xfffffe005136e010 r8 0 r9 0x5d6 r10 0xfffff8004c927bd8 r11 0xfffffe005136e260 r12 0xfffffe0003df1490 r13 0xfffffe0003df1460 r14 0 r15 0xfffffe005136e260 rflags 0x10246 cs 0x20 ss 0 ds 0x3b es 0x3b fs 0x13 gs 0x1b fsbase 0xac6ec290620 gsbase 0xffffffff82a10000 kgsbase 0 cpuid = 0; apic id = 00 panic: double fault cpuid = 0 time = 1713308750 KDB: stack backtrace: Uptime: 2m17s
(In reply to Marek Zarychta from comment #2) It doesn't look like changing the algo is going to work: # kldload dpdk_lpm6 Triggered a constant flow of logging: Apr 16 23:14:24 membrane kernel: [fib_algo] inet6.0 (radix6#100) rebuild_fd_flm: switching algo to dpdk_lpm6 Apr 16 23:14:24 membrane kernel: [fib_algo] inet6.0 setup_fd_instance: dpdk_lpm6 algo instance setup failed, failures=0 Apr 16 23:14:24 membrane kernel: [fib_algo] inet6.0 (radix6#100) rebuild_fd_flm: table rebuild failed Apr 16 23:14:31 membrane kernel: [fib_algo] inet6.0 (radix6#100) rebuild_fd_flm: switching algo to dpdk_lpm6 Apr 16 23:14:31 membrane kernel: [fib_algo] inet6.0 setup_fd_instance: dpdk_lpm6 algo instance setup failed, failures=0 Apr 16 23:14:31 membrane kernel: [fib_algo] inet6.0 (radix6#100) rebuild_fd_flm: table rebuild failed ...
(In reply to Gregory Neil Shapiro from comment #5) Hmm, the fact that rsp is still close to a page boundary still suggests that a stack overflow's happening, but perhaps it's an instance of infinite recursion or so. For some reason we are not getting a stack trace. I'm not sure why, I think that should generally work in the face of double faults. Assuming you're running a release kernel, the other quick thing to try is a GENERIC kernel with debugging enabled.
I was able to load dxr, set the sysctl, and then enabling IPv4 routes to the OS still crashed the system.
(In reply to Mark Johnston from comment #7) Yeah, even kgdb isn't giving me anything: # kgdb /boot/kernel/kernel /var/crash/vmcore.1 GNU gdb (GDB) 14.1 [GDB v14.1 for FreeBSD] ... Reading symbols from /boot/kernel/kernel... (No debugging symbols found in /boot/kernel/kernel) 'osreldate' has unknown type; cast it to its declared type (kgdb) bt No thread selected. (kgdb) info threads No threads. I'll see if I can get a debug GENERIC kernel for 14.0-p6 to boot from and see if that helps. I hope my upstreams don't hate me for the BGP flaps. :)
I got both routing algorithms working by loading them in loader.conf. I also bumped kern.kstack_pages to 8. No impact -- still crashes every time I try to export the BGP routing table from bird2 to the kernel (gets around half way through before it crashes). No backtrace as usual, but I'm pretty sure I'm using a debug kernel. From the core.txt output: kernel config options CONFIG_AUTOGENERATED ident GENERIC machine amd64 cpu HAMMER makeoptions WITH_CTF=1 makeoptions DEBUG=-g (note the DEBUG=-g) and /boot/kernel/kernel: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), dynamically linked, interpreter /red/herring, BuildID[sha1]=7b7d4b244902bd1c795f110c11cbabe50c40783e, not stripped Note the "not stripped".
(In reply to Gregory Neil Shapiro from comment #10) The BIRD 2.15.1 package for FreeBSD 14 has two flavours: bird2 (netlink-based) and bird2-rtsock. Please try both of them.
Thanks, I've tried the other in the past and had odd errors but I'll try again. Even if it prevents the crash, I'm guessing a userland program shouldn't be able to panic the kernel so I think it is worth trying to address the root cause.
Switching to bird2-rtsock didn't make a difference, same crash.
(In reply to Gregory Neil Shapiro from comment #10) So, if you're building a kernel from releng/14.0, GENERIC doesn't have debugging options enabled. DEBUG+=-g merely instructs the compiler to include debug symbols in the output, which helps debuggers but doesn't enable assertions, invariants checking, etc.. We don't include a debug kernel config on releng branches (this is a problem to be fixed). You can add the following lines (taken from GENERIC on main) to enable more checking which will hopefully catch a problem before the double fault occurs: # For full debugger support use (turn off in stable branch): options BUF_TRACKING # Track buffer history options DDB # Support DDB. options FULL_BUF_TRACKING # Track more buffer history options GDB # Support remote GDB. options DEADLKRES # Enable the deadlock resolver options INVARIANTS # Enable calls of extra sanity checking options INVARIANT_SUPPORT # Extra sanity checks of internal structures, required by INVARIANTS options QUEUE_MACRO_DEBUG_TRASH # Trash queue(2) internal pointers on invalidation options WITNESS # Enable checks to detect deadlocks and cycles options WITNESS_SKIPSPIN # Don't run witness on spinlocks for speed options MALLOC_DEBUG_MAXZONES=8 # Separate malloc(9) zones options VERBOSE_SYSINIT=0 # Support debug.verbose_sysinit, off by default
Created attachment 250034 [details] core.txt.6 The debug kernel was a great improvement in diagnostics. Attaching core.txt.6 which provides a full backtrace.
So we have a packet loop: ip_output() at ip_output+0xce7/frame 0xfffffe00517c4610 vxlan_transmit() at vxlan_transmit+0x591/frame 0xfffffe00517c4720 ether_output_frame() at ether_output_frame+0xf9/frame 0xfffffe00517c4750 ether_output() at ether_output+0x6fb/frame 0xfffffe00517c47e0 ip_output() at ip_output+0x1355/frame 0xfffffe00517c48e0 vxlan_transmit() at vxlan_transmit+0x591/frame 0xfffffe00517c49f0 ether_output_frame() at ether_output_frame+0xf9/frame 0xfffffe00517c4a20 ether_output() at ether_output+0x6fb/frame 0xfffffe00517c4ab0 ip_output() at ip_output+0x1355/frame 0xfffffe00517c4bb0 ... I wonder if vxlan_transmit() should perhaps be calling if_tunnel_check_nesting() somewhere?
(In reply to Mark Johnston from comment #16) > I wonder if vxlan_transmit() should perhaps be calling if_tunnel_check_nesting() somewhere? I think that's the right fix.
Verified: #23 ip_output (m=m@entry=0xfffff80007e3e000, opt=opt@entry=0x0, ro=<optimized out>, flags=flags@entry=0, imo=0x0, inp=inp@entry=0x0) at ../../../netinet/ip_output.c:699 699 switch (ip_output_pfil(&m, ifp, flags, inp, dst, &fibnum, (kgdb) print *dst $6 = {sin_len = 16 '\020', sin_family = 2 '\002', sin_port = 0, sin_addr = {s_addr = 4017795422}, sin_zero = "\000\000\000\000\000\000\000"} s_addr is 94.177.122.239, which is the vxlanremote IP for vxlan0. Checking BGP route map: 94.177.122.0/24 unicast [4IXP4RS1 18:24:23.600 from 185.1.125.1] * (100) [AS58057i] via 185.1.125.5 on vxlan0 Once that route was exported to the kernel, routing to 94.177.122.239 went out over vxlan0. Configuration wise, I can avoid the crash with a static route for 94.177.122.239/32 out of the default gateway instead of the tunnel. However, would be good to fix the crash. Might even be interesting for all of the tunnel interfaces (not just vxlan) to check if they are about to send a packet for the tunnel endpoint inside the tunnel and realize that won't work (e.g., if dst == vxlanremote). Is that what if_tunnel_check_nesting() does?
@Gregory bird2 learned more precise route for remote end of the vxlan tunnel from the vxlan interface. When that route was installed into kernel FIB then it ended up recursive encapsulation. Unfortunately vxlan(4) does not handle the recursive correctly and thus results in kernel stack overflow. As a workaround, use a dedicated fib for vxlan tunnel. Assume x.x.x.x is your tunnel remote end, and y.y.y.y is original route for x.x.x.x . ``` # sysctl -n net.fibs=2 1 -> 2 # route add x.x.x.x y.y.y.y -fib 1 # ifconfig vxlan0 tunnelfib 1 ``` Now happy starting bird2 :)
Funny thing happened when I put the config in place to work around the loop. The sytem crashed with a panic in the dxr routing algorithm Marek recommended in comment 2. I'll file a new bug for that crash.
(In reply to Zhenlei Huang from comment #19) That is another good solution that I can appy generically to all tunnels that might end up giving a BGP route to the tunnel endpoint. Thanks!
Bug 278422 filed for dxr crash
(In reply to Mark Johnston from comment #16) Who is best to commit the change Mark suggested? vxlan isn't listed in src/MAINTAINERS but if_vxlan.c lists a Copyright belonging to @bryanv.
Created attachment 250637 [details] patch (In reply to Gregory Neil Shapiro from comment #23) > Who is best to commit the change Mark suggested? vxlan isn't listed in > src/MAINTAINERS but if_vxlan.c lists a Copyright belonging to @bryanv. Busy days. I have a simple WIP draft patch but forgot to upload it. May you please have a try ?
@Gregory Please note, the patch prevents the kernel panic but those packets to be encapsulated which is in the route loop will be dropped. So a more precise route entry for the tunnel end or dedicated FIB is still needed.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=93fbfef0b50354b7a1620822454ef29cd415cb2d commit 93fbfef0b50354b7a1620822454ef29cd415cb2d Author: Zhenlei Huang <zlei@FreeBSD.org> AuthorDate: 2024-05-20 12:14:07 +0000 Commit: Zhenlei Huang <zlei@FreeBSD.org> CommitDate: 2024-05-20 12:14:07 +0000 if_vxlan(4): Add checking for loops and nesting of tunnels User misconfiguration, either tunnel loops, or a large number of different nested tunnels, can overflow the kernel stack. Prevent that by using if_tunnel_check_nesting(). PR: 278394 Diagnosed by: markj Reviewed by: kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D45197 sys/net/if_vxlan.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
A commit in branch stable/14 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=3ebd2b1c730834123a53b3eddcf9029fcf414782 commit 3ebd2b1c730834123a53b3eddcf9029fcf414782 Author: Zhenlei Huang <zlei@FreeBSD.org> AuthorDate: 2024-05-20 12:14:07 +0000 Commit: Zhenlei Huang <zlei@FreeBSD.org> CommitDate: 2024-05-22 13:58:31 +0000 if_vxlan(4): Add checking for loops and nesting of tunnels User misconfiguration, either tunnel loops, or a large number of different nested tunnels, can overflow the kernel stack. Prevent that by using if_tunnel_check_nesting(). PR: 278394 Diagnosed by: markj Reviewed by: kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D45197 (cherry picked from commit 93fbfef0b50354b7a1620822454ef29cd415cb2d) sys/net/if_vxlan.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9d653a52f899c420f00e23401c3ff44b493d4d32 commit 9d653a52f899c420f00e23401c3ff44b493d4d32 Author: Zhenlei Huang <zlei@FreeBSD.org> AuthorDate: 2024-05-20 12:14:07 +0000 Commit: Zhenlei Huang <zlei@FreeBSD.org> CommitDate: 2024-05-22 14:01:00 +0000 if_vxlan(4): Add checking for loops and nesting of tunnels User misconfiguration, either tunnel loops, or a large number of different nested tunnels, can overflow the kernel stack. Prevent that by using if_tunnel_check_nesting(). PR: 278394 Diagnosed by: markj Reviewed by: kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D45197 (cherry picked from commit 93fbfef0b50354b7a1620822454ef29cd415cb2d) (cherry picked from commit 3ebd2b1c730834123a53b3eddcf9029fcf414782) sys/net/if_vxlan.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
A commit in branch releng/14.1 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=34db75d21876ae406ff57fdc594f151fc4214109 commit 34db75d21876ae406ff57fdc594f151fc4214109 Author: Zhenlei Huang <zlei@FreeBSD.org> AuthorDate: 2024-05-20 12:14:07 +0000 Commit: Zhenlei Huang <zlei@FreeBSD.org> CommitDate: 2024-05-22 23:00:03 +0000 if_vxlan(4): Add checking for loops and nesting of tunnels User misconfiguration, either tunnel loops, or a large number of different nested tunnels, can overflow the kernel stack. Prevent that by using if_tunnel_check_nesting(). PR: 278394 Diagnosed by: markj Reviewed by: kp Approved by: re (cperciva) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D45197 (cherry picked from commit 93fbfef0b50354b7a1620822454ef29cd415cb2d) (cherry picked from commit 3ebd2b1c730834123a53b3eddcf9029fcf414782) sys/net/if_vxlan.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
The fix has been in stable branches and will be in upcoming release candidate 14.1-RC1. Closing now.