Hello, Colleagues. We are use RADIX_MPATH . Have a lot of IPv4 route multipath. Our FreeBSD 12.0-STABLE is panic when execute `route flush`. (See details and screenshots) The problem is caused by incorrect processing of return rt_unlinkrte() from sys/net/route.c . With RADIX_MPATH return may be _NULL_ with *perror = 0. Than rtrequest1_fib() from sys/net/route.c is panic; Also route_output() from sys/net/rtsock.c . I prepared the patch. Please check it out. If he does not contradict anything, then accept. Perhaps patch creates some other problems. However, patch solves the original problem successfully. Thanks. Add details. root@noc-srr01:~ # sysctl kern | grep kern.os kern.ostype: FreeBSD kern.osrelease: 12.0-STABLE kern.osrevision: 199506 kern.osreldate: 1200503 Example 1: WITHOUT kernel options RADIX_MPATH # A lot of routes root@noc-srr01:/boot/kernel # netstat -rnW | wc -l 16132 # root@noc-srr01:/boot/kernel # netstat -rnW | more Routing tables Internet: Destination Gateway Flags Use Mtu Netif Expire default 10.169.211.1 UGS 3945 1500 xn0 1.1.2.0/30 10.169.213.234 UG1 0 1500 xn1 1.2.2.0/30 10.169.213.234 UG1 0 1500 xn1 1.2.3.0/30 10.169.213.234 UG1 0 1500 xn1 ... # Work flush correctly root@noc-srr01:/boot/kernel # netstat -rn | wc -l ; route -qn flush ; netstat -rn | wc -l 16132 23 Example 2: WITH kernel options RADIX_MPATH # A lot of routes (16000 * 4 ifs and routers) root@noc-srr01:~ # netstat -rn | wc -l 63595 # root@noc-srr01:~ # netstat -rn | more Routing tables Internet: Destination Gateway Flags Netif Expire default 10.169.211.1 UGS xn0 1.1.2.0/30 10.169.213.234 UG1 xn1 1.1.2.0/30 10.169.213.242 UG1 xn2 1.1.2.0/30 10.169.213.233 UG1 xn1 1.1.2.0/30 10.169.213.241 UG1 xn2 1.2.2.0/30 10.169.213.234 UG1 xn1 1.2.2.0/30 10.169.213.242 UG1 xn2 1.2.2.0/30 10.169.213.233 UG1 xn1 1.2.2.0/30 10.169.213.241 UG1 xn2 1.2.3.0/30 10.169.213.234 UG1 xn1 1.2.3.0/30 10.169.213.242 UG1 xn2 1.2.3.0/30 10.169.213.233 UG1 xn1 1.2.3.0/30 10.169.213.241 UG1 xn2 # flush panic ... see ScreenShots in attach root@noc-srr01:~ # netstat -rn | wc -l ; route -qn flush ; netstat -rn | wc -l 63599
Created attachment 208238 [details] Panic screenshot, 1
Created attachment 208239 [details] Panic screenshot, 2
Created attachment 208240 [details] Patch
Crash report with patch on stable/12, potential 12.1-R candidate CC recent committers around that section of code
Hi Andrey, I would like to apologise for the extremely belated reply. Thank you for submitting the fixes for the RADIX_MPATH! Properly-working multipath is a must-have for the modern networking OS. The routing stack got quite a lot of attention in the last year. As a result, the newer FreeBSD versions (starting from 13.0) features rewritten routing stack & multipath implementation. Multipath is now enabled by default. I'd suggest considering trying it out (preferably, 13-stable branch). Unfortunately, I don't have cycles to look & merge the patch to 12-S branch, so passing the bug back to -net to allow someone else to potentially pick it.
Hello, Alexander. We will definitely check the 13. Thanks/
FreeBSD 14.0-CURRENT (GENERIC) All parameters are default. In /boot/loader.conf -> if_wg_load="YES" In /etc/rc.conf -> wireguard_interfaces="wg0 wg1 wg2 wg3" In wg0.conf ... wg3.conf (almost everything is identical) : [Interface] Address = 10.127.0.9/30 PrivateKey = xxxx...xxxx= ListenPort = 46010 Table = off [Peer] PublicKey = yyyy...yyyy= AllowedIPs = 10.18.0.0/22, 10.127.0.8/30, 172.16.42.0/24 Endpoint = A.B.C.D:46010 PersistentKeepalive = 25 #> ifconfig wg0: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420 options=80000<LINKSTATE> inet 10.127.0.9 netmask 0xfffffffc groups: wg nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD> wg1: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420 options=80000<LINKSTATE> inet 10.127.0.13 netmask 0xfffffffc groups: wg nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD> wg2: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420 options=80000<LINKSTATE> inet 10.127.0.17 netmask 0xfffffffc groups: wg nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD> wg3: flags=80c1<UP,RUNNING,NOARP,MULTICAST> metric 0 mtu 1420 options=80000<LINKSTATE> inet 10.127.0.21 netmask 0xfffffffc groups: wg nd6 options=109<PERFORMNUD,IFDISABLED,NO_DAD> #> netstat -rn4 Routing tables Internet: Destination Gateway Flags Netif Expire default E.F.G.H UGS hn1 10.127.0.8/30 link#7 U wg0 10.127.0.9 link#7 UHS lo0 10.127.0.12/30 link#8 U wg1 10.127.0.13 link#8 UHS lo0 10.127.0.16/30 link#9 U wg2 10.127.0.17 link#9 UHS lo0 10.127.0.20/30 link#10 U wg3 10.127.0.21 link#10 UHS lo0 51.83.179.112 link#5 UH hn1 51.83.236.254 link#5 UHS hn1 127.0.0.1 link#1 UH lo0 172.16.42.0/24 link#4 U hn0 172.16.42.2 link#4 UHS lo0 Let's try to delete a non-existing route: #> route delete 10.18.0.0/22 10.127.0.10 route: route has not been found delete net 10.18.0.0: gateway 10.127.0.10 fib 0: not in table At this stage, everything is ok. Adding routes: route add 10.18.0.0/22 10.127.0.10 route add 10.18.0.0/22 10.127.0.14 route add 10.18.0.0/22 10.127.0.18 route add 10.18.0.0/22 10.127.0.22 #> netstat -rn4 Routing tables Internet: Destination Gateway Flags Netif Expire default E.F.G.H UGS hn1 10.18.0.0/22 10.127.0.14 UGS wg1 10.18.0.0/22 10.127.0.10 UGS wg0 10.18.0.0/22 10.127.0.22 UGS wg3 10.18.0.0/22 10.127.0.18 UGS wg2 10.127.0.8/30 link#7 U wg0 10.127.0.9 link#7 UHS lo0 10.127.0.12/30 link#8 U wg1 10.127.0.13 link#8 UHS lo0 10.127.0.16/30 link#9 U wg2 10.127.0.17 link#9 UHS lo0 10.127.0.20/30 link#10 U wg3 10.127.0.21 link#10 UHS lo0 51.83.179.112 link#5 UH hn1 51.83.236.254 link#5 UHS hn1 127.0.0.1 link#1 UH lo0 172.16.42.0/24 link#4 U hn0 172.16.42.2 link#4 UHS lo0 In /var/log/messages -> the message appears: kernel: FIB: enabled flowid calculation for locally-originated packets Let's try to delete a existing one route: #> route delete 10.18.0.0/22 10.127.0.10 delete net 10.18.0.0: gateway 10.127.0.10 fib 0 Let's try to delete a non-existing route (let's say we made a mistake in the figure): #> route delete 10.18.0.0/22 10.127.0.50 kernel: Fatal trap 12: page fault while in kernel mode kernel: cpuid = 1; apic id = 01 kernel: fault virtual address = 0x18 kernel: fault code = supervisor read data, page not present kernel: instruction pointer = 0x20:0xffffffff80d779f4 kernel: stack pointer = 0x28:0xfffffe00b54f14f0 kernel: frame pointer = 0x28:0xfffffe00b54f14f0 kernel: code segment = base rx0, limit 0xfffff, type 0x1b kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 kernel: processor eflags = interrupt enabled, resume, IOPL = 0 kernel: current process = 1648 (route) kernel: trap number = 12 kernel: panic: page fault kernel: cpuid = 1 kernel: time = 1620308885 kernel: KDB: stack backtrace: kernel: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00b54f11a0 kernel: vpanic() at vpanic+0x181/frame 0xfffffe00b54f11f0 kernel: panic() at panic+0x43/frame 0xfffffe00b54f1250 kernel: trap_fatal() at trap_fatal+0x387/frame 0xfffffe00b54f12b0 kernel: trap_pfault() at trap_pfault+0x4f/frame 0xfffffe00b54f1310 kernel: trap() at trap+0x27d/frame 0xfffffe00b54f1420 kernel: calltrap() at calltrap+0x8/frame 0xfffffe00b54f1420 kernel: --- trap 0xc, rip = 0xffffffff80d779f4, rsp = 0xfffffe00b54f14f0, rbp = 0xfffffe00b54f14f0 --- kernel: rt_get_inet_prefix_pmask() at rt_get_inet_prefix_pmask+0x4/frame 0xfffffe00b54f14f0 kernel: route_output() at route_output+0x17da/frame 0xfffffe00b54f17d0 kernel: sosend_generic() at sosend_generic+0x633/frame 0xfffffe00b54f1890 kernel: sosend() at sosend+0x50/frame 0xfffffe00b54f18c0 kernel: soo_write() at soo_write+0x49/frame 0xfffffe00b54f1900 kernel: dofilewrite() at dofilewrite+0x88/frame 0xfffffe00b54f1950 kernel: sys_write() at sys_write+0xbc/frame 0xfffffe00b54f19c0 kernel: amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe00b54f1af0 kernel: fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe00b54f1af0 kernel: --- syscall (4, FreeBSD ELF64, sys_write), rip = 0x8011ad8ea, rsp = 0x7fffffffe918, rbp = 0x7fffffffe9d0 --- kernel: KDB: enter: panic
(In reply to Michael from comment #7) Michael, I'm pretty sure the bug you see with 14.0-CURRENT is different to the bug Andrey reports. Can you please file a separate bug report?
Create new thread (Fatal trap 12 in ROUTE_MPATH variant "route delete"): https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=255665
(In reply to Gleb Smirnoff from comment #8) route -qn flush this is actually the same as route delete multi, for all route's ... and, RADIX_MPATH now renamed to ROUTE_MPATH (from 13.0) https://reviews.freebsd.org/D26449
For the record: it indeed was a different bug and it was fixed in https://cgit.FreeBSD.org/src/commit/?id=aad59c79f5f2b1881c6613b1b0b6ac7be8eb474b
(In reply to Andrey Linkevich from comment #6) Do you have any update? Does the bug reproduce in FreeBSD 13 or FreeBSD 14?
I'm going to close it with won't fix on Friday, December 30 if no objections arise
colleagues, good afternoon. Happy New Year. On FreeBSD 14.0-CURRENT #0 main-n259842-c89209c674f2: Sat Dec 24 03:59:18 UTC 2022 the problem was not repeated. I agree that the Bug can be closed. If you manage to repeat the problem on the new OS in the future, I will definitely inform you. Thank you all. Happy New Year!
Thank you for the feedback and the warm wishes! Closing this one.