Bug 254303 - Fatal trap 12: page fault while in kernel mode ((frr 7.5_1 + Freebsd 13 Beta3) zebra crashes server when routes are populated)
Summary: Fatal trap 12: page fault while in kernel mode ((frr 7.5_1 + Freebsd 13 Beta3...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: amd64 Any
: --- Affects Only Me
Assignee: Alexander V. Chernikov
URL:
Keywords: panic
Depends on:
Blocks:
 
Reported: 2021-03-15 09:18 UTC by Aleks
Modified: 2021-04-04 13:00 UTC (History)
5 users (show)

See Also:


Attachments
kgdb backtrace (65.99 KB, text/plain)
2021-03-15 09:18 UTC, Aleks
no flags Details
core2 (97.70 KB, text/plain)
2021-03-24 07:22 UTC, Aleks
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aleks 2021-03-15 09:18:32 UTC
Created attachment 223285 [details]
kgdb backtrace

Description :

FreeBSD 13 Beta 3 + Frr frr7-7.5_1

Server has 2 BGP connections with Uplink routers. Each neighbour sends FullView ipv4 (840k+ routes) to this FreeBSD box. 
When 1 session is up - everything is ok, when we bring second connection - server crashes (dump attached)

How to reproduce : 
1) Install FreeBSD 13 Beta 3 + Frr frr7-7.5_1
2) Send 2 x Fullview via 2 peers to FreeBSD box
Comment 1 Aleks 2021-03-15 10:14:33 UTC
Same issue on Freebsd 13 RC2
13.0-RC2 #0 releng/13.0-n244684-13c22f74953: Fri Mar 12 04:05:19 UTC 2021     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64

Frr same : 
frr7-7.5_1
Name           : frr7
Version        : 7.5_1
Comment 2 Marek Zarychta 2021-03-15 10:46:41 UTC
Can you upgrade and test 13.0-RC2? Does it also happen with sysctl net.route.multipath=0 set?
Comment 3 Aleks 2021-03-15 10:55:45 UTC
(In reply to Marek Zarychta from comment #2)

It does happen on 13.0-RC2 too with default sysctl settings. (multipath=1)
With sysctl net.route.multipath=0 it doesn't crash. (on RC2)
Comment 4 Marek Zarychta 2021-03-17 08:20:58 UTC
(In reply to Aleks from comment #3)
If you can build a custom kernel with "options 	FIB_ALGO", install it and after reboot load the module dpdk_lpm4 (and  dpdk_lpm6 if appropriate), then please give it a try.
Comment 5 Aleks 2021-03-17 12:01:05 UTC
(In reply to Marek Zarychta from comment #4)
So, I took kernel src from https://download.freebsd.org/ftp/releases/amd64/13.0-RC2/src.txz
Build with "options 	FIB_ALGO"
FreeBSD 13.0-RC2 FreeBSD 13.0-RC2 #0: Wed Mar 17 13:23:47 EET 2021     :/usr/obj/usr/src/amd64.amd64/sys/CUSTOM  amd64
Disabled FRR autostart and rebooted the server.

After reboot I've set multipath=1 and loaded dpdk_lpm4/6, and after that started FRR.
[fib_algo] inet.0 (bsearch4#13) rebuild_fd: switching algo to radix4_lockless
[fib_algo] fib_module_register: attaching dpdk_lpm4 to inet
[fib_algo] fib_module_register: attaching dpdk_lpm6 to inet6
[fib_algo] inet.0 (radix4_lockless#114) rebuild_fd: switching algo to dpdk_lpm4

After bringing up second BGP FullView session servers still crashed.
Comment 6 Zhenlei Huang 2021-03-18 03:35:14 UTC
CC Alexander V. Chernikov
Comment 7 Alexander V. Chernikov freebsd_committer 2021-03-21 18:46:54 UTC
(In reply to Aleks from comment #5)
Is there any chance you could share kernel&core?
Comment 8 Aleks 2021-03-21 21:30:53 UTC
(In reply to Alexander V. Chernikov from comment #7)
Sure, just tell me what you mean by "share kernel&core"
Comment 9 Alexander V. Chernikov freebsd_committer 2021-03-21 23:35:59 UTC
Obvious code checks and tries to repro this in an easy way failed - I don’t have a good idea on why its happening. Setting up multiple live bgp feeds will take some time, so there are multiple ways to proceed:

Fastest one - if you could tar all your /boot/kernel AND coredump for that kernel.
The downside is that kernel memory dump may contain some private information (passwords, other sensitive stuff in packet memory etc). If you could consider sharing it with me (so noone else gets access to this info) - that would be awesome.

Otherwise I can ether write a list of gdb conmands to run on the core or try to repro with the feeds, but that will take more time.
Comment 10 Aleks 2021-03-21 23:49:48 UTC
(In reply to Alexander V. Chernikov from comment #9)
by core dump you mean vmcore.* file?

p.s. I can even give you access to this server if it will help you (it's not in production)
Comment 11 Aleks 2021-03-22 00:14:21 UTC
(In reply to Alexander V. Chernikov from comment #9)
p.s. I can't make him write dump file after I compiled custom kernel (+FIB_ALGO)
When I test dumps with "sysctl debug.kdb.panic=1" the dump is written but when zebra populates routes and that crashes servers - dump is not there(
Comment 12 Aleks 2021-03-23 16:26:19 UTC
(In reply to Alexander V. Chernikov from comment #9)
I've sent you what you asked for (in email).
Comment 13 Alexander V. Chernikov freebsd_committer 2021-03-24 00:49:05 UTC
Awesome!

Could you also share other panics backtraces (if any)?
Comment 14 Aleks 2021-03-24 07:22:25 UTC
Created attachment 223541 [details]
core2
Comment 15 Aleks 2021-03-24 07:23:19 UTC
(In reply to Alexander V. Chernikov from comment #13)
Apart from main trace and the one I've sent you yesterday?
I've attached another one (named core2) to this bugreport
Comment 16 Alexander V. Chernikov freebsd_committer 2021-03-26 23:46:29 UTC
(In reply to Aleks from comment #15)
Thank you!

Short summary:

From the private core.5 you sent me:
* rtentry looks perfectly fine, but the nexthop pointer is (mostly) zeroed

* from the core2: failure to resolve nh_priv pointer
* from the original kgdb_backtrace: nhg has zero pointer to nh_ctl

So far it looks like we're removing the additional reference from the nexthop group in some corner case scenario, which results in the group being freed, with the rtentry still pointing to this group.

Re reproduction: I don't have 2 full-view peers, so I ended up duplicating the feed from a single peer & introducing some delay, to mimic propagation delays.
So far I wasn't able to reproduce any panic.
Are there any additional specifics (e.g. links flapping) in the setup?


IS there any chance you could run

stdbuf -o0 route -n monitor > zebra_log.txt at startup (or, actually, at the point in time when all peers are down) and then try to turn up first and then the second peer?
If you could also run something like
`while true; do date >> nhg.log ; netstat -4OnW >> nhg.log ; sleep 5; done`

and share both files along with the core backtrace, that would be awesome.

If there is a possibility of getting access to the server - that would really speed the things up.
Comment 17 Aleks 2021-03-28 16:23:54 UTC
(In reply to Alexander V. Chernikov from comment #16)
I'll give you both files and server access via email.
Comment 18 commit-hook freebsd_committer 2021-03-29 23:07:41 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9095dc7da4cf0c484fb1160b2180b7329b09b107

commit 9095dc7da4cf0c484fb1160b2180b7329b09b107
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-03-29 23:00:17 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-03-29 23:00:17 +0000

    Fix nexhtop group index array scaling.

    The current code has the limit of 127 nexthop groups due to the
     wrongly-checked bitmask_copy() return value.

    PR: 254303
    Reported by:    Aleks <a.ivanov at veesp.com>
    MFC after: 1 day

 sys/net/route/nhgrp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
Comment 19 Alexander V. Chernikov freebsd_committer 2021-03-29 23:23:04 UTC
So, it looks like it is a combination of 3 bugs:

The actual thing corrupting memory is https://cgit.freebsd.org/src/commit/?id=42f997d9b721ce5b64c37958f21fa81630f5a224 (in 13.0-RC4).

We get to this codepath by having 127 hexthop groups (number when we trigger array resize). This is addressed in https://cgit.freebsd.org/src/commit/?id=9095dc7da4cf0c484fb1160b2180b7329b09b107 (only in HEAD atm).

We get that amount of nexthop groups (should be only one) because of non-zeroing all of the memory in the comparison part of nexthop group. This is address in https://cgit.freebsd.org/src/commit/?id=823a80f4f9037b6b9611aaceb21f53115d1e64f1 (in 13-S, not sure if it lands in 13.0-R).
Comment 20 commit-hook freebsd_committer 2021-03-30 08:01:49 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=923e7f7e12670e97b097a195e69c848a6e8773a2

commit 923e7f7e12670e97b097a195e69c848a6e8773a2
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-03-29 23:00:17 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-03-30 07:34:31 +0000

    Fix nexhtop group index array scaling.

    The current code has the limit of 127 nexthop groups due to the
     wrongly-checked bitmask_copy() return value.

    PR: 254303
    Reported by:    Aleks <a.ivanov at veesp.com>

    (cherry picked from commit 9095dc7da4cf0c484fb1160b2180b7329b09b107)

 sys/net/route/nhgrp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
Comment 21 commit-hook freebsd_committer 2021-03-31 20:09:41 UTC
A commit in branch releng/13.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=b7fbdb5042c619221ee0b97573affcb8bcb59458

commit b7fbdb5042c619221ee0b97573affcb8bcb59458
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-03-29 23:00:17 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-03-31 20:00:10 +0000

    Fix nexhtop group index array scaling.

    The current code has the limit of 127 nexthop groups due to the
     wrongly-checked bitmask_copy() return value.

    PR: 254303
    Reported by:    Aleks <a.ivanov at veesp.com>
    Approved by:    re (gjb)

    (cherry picked from commit 923e7f7e12670e97b097a195e69c848a6e8773a2)

 sys/net/route/nhgrp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
Comment 22 Alexander V. Chernikov freebsd_committer 2021-04-04 08:55:43 UTC
All relevant patches are in 13-R.
Does it fix an issue for you?
Comment 23 Aleks 2021-04-04 12:57:35 UTC
(In reply to Alexander V. Chernikov from comment #22)
For me - yes. Thank you very much!