Bug 253800 - [panic] FreeBSD-13.0 (releng/13.0) panic upon duplicate IPv4 detection / page fault while in kernel mode (in function rtsock_routemsg_info)
Summary: [panic] FreeBSD-13.0 (releng/13.0) panic upon duplicate IPv4 detection / page...
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 13.0-STABLE
Hardware: amd64 Any
: --- Affects Some People
Assignee: Alexander V. Chernikov
URL:
Keywords: crash
Depends on:
Blocks:
 
Reported: 2021-02-23 17:27 UTC by Frederic Denis
Modified: 2022-10-12 00:49 UTC (History)
1 user (show)

See Also:


Attachments
core.txt output (153.17 KB, text/plain)
2021-02-23 17:27 UTC, Frederic Denis
no flags Details
core.txt for panic of kernel with commit cherry-picked (88.55 KB, text/plain)
2021-02-23 23:11 UTC, Frederic Denis
no flags Details
core.txt for kernel with commit ea10694336b9a07d58d22187052291976f4906b2 (117.13 KB, text/plain)
2021-02-23 23:42 UTC, Frederic Denis
no flags Details
core.txt for panic of kernel with simple patch (144.44 KB, text/plain)
2021-02-24 00:17 UTC, Frederic Denis
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Frederic Denis 2021-02-23 17:27:04 UTC
Created attachment 222763 [details]
core.txt output

Hello,

Kernel version :
Version String: FreeBSD 13.0-BETA3 #7 n244527-9f00cb5fa8a4: Fri Feb 19 23:42:51 CET 2021    root@sparta:/usr/obj/usr/src/amd64.amd64/sys/GENERIC

Panic String: page fault

The system is quite simple (and used for actual FreeBSD 13, was installed from an ALPHA2 image, then updated manually to git releng/13.0 branch, going through ALPHA3/BETA1/2/3).
Additionnal modules/config removed the system still panics (removed nvidia module, linux module, and associated x11/kde5 startup items as well as custom sysctl.conf entries).

I was testing a pfSense release on a VM sharing the same lan segment, but forgot to disconnect the VM's network NICs.
The VM advertised the same IPv4 addr as the FreeBSD 13 testing machine, which panicked shortly after :

Feb 23 17:28:25 sparta kernel: arp: 00:0c:29:f6:2d:60 is using my IP address 192.168.1.1 on em0!
Feb 23 17:29:34 sparta syslogd: kernel boot file is /boot/kernel/kernel
Feb 23 17:29:34 sparta kernel:
Feb 23 17:29:34 sparta syslogd: last message repeated 1 times
Feb 23 17:29:34 sparta kernel: Fatal trap 12: page fault while in kernel mode
Feb 23 17:29:34 sparta kernel: cpuid = 7; apic id = 07
Feb 23 17:29:34 sparta kernel: fault virtual address    = 0x300000056
Feb 23 17:29:34 sparta kernel: fault code               = supervisor read data, page not present
Feb 23 17:29:34 sparta kernel: instruction pointer      = 0x20:0xffffffff80d4c716
Feb 23 17:29:34 sparta kernel: stack pointer            = 0x28:0xfffffe006b9d9f20
Feb 23 17:29:34 sparta kernel: frame pointer            = 0x28:0xfffffe006b9d9f50
Feb 23 17:29:34 sparta kernel: code segment             = base rx0, limit 0xfffff, type 0x1b
Feb 23 17:29:34 sparta kernel:                  = DPL 0, pres 1, long 1, def32 0, gran 1
Feb 23 17:29:34 sparta kernel: processor eflags = interrupt enabled, resume, IOPL = 0
Feb 23 17:29:34 sparta kernel: current process          = 0 (if_io_tqg_7)
Feb 23 17:29:34 sparta kernel: trap number              = 12
Feb 23 17:29:34 sparta kernel: panic: page fault
Feb 23 17:29:34 sparta kernel: cpuid = 7
Feb 23 17:29:34 sparta kernel: time = 1614097724
Feb 23 17:29:34 sparta kernel: KDB: stack backtrace:
Feb 23 17:29:34 sparta kernel: #0 0xffffffff80c568c5 at kdb_backtrace+0x65
Feb 23 17:29:34 sparta kernel: #1 0xffffffff80c09491 at vpanic+0x181
Feb 23 17:29:34 sparta kernel: #2 0xffffffff80c09303 at panic+0x43
Feb 23 17:29:34 sparta kernel: #3 0xffffffff810891a7 at trap_fatal+0x387
Feb 23 17:29:34 sparta kernel: #4 0xffffffff810891ff at trap_pfault+0x4f
Feb 23 17:29:34 sparta kernel: #5 0xffffffff8108885d at trap+0x27d
Feb 23 17:29:34 sparta kernel: #6 0xffffffff8105ffc8 at calltrap+0x8
Feb 23 17:29:34 sparta kernel: #7 0xffffffff80d4c676 at rtsock_routemsg+0x1f6
Feb 23 17:29:34 sparta kernel: #8 0xffffffff80e12967 at defrouter_select_fib+0x507
Feb 23 17:29:34 sparta kernel: #9 0xffffffff80e104de at nd6_ra_input+0x76e
Feb 23 17:29:34 sparta kernel: #10 0xffffffff80de5389 at icmp6_input+0x699
Feb 23 17:29:34 sparta kernel: #11 0xffffffff80dfdc0a at ip6_input+0xb3a
Feb 23 17:29:34 sparta kernel: #12 0xffffffff80d3e56a at netisr_dispatch_src+0xca
Feb 23 17:29:34 sparta kernel: #13 0xffffffff80d22d28 at ether_demux+0x148
Feb 23 17:29:34 sparta kernel: #14 0xffffffff80d240ac at ether_nh_input+0x34c
Feb 23 17:29:34 sparta kernel: #15 0xffffffff80d3e56a at netisr_dispatch_src+0xca
Feb 23 17:29:34 sparta kernel: #16 0xffffffff80d23179 at ether_input+0x69
Feb 23 17:29:34 sparta kernel: #17 0xffffffff80d3ab72 at iflib_rxeof+0xb12
Feb 23 17:29:34 sparta kernel: Uptime: 1m57s
Feb 23 17:29:34 sparta kernel: Dumping 2146 out of 65359 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%---<<BOOT>>---

panic is reproductible at will (tried it several times with the same outcome).

Test machine is dual-stack IPv4/v6, with static addrs :
root@sparta:/var/crash #  cat /etc/rc.conf
hostname="sparta"
ifconfig_em0="inet 192.168.1.1 netmask 255.255.255.0"
defaultrouter="192.168.1.254"
ifconfig_em0_ipv6="inet6 accept_rtadv xxxx:xxx:xxxx:1::2:1/64"
ipv6_defaultrouter="xxxx:xxx:xxxx:1::ffff"
rtsold_enable="YES"
[...]

I'll attach the core.txt a.s.a.p., and will keep the vmcore files for some time in case you'll need them.

Please let me know if there is any more details needed, or actions I can perform to provide additional details.

Kind regards.

-- Fred
Comment 1 commit-hook freebsd_committer freebsd_triage 2021-02-23 22:40:58 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=9c4a8d24f0ffd5243fa5c6fe27178f669f16d1f5

commit 9c4a8d24f0ffd5243fa5c6fe27178f669f16d1f5
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-02-23 22:31:07 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-02-23 22:40:01 +0000

    Fix nd6 rib_action() handling.

    rib_action() guarantees valid rc filling IFF it returns without error.
    Check rib_action() return code instead of checking rc fields.

    PR:             253800
    Reported by:    Frederic Denis <freebsdml@hecian.net>
    MFC after:      immediately

 sys/netinet6/nd6_rtr.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)
Comment 2 commit-hook freebsd_committer freebsd_triage 2021-02-23 22:45:00 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=ea10694336b9a07d58d22187052291976f4906b2

commit ea10694336b9a07d58d22187052291976f4906b2
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-02-23 22:31:07 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-02-23 22:42:28 +0000

    Fix nd6 rib_action() handling.

    rib_action() guarantees valid rc filling IFF it returns without error.
    Check rib_action() return code instead of checking rc fields.

    PR:             253800
    Reported by:    Frederic Denis <freebsdml@hecian.net>

    (cherry picked from commit 9c4a8d24f0ffd5243fa5c6fe27178f669f16d1f5)

 sys/netinet6/nd6_rtr.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)
Comment 3 Alexander V. Chernikov freebsd_committer freebsd_triage 2021-02-23 22:46:28 UTC
Hi Frederic,

Is there any chance you can try to cherry-pick the patch from stable/13 and see if this helps?
Comment 4 Frederic Denis 2021-02-23 23:11:24 UTC
Created attachment 222773 [details]
core.txt for panic of kernel with commit cherry-picked
Comment 5 Frederic Denis 2021-02-23 23:12:48 UTC
(In reply to Alexander V. Chernikov from comment #3)
Hello Alexander,

If I did the cherry-pick correctly, I'm sorry to report that the panic is still here (new core.txt attached).

FYI :
root@sparta:/usr/src # uname -a
FreeBSD sparta 13.0-BETA3 FreeBSD 13.0-BETA3 #8 n244528-6e35f3391d82: Tue Feb 23 23:57:09 CET 2021     root@sparta:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
root@sparta:/usr/src # git log -4 --oneline
6e35f3391d82 (HEAD) Fix nd6 rib_action() handling.
9f00cb5fa8a4 MFS jail: Handle a possible race between jail_remove(2) and fork(2)
889cf2bf73a0 ffs_vnops.c: Move opt_*.h includes to the top.
150b4388d3b5 update to 13.0-BETA3

Kind regards.
Comment 6 Frederic Denis 2021-02-23 23:29:05 UTC
Errata, I just saw that there was 2 different commits, the last report is for the 9c4a8d24f0ffd5243fa5c6fe27178f669f16d1f5 one. I'm currently rebuilding the kernel with ea10694336b9a07d58d22187052291976f4906b2 and will report back.

Sorry for my mistake.

Kind regards.
Comment 7 Alexander V. Chernikov freebsd_committer freebsd_triage 2021-02-23 23:40:35 UTC
It's the same commit, just cherry-picked to stable/13, so it shouldn't change anything.


Would it be possible if you could try the following simple patch:

diff --git a/sys/netinet6/nd6_rtr.c b/sys/netinet6/nd6_rtr.c
index 51b831a956bc..ca2c7255a4d8 100644
--- a/sys/netinet6/nd6_rtr.c
+++ b/sys/netinet6/nd6_rtr.c
@@ -699,7 +699,7 @@ defrouter_addreq(struct nd_defrouter *new)
        NET_EPOCH_ASSERT();
        error = rib_action(fibnum, RTM_ADD, &info, &rc);
        if (error == 0) {
-               struct nhop_object *nh = nhop_select(rc.rc_nh_new, 0);
+               struct nhop_object *nh = nhop_select_func(rc.rc_nh_new, 0);
                rt_routemsg(RTM_ADD, rc.rc_rt, nh, fibnum);
                new->installed = 1;
        }

?
Comment 8 Frederic Denis 2021-02-23 23:42:14 UTC
Created attachment 222775 [details]
core.txt for kernel with commit ea10694336b9a07d58d22187052291976f4906b2

Still no luck, kernel keeps panicking.

Double checking that I used the proper commit :
root@sparta:~ # uname -a
FreeBSD sparta 13.0-BETA3 FreeBSD 13.0-BETA3 #9 n244528-7583c26e316b: Wed Feb 24 00:30:14 CET 2021     root@sparta:/usr/obj/usr/src/amd64.amd64/sys/GENERIC  amd64
root@sparta:~ # cd /usr/src
root@sparta:/usr/src # git log -4 --oneline
7583c26e316b (HEAD) Fix nd6 rib_action() handling.
9f00cb5fa8a4 MFS jail: Handle a possible race between jail_remove(2) and fork(2)
889cf2bf73a0 ffs_vnops.c: Move opt_*.h includes to the top.
150b4388d3b5 update to 13.0-BETA3
root@sparta:/usr/src # git diff 9f00cb5fa8a4
diff --git a/sys/netinet6/nd6_rtr.c b/sys/netinet6/nd6_rtr.c
index eca704dc2843..51b831a956bc 100644
--- a/sys/netinet6/nd6_rtr.c
+++ b/sys/netinet6/nd6_rtr.c
@@ -698,12 +698,11 @@ defrouter_addreq(struct nd_defrouter *new)

        NET_EPOCH_ASSERT();
        error = rib_action(fibnum, RTM_ADD, &info, &rc);
-       if (rc.rc_rt != NULL) {
+       if (error == 0) {
                struct nhop_object *nh = nhop_select(rc.rc_nh_new, 0);
                rt_routemsg(RTM_ADD, rc.rc_rt, nh, fibnum);
-       }
-       if (error == 0)
                new->installed = 1;
+       }
 }

 /*
@@ -719,6 +718,7 @@ defrouter_delreq(struct nd_defrouter *dr)
        struct rib_cmd_info rc;
        struct epoch_tracker et;
        unsigned int fibnum;
+       int error;

        bzero(&def, sizeof(def));
        bzero(&mask, sizeof(mask));
@@ -737,8 +737,8 @@ defrouter_delreq(struct nd_defrouter *dr)
        info.rti_info[RTAX_NETMASK] = (struct sockaddr *)&mask;

        NET_EPOCH_ENTER(et);
-       rib_action(fibnum, RTM_DELETE, &info, &rc);
-       if (rc.rc_rt != NULL) {
+       error = rib_action(fibnum, RTM_DELETE, &info, &rc);
+       if (error == 0) {
                struct nhop_object *nh = nhop_select(rc.rc_nh_old, 0);
                rt_routemsg(RTM_DELETE, rc.rc_rt, nh, fibnum);
        }
root@sparta:/usr/src #
Comment 9 Frederic Denis 2021-02-24 00:17:59 UTC
Created attachment 222776 [details]
core.txt for panic of kernel with simple patch

No more immediate (~few secs after the duplicate IP report) panic.
Did trigger the duplicate several times waiting 1-2 mins in between, with some ssh interaction (getting ifconfig status/netstat -nr). Lasted for ~10 mins but finally panicked again as the duplicate IP machine was disconnected.

New core.txt attached.
Comment 10 Alexander V. Chernikov freebsd_committer freebsd_triage 2021-02-24 00:25:29 UTC
Okay. I know what’s happening here and will fix it tomorrow.

Basically, kernel creates multipath route from static one and the one resulting from rtadv received. This interaction was not foreseen, hence the problem.

As a workaround you can set net.route.multipath sysctl to 0.

Btw, what’s your expectations wrt rtadv default route?
Should kernel avoid installing it if default route is present already?
Comment 11 Frederic Denis 2021-02-24 01:06:14 UTC
(In reply to Alexander V. Chernikov from comment #10)
There's no hurry on my side, I reported the bug mainly to avoid others encountering this panic with production servers.

TBH seeing your explanation of the issue, it seems I'm to blame for this as I kept the accept_rtadv option despite having set a static route (I can further test and confirm this by removing the option or the static route if you wish).
I however don't quite understand how the duplicate v4 IP triggers changes on the v6 side: The VM that causes the duplicate IP is v4 only and both machines avec using static addresses (thus no DHCP)... I think I'll have to get some network traces to see what happens here.

As for expectations, I would say that configuring a static route expresses the will of the admin to overrule 'autoconfiguration' setup by rtadv, the logical action should then be to ignore rtadv routes, but again, TBH I'm no way an IPv6 expert so my advice is to be taken with the usual amount of salt ;)

Many thanks for your time,

Kind regards.
Comment 12 commit-hook freebsd_committer freebsd_triage 2021-02-24 22:47:07 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=cc3fa1e29fda2cc761e793a61cef3bd2522b3468

commit cc3fa1e29fda2cc761e793a61cef3bd2522b3468
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-02-24 16:42:48 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-02-24 16:44:10 +0000

    Fix crash with rtadv-originated multipath IPv6 routes.

    PR:             253800
    Reported by:    Frederic Denis <freebsdml at hecian.net>
    MFC after:      immediately

 sys/netinet6/nd6_rtr.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
Comment 13 Alexander V. Chernikov freebsd_committer freebsd_triage 2021-02-24 23:03:43 UTC
(In reply to Frederic Denis from comment #11)
IPv4 has nothing to do with the issue - it's purely IPv6-related.

I've committed a bandaid patch that prevents kernel from crashing on such an occurrence.
I think we're on the same page w.r.t the desired system behaviour when we have partially contradicting configuration - user-specified "accept_rtadv" and user-specified static default. The static route should take precedence.

However, I'm still thinking what's the best path forward here.
ND6-requested static route is pretty much the same route as comes from rtsock. I don't see any attribute one can use to distinguish one from another.

Monitoring kernel rib changes doesn't easily allow to _always_ install rtadv default ONLY when there is no other default routes.

Relative route priorities can do the trick - what's something that's mostly implemented (RTF_CONNECTED routes takes precedence), but it doesn't look mergeable to 13.0.
Comment 14 Frederic Denis 2021-02-25 09:52:12 UTC
(In reply to Alexander V. Chernikov from comment #13)
Hello Alexander,

I investigated a bit further with network traces and found that my pfSense VM that caused the ipv4 duplicate was also sending ICMP6 RAs... (despite not having any v6 configured nor services started as the setup was unfinished, but that's another story).
In other words, not only did we have a misconfiguration from my part (accept_rtadv & static route) but we also had to handle multiple conflicting RAs.

Tested a new kernel with the two commits cherry-picked, and kernel looks rock solid, despite the 'hostile' environment ;)

Thanks again for your time,

Best regards.
Comment 15 commit-hook freebsd_committer freebsd_triage 2021-02-25 21:45:23 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=7dfdd039a3584885648d33888359032479038dc1

commit 7dfdd039a3584885648d33888359032479038dc1
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-02-24 16:42:48 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-02-25 21:43:37 +0000

    Fix crash with rtadv-originated multipath IPv6 routes.

    PR:             253800
    Reported by:    Frederic Denis <freebsdml at hecian.net>
    MFC after:      immediately

    (cherry picked from commit cc3fa1e29fda2cc761e793a61cef3bd2522b3468)

 sys/netinet6/nd6_rtr.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
Comment 16 commit-hook freebsd_committer freebsd_triage 2021-02-25 21:57:26 UTC
A commit in branch releng/13.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=52d4c9e2fa024c36ea75ad17c50a9e66d2dbdc8c

commit 52d4c9e2fa024c36ea75ad17c50a9e66d2dbdc8c
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-02-24 16:42:48 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-02-25 21:55:58 +0000

    Fix crash with rtadv-originated multipath IPv6 routes.

    PR:             253800
    Reported by:    Frederic Denis <freebsdml at hecian.net>
    Approved by:    re(gjb)

    (cherry picked from commit 7dfdd039a3584885648d33888359032479038dc1)

 sys/netinet6/nd6_rtr.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)
Comment 17 commit-hook freebsd_committer freebsd_triage 2021-02-25 21:57:26 UTC
A commit in branch releng/13.0 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=5dde6e460a1f76842f884e1f6a53df8e7648756b

commit 5dde6e460a1f76842f884e1f6a53df8e7648756b
Author:     Alexander V. Chernikov <melifaro@FreeBSD.org>
AuthorDate: 2021-02-23 22:31:07 +0000
Commit:     Alexander V. Chernikov <melifaro@FreeBSD.org>
CommitDate: 2021-02-25 21:55:52 +0000

    Fix nd6 rib_action() handling.

    rib_action() guarantees valid rc filling IFF it returns without error.
    Check rib_action() return code instead of checking rc fields.

    PR:             253800
    Reported by:    Frederic Denis <freebsdml@hecian.net>
    Approved by:    re(gjb)

    (cherry picked from commit ea10694336b9a07d58d22187052291976f4906b2)

 sys/netinet6/nd6_rtr.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)
Comment 18 Alexander V. Chernikov freebsd_committer freebsd_triage 2021-02-25 22:05:18 UTC
(In reply to Frederic Denis from comment #14)
Hi Frederic,

Thank you for reporting and testing the patches!
Glad that it works as expected in your setup now.

I merged the bandaid patches to releng/13, so at least it won't panic.
The proper clash handling described above is subject to a different story and most probably won't be solved in 13.0.

Given the original problem is solved - do you mind if I close the PR?

Thank you.
Comment 19 Frederic Denis 2021-02-25 22:20:42 UTC
(In reply to Alexander V. Chernikov from comment #18)
Hi Alexander,

Thank you for pushing the patches to releng/13.0!

No worries as for the proper handling as this is clearly an edge case (misconfiguration+not so clean network), that shouldn't bite that much people after all.

You can proceed and close the PR, of course.

Cheers.