Created attachment 222763 [details] core.txt output Hello, Kernel version : Version String: FreeBSD 13.0-BETA3 #7 n244527-9f00cb5fa8a4: Fri Feb 19 23:42:51 CET 2021 root@sparta:/usr/obj/usr/src/amd64.amd64/sys/GENERIC Panic String: page fault The system is quite simple (and used for actual FreeBSD 13, was installed from an ALPHA2 image, then updated manually to git releng/13.0 branch, going through ALPHA3/BETA1/2/3). Additionnal modules/config removed the system still panics (removed nvidia module, linux module, and associated x11/kde5 startup items as well as custom sysctl.conf entries). I was testing a pfSense release on a VM sharing the same lan segment, but forgot to disconnect the VM's network NICs. The VM advertised the same IPv4 addr as the FreeBSD 13 testing machine, which panicked shortly after : Feb 23 17:28:25 sparta kernel: arp: 00:0c:29:f6:2d:60 is using my IP address 192.168.1.1 on em0! Feb 23 17:29:34 sparta syslogd: kernel boot file is /boot/kernel/kernel Feb 23 17:29:34 sparta kernel: Feb 23 17:29:34 sparta syslogd: last message repeated 1 times Feb 23 17:29:34 sparta kernel: Fatal trap 12: page fault while in kernel mode Feb 23 17:29:34 sparta kernel: cpuid = 7; apic id = 07 Feb 23 17:29:34 sparta kernel: fault virtual address = 0x300000056 Feb 23 17:29:34 sparta kernel: fault code = supervisor read data, page not present Feb 23 17:29:34 sparta kernel: instruction pointer = 0x20:0xffffffff80d4c716 Feb 23 17:29:34 sparta kernel: stack pointer = 0x28:0xfffffe006b9d9f20 Feb 23 17:29:34 sparta kernel: frame pointer = 0x28:0xfffffe006b9d9f50 Feb 23 17:29:34 sparta kernel: code segment = base rx0, limit 0xfffff, type 0x1b Feb 23 17:29:34 sparta kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Feb 23 17:29:34 sparta kernel: processor eflags = interrupt enabled, resume, IOPL = 0 Feb 23 17:29:34 sparta kernel: current process = 0 (if_io_tqg_7) Feb 23 17:29:34 sparta kernel: trap number = 12 Feb 23 17:29:34 sparta kernel: panic: page fault Feb 23 17:29:34 sparta kernel: cpuid = 7 Feb 23 17:29:34 sparta kernel: time = 1614097724 Feb 23 17:29:34 sparta kernel: KDB: stack backtrace: Feb 23 17:29:34 sparta kernel: #0 0xffffffff80c568c5 at kdb_backtrace+0x65 Feb 23 17:29:34 sparta kernel: #1 0xffffffff80c09491 at vpanic+0x181 Feb 23 17:29:34 sparta kernel: #2 0xffffffff80c09303 at panic+0x43 Feb 23 17:29:34 sparta kernel: #3 0xffffffff810891a7 at trap_fatal+0x387 Feb 23 17:29:34 sparta kernel: #4 0xffffffff810891ff at trap_pfault+0x4f Feb 23 17:29:34 sparta kernel: #5 0xffffffff8108885d at trap+0x27d Feb 23 17:29:34 sparta kernel: #6 0xffffffff8105ffc8 at calltrap+0x8 Feb 23 17:29:34 sparta kernel: #7 0xffffffff80d4c676 at rtsock_routemsg+0x1f6 Feb 23 17:29:34 sparta kernel: #8 0xffffffff80e12967 at defrouter_select_fib+0x507 Feb 23 17:29:34 sparta kernel: #9 0xffffffff80e104de at nd6_ra_input+0x76e Feb 23 17:29:34 sparta kernel: #10 0xffffffff80de5389 at icmp6_input+0x699 Feb 23 17:29:34 sparta kernel: #11 0xffffffff80dfdc0a at ip6_input+0xb3a Feb 23 17:29:34 sparta kernel: #12 0xffffffff80d3e56a at netisr_dispatch_src+0xca Feb 23 17:29:34 sparta kernel: #13 0xffffffff80d22d28 at ether_demux+0x148 Feb 23 17:29:34 sparta kernel: #14 0xffffffff80d240ac at ether_nh_input+0x34c Feb 23 17:29:34 sparta kernel: #15 0xffffffff80d3e56a at netisr_dispatch_src+0xca Feb 23 17:29:34 sparta kernel: #16 0xffffffff80d23179 at ether_input+0x69 Feb 23 17:29:34 sparta kernel: #17 0xffffffff80d3ab72 at iflib_rxeof+0xb12 Feb 23 17:29:34 sparta kernel: Uptime: 1m57s Feb 23 17:29:34 sparta kernel: Dumping 2146 out of 65359 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%---<<BOOT>>--- panic is reproductible at will (tried it several times with the same outcome). Test machine is dual-stack IPv4/v6, with static addrs : root@sparta:/var/crash # cat /etc/rc.conf hostname="sparta" ifconfig_em0="inet 192.168.1.1 netmask 255.255.255.0" defaultrouter="192.168.1.254" ifconfig_em0_ipv6="inet6 accept_rtadv xxxx:xxx:xxxx:1::2:1/64" ipv6_defaultrouter="xxxx:xxx:xxxx:1::ffff" rtsold_enable="YES" [...] I'll attach the core.txt a.s.a.p., and will keep the vmcore files for some time in case you'll need them. Please let me know if there is any more details needed, or actions I can perform to provide additional details. Kind regards. -- Fred
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=9c4a8d24f0ffd5243fa5c6fe27178f669f16d1f5 commit 9c4a8d24f0ffd5243fa5c6fe27178f669f16d1f5 Author: Alexander V. Chernikov <melifaro@FreeBSD.org> AuthorDate: 2021-02-23 22:31:07 +0000 Commit: Alexander V. Chernikov <melifaro@FreeBSD.org> CommitDate: 2021-02-23 22:40:01 +0000 Fix nd6 rib_action() handling. rib_action() guarantees valid rc filling IFF it returns without error. Check rib_action() return code instead of checking rc fields. PR: 253800 Reported by: Frederic Denis <freebsdml@hecian.net> MFC after: immediately sys/netinet6/nd6_rtr.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=ea10694336b9a07d58d22187052291976f4906b2 commit ea10694336b9a07d58d22187052291976f4906b2 Author: Alexander V. Chernikov <melifaro@FreeBSD.org> AuthorDate: 2021-02-23 22:31:07 +0000 Commit: Alexander V. Chernikov <melifaro@FreeBSD.org> CommitDate: 2021-02-23 22:42:28 +0000 Fix nd6 rib_action() handling. rib_action() guarantees valid rc filling IFF it returns without error. Check rib_action() return code instead of checking rc fields. PR: 253800 Reported by: Frederic Denis <freebsdml@hecian.net> (cherry picked from commit 9c4a8d24f0ffd5243fa5c6fe27178f669f16d1f5) sys/netinet6/nd6_rtr.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
Hi Frederic, Is there any chance you can try to cherry-pick the patch from stable/13 and see if this helps?
Created attachment 222773 [details] core.txt for panic of kernel with commit cherry-picked
(In reply to Alexander V. Chernikov from comment #3) Hello Alexander, If I did the cherry-pick correctly, I'm sorry to report that the panic is still here (new core.txt attached). FYI : root@sparta:/usr/src # uname -a FreeBSD sparta 13.0-BETA3 FreeBSD 13.0-BETA3 #8 n244528-6e35f3391d82: Tue Feb 23 23:57:09 CET 2021 root@sparta:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 root@sparta:/usr/src # git log -4 --oneline 6e35f3391d82 (HEAD) Fix nd6 rib_action() handling. 9f00cb5fa8a4 MFS jail: Handle a possible race between jail_remove(2) and fork(2) 889cf2bf73a0 ffs_vnops.c: Move opt_*.h includes to the top. 150b4388d3b5 update to 13.0-BETA3 Kind regards.
Errata, I just saw that there was 2 different commits, the last report is for the 9c4a8d24f0ffd5243fa5c6fe27178f669f16d1f5 one. I'm currently rebuilding the kernel with ea10694336b9a07d58d22187052291976f4906b2 and will report back. Sorry for my mistake. Kind regards.
It's the same commit, just cherry-picked to stable/13, so it shouldn't change anything. Would it be possible if you could try the following simple patch: diff --git a/sys/netinet6/nd6_rtr.c b/sys/netinet6/nd6_rtr.c index 51b831a956bc..ca2c7255a4d8 100644 --- a/sys/netinet6/nd6_rtr.c +++ b/sys/netinet6/nd6_rtr.c @@ -699,7 +699,7 @@ defrouter_addreq(struct nd_defrouter *new) NET_EPOCH_ASSERT(); error = rib_action(fibnum, RTM_ADD, &info, &rc); if (error == 0) { - struct nhop_object *nh = nhop_select(rc.rc_nh_new, 0); + struct nhop_object *nh = nhop_select_func(rc.rc_nh_new, 0); rt_routemsg(RTM_ADD, rc.rc_rt, nh, fibnum); new->installed = 1; } ?
Created attachment 222775 [details] core.txt for kernel with commit ea10694336b9a07d58d22187052291976f4906b2 Still no luck, kernel keeps panicking. Double checking that I used the proper commit : root@sparta:~ # uname -a FreeBSD sparta 13.0-BETA3 FreeBSD 13.0-BETA3 #9 n244528-7583c26e316b: Wed Feb 24 00:30:14 CET 2021 root@sparta:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 root@sparta:~ # cd /usr/src root@sparta:/usr/src # git log -4 --oneline 7583c26e316b (HEAD) Fix nd6 rib_action() handling. 9f00cb5fa8a4 MFS jail: Handle a possible race between jail_remove(2) and fork(2) 889cf2bf73a0 ffs_vnops.c: Move opt_*.h includes to the top. 150b4388d3b5 update to 13.0-BETA3 root@sparta:/usr/src # git diff 9f00cb5fa8a4 diff --git a/sys/netinet6/nd6_rtr.c b/sys/netinet6/nd6_rtr.c index eca704dc2843..51b831a956bc 100644 --- a/sys/netinet6/nd6_rtr.c +++ b/sys/netinet6/nd6_rtr.c @@ -698,12 +698,11 @@ defrouter_addreq(struct nd_defrouter *new) NET_EPOCH_ASSERT(); error = rib_action(fibnum, RTM_ADD, &info, &rc); - if (rc.rc_rt != NULL) { + if (error == 0) { struct nhop_object *nh = nhop_select(rc.rc_nh_new, 0); rt_routemsg(RTM_ADD, rc.rc_rt, nh, fibnum); - } - if (error == 0) new->installed = 1; + } } /* @@ -719,6 +718,7 @@ defrouter_delreq(struct nd_defrouter *dr) struct rib_cmd_info rc; struct epoch_tracker et; unsigned int fibnum; + int error; bzero(&def, sizeof(def)); bzero(&mask, sizeof(mask)); @@ -737,8 +737,8 @@ defrouter_delreq(struct nd_defrouter *dr) info.rti_info[RTAX_NETMASK] = (struct sockaddr *)&mask; NET_EPOCH_ENTER(et); - rib_action(fibnum, RTM_DELETE, &info, &rc); - if (rc.rc_rt != NULL) { + error = rib_action(fibnum, RTM_DELETE, &info, &rc); + if (error == 0) { struct nhop_object *nh = nhop_select(rc.rc_nh_old, 0); rt_routemsg(RTM_DELETE, rc.rc_rt, nh, fibnum); } root@sparta:/usr/src #
Created attachment 222776 [details] core.txt for panic of kernel with simple patch No more immediate (~few secs after the duplicate IP report) panic. Did trigger the duplicate several times waiting 1-2 mins in between, with some ssh interaction (getting ifconfig status/netstat -nr). Lasted for ~10 mins but finally panicked again as the duplicate IP machine was disconnected. New core.txt attached.
Okay. I know what’s happening here and will fix it tomorrow. Basically, kernel creates multipath route from static one and the one resulting from rtadv received. This interaction was not foreseen, hence the problem. As a workaround you can set net.route.multipath sysctl to 0. Btw, what’s your expectations wrt rtadv default route? Should kernel avoid installing it if default route is present already?
(In reply to Alexander V. Chernikov from comment #10) There's no hurry on my side, I reported the bug mainly to avoid others encountering this panic with production servers. TBH seeing your explanation of the issue, it seems I'm to blame for this as I kept the accept_rtadv option despite having set a static route (I can further test and confirm this by removing the option or the static route if you wish). I however don't quite understand how the duplicate v4 IP triggers changes on the v6 side: The VM that causes the duplicate IP is v4 only and both machines avec using static addresses (thus no DHCP)... I think I'll have to get some network traces to see what happens here. As for expectations, I would say that configuring a static route expresses the will of the admin to overrule 'autoconfiguration' setup by rtadv, the logical action should then be to ignore rtadv routes, but again, TBH I'm no way an IPv6 expert so my advice is to be taken with the usual amount of salt ;) Many thanks for your time, Kind regards.
A commit in branch main references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=cc3fa1e29fda2cc761e793a61cef3bd2522b3468 commit cc3fa1e29fda2cc761e793a61cef3bd2522b3468 Author: Alexander V. Chernikov <melifaro@FreeBSD.org> AuthorDate: 2021-02-24 16:42:48 +0000 Commit: Alexander V. Chernikov <melifaro@FreeBSD.org> CommitDate: 2021-02-24 16:44:10 +0000 Fix crash with rtadv-originated multipath IPv6 routes. PR: 253800 Reported by: Frederic Denis <freebsdml at hecian.net> MFC after: immediately sys/netinet6/nd6_rtr.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
(In reply to Frederic Denis from comment #11) IPv4 has nothing to do with the issue - it's purely IPv6-related. I've committed a bandaid patch that prevents kernel from crashing on such an occurrence. I think we're on the same page w.r.t the desired system behaviour when we have partially contradicting configuration - user-specified "accept_rtadv" and user-specified static default. The static route should take precedence. However, I'm still thinking what's the best path forward here. ND6-requested static route is pretty much the same route as comes from rtsock. I don't see any attribute one can use to distinguish one from another. Monitoring kernel rib changes doesn't easily allow to _always_ install rtadv default ONLY when there is no other default routes. Relative route priorities can do the trick - what's something that's mostly implemented (RTF_CONNECTED routes takes precedence), but it doesn't look mergeable to 13.0.
(In reply to Alexander V. Chernikov from comment #13) Hello Alexander, I investigated a bit further with network traces and found that my pfSense VM that caused the ipv4 duplicate was also sending ICMP6 RAs... (despite not having any v6 configured nor services started as the setup was unfinished, but that's another story). In other words, not only did we have a misconfiguration from my part (accept_rtadv & static route) but we also had to handle multiple conflicting RAs. Tested a new kernel with the two commits cherry-picked, and kernel looks rock solid, despite the 'hostile' environment ;) Thanks again for your time, Best regards.
A commit in branch stable/13 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=7dfdd039a3584885648d33888359032479038dc1 commit 7dfdd039a3584885648d33888359032479038dc1 Author: Alexander V. Chernikov <melifaro@FreeBSD.org> AuthorDate: 2021-02-24 16:42:48 +0000 Commit: Alexander V. Chernikov <melifaro@FreeBSD.org> CommitDate: 2021-02-25 21:43:37 +0000 Fix crash with rtadv-originated multipath IPv6 routes. PR: 253800 Reported by: Frederic Denis <freebsdml at hecian.net> MFC after: immediately (cherry picked from commit cc3fa1e29fda2cc761e793a61cef3bd2522b3468) sys/netinet6/nd6_rtr.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
A commit in branch releng/13.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=52d4c9e2fa024c36ea75ad17c50a9e66d2dbdc8c commit 52d4c9e2fa024c36ea75ad17c50a9e66d2dbdc8c Author: Alexander V. Chernikov <melifaro@FreeBSD.org> AuthorDate: 2021-02-24 16:42:48 +0000 Commit: Alexander V. Chernikov <melifaro@FreeBSD.org> CommitDate: 2021-02-25 21:55:58 +0000 Fix crash with rtadv-originated multipath IPv6 routes. PR: 253800 Reported by: Frederic Denis <freebsdml at hecian.net> Approved by: re(gjb) (cherry picked from commit 7dfdd039a3584885648d33888359032479038dc1) sys/netinet6/nd6_rtr.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
A commit in branch releng/13.0 references this bug: URL: https://cgit.FreeBSD.org/src/commit/?id=5dde6e460a1f76842f884e1f6a53df8e7648756b commit 5dde6e460a1f76842f884e1f6a53df8e7648756b Author: Alexander V. Chernikov <melifaro@FreeBSD.org> AuthorDate: 2021-02-23 22:31:07 +0000 Commit: Alexander V. Chernikov <melifaro@FreeBSD.org> CommitDate: 2021-02-25 21:55:52 +0000 Fix nd6 rib_action() handling. rib_action() guarantees valid rc filling IFF it returns without error. Check rib_action() return code instead of checking rc fields. PR: 253800 Reported by: Frederic Denis <freebsdml@hecian.net> Approved by: re(gjb) (cherry picked from commit ea10694336b9a07d58d22187052291976f4906b2) sys/netinet6/nd6_rtr.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
(In reply to Frederic Denis from comment #14) Hi Frederic, Thank you for reporting and testing the patches! Glad that it works as expected in your setup now. I merged the bandaid patches to releng/13, so at least it won't panic. The proper clash handling described above is subject to a different story and most probably won't be solved in 13.0. Given the original problem is solved - do you mind if I close the PR? Thank you.
(In reply to Alexander V. Chernikov from comment #18) Hi Alexander, Thank you for pushing the patches to releng/13.0! No worries as for the proper handling as this is clearly an edge case (misconfiguration+not so clean network), that shouldn't bite that much people after all. You can proceed and close the PR, of course. Cheers.