Summary: | Fatal trap 12: page fault while in kernel mode in sysctl_dumpentry from sysctl NET_RT_DUMP | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | ian | ||||||||||||||||
Component: | kern | Assignee: | Andrey V. Elsukov <ae> | ||||||||||||||||
Status: | Closed DUPLICATE | ||||||||||||||||||
Severity: | Affects Some People | CC: | Franck.Rousseau, ae, chris, cy, eugen, markj, mmacy, msl0000023508, net, rstone, w0wkin | ||||||||||||||||
Priority: | --- | Flags: | koobs:
mfc-stable12+
|
||||||||||||||||
Version: | 11.2-STABLE | ||||||||||||||||||
Hardware: | amd64 | ||||||||||||||||||
OS: | Any | ||||||||||||||||||
See Also: |
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227720 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233306 |
||||||||||||||||||
Attachments: |
|
Description
ian
2018-08-10 09:17:34 UTC
The point at which it fails appears to be Line 1559 in /usr/src/sys/net/rtsock.c info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr; At the point where it runs this, (kgdb) print rt->rt_ifp->if_addr $3 = (struct ifaddr *) 0x0 Its trying to access via a NULL pointer. Add some people working with locking for rtsock recently to the CC list. Created attachment 199064 [details]
add some checks
Please try this patch and see if it eliminates panics. Apply it:
cd /usr/src
patch < /path/to/patch
Then rebuild and reinstall the kernel.
(In reply to Eugene Grosbein from comment #3) > Created attachment 199064 [details] > add some checks > > Please try this patch and see if it eliminates panics. Apply it: > > cd /usr/src > patch < /path/to/patch > > Then rebuild and reinstall the kernel. This patch is not correct way to fix the problem, I think you have not any guarantee that you acquire the lock at the time when all data is correct. Indeed, this patch does not work. I have given more information at bug #227720 which is linked to this one. Created attachment 199344 [details]
Proposed patch (for stable/12+)
I think this problem can be fixed by this patch, but it is only applicable to FreeBSD 12.0 and later. If you are able to test stable/12 with and without patch, the feedback would be appreciated.
Created attachment 199345 [details]
Proposed patch (for stable/12+)
Actually IF_ADDR_RLOCK() is redundant here.
Created attachment 199372 [details]
Proposed patch (for stable/12+)
Sorry, but I think the panic is still possible. The kernel sets IFF_DYING flag too late, instead we can check for presence of IFF_UP. Also, do not reset to NULL ifp->if_addr pointer in the if_detach_internal(), this doesn't look like very useful and also will protect us from NULL pointer dereference, when another thread will detach interface after we check IFF_UP flag. The accessing to if_addr is safe in this case due to using epoch_call() in ifa_free().
Thanks for the tentative fix, I have just tested on 11.2 and 12-RC1 kernels. I have adapted to 11.2 by removing the NET_EPOCH_* macros. The behavior changes, there is no more crash, but it looks like something is not cleared as it should. Setting up ppp + proxy arp, everything works. Then, interrupting and restarting ppp used to cause the crash consistently, but with this patch, ppp fails with the following error : PPp ON localhost> Warning: iface add: ioctl(SIOCAIFADDR, 192.168.0.2 -> 192.168.0.1): File exists Error: ipcp_InterfaceUp: unable to set ip address Sorry, I don't have much time to dig into the route and interface handling code right now. (In reply to Franck Rousseau from comment #9) > Thanks for the tentative fix, I have just tested on 11.2 and 12-RC1 kernels. > I have adapted to 11.2 by removing the NET_EPOCH_* macros. The behavior > changes, there is no more crash, but it looks like something is not cleared > as it should. > > Setting up ppp + proxy arp, everything works. Then, interrupting and > restarting ppp used to cause the crash consistently, but with this patch, > ppp fails with the following error : > > PPp ON localhost> Warning: iface add: ioctl(SIOCAIFADDR, 192.168.0.2 -> > 192.168.0.1): File exists > Error: ipcp_InterfaceUp: unable to set ip address > > Sorry, I don't have much time to dig into the route and interface handling > code right now. No, without NET_EPOCH the patch won't work. It is the main feature that allows to fix the problem and 11.x has not this feature. Sure, I did the test in 12 as I just wrote, it was just to compare, since it did not work. (In reply to Andrey V. Elsukov from comment #10) Just to clear things up: - the crash happens both in 11.2 and 12 - the proposed fix breaks ppp I did more tests with ppp as explained in bug #227720 this morning and noticed the following: - if the ppp server has two different addresses on the ethernet and ppp tun interfaces, everything works fine, I can stop and start ppp without a problem - if I configure the same address on the ethernet interface as the one set up on the tun interface, then the next ppp connection works fine, but if I stop the server, restart and re-open from the client I consistently get a crash I did make a pretty naive fix for this shortly after reporting it as the system in question was crashing several times a day. Since applying this I have has no further issues with it. It does mean the application querying gets back some null pointers, but its likely better the application exits (if it does not check for NULL pointers) than the entire system crashing ? Index: rtsock.c =================================================================== --- rtsock.c (revision 339318) +++ rtsock.c (working copy) @@ -1556,8 +1556,10 @@ rt_mask(rt), &ss); info.rti_info[RTAX_GENMASK] = 0; if (rt->rt_ifp) { - info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr; - info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr; + if (rt->rt_ifp->if_addr) + info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr; + if (rt->rt_ifa) + info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr; if (rt->rt_ifp->if_flags & IFF_POINTOPOINT) info.rti_info[RTAX_BRD] = rt->rt_ifa->ifa_dstaddr; } (In reply to Franck Rousseau from comment #12) > Just to clear things up: > - the crash happens both in 11.2 and 12 > - the proposed fix breaks ppp > > I did more tests with ppp as explained in bug #227720 this morning and > noticed the following: > - if the ppp server has two different addresses on the ethernet and ppp tun > interfaces, everything works fine, I can stop and start ppp without a problem > - if I configure the same address on the ethernet interface as the one set > up on the tun interface, then the next ppp connection works fine, but if I > stop the server, restart and re-open from the client I consistently get a > crash Ok, I think the problem with ppp is due to we don't return needed info when interface isn't UP. Created attachment 199444 [details]
Proposed patch (for stable/12+)
Simplify the patch, remove the check for IFF_UP.
ifnet pointer should be safe to dereference while we in NET_EPOCH section. Also, since if_addr now kept unchanged, it is safe to dereference it too.
(In reply to ian from comment #13) > I did make a pretty naive fix for this shortly after reporting it as the > system in question was crashing several times a day. Since applying this I > have has no further issues with it. It does mean the application querying > gets back some null pointers, but its likely better the application exits > (if it does not check for NULL pointers) than the entire system crashing ? > > Index: rtsock.c > =================================================================== > --- rtsock.c (revision 339318) > +++ rtsock.c (working copy) > @@ -1556,8 +1556,10 @@ > rt_mask(rt), &ss); > info.rti_info[RTAX_GENMASK] = 0; > if (rt->rt_ifp) { > - info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr; > - info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr; > + if (rt->rt_ifp->if_addr) > + info.rti_info[RTAX_IFP] = > rt->rt_ifp->if_addr->ifa_addr; > + if (rt->rt_ifa) > + info.rti_info[RTAX_IFA] = rt->rt_ifa->ifa_addr; > if (rt->rt_ifp->if_flags & IFF_POINTOPOINT) > info.rti_info[RTAX_BRD] = rt->rt_ifa->ifa_dstaddr; > } rt->rt_ifa should be safe to dereference, since rtentry holds reference to ifa and it won't be freed. But access to rt_ifp->if_addr is not easy to protect in stable/11. The problem happens due to interface is destroying in the time, when we are doing iteration through routes. And even if you add NULL check here, there is not any guarantee that you won't make access to already freed memory in the rtsock_msg_buffer() a bit later, when you will make access to info.rti_info[]. Also I think an application may expect presence of both RTAX_IFP and RTAX_IFA pointers. Created attachment 199449 [details]
Proposed patch
I think this patch can be used for both FreeBSD 12 and 11. Use IFNET_RLOCK_NOSLEEP() to protect from interface destroying during routes iteration. In if_detach_internal() mark interface as dying just after we remove it from the ifnets list. In sysctl_dumpentry() add the check, that interface was not destroyed before doing the access.
Created attachment 199450 [details]
Proposed patch
Sorry, wrong patch was attached.
A commit references this bug: Author: ae Date: Tue Nov 27 09:04:07 UTC 2018 New revision: 341008 URL: https://svnweb.freebsd.org/changeset/base/341008 Log: Fix possible panic during ifnet detach in rtsock. The panic can happen, when some application does dump of routing table using sysctl interface. To prevent this, set IFF_DYING flag in if_detach_internal() function, when ifnet under lock is removed from the chain. In sysctl_rtsock() take IFNET_RLOCK_NOSLEEP() to prevent ifnet detach during routes enumeration. In case, if some interface was detached in the time before we take the lock, add the check, that ifnet is not DYING. This prevents access to memory that could be freed after ifnet is unlinked. PR: 227720, 230498, 233306 Reviewed by: bz, eugen MFC after: 1 week Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D18338 Changes: head/sys/net/if.c head/sys/net/rtsock.c (In reply to commit-hook from comment #19) As mentioned in comment #9 above, this patch breaks ppp, I get this when trying to re-open a second connection, this is the stage at which the crash occured before: PPp ON localhost> Warning: iface add: ioctl(SIOCAIFADDR, 192.168.0.2 -> 192.168.0.1): File exists Error: ipcp_InterfaceUp: unable to set ip address Also, the patch in attachment #199450 [details] does not fix this specific problem, we still crash the kernel with the procedure described earlier in comment #12. As I said, I could narrow down the cause and find a fix for our use case, by using two different IPv4 addresses for Ethernet and PPP tun interfaces the kernel does not crash anymore. About the fix, I suspect that internal structures are corrupted, so any kind of fix at this point will fail, for example with this patch on 11.2-p4 it looks like I keep getting these values after the crash: (kgdb) print rt->rt_ifp->if_flags $3 = 3 (kgdb) print rt->rt_ifp->if_index $4 = 63488 I will try to setup on-line debugging to watch internal structures and see if I can get an idea of what is breaking things up. (In reply to Franck Rousseau from comment #20) > (In reply to commit-hook from comment #19) > > As mentioned in comment #9 above, this patch breaks ppp, I get this when > trying to re-open a second connection, this is the stage at which the crash > occured before: > PPp ON localhost> Warning: iface add: ioctl(SIOCAIFADDR, 192.168.0.2 -> > 192.168.0.1): File exists > Error: ipcp_InterfaceUp: unable to set ip address > > Also, the patch in attachment #199450 [details] does not fix this specific > problem, we still crash the kernel with the procedure described earlier in > comment #12. As I said, I could narrow down the cause and find a fix for our > use case, by using two different IPv4 addresses for Ethernet and PPP tun > interfaces the kernel does not crash anymore. > > About the fix, I suspect that internal structures are corrupted, so any kind > of fix at this point will fail, for example with this patch on 11.2-p4 it > looks like I keep getting these values after the crash: > > (kgdb) print rt->rt_ifp->if_flags > $3 = 3 > (kgdb) print rt->rt_ifp->if_index > $4 = 63488 > > I will try to setup on-line debugging to watch internal structures and see > if I can get an idea of what is breaking things up. According to if_flags this patch doesn't affect your case, since if_flags = (IFF_UP | IFF_BROADCAST). There is no IFF_DYING flag. Also, rtsock has several places where it can panic due to the similar issue, but with different stack trace (for example https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205678). Are you sure that your panic is the same? Also if_index has unusual very large value. Pleas, show your backtrace and show in context of noted frame the output of "p *rt->rt_ifp" command. (In reply to Andrey V. Elsukov from comment #21) Panic is at sys/net/rtsock.c:1559 1559 info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr; The stack trace is always pretty much the same, as in bug 227720 comments 35 and 37, at this last comment you will also find the output of p *rt->rt_ifp I'm running a 12.0-STABLE r349024 amd64 system; 2 PPP over SSH tunnels (as a server) with the user space ppp(8) implementation running on it. I later noticed the fix has already been MFCed into 12-STABLE branch via r341677. This kernel panic is still happening when I trying to restart those PPP instances (using 'killall ppp', for example). # kgdb -c vmcore.3 /boot/kernel/kernel GNU gdb (GDB) 8.3 [GDB v8.3 for FreeBSD] Copyright (C) 2019 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-portbld-freebsd12.0". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /boot/kernel/kernel... Reading symbols from /usr/lib/debug//boot/kernel/kernel.debug... Unread portion of the kernel message buffer: cpuid = 3; apic id = 06 fault virtual address = 0x0 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80cf66a3 stack pointer = 0x28:0xfffffe002cd084f0 frame pointer = 0x28:0xfffffe002cd08630 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 1926 (ppp) trap number = 12 panic: page fault cpuid = 3 time = 1561396369 KDB: stack backtrace: #0 0xffffffff80c16e77 at kdb_backtrace+0x67 #1 0xffffffff80bcad3d at vpanic+0x19d #2 0xffffffff80bcab93 at panic+0x43 #3 0xffffffff810a84b5 at trap_fatal+0x395 #4 0xffffffff810a8519 at trap_pfault+0x49 #5 0xffffffff810a7aff at trap+0x29f #6 0xffffffff81082cf5 at calltrap+0x8 #7 0xffffffff80cf0110 at rn_walktree+0x80 #8 0xffffffff80cf5b4b at sysctl_rtsock+0x2db #9 0xffffffff80bd9b4b at sysctl_root_handler_locked+0x8b #10 0xffffffff80bd91ed at sysctl_root+0x24d #11 0xffffffff80bd986a at userland_sysctl+0x17a #12 0xffffffff80bd96af at sys___sysctl+0x5f #13 0xffffffff810a9084 at amd64_syscall+0x364 #14 0xffffffff810835dd at fast_syscall_common+0x101 Uptime: 2h0m31s (ada0:ahcich1:0:0:0): spin-down Dumping 289 out of 3952 MB: (CTRL-C to abort) ..6%..12%..23%..34%..45%..56%..61%..72%..83%..94% __curthread () at /usr/src/sys/amd64/include/pcpu.h:234 234 __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (OFFSETOF_CURTHREAD)); (kgdb) bt #0 __curthread () at /usr/src/sys/amd64/include/pcpu.h:234 #1 doadump (textdump=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:371 #2 0xffffffff80bca938 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:451 #3 0xffffffff80bcad99 in vpanic (fmt=<optimized out>, ap=<optimized out>) at /usr/src/sys/kern/kern_shutdown.c:877 #4 0xffffffff80bcab93 in panic (fmt=<unavailable>) at /usr/src/sys/kern/kern_shutdown.c:804 #5 0xffffffff810a84b5 in trap_fatal (frame=0xfffffe002cd08430, eva=0) at /usr/src/sys/amd64/amd64/trap.c:948 #6 0xffffffff810a8519 in trap_pfault (frame=0xfffffe002cd08430, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:767 #7 0xffffffff810a7aff in trap (frame=0xfffffe002cd08430) at /usr/src/sys/amd64/amd64/trap.c:443 #8 <signal handler called> #9 0xffffffff80cf66a3 in sysctl_dumpentry (rn=0xfffff80004901680, vw=0xfffffe002cd087b8) at /usr/src/sys/net/rtsock.c:1579 #10 0xffffffff80cf0110 in rn_walktree (h=<optimized out>, f=0xffffffff80cf6500 <sysctl_dumpentry>, w=0xfffffe002cd087b8) at /usr/src/sys/net/radix.c:1096 #11 0xffffffff80cf5b4b in sysctl_rtsock (oidp=<optimized out>, arg1=<optimized out>, arg2=<optimized out>, req=<optimized out>) at /usr/src/sys/net/rtsock.c:1942 #12 0xffffffff80bd9b4b in sysctl_root_handler_locked ( oid=0xffffffff81b2c960 <sysctl___net_routetable>, arg1=0xfffffe002cd08a38, arg2=4, req=0xfffffe002cd08970, tracker=0xfffffe002cd088e8) at /usr/src/sys/kern/kern_sysctl.c:166 #13 0xffffffff80bd91ed in sysctl_root (oidp=<optimized out>, arg1=0xfffffe002cd08a38, arg2=4, req=0xfffffe002cd08970) at /usr/src/sys/kern/kern_sysctl.c:2033 #14 0xffffffff80bd986a in userland_sysctl (td=0xfffff8006a5e2000, name=0xfffffe002cd08a30, namelen=6, old=<optimized out>, oldlenp=<optimized out>, inkernel=<optimized out>, new=0x0, newlen=0, retval=0xfffffe002cd08a98, flags=0) at /usr/src/sys/kern/kern_sysctl.c:2128 #15 0xffffffff80bd96af in sys___sysctl (td=0xfffff8006a5e2000, uap=0xfffff8006a5e23c0) at /usr/src/sys/kern/kern_sysctl.c:2063 #16 0xffffffff810a9084 in syscallenter (td=0xfffff8006a5e2000) at /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:135 #17 amd64_syscall (td=0xfffff8006a5e2000, traced=0) at /usr/src/sys/amd64/amd64/trap.c:1192 #18 <signal handler called> #19 0x00000008007df91a in ?? () Backtrace stopped: Cannot access memory at address 0x7fffffffdc98 (kgdb) frame 9 #9 0xffffffff80cf66a3 in sysctl_dumpentry (rn=0xfffff80004901680, vw=0xfffffe002cd087b8) at /usr/src/sys/net/rtsock.c:1579 1579 info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr; (kgdb) p rt $2 = (struct rtentry *) 0xfffff80004901680 (kgdb) p rt->rt_ifp $3 = (struct ifnet *) 0xfffff80004f9f800 (kgdb) p rt->rt_ifp->if_addr $4 = (struct ifaddr *) 0x0 Closing as duplicate to earlier PR #227720. The problem fixed before 11.3-RELEASE and 12.1-RELEASE. *** This bug has been marked as a duplicate of bug 227720 *** |