Created attachment 199328 [details]
this is the core.txt. that is all I have. from the command crashinfo -k /boot/kernel/kernel vmcore.6
The system panics when there are a lot 10+ ppp connections
Architecture Version: 2
Dump Length: 147546112
Dumptime: Sun Nov 18 20:15:11 2018
Magic: FreeBSD Kernel Dump
Version String: FreeBSD 11.2-STABLE #9 r340586M: Sun Nov 18 18:06:09 -02 2018
Panic String: page fault
Dump Parity: 763915341
Dump Status: good
the server is a tunnel server runs ppp from inetd with the entry:
ppp-in stream tcp nowait/0/1800/0 root /usr/sbin/ppp ppp -direct ppp-in
at some point in time, minutes, hours... the system panics.. in the ppp program, seems like there is a bug in one of the ng_xxx module... the system works find in FreeBSD 10.4 the same kernel no bt is available.
Created attachment 199337 [details]
another panic... seems that a problem in routing socket
Another panic, seems a bad pointer /usr/src/sys/net/rtsock.c:1916
this time I got full crashinfo with kgdb and gdb.
Created attachment 199339 [details]
it puts a test for null pointer at rtsock.c:1559
Created attachment 199511 [details]
remove code that panics the system due to invalid memory access
The panic happens when the code: both i386 and amd64
info.rti_info[RTAX_IFP] = rt->rt_ifp->if_addr->ifa_addr;
at /usr/src/sys/net/rtsock.c near line 1568 the code tries to access rt->rt_ifp->if_addr->ifa_addr, but because rt->rt_ifp points to an already freed memory, and the pointer is NOT NULL, probably because the free code does not nulls the pointer. or a race condition in the code, the system panics at page fault in kernel mode
The patch removes the line from the rtsock.c for a while until some guru tracks the race condition or fixes the pointer to a null value after rt->rt_ifp is freed and so a test can be done..
the server in test holds many (100+) pppoi connections that changes every time, the server runs routed with flags=-s so the route tables are stressed in add/delete routes all the time... sometimes it panics in seconds, sometimes after hours... Now with this patch, it is working 24/7 for some days...
A commit references this bug:
Date: Tue Nov 27 09:04:07 UTC 2018
New revision: 341008
Fix possible panic during ifnet detach in rtsock.
The panic can happen, when some application does dump of routing table
using sysctl interface. To prevent this, set IFF_DYING flag in
if_detach_internal() function, when ifnet under lock is removed from
the chain. In sysctl_rtsock() take IFNET_RLOCK_NOSLEEP() to prevent
ifnet detach during routes enumeration. In case, if some interface was
detached in the time before we take the lock, add the check, that ifnet
is not DYING. This prevents access to memory that could be freed after
ifnet is unlinked.
PR: 227720, 230498, 233306
Reviewed by: bz, eugen
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D18338