Bug 162558 - [dummynet] [panic] seldom dummynet panics
Summary: [dummynet] [panic] seldom dummynet panics
Status: Closed DUPLICATE of bug 220078
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 8.2-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Eugene Grosbein
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-11-14 10:00 UTC by Eugene Grosbein
Modified: 2017-09-26 06:43 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eugene Grosbein 2011-11-14 10:00:23 UTC
	My high loaded PPPoE servers (mpd-5.5) panic seldom in dummynet code.
	Last time they were updated to 8.2-STABLE/amd64 18 October 2011
	and today I've got another panic. These panic generate crashdumps
	but kgdb just cannot read them for unknown reason:

# kgdb kernel.debug /path/to/vmcore.1
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...
Cannot access memory at address 0x746c75616620
(kgdb) bt
#0  0x0000000000000000 in ?? ()
Cannot access memory at address 0x0

	I've remote console also and at the moment of panic it shows:

ipfw: pullup failed
ipfw: pullup failed
ipfw: pullup failed
ipfw: pullup failed
ipfw: pullup failed
ipfw: pullup failed


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x308
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff802e7657
stack pointer           = 0x28:0xffffff81229e1970
frame pointer           = 0x28:0xffffff81229e1990
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (dummynet)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff801ab1ca = db_trace_self_wrapper+0x2a
kdb_backtrace() at 0xffffffff803294a7 = kdb_backtrace+0x37
panic() at 0xffffffff802f69ee = panic+0x27e
trap_fatal() at 0xffffffff804ddc00 = trap_fatal+0x290
trap_pfault() at 0xffffffff804ddfdf = trap_pfault+0x28f
trap() at 0xffffffff804de4bf = trap+0x3df
calltrap() at 0xffffffff804c5b74 = calltrap+0x8
--- trap 0xc, rip = 0xffffffff802e7657, rsp = 0xffffff81229e1970, rbp = 0xffffff81229e1990 ---
_mtx_lock_sleep() at 0xffffffff802e7657 = _mtx_lock_sleep+0x67
igmp_input() at 0xffffffff803dbc2d = igmp_input+0xf8d
ip_input() at 0xffffffff803fca3d = ip_input+0xad
netisr_dispatch_src() at 0xffffffff803bab7e = netisr_dispatch_src+0x7e
dummynet_send() at 0xffffffff803ee18e = dummynet_send+0x14e
dummynet_task() at 0xffffffff803ee386 = dummynet_task+0x1c6
taskqueue_run_locked() at 0xffffffff80335885 = taskqueue_run_locked+0x85
taskqueue_thread_loop() at 0xffffffff80335a1e = taskqueue_thread_loop+0x4e
fork_exit() at 0xffffffff802cab0f = fork_exit+0x11f
fork_trampoline() at 0xffffffff804c60be = fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffff81229e1d00, rbp = 0 ---
Uptime: 27d16h2m57s
Dumping 871 out of 4079 MB:..2%..12%..21%..32%..41%..52%..61%..72%..81%..92%
Dump complete
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...

Fix: 

Unknown.
How-To-Repeat: 	Panics occur seldom, about 6 weeks between them.
	This panic occured in a hour of least load.

	I use dummynet actively to rate-limits users:
	every user get its own pair of dynamic dummynet pipes for local traffic
	and another pair for external traffic:

ipfw add 1010 pipe tablarg ip from $mynets to 'table(13)' in recv $uplink
ipfw add 1020 pipe tablarg ip from 'table(14)' to $mynets in recv 'ng*'
ipfw add 1022 pipe 1 ip from $mynets to any in recv $uplink
ipfw add 1024 pipe 3 ip from any to $mynets in recv 'ng*'
ipfw add 5000 pipe tablearg ip from any to 'table(11)' in
ipfw add 5010 pipe tablearg ip from 'table(12)' to any in

	Here is my /etc/sysctl.conf tuning for dummynet:

net.inet.ip.dummynet.pipe_slot_limit=1000
net.inet.ip.dummynet.io_fast=1
net.inet.ip.fw.one_pass: 1

	All my pipes have queue length 1000 and 64 buckets and
	distinct bandwidth and burst values.
Comment 1 Mark Linimon freebsd_committer freebsd_triage 2011-11-14 19:10:55 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

Over to maintainer(s).
Comment 2 Eugene Grosbein 2012-02-15 12:25:43 UTC
Hi!

The source of this problem seems to be famous 'dangling pointer' problem:
- mbufs with packets from PPPoE users sometimes stall within dummynet queues,
- then user disconnects, its ngX interface get destroyed,
- then dummynet attempts to dereference its ifp pointer and panic occurs.

There is workaround consisting of several tunnables eliminating races:

- net.isr.bindthreads=1 in /boot/loader.conf;
- net.isr.direct=1 and net.isr.direct_force=1 in /etc/sysctl.conf
  (default)

Plus, use recent 8.2-STABLE as it contains some netgraph fixes
for bugs that lead to panics in 8.2-RELEASE and early 8.2-STABLE versions.

With these precautions I run my routers rock stable for months.

Eugene Grosbein
Comment 3 Eugene Grosbein 2013-01-22 05:33:51 UTC
Hi!

The same problem had repeated 20 January with FreeBSD 8.3-STABLE
built from late October 2012 sources. This time I've got nice crashdump
with backtrace. That's with net.isr.bindthreads=1, net.isr.direct=1 and
net.isr.direct_force=1.

Full crashinfo, kernel.debug and crashdump are available here:
http://www.grosbein.net/freebsd/crash/20130120/core.0.txt
http://www.grosbein.net/freebsd/crash/20130120/kernel.debug.xz (8.4M)
http://www.grosbein.net/freebsd/crash/20130120/vmcore.0.xz
Comment 4 Eugene Grosbein freebsd_committer 2017-09-19 18:01:38 UTC
Seems to be duplicate of 220078 (unlocked access to INADDR_TO_IFP in the multicast handling code).

*** This bug has been marked as a duplicate of bug 220078 ***