Bug 133786

Summary: [netinet] [patch] ip_input might cause kernel panic
Product: Base System Reporter: Ivan Panachev <ivan.panachev>
Component: kernAssignee: Andre Oppermann <andre>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 7.1-RELEASE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.diff none

Description Ivan Panachev 2009-04-16 18:40:07 UTC
Some time ago one of my FreeBSD boxes began to panic. I've got two panic dumps and inspected stack traces. Here's the first one (another's the very same):

$ kgdb kernel vmcore.0
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-marcel-freebsd"...

Unread portion of the kernel message buffer:

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xc
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc07f5588
stack pointer           = 0x28:0xe3db8a78
frame pointer           = 0x28:0xe3db8a94
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 14 (swi1: net)
trap number             = 12
panic: page fault
cpuid = 0
Uptime: 1d4h3m47s
Physical memory: 973 MB
Dumping 65 MB: 50 34 18 2

#0  doadump () at pcpu.h:196
196     pcpu.h: No such file or directory.
        in pcpu.h

(kgdb) backtrace
#0  doadump () at pcpu.h:196
#1  0xc07a68b7 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xc07a6b89 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xc0af377c in trap_fatal (frame=0xe3db8a38, eva=12) at /usr/src/sys/i386/i386/trap.c:939
#4  0xc0af3a00 in trap_pfault (frame=0xe3db8a38, usermode=0, eva=12) at /usr/src/sys/i386/i386/trap.c:849
#5  0xc0af43bc in trap (frame=0xe3db8a38) at /usr/src/sys/i386/i386/trap.c:528
#6  0xc0ada22b in alltraps_with_regs_pushed () at /usr/src/sys/i386/i386/exception.s:155
#7  0xc0850008 in pfil_head_register (ph=0x30) at /usr/src/sys/net/pfil.c:102
#8  0xc089aff8 in ip_forward (m=0xc46e3100, srcrt=0) at /usr/src/sys/netinet/ip_input.c:1308
#9  0xc089c82c in ip_input (m=0xc46e3100) at /usr/src/sys/netinet/ip_input.c:610
#10 0xc084e5d5 in netisr_dispatch (num=2, m=0xc46e3100) at /usr/src/sys/net/netisr.c:185
#11 0xc08423a1 in ether_demux (ifp=0xc4521c00, m=0xc46e3100) at /usr/src/sys/net/if_ethersubr.c:834
#12 0xc0842793 in ether_input (ifp=0xc4521c00, m=0xc46e3100) at /usr/src/sys/net/if_ethersubr.c:692
#13 0xc084d903 in vlan_input (ifp=0xc400b400, m=0xc46e3100) at /usr/src/sys/net/if_vlan.c:946
#14 0xc08422e7 in ether_demux (ifp=0xc400b400, m=0xc46e3100) at /usr/src/sys/net/if_ethersubr.c:743
#15 0xc0842793 in ether_input (ifp=0xc400b400, m=0xc46e3100) at /usr/src/sys/net/if_ethersubr.c:692
#16 0xc09a375e in sis_rxeof (sc=0xc3fdeb00) at /usr/src/sys/pci/if_sis.c:1476
#17 0xc09a4264 in sis_poll (ifp=0xc400b400, cmd=POLL_ONLY, count=5) at /usr/src/sys/pci/if_sis.c:1589
#18 0xc0799c7b in netisr_poll () at /usr/src/sys/kern/kern_poll.c:432
#19 0xc084e842 in swi_net (dummy=0x0) at /usr/src/sys/net/netisr.c:254
#20 0xc07847fb in ithread_loop (arg=0xc3e92230) at /usr/src/sys/kern/kern_intr.c:1088
#21 0xc0781369 in fork_exit (callout=0xc0784640 <ithread_loop>, arg=0xc3e92230, frame=0xe3db8d38) at /usr/src/sys/kern/kern_fork.c:804
#22 0xc0ada2a0 in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:255
#23 0x00000000 in ?? ()

According to core dump and kernel message buffer the panic occured in m_copydata function called from #8 (m_copydata isn't reflected in backtrace, don't know why).

(kgdb) frame 8
#8  0xc089aff8 in ip_forward (m=0xc46e3100, srcrt=0) at /usr/src/sys/netinet/ip_input.c:1308
1308                    m_copydata(m, 0, mcopy->m_len, mtod(mcopy, caddr_t));

Inspecting local vars I found the reason of the problem:

(kgdb) p mcopy->m_hdr.mh_len
$1 = 204
(kgdb) p mcopy->m_hdr.mh_next
$2 = (struct mbuf *) 0x0
(kgdb) p *mcopy
$3 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xc4582a34 "E", mh_len = 212, mh_flags = 2, mh_type = 1, pad = "\000"}, M_dat = {MH = {MH_pkthdr = {rcvif = 0xc4521c00, header = 0x0, len = 204,
        csum_flags = 0, csum_data = 0, tso_segsz = 0, ether_vtag = 807, tags = {slh_first = 0x0}}, MH_dat = {MH_ext = {ext_buf = 0x30000045 <Address 0x30000045 out of bounds>, ext_free = 0x409483,
          ext_args = 0xaf760680, ext_size = 2729879744, ref_cnt = 0xa8af90d9, ext_type = -2029987982},
..

(kgdb) p m->m_hdr.mh_len
$4 = 48
(kgdb) p m->m_hdr.mh_next
$5 = (struct mbuf *) 0x0
(kgdb) p *m
$6 = {m_hdr = {mh_next = 0x0, mh_nextpkt = 0x0, mh_data = 0xc4705012 "E", mh_len = 48, mh_flags = 3, mh_type = 1, pad = "\000"}, M_dat = {MH = {MH_pkthdr = {rcvif = 0xc4521c00, header = 0x0, len = 48,
        csum_flags = 0, csum_data = 0, tso_segsz = 0, ether_vtag = 807, tags = {slh_first = 0x0}}, MH_dat = {MH_ext = {ext_buf = 0xc4705000 "", ext_free = 0, ext_args = 0x0, ext_size = 2048,
..


So in_forward function called m_copydata with source data m shorter (48 octets) than expected (204 octets) and it cause kernel to panic. After that I've made a quickfix (see patch attached), it seems to solve the problem.

P.S. If any Core Team member wants to get a full core dump, kernel build or something else, contact me via e-mail specified

Fix: Patch attached with submission follows:
How-To-Repeat: Never tried, it repeats itself :)
Comment 1 Ed Schouten freebsd_committer freebsd_triage 2009-09-18 14:45:51 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-net

This looks like a networking issue.
Comment 2 Bruce M Simpson 2009-09-18 16:40:20 UTC
Interesting... the input checks in ip_input() should really have 
screened this out, however, if m->m_len is indeed smaller than mcopy 
(temporary mbuf created in the ip_forward() slow path), then 
m_copydata() may well stomp on memory not owned by the mbuf chain.
Comment 3 Andre Oppermann freebsd_committer freebsd_triage 2010-08-10 23:21:07 UTC
Responsible Changed
From-To: freebsd-net->andre

Take over.
Comment 4 Andre Oppermann freebsd_committer freebsd_triage 2010-08-15 15:35:05 UTC
Ivan

Thank you for your bug report.  For reproducing this problem I have
two questions:

  a) do you have any firewalls (ipfw, pf, ipfilter) or IPSec tunnels
     active when the panic happens?

  b) can you still reproduce the panic?

Thanks
-- 
Andre
Comment 5 Andre Oppermann freebsd_committer freebsd_triage 2010-08-17 22:11:43 UTC
State Changed
From-To: open->feedback
Comment 6 Ivan Panachev 2010-11-25 16:50:11 UTC
Hello,

Sorry for such a late answer.

In fact this box didn't stop panicking after my workaround, just
panicked less than before.
I've spent a lot of time debugging the issue, tried the same hardware, etc.
Finally I've become convinced that it's a bad hardware of this specific box=
.

I suppose this ticket should be closed.

On Thu, Apr 16, 2009 at 9:40 PM,  <FreeBSD-gnats-submit@freebsd.org> wrote:
> Thank you very much for your problem report.
> It has the internal identification `kern/133786'.
> The individual assigned to look at your
> report is: freebsd-bugs.
>
> You can access the state of your problem report at any time
> via this link:
>
> http://www.freebsd.org/cgi/query-pr.cgi?pr=3D133786
>
>>Category: =A0 =A0 =A0 kern
>>Responsible: =A0 =A0freebsd-bugs
>>Synopsis: =A0 =A0 =A0 ip_input might cause kernel panic
>>Arrival-Date: =A0 Thu Apr 16 17:40:07 UTC 2009
>
Comment 7 Maxim Konovalov freebsd_committer freebsd_triage 2010-11-26 06:30:42 UTC
State Changed
From-To: feedback->closed

Closed at the submitter's request: bad hardware suspected.