Bug 26176

Summary: Kernel panic when using IPsec on high loads
Product: Base System Reporter: gunther <gunther>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.2-RELEASE   
Hardware: Any   
OS: Any   

Description gunther 2001-03-28 19:10:01 UTC
Kernel panic (see messages below) when used as a router for
for videoconferencing (i.e. under moderate to high load of
UDP streaming, 1.5 to 2 Mb/s) with IPsec tunnel mode enabled
(static keying, *no* IKE/racoon) and using IPFW (with or 
without any firewall rules.) Kernel panics approximately 5
minutes after the streaming starts.

Both input and output to the gateway go through the same 
ethernet device (an "xl" device.)

Will reproduce more of problem tomorrow and will use kernel
debugger and more recent STABLE versions. Would be good to 
have some feedback though that would limit my search space
for trial-and-error attempts.

Two incidents:

Incident 1:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xb5c0a612
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc014dcf8
stack pointer           = 0x10:0xc0201d8c
frame pointer           = 0x10:0xc0201d98
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          = net tty
trap number             = 12
panic: page fault

syncing disks...
done
Uptime: 1h48m1s


Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xb6c03812
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc014dcf8
stack pointer           = 0x10:0xc0201b08
frame pointer           = 0x10:0xc0201b14
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          = net tty
trap number             = 12
panic: page fault
Uptime: 1h48m2s


-------------------
Incident 2:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xb4c08a00
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0197484
stack pointer           = 0x10:0xc0201ab8
frame pointer           = 0x10:0xc0201ac8
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          = net tty
trap number             = 12
panic: page fault

syncing disks...
done
Uptime: 36m55s

How-To-Repeat: The problem consistently occurs about 5 minutes after the high
load begins. Have one machine at remote site now but will try
in laboratory setting and with KDB tomorrow.

PLEASE let me know if this problem (or a similar problem) is 
known and has workaround or fixes of any kind.
Comment 1 gunther 2001-03-29 19:14:05 UTC
Here is more information:

Fatal trap 12: page fault while in kernel mode
fault virtual address   = 0xb2c04400
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc0199fa0
stack pointer           = 0x10:0xc020c218
frame pointer           = 0x10:0xc020c268
code segment            = base rx0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = Idle
interrupt mask          = net tty
kernel: type 12 trap, code=0
Stopped at      esp_hdrsiz+0x498:       movl    0(%edx),%eax

So, the problem seems to be in the IPsec code, sys/netinet6/esp_output.c
called from sys/netinet6/ipsec.c. Here is the stack trace:

esp_hdrsiz(c0b48500,c0b485f5,c0b3f400,c0ceb800,2) at esp_hdrsiz+0x498
esp4_output(c0b48500,c0ceb800,c0ceba00,0,1) at esp4_output+0x48
ipsec4_output(c020c418,c0ceba00,1,c0ceef00,c0b5af00) at ipsec4_output+0x2e3
ip_output(c0b1be00,0,c0229a50,1,0) at ip_output+0x762
ip_stripoptions(c0b1be00,0,c0b1be00,0,ffffffff) at ip_stripoptions+0x211
ip_input(c0b1be00) at ip_input+0x462
ip_input(c01d374f,0,d0f0010,10,c7a50010) at ip_input+0x7b7
doreti_popl_fs_fault() at doreti_popl_fs_fault+0x91

I am assuming if I upgrade to some more current version of the IPsec
code the problem might have been fixed. But am not sure... I will
report more later.

thanks
-- 
Gunther Schadow, M.D., Ph.D.                    gschadow@regenstrief.org
Medical Information Scientist      Regenstrief Institute for Health Care
Adjunct Assistent Professor        Indiana University School of Medicine
tel:1(317)630-7960                         http://aurora.regenstrief.org
Comment 2 iedowse freebsd_committer freebsd_triage 2001-04-12 17:31:02 UTC
State Changed
From-To: open->feedback

I think this may have been fixed in revision 1.130.2.21 of  
src/sys/netinet/ip_input.c. Could you try updating to a more recent 
-stable to see if this problem still exists?
Comment 3 iedowse freebsd_committer freebsd_triage 2001-06-05 20:52:21 UTC
State Changed
From-To: feedback->closed


This bug (icmp_error mbuf corruption) has been fixed. Thanks 
for the bug report!