Bug 131314

Summary: [modules] [panic] large modules fail to load on amd64
Product: Base System Reporter: Kenneth D. Merry <ken>
Component: amd64Assignee: freebsd-amd64 (Nobody) <amd64>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Kenneth D. Merry freebsd_committer freebsd_triage 2009-02-02 22:10:01 UTC
Loading a large module (2.5MB in size) on amd64 panics the kernel.  We
found a work-around, which was to remove a static structure that was
700K in size.  After that the module loads without a problem.

The same module worked fine on i386 with the static structure in place.

# kldload ./hasc.module                                                                                                                                         Fatal trap 12: page fault while in kernel mode                                  
cpuid = 1; apic id = 01                                                         fault virtual address   = 0xfffffffffb3b6000                                    fault code              = supervisor write data, page not present               
instruction pointer     = 0x8:0xffffffff8044c49a                                
stack pointer           = 0x10:0xfffffffffb11d600                               
frame pointer           = 0x10:0xfffffffffb11d790                               
code segment            = base rx0, limit 0xfffff, type 0x1b                    
                        = DPL 0, pres 1, long 1, def32 0, gran 1                
processor eflags        = interrupt enabled, resume, IOPL = 0                   
current process         = 840 (kldload)                                         
[thread pid 840 tid 100106 ]                                                    
Stopped at      0xffffffff8044c49a = bzero+0xa: repe stosq      %es:(%rdi)      
db> bt                                                                          
Tracing pid 840 tid 100106 td 0xffffff0004c45a50                                
bzero() at 0xffffffff8044c49a = bzero+0xa                                       
linker_load_module() at 0xffffffff802d02cf = linker_load_module+0x8cf           
kern_kldload() at 0xffffffff802d0847 = kern_kldload+0xa7                        
kldload() at 0xffffffff802d0934 = kldload+0x84                                  
syscall() at 0xffffffff8044d65b = syscall+0x1cb                                 
Xfast_syscall() at 0xffffffff8043346b = Xfast_syscall+0xab                      
--- syscall (304, FreeBSD ELF64, kldload), rip = 0x800683c6c, rsp =             
+0x7fffffffec38, rbp = 0 ---                                                    
db>

How-To-Repeat: Come up with a large kernel loadable module.  Try to load it on amd64,
and see the kernel crash.
Comment 1 Andriy Gapon freebsd_committer freebsd_triage 2010-12-05 12:19:07 UTC
Ken,

this is an interesting problem. Are you able to reproduce it still?
I think that something like that should be easily debuggable.

-- 
Andriy Gapon
Comment 2 Kenneth D. Merry freebsd_committer freebsd_triage 2010-12-06 16:11:39 UTC
On Sun, Dec 05, 2010 at 14:19:07 +0200, Andriy Gapon wrote:
> 
> Ken,
> 
> this is an interesting problem. Are you able to reproduce it still?
> I think that something like that should be easily debuggable.

Unfortunately, no.  I no longer have access to the codebase that was
generating the problem.  (It was at my previous job.)

But, you may be able to reproduce the problem by putting a very large
global variable/table in the kernel and see what happens.  As I
mentioned in the bug report, I made the problem go away by taking out that
table.

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG
Comment 3 Andriy Gapon freebsd_committer freebsd_triage 2010-12-06 17:11:52 UTC
on 06/12/2010 18:11 Kenneth D. Merry said the following:
> But, you may be able to reproduce the problem by putting a very large
> global variable/table in the kernel and see what happens.  As I
> mentioned in the bug report, I made the problem go away by taking out that
> table.

I will try.
BTW, do you remember if it was initialized or not, and how it was used?
I.e., did the structure end up in .data, .rodata or .bss?

-- 
Andriy Gapon
Comment 4 Andriy Gapon freebsd_committer freebsd_triage 2010-12-06 17:49:26 UTC
on 06/12/2010 19:11 Andriy Gapon said the following:
> BTW, do you remember if it was initialized or not, and how it was used?
> I.e., did the structure end up in .data, .rodata or .bss?

I couldn't reproduce the problem with an array of about 800KB in size in any of
the three sections.
I think that gives a ground for closing the PR.
If the problem resurfaces then we could re-open it or open a new one.
Do you agree?

-- 
Andriy Gapon
Comment 5 Kenneth D. Merry freebsd_committer freebsd_triage 2010-12-06 17:52:48 UTC
On Mon, Dec 06, 2010 at 19:11:52 +0200, Andriy Gapon wrote:
> on 06/12/2010 18:11 Kenneth D. Merry said the following:
> > But, you may be able to reproduce the problem by putting a very large
> > global variable/table in the kernel and see what happens.  As I
> > mentioned in the bug report, I made the problem go away by taking out that
> > table.
> 
> I will try.
> BTW, do you remember if it was initialized or not, and how it was used?
> I.e., did the structure end up in .data, .rodata or .bss?

Yes, it was initialized.

It had text and various numeric values in it I think.

e.g. something like this:

struct foo {
	int bar;
	char *baz;
} teststruct[] = {
	{1, "blah"},
	{2, "blahblah"},
	{0, NULL}
}

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG
Comment 6 Kenneth D. Merry freebsd_committer freebsd_triage 2010-12-06 17:58:11 UTC
On Mon, Dec 06, 2010 at 19:49:26 +0200, Andriy Gapon wrote:
> on 06/12/2010 19:11 Andriy Gapon said the following:
> > BTW, do you remember if it was initialized or not, and how it was used?
> > I.e., did the structure end up in .data, .rodata or .bss?
> 
> I couldn't reproduce the problem with an array of about 800KB in size in any of
> the three sections.
> I think that gives a ground for closing the PR.
> If the problem resurfaces then we could re-open it or open a new one.
> Do you agree?

I would do one more test, but this time try bumping the size up to say 5MB
or so.

The reason is, the module in question was already very large to start with,
and the table put it over the edge.  So the total amount of static data in
the module was probably a good bit larger than 800KB.

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG
Comment 7 Andriy Gapon freebsd_committer freebsd_triage 2010-12-06 18:08:18 UTC
on 06/12/2010 19:58 Kenneth D. Merry said the following:
> The reason is, the module in question was already very large to start with,
> and the table put it over the edge.  So the total amount of static data in
> the module was probably a good bit larger than 800KB.

OK, bumped the array size to ~7MB:
0000000000000000 l     O .rodata        00000000006d6000 large3
0000000000000020 l     O .data  00000000006d6000 large2
0000000000000000 l     O .bss   00000000006d6000 large1
Total module file size is ~14MB.

Everything is OK still.

-- 
Andriy Gapon
Comment 8 Kenneth D. Merry freebsd_committer freebsd_triage 2010-12-06 18:12:07 UTC
On Mon, Dec 06, 2010 at 20:08:18 +0200, Andriy Gapon wrote:
> on 06/12/2010 19:58 Kenneth D. Merry said the following:
> > The reason is, the module in question was already very large to start with,
> > and the table put it over the edge.  So the total amount of static data in
> > the module was probably a good bit larger than 800KB.
> 
> OK, bumped the array size to ~7MB:
> 0000000000000000 l     O .rodata        00000000006d6000 large3
> 0000000000000020 l     O .data  00000000006d6000 large2
> 0000000000000000 l     O .bss   00000000006d6000 large1
> Total module file size is ~14MB.
> 
> Everything is OK still.

Fair enough, I'd say close the bug.

If anyone runs into it again they can reopen it.

Thanks!

Ken
-- 
Kenneth Merry
ken@FreeBSD.ORG
Comment 9 Andriy Gapon freebsd_committer freebsd_triage 2010-12-06 18:16:11 UTC
State Changed
From-To: open->closed

The problem doesn't seem to be reproducible with the recent 
code in head.