FreeBSD 8.0 will freeze during the kernel panic memory dump. Fix: I use self developed instrument code to prevent dead lock and find the cpu which is used for dumping is seized by ehci_interrupt which is locked up. The fix is to add a critical_enter() in function doadump. btw: following is my dirty and quick instrument code: #define TRACELEVEL 5 void lapic_handle_intr(int vector, struct trapframe *frame) { struct intsrc *isrc; #ifdef TRACELEVEL char tfip[20]; if (1/*vector == 0x30*/){ int i = 0; int j = 0; int cpuid = PCPU_GET(cpuid); struct amd64_frame * frame1; if (!INKERNEL(frame->tf_rip)) goto out; frame1 = (struct amd64_frame *) (frame->tf_rbp); sprintf(tfip, "%x\n", frame->tf_rip); for (i = 0; i < 8; i++){ *(((unsigned char *)0xffffffff800b8000) + cpuid*300 + j*9*2 + i*2) = tfip[i]; if (*((unsigned char *)0xffffffff800b8001 + cpuid*300 + j*9*2 + i*2) != 121) *((unsigned char *)0xffffffff800b8001 + cpuid*300 + j*9*2 + i*2) = 121; else *((unsigned char *)0xffffffff800b8001 + cpuid*300 + j*9*2 + i*2) = 120; } *(((unsigned char *)0xffffffff800b8000) + cpuid*300 + j*9*2 + 8*2) = ' '; *((unsigned char *)0xffffffff800b8001 + cpuid*300 + j*9*2 + 8*2) = 121; j = 1; while (j <= TRACELEVEL){ if (!INKERNEL((long)frame1)) goto out; sprintf(tfip, "%x\n", frame1->f_retaddr); for (i = 0; i < 8; i++){ *(((unsigned char *)0xffffffff800b8000) +cpuid*300 + j*9*2 + i*2) = tfip[i]; if (*((unsigned char *)0xffffffff800b8001 + cpuid*300 +j*9*2 + i*2) != 121) *((unsigned char *)0xffffffff800b8001 + cpuid*300 +j*9*2 + i*2) = 121; else *((unsigned char *)0xffffffff800b8001 + cpuid*300 +j*9*2 + i*2) = 120; } *(((unsigned char *)0xffffffff800b8000) + cpuid*300 +j*9*2 + 8*2) = ' '; *((unsigned char *)0xffffffff800b8001 +cpuid*300 + j*9*2 + 8*2) = 121; frame1 = frame1->f_frame; j++; } } } out: #endif if (vector == -1) panic("Couldn't get vector from ISR!"); isrc = intr_lookup_source(apic_idt_to_irq(PCPU_GET(apic_id), vector)); intr_execute_handlers(isrc, frame); } How-To-Repeat: Allocate a large memory, trigger a kernel panic, and let it dump
Responsible Changed From-To: freebsd-bugs->freebsd-fs Over to maintainer(s). Apparently the fix is simple (patch doadump).
Responsible Changed From-To: freebsd-fs->avg This PR looks like a duplicate of PR 139614.
Do you have a patch? Have you tested it? Can you explain how it works? Can you also review the following PR http://www.freebsd.org/cgi/query-pr.cgi?pr=amd64/139614 and commits referenced by it? - Do those match your problem? -- Andriy Gapon
Thanks for attention On Wed, Mar 21, 2012 at 6:28 PM, Andriy Gapon <avg@freebsd.org> wrote: > Do you have a patch? The following is the patch --- kern_shutdown.c~ +++ kern_shutdown.c @@ -242,6 +242,7 @@ } savectx(&dumppcb); + critical_enter(); dumptid = curthread->td_tid; dumping++; #ifdef DDB @@ -263,6 +264,9 @@ return (0); } >Have you tested it? test it many times, after my patch, dumping is no longer freeze >Can you explain how it works? 1) how the patch works first let's assume the panic cpu id is 0, soon after cpu 0 is begin to dump, kernel scheduler preempt cpu 0 to execute other thread which soon locked up (usb subsystem). my patch is to prevent scheduler to preempt the dumping thread. 2)how my instrument code works on each interrupt, print the current exeuction stack for each cpu in the system to vga memory 3)what I find in dumping freeze scenery every time after the dumping freezes, the instrument code told me the usb subsystem is locked up 4)either my patch or disable usb in bios will prevent the dumping freeze > > Can you also review the following PR > http://www.freebsd.org/cgi/query-pr.cgi?pr=amd64/139614 and commits referenced > by it? - Do those match your problem? 1)my PR is a sub problem of PR 139614, the commits settles the dumping freeze problem and dumping mistake problem altogether. 2)I have reviewed current code before submit the PR and find the stopping other CPUs and scheduler treatment, and submit the PR for goodness of who do want patch heavily on their current work, and my patch don't settle the dumping mistake under heavy interrupt condition. >
on 23/03/2012 06:19 Zhouyi Zhou said the following: > Thanks for attention Sorry for taking so long to reply... > On Wed, Mar 21, 2012 at 6:28 PM, Andriy Gapon <avg@freebsd.org> wrote: >> Do you have a patch? > The following is the patch > --- kern_shutdown.c~ > +++ kern_shutdown.c > @@ -242,6 +242,7 @@ > } > > savectx(&dumppcb); > + critical_enter(); > dumptid = curthread->td_tid; > dumping++; > #ifdef DDB > @@ -263,6 +264,9 @@ > return (0); > } >> Have you tested it? > test it many times, after my patch, dumping is no longer freeze >> Can you explain how it works? > 1) how the patch works > first let's assume the panic cpu id is 0, soon after cpu 0 is begin to dump, > kernel scheduler preempt cpu 0 to execute other thread which soon > locked up (usb subsystem). > my patch is to prevent scheduler to preempt the dumping thread. OK. Now I see what your patch does and I think that this is a good workaround. Although it won't help all the cases - e.g. if a thread running on a different CPU does memory-related operations then that can still confuse the dumping code. But at least the panic-ing/dumping CPU won't get indefinitely stuck. [snip] >> Can you also review the following PR >> http://www.freebsd.org/cgi/query-pr.cgi?pr=amd64/139614 and commits referenced >> by it? - Do those match your problem? > 1)my PR is a sub problem of PR 139614, the commits settles the dumping > freeze problem and dumping mistake problem > altogether. > 2)I have reviewed current code before submit the PR and find the > stopping other CPUs and scheduler treatment, and submit the PR for > goodness of who do want patch heavily on their current work, and my > patch don't settle the dumping mistake under heavy interrupt > condition. Yes, thank you for the investigation and the patch. And sorry for taking too long to act on your report. Now, I have MFCed the main part of the CPU/scheduler stopping commits to stable/8. The new behavior is disabled by default, but could be enabled via a tunable. In stable/9 the changes are fully MFCed and enabled bydefault. What is your opinion - should that be good enough or is your patch still needed? Assuming that you can try stable/8, could you please test if the latest code there is able to correctly handle your environment (hardware, interrupt load, etc)? Thank you! -- Andriy Gapon
State Changed From-To: open->patched Update to the state of of PR 139614
Andriy, I cvsuped the FB8 stable, and sysctl kern.stop_scheduler_on_panic=1 Dumping is always successfully in 3 rounds of tries (2000M memory dump)). Many thanks Best Wishes Zhouyi
State Changed From-To: patched->closed See PR 139614.