One of our production systems has begun kernel panic'ing for reasons unknown; we're in the process of figuring out why that's happening. On the other hand, none of our kernel panics (which are being written to disk when doing "panic" from DDB) are being dropped into /var/crash when savecore runs. Details of our configuration and what actually happens were posted to freebsd-stable. It shows that a kernel core dump is indeed written to the correct device (/dev/ad0s1b), but savecore never detects the cores: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038069.html http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038569.html I believe the problem is that /etc/rc.d/swap1 (which does `swapon -a`) is being called _before_ /etc/rc.d/savecore, thus clobbering/stomping over any core dumps that exist. See the 2nd URL above for some additional details. I'm marking this serious/medium because people being able to get vmcore images after a kernel panic is important. :-) Fix: I believe the issue can be fixed by adjusting some of the rcorder(8) values so that savecore gets run *before* swap1. I'm not familiar with what needs to be changed to make this work. How-To-Repeat: Set dumpdev and dumpdir in /etc/rc.conf, panic system, and see.
Responsible Changed From-To: freebsd-bugs->freebsd-rc reassign to rc team
There seems to be conflicting information about what constitutes the correct behaviour here. The original 4.4BSD "Unix System Manager's Manual (SMM)", found here: http://docs.freebsd.org/44doc/smm/02.config/paper-6.html Indicates the following (found under the "System dumps" heading): - Kernel dumps write from the end of swap and work backwards - The kernel uses swap from the front and works forward - This way it reduces the chance of swapping overwriting the dump during the boot process until savecore is run This somewhat more modern posting suggests that is still the case: http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2005-11/0703.html However the FreeBSD Developers' Handbook suggests a behaviour that does not match the current reality: http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html#EXTRACT-DUMP Can anyone speak with more authority on this...? --Antony
This is great information; thanks for providing it! I found it quite educational/informational. The documentation Antony provided seems to indicate two things: 1) That savecore(8) should really be run before swapon(8) -- I don't see any indication that swap needs to be made available prior to mounting filesystems, which is what Doug B. was stating was a necessity. 2) That even regardless of Item #1, savecore(8) should be working (assuming that kernel dumps are still written from the end of the swap device to the front (e.g. backwards)), and that swapon(8) shouldn't be stomping on kernel dumps. I haven't tried changing the rcorder of the /etc/rc.d scripts in question to see if it works. My gut feeling says it probably will, but I'm not sure of the implications. Doug, can you provide some comments/insight here when time permits? -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
State Changed From-To: open->feedback This indicates to me that your swap device is possibly too small to fit the coredump and the fsck results. How big is it?
(In reply to Chris Rees from comment #4) Actually, no. The problem is that it can't find a coredump on a partition. I'll take a look at this bug...
Mark might have fixed this recently.
swapon will not clobber any kernel dump metadata - the swap pager doesn't store any metadata on the swap device. There's a potential issue if we start swapping after the swap device is enabled and before we attempt to recover the kernel dump, but this is quite unlikely, and kernel dumps are written in a way that minimizes the likelihood of swap data overwriting a kernel dump header. It doesn't look like it was ever confirmed that switching the order does in fact fix the problem. If the issue occurs again, a useful first step would be to read back the last sector of the dump device with dd(1) after a crash. However, given the age of this PR, I suspect we won't be able to make any progress. I'm sorry that the PR didn't get much attention.