Bug 118255 - savecore never finding kernel core dumps (rcorder problem)
Summary: savecore never finding kernel core dumps (rcorder problem)
Status: Closed Not Enough Information
Alias: None
Product: Base System
Classification: Unclassified
Component: conf (show other bugs)
Version: 6.3-PRERELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: Mark Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-26 02:30 UTC by Jeremy Chadwick
Modified: 2017-11-05 21:12 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Chadwick freebsd_committer freebsd_triage 2007-11-26 02:30:01 UTC
	One of our production systems has begun kernel panic'ing for reasons unknown;
	we're in the process of figuring out why that's happening.  On the other hand,
	none of our kernel panics (which are being written to disk when doing "panic"
	from DDB) are being dropped into /var/crash when savecore runs.

	Details of our configuration and what actually happens were posted to
	freebsd-stable.  It shows that a kernel core dump is indeed written to the
	correct device (/dev/ad0s1b), but savecore never detects the cores:

	http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038069.html
	http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038569.html

	I believe the problem is that /etc/rc.d/swap1 (which does `swapon -a`) is
	being called _before_ /etc/rc.d/savecore, thus clobbering/stomping over any
	core dumps that exist.  See the 2nd URL above for some additional details.

	I'm marking this serious/medium because people being able to get vmcore
	images after a kernel panic is important.  :-)

Fix: 

I believe the issue can be fixed by adjusting some of the rcorder(8) values
	so that savecore gets run *before* swap1. I'm not familiar with what needs to
	be changed to make this work.
How-To-Repeat: 	Set dumpdev and dumpdir in /etc/rc.conf, panic system, and see.
Comment 1 Remko Lodder freebsd_committer freebsd_triage 2007-11-26 06:22:07 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-rc

reassign to rc team
Comment 2 Antony Mawer 2007-11-30 01:30:17 UTC
There seems to be conflicting information about what constitutes the 
correct behaviour here. The original 4.4BSD "Unix System Manager's 
Manual (SMM)", found here:

     http://docs.freebsd.org/44doc/smm/02.config/paper-6.html

Indicates the following (found under the "System dumps" heading):

     - Kernel dumps write from the end of swap and work backwards
     - The kernel uses swap from the front and works forward
     - This way it reduces the chance of swapping overwriting the dump
        during the boot process until savecore is run

This somewhat more modern posting suggests that is still the case:

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2005-11/0703.html

However the FreeBSD Developers' Handbook suggests a behaviour that does 
not match the current reality:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug.html#EXTRACT-DUMP

Can anyone speak with more authority on this...?

--Antony
Comment 3 Jeremy Chadwick freebsd_committer freebsd_triage 2007-12-01 10:55:36 UTC
This is great information; thanks for providing it! I found it quite
educational/informational.  The documentation Antony provided seems to
indicate two things:

1) That savecore(8) should really be run before swapon(8) -- I don't see
any indication that swap needs to be made available prior to mounting
filesystems, which is what Doug B. was stating was a necessity.

2) That even regardless of Item #1, savecore(8) should be working
(assuming that kernel dumps are still written from the end of the swap
device to the front (e.g. backwards)), and that swapon(8) shouldn't be
stomping on kernel dumps.

I haven't tried changing the rcorder of the /etc/rc.d scripts in
question to see if it works.  My gut feeling says it probably will, but
I'm not sure of the implications.

Doug, can you provide some comments/insight here when time permits?

-- 
| Jeremy Chadwick                                    jdc at parodius.com |
| Parodius Networking                           http://www.parodius.com/ |
| UNIX Systems Administrator                      Mountain View, CA, USA |
| Making life hard for others since 1977.                  PGP: 4BD6C0CB |
Comment 4 Chris Rees freebsd_committer freebsd_triage 2012-10-25 20:36:15 UTC
State Changed
From-To: open->feedback

This indicates to me that your swap device is possibly too small to fit 
the coredump and the fsck results.  How big is it?
Comment 5 Enji Cooper freebsd_committer freebsd_triage 2017-06-09 19:29:05 UTC
(In reply to Chris Rees from comment #4)

Actually, no. The problem is that it can't find a coredump on a partition.

I'll take a look at this bug...
Comment 6 Enji Cooper freebsd_committer freebsd_triage 2017-11-05 20:56:16 UTC
Mark might have fixed this recently.
Comment 7 Mark Johnston freebsd_committer freebsd_triage 2017-11-05 21:12:33 UTC
swapon will not clobber any kernel dump metadata - the swap pager doesn't store any metadata on the swap device. There's a potential issue if we start swapping after the swap device is enabled and before we attempt to recover the kernel dump, but this is quite unlikely, and kernel dumps are written in a way that minimizes the likelihood of swap data overwriting a kernel dump header.

It doesn't look like it was ever confirmed that switching the order does in fact fix the problem. If the issue occurs again, a useful first step would be to read back the last sector of the dump device with dd(1) after a crash. However, given the age of this PR, I suspect we won't be able to make any progress. I'm sorry that the PR didn't get much attention.