Bug 28164

Summary: [PATCH] crashdump can trash disklabel/other partitions
Product: Base System Reporter: Phil Homewood <pdh>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.3-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
file.diff
none
dmesg.boot
none
disklabel-da1s2 none

Description Phil Homewood 2001-06-15 06:00:07 UTC
	Crashdumps can overwrite the last few blocks of the device
	immediately before the dumpdev, if the dumpdev is approximately
	the same size as physical memory.

Fix: The following patch works but is probably incorrect (in tests
	on this machine, I needed an extra 10 blocks of disk space, and
	one page of physical memory requires 8 blocks of disk).
	Someone with a better understanding than I have should review
	this patch, but it should provide a good starting point.


How-To-Repeat: 	Set up a swap device of the same size as physical memory and
	force a crashdump (eg, from DDB).
Comment 1 Bruce Evans 2001-06-22 13:31:24 UTC
On Fri, 15 Jun 2001, Phil Homewood wrote:

> >Description:
> 	Crashdumps can overwrite the last few blocks of the device
> 	immediately before the dumpdev, if the dumpdev is approximately
> 	the same size as physical memory.
> 
> >How-To-Repeat:
> 	Set up a swap device of the same size as physical memory and
> 	force a crashdump (eg, from DDB).

This did not repeat it for me :-).

> >Fix:
> 
> 	The following patch works but is probably incorrect (in tests
> 	on this machine, I needed an extra 10 blocks of disk space, and
> 	one page of physical memory requires 8 blocks of disk).
> 	Someone with a better understanding than I have should review
> 	this patch, but it should provide a good starting point.
> 
> --- kern/kern_shutdown.c.orig	Mon Jun 11 23:12:10 2001
> +++ kern/kern_shutdown.c	Fri Jun 15 14:46:53 2001
> @@ -421,7 +421,7 @@
>  	/*
>  	 * XXX should clean up checking in dumpsys() to be more like this.
>  	 */
> -	newdumplo = psize - Maxmem * PAGE_SIZE / DEV_BSIZE;
> +	newdumplo = psize - (Maxmem + 2) * PAGE_SIZE / DEV_BSIZE;
>  	if (newdumplo < 0)
>  		return (ENOSPC);
>  	dumpdev = dev;
> --- kern/subr_disk.c.orig	Fri Jun  1 02:47:45 2001
> +++ kern/subr_disk.c	Fri Jun 15 14:46:53 2001
> @@ -91,7 +91,7 @@
>  	dl = dsgetlabel(dev, dp->d_slice);
>  	if (!dl)
>  		return (ENXIO);
> -	*count = (u_long)Maxmem * PAGE_SIZE / dl->d_secsize;
> +	*count = (u_long)(Maxmem + 2) * PAGE_SIZE / dl->d_secsize;
>  	if (dumplo < 0 || 
>  	    (dumplo + *count > dl->d_partitions[dkpart(dev)].p_size))
>  		return (EINVAL);

I don't see how these patches can help.  The first hunk prevents dumping
if the device size (in bytes) is precisely the same as the memory size
(according to Maxmem).  But dumping will still occur if the device size is
2 pages larger, and then the second hunk almost ensures that any overrun
still occurs (since it adjusts the dump size up by the same amount that
the first hunk adjusts the dump start down).  It also has bad side effects:
- it causes 2 nonexistent pages to be dumped.  This might cause NMIs or
  worse.
- it causes overflow on machines with 4GB less 2 pages of memory instead
  of only on machines with 4GB of memory, if Maxmem can reach 4GB.  Better
  original code:

	*count = (u_long)Maxmem * (PAGE_SIZE / dl->d_secsize);

  This assumes that PAGE_SIZE is a multiple of dl->d_secsize, but all dump
  routines already assume this.

The patch might help by avoidng rounding bugs in the dump routines (e.g.,
they might round *count up to a multiple of 128, so it's best to have
*count a multiple of 128 to begin with), but I couldn't see any bugs like
that.

Bruce
Comment 2 pdh 2001-06-25 01:54:23 UTC
Bruce Evans wrote:
> > >How-To-Repeat:
> > 	Set up a swap device of the same size as physical memory and
> > 	force a crashdump (eg, from DDB).
> 
> This did not repeat it for me :-).

Hmm.

OK, more info, see dmesg attached.

> I don't see how these patches can help.  The first hunk prevents dumping
> if the device size (in bytes) is precisely the same as the memory size
> (according to Maxmem).  But dumping will still occur if the device size is
> 2 pages larger, and then the second hunk almost ensures that any overrun
> still occurs (since it adjusts the dump size up by the same amount that
> the first hunk adjusts the dump start down).  It also has bad side effects:
> - it causes 2 nonexistent pages to be dumped.  This might cause NMIs or
>   worse.

Erm, no, it doesn't. Unless I'm missing something, it just causes an extra
two pages of disk to be required for the dump. Yes, this is wrong, there's
no logical reason I can see for that number, but I just wanted to give
some kind of starting point...

> - it causes overflow on machines with 4GB less 2 pages of memory instead
>   of only on machines with 4GB of memory, if Maxmem can reach 4GB.  Better
>   original code:
> 
> 	*count = (u_long)Maxmem * (PAGE_SIZE / dl->d_secsize);
> 
>   This assumes that PAGE_SIZE is a multiple of dl->d_secsize, but all dump
>   routines already assume this.
> 
> The patch might help by avoidng rounding bugs in the dump routines (e.g.,
> they might round *count up to a multiple of 128, so it's best to have
> *count a multiple of 128 to begin with), but I couldn't see any bugs like
> that.

I can probably use this machine for another couple of days if you want
me to do some more testing/debugging of this problem. (It's due to go
production real soon.) I couldn't understand why the dump was
overflowing at all, I just needed it to stop doing so. :-)

Second attachment is a disklabel of the disk containing the dumpdev.
The critical size for da1s2b is 524298 blocks; at that size the dump
works, but at 524297 blocks it trashes the label.
Comment 3 pdh 2001-06-25 01:58:16 UTC
Phil Homewood wrote:
> Second attachment is a disklabel of the disk containing the dumpdev.
> The critical size for da1s2b is 524298 blocks; at that size the dump
> works, but at 524297 blocks it trashes the label.

I forgot to mention: da1s2 is, of course, the second slice on the
disk, if that makes any difference:


******* Working on device /dev/da1 *******
parameters extracted from in-core disklabel are:
cylinders=17501 heads=64 sectors/track=32 (2048 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=17501 heads=64 sectors/track=32 (2048 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165,(FreeBSD/NetBSD/386BSD)
    start 32, size 524256 (255 Meg), flag 0
	beg: cyl 0/ sector 1/ head 1;
	end: cyl 255/ sector 32/ head 63
The data for partition 2 is:
sysid 165,(FreeBSD/NetBSD/386BSD)
    start 524288, size 35317760 (17245 Meg), flag 0
	beg: cyl 256/ sector 1/ head 0;
	end: cyl 1023/ sector 32/ head 63
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
Comment 4 Bruce Evans 2001-06-25 07:02:09 UTC
On Mon, 25 Jun 2001, Phil Homewood wrote:

> Bruce Evans wrote:
> > This did not repeat it for me :-).
> ...
> OK, more info, see dmesg attached.

Thanks.

> > I don't see how these patches can help.  The first hunk prevents dumping
> > ...
> > the first hunk adjusts the dump start down).  It also has bad side effects:
> > - it causes 2 nonexistent pages to be dumped.  This might cause NMIs or
> >   worse.
> 
> Erm, no, it doesn't. Unless I'm missing something, it just causes an extra
> two pages of disk to be required for the dump. Yes, this is wrong, there's

I think you're missing that `*count' is an output parameter.  Something must
be written to the extra space, and that something is extra pages beyond what
`Maxmem' says is the end of memory.

> Second attachment is a disklabel of the disk containing the dumpdev.

> The critical size for da1s2b is 524298 blocks; at that size the dump
> works, but at 524297 blocks it trashes the label.

> # /dev/da1s2c:
> type: SCSI
> disk: da1s2
> ...
> 8 partitions:
> #        size   offset    fstype   [fsize bsize bps/cpg]
>   b:   526336        0      swap                    	# (Cyl.    0 - 256)
>   c: 35317760        0    unused        0     0       	# (Cyl.    0 - 17244)
>   e: 34791424   526336    4.2BSD     1024  8192    16 	# (Cyl.  257 - 17244)

I think you are just trashing the label becauses it is inside da1s2b, not
overrunning da1s2b.  This is certainly a bug.  The label is write protected
in software, but the dump routines don't even know that the label is there.
This bug is not seen in most configurations because the dump partition
usually starts at a nonzero offset.

Untested incomplete fixes (the boot block area should also be protected...):

Index: kern_shutdown.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/kern_shutdown.c,v
retrieving revision 1.101
diff -u -2 -r1.101 kern_shutdown.c
--- kern_shutdown.c	2001/04/29 02:44:48	1.101
+++ kern_shutdown.c	2001/06/25 06:00:08
@@ -431,5 +431,5 @@
 	 */
 	newdumplo = psize - Maxmem * PAGE_SIZE / DEV_BSIZE;
-	if (newdumplo < 0)
+	if (newdumplo <= LABELSECTOR)
 		return (ENOSPC);
 	dumpdev = dev;
Index: subr_disk.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/subr_disk.c,v
retrieving revision 1.41
diff -u -2 -r1.41 subr_disk.c
--- subr_disk.c	2001/05/29 18:19:57	1.41
+++ subr_disk.c	2001/06/25 06:00:24
@@ -156,5 +156,5 @@
 		return (ENXIO);
 	*count = (u_long)Maxmem * PAGE_SIZE / dl->d_secsize;
-	if (dumplo < 0 || 
+	if (dumplo <= LABELSECTOR || 
 	    (dumplo + *count > dl->d_partitions[dkpart(dev)].p_size))
 		return (EINVAL);

Bruce
Comment 5 pdh 2001-06-25 07:15:46 UTC
Bruce Evans wrote:
> I think you are just trashing the label becauses it is inside da1s2b, not
> overrunning da1s2b.

That was a conclusion I was starting to draw, too.

> This is certainly a bug.  The label is write protected
> in software, but the dump routines don't even know that the label is there.

Should it be there?

> This bug is not seen in most configurations because the dump partition
> usually starts at a nonzero offset.

Yep.

Is this configuration acceptable ('b' is a swap partition) or should
it start at non-zero offset? Alas, I can't remember if it was
/stand/sysinstall or /dev/luser who put it there like that.

> Untested incomplete fixes (the boot block area should also be protected...):

Will test 'em tomorrow and let you know. Thanks.
Comment 6 Bruce Evans 2001-06-25 14:31:11 UTC
On Mon, 25 Jun 2001, Phil Homewood wrote:

> Bruce Evans wrote:
> > I think you are just trashing the label becauses it is inside da1s2b, not
> > overrunning da1s2b.
> 
> That was a conclusion I was starting to draw, too.

BTW, I jumped to the conclusion that it was overrunning the partition for
you because of another report that claimed overrunning.

> > This is certainly a bug.  The label is write protected
> > in software, but the dump routines don't even know that the label is there.
> 
> Should it be there?

Yes, they should know, since they bypass the normal block number conversion
and bounds checking and need to understand the label (contents, not where
it is) to do so.

> > This bug is not seen in most configurations because the dump partition
> > usually starts at a nonzero offset.
> 
> Is this configuration acceptable ('b' is a swap partition) or should
> it start at non-zero offset? Alas, I can't remember if it was
> /stand/sysinstall or /dev/luser who put it there like that.

It is acceptable to me (I like corner cases to work prefectly :-), but
is risky because it is unusual.  The same bug used to affect swapping,
but it bit enough to get it worked around a long time ago.  The swap
code just avoids using the first 128K or so.  It really should understand
labels too (the label might be at a strange place, after 128K ...), or
better, a common routine should understand all the metadata at the start
of the device.

Bruce
Comment 7 pdh 2001-06-26 02:25:37 UTC
Bruce Evans wrote:
> Untested incomplete fixes (the boot block area should also be protected...):

This works for me, with the following additional patch, and makes much
more sense than my effort. :-)


--- sys/kern/kern_shutdown.c.orig       Tue Jun 26 09:50:40 2001
+++ sys/kern/kern_shutdown.c    Tue Jun 26 09:51:04 2001
@@ -59,6 +59,7 @@
 #include <sys/conf.h>
 #include <sys/sysproto.h>
 #include <sys/cons.h>
+#include <sys/disklabel.h>

 #include <machine/pcb.h>
 #include <machine/clock.h>
Comment 8 iedowse freebsd_committer freebsd_triage 2001-12-03 00:39:25 UTC
State Changed
From-To: open->closed


Fixed by bde in rev.1.103 of kern_shutdown.c, and MFC'd before 
4.4-RELEASE.