| Summary: | [PATCH] crashdump can trash disklabel/other partitions | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Phil Homewood <pdh> | ||||||||
| Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> | ||||||||
| Status: | Closed FIXED | ||||||||||
| Severity: | Affects Only Me | ||||||||||
| Priority: | Normal | ||||||||||
| Version: | 4.3-STABLE | ||||||||||
| Hardware: | Any | ||||||||||
| OS: | Any | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Phil Homewood
2001-06-15 06:00:07 UTC
On Fri, 15 Jun 2001, Phil Homewood wrote: > >Description: > Crashdumps can overwrite the last few blocks of the device > immediately before the dumpdev, if the dumpdev is approximately > the same size as physical memory. > > >How-To-Repeat: > Set up a swap device of the same size as physical memory and > force a crashdump (eg, from DDB). This did not repeat it for me :-). > >Fix: > > The following patch works but is probably incorrect (in tests > on this machine, I needed an extra 10 blocks of disk space, and > one page of physical memory requires 8 blocks of disk). > Someone with a better understanding than I have should review > this patch, but it should provide a good starting point. > > --- kern/kern_shutdown.c.orig Mon Jun 11 23:12:10 2001 > +++ kern/kern_shutdown.c Fri Jun 15 14:46:53 2001 > @@ -421,7 +421,7 @@ > /* > * XXX should clean up checking in dumpsys() to be more like this. > */ > - newdumplo = psize - Maxmem * PAGE_SIZE / DEV_BSIZE; > + newdumplo = psize - (Maxmem + 2) * PAGE_SIZE / DEV_BSIZE; > if (newdumplo < 0) > return (ENOSPC); > dumpdev = dev; > --- kern/subr_disk.c.orig Fri Jun 1 02:47:45 2001 > +++ kern/subr_disk.c Fri Jun 15 14:46:53 2001 > @@ -91,7 +91,7 @@ > dl = dsgetlabel(dev, dp->d_slice); > if (!dl) > return (ENXIO); > - *count = (u_long)Maxmem * PAGE_SIZE / dl->d_secsize; > + *count = (u_long)(Maxmem + 2) * PAGE_SIZE / dl->d_secsize; > if (dumplo < 0 || > (dumplo + *count > dl->d_partitions[dkpart(dev)].p_size)) > return (EINVAL); I don't see how these patches can help. The first hunk prevents dumping if the device size (in bytes) is precisely the same as the memory size (according to Maxmem). But dumping will still occur if the device size is 2 pages larger, and then the second hunk almost ensures that any overrun still occurs (since it adjusts the dump size up by the same amount that the first hunk adjusts the dump start down). It also has bad side effects: - it causes 2 nonexistent pages to be dumped. This might cause NMIs or worse. - it causes overflow on machines with 4GB less 2 pages of memory instead of only on machines with 4GB of memory, if Maxmem can reach 4GB. Better original code: *count = (u_long)Maxmem * (PAGE_SIZE / dl->d_secsize); This assumes that PAGE_SIZE is a multiple of dl->d_secsize, but all dump routines already assume this. The patch might help by avoidng rounding bugs in the dump routines (e.g., they might round *count up to a multiple of 128, so it's best to have *count a multiple of 128 to begin with), but I couldn't see any bugs like that. Bruce Bruce Evans wrote: > > >How-To-Repeat: > > Set up a swap device of the same size as physical memory and > > force a crashdump (eg, from DDB). > > This did not repeat it for me :-). Hmm. OK, more info, see dmesg attached. > I don't see how these patches can help. The first hunk prevents dumping > if the device size (in bytes) is precisely the same as the memory size > (according to Maxmem). But dumping will still occur if the device size is > 2 pages larger, and then the second hunk almost ensures that any overrun > still occurs (since it adjusts the dump size up by the same amount that > the first hunk adjusts the dump start down). It also has bad side effects: > - it causes 2 nonexistent pages to be dumped. This might cause NMIs or > worse. Erm, no, it doesn't. Unless I'm missing something, it just causes an extra two pages of disk to be required for the dump. Yes, this is wrong, there's no logical reason I can see for that number, but I just wanted to give some kind of starting point... > - it causes overflow on machines with 4GB less 2 pages of memory instead > of only on machines with 4GB of memory, if Maxmem can reach 4GB. Better > original code: > > *count = (u_long)Maxmem * (PAGE_SIZE / dl->d_secsize); > > This assumes that PAGE_SIZE is a multiple of dl->d_secsize, but all dump > routines already assume this. > > The patch might help by avoidng rounding bugs in the dump routines (e.g., > they might round *count up to a multiple of 128, so it's best to have > *count a multiple of 128 to begin with), but I couldn't see any bugs like > that. I can probably use this machine for another couple of days if you want me to do some more testing/debugging of this problem. (It's due to go production real soon.) I couldn't understand why the dump was overflowing at all, I just needed it to stop doing so. :-) Second attachment is a disklabel of the disk containing the dumpdev. The critical size for da1s2b is 524298 blocks; at that size the dump works, but at 524297 blocks it trashes the label. Phil Homewood wrote:
> Second attachment is a disklabel of the disk containing the dumpdev.
> The critical size for da1s2b is 524298 blocks; at that size the dump
> works, but at 524297 blocks it trashes the label.
I forgot to mention: da1s2 is, of course, the second slice on the
disk, if that makes any difference:
******* Working on device /dev/da1 *******
parameters extracted from in-core disklabel are:
cylinders=17501 heads=64 sectors/track=32 (2048 blks/cyl)
Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=17501 heads=64 sectors/track=32 (2048 blks/cyl)
Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165,(FreeBSD/NetBSD/386BSD)
start 32, size 524256 (255 Meg), flag 0
beg: cyl 0/ sector 1/ head 1;
end: cyl 255/ sector 32/ head 63
The data for partition 2 is:
sysid 165,(FreeBSD/NetBSD/386BSD)
start 524288, size 35317760 (17245 Meg), flag 0
beg: cyl 256/ sector 1/ head 0;
end: cyl 1023/ sector 32/ head 63
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
On Mon, 25 Jun 2001, Phil Homewood wrote: > Bruce Evans wrote: > > This did not repeat it for me :-). > ... > OK, more info, see dmesg attached. Thanks. > > I don't see how these patches can help. The first hunk prevents dumping > > ... > > the first hunk adjusts the dump start down). It also has bad side effects: > > - it causes 2 nonexistent pages to be dumped. This might cause NMIs or > > worse. > > Erm, no, it doesn't. Unless I'm missing something, it just causes an extra > two pages of disk to be required for the dump. Yes, this is wrong, there's I think you're missing that `*count' is an output parameter. Something must be written to the extra space, and that something is extra pages beyond what `Maxmem' says is the end of memory. > Second attachment is a disklabel of the disk containing the dumpdev. > The critical size for da1s2b is 524298 blocks; at that size the dump > works, but at 524297 blocks it trashes the label. > # /dev/da1s2c: > type: SCSI > disk: da1s2 > ... > 8 partitions: > # size offset fstype [fsize bsize bps/cpg] > b: 526336 0 swap # (Cyl. 0 - 256) > c: 35317760 0 unused 0 0 # (Cyl. 0 - 17244) > e: 34791424 526336 4.2BSD 1024 8192 16 # (Cyl. 257 - 17244) I think you are just trashing the label becauses it is inside da1s2b, not overrunning da1s2b. This is certainly a bug. The label is write protected in software, but the dump routines don't even know that the label is there. This bug is not seen in most configurations because the dump partition usually starts at a nonzero offset. Untested incomplete fixes (the boot block area should also be protected...): Index: kern_shutdown.c =================================================================== RCS file: /home/ncvs/src/sys/kern/kern_shutdown.c,v retrieving revision 1.101 diff -u -2 -r1.101 kern_shutdown.c --- kern_shutdown.c 2001/04/29 02:44:48 1.101 +++ kern_shutdown.c 2001/06/25 06:00:08 @@ -431,5 +431,5 @@ */ newdumplo = psize - Maxmem * PAGE_SIZE / DEV_BSIZE; - if (newdumplo < 0) + if (newdumplo <= LABELSECTOR) return (ENOSPC); dumpdev = dev; Index: subr_disk.c =================================================================== RCS file: /home/ncvs/src/sys/kern/subr_disk.c,v retrieving revision 1.41 diff -u -2 -r1.41 subr_disk.c --- subr_disk.c 2001/05/29 18:19:57 1.41 +++ subr_disk.c 2001/06/25 06:00:24 @@ -156,5 +156,5 @@ return (ENXIO); *count = (u_long)Maxmem * PAGE_SIZE / dl->d_secsize; - if (dumplo < 0 || + if (dumplo <= LABELSECTOR || (dumplo + *count > dl->d_partitions[dkpart(dev)].p_size)) return (EINVAL); Bruce Bruce Evans wrote: > I think you are just trashing the label becauses it is inside da1s2b, not > overrunning da1s2b. That was a conclusion I was starting to draw, too. > This is certainly a bug. The label is write protected > in software, but the dump routines don't even know that the label is there. Should it be there? > This bug is not seen in most configurations because the dump partition > usually starts at a nonzero offset. Yep. Is this configuration acceptable ('b' is a swap partition) or should it start at non-zero offset? Alas, I can't remember if it was /stand/sysinstall or /dev/luser who put it there like that. > Untested incomplete fixes (the boot block area should also be protected...): Will test 'em tomorrow and let you know. Thanks. On Mon, 25 Jun 2001, Phil Homewood wrote: > Bruce Evans wrote: > > I think you are just trashing the label becauses it is inside da1s2b, not > > overrunning da1s2b. > > That was a conclusion I was starting to draw, too. BTW, I jumped to the conclusion that it was overrunning the partition for you because of another report that claimed overrunning. > > This is certainly a bug. The label is write protected > > in software, but the dump routines don't even know that the label is there. > > Should it be there? Yes, they should know, since they bypass the normal block number conversion and bounds checking and need to understand the label (contents, not where it is) to do so. > > This bug is not seen in most configurations because the dump partition > > usually starts at a nonzero offset. > > Is this configuration acceptable ('b' is a swap partition) or should > it start at non-zero offset? Alas, I can't remember if it was > /stand/sysinstall or /dev/luser who put it there like that. It is acceptable to me (I like corner cases to work prefectly :-), but is risky because it is unusual. The same bug used to affect swapping, but it bit enough to get it worked around a long time ago. The swap code just avoids using the first 128K or so. It really should understand labels too (the label might be at a strange place, after 128K ...), or better, a common routine should understand all the metadata at the start of the device. Bruce Bruce Evans wrote:
> Untested incomplete fixes (the boot block area should also be protected...):
This works for me, with the following additional patch, and makes much
more sense than my effort. :-)
--- sys/kern/kern_shutdown.c.orig Tue Jun 26 09:50:40 2001
+++ sys/kern/kern_shutdown.c Tue Jun 26 09:51:04 2001
@@ -59,6 +59,7 @@
#include <sys/conf.h>
#include <sys/sysproto.h>
#include <sys/cons.h>
+#include <sys/disklabel.h>
#include <machine/pcb.h>
#include <machine/clock.h>
State Changed From-To: open->closed Fixed by bde in rev.1.103 of kern_shutdown.c, and MFC'd before 4.4-RELEASE. |