| Summary: | cmp can not compare files lager 2GB but smaller 4GB | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | kazarov <kazarov> | ||||||
| Component: | bin | Assignee: | dwmalone | ||||||
| Status: | Closed FIXED | ||||||||
| Severity: | Affects Only Me | ||||||||
| Priority: | Normal | ||||||||
| Version: | 4.2-STABLE | ||||||||
| Hardware: | Any | ||||||||
| OS: | Any | ||||||||
| Attachments: |
|
||||||||
On Tue, 30 Jan 2001 kazarov@izmiran.rssi.ru wrote: > And tests shows that cmp cannot mmap two files of silze greate about 1213MB (on my system). This seems about right for i386's. Two files have to be mapped below the kernel start address of 3G, so their size is limited to 1.5GB. Big files, at least, have to be mapped after libraries, so there size is further limited. > So constant should be decreased to 1GB This is rather machine-dependent, and probably too large to be optimal anyway. To fix the machine-dependencies, I think cmp should just try to mmap both files and fall back to c_special() if this fails. Note that the current check against SIZE_T_MAX is just to prevent overflow, but it is broken in several ways: (1) It SIZE_T_MAX with MIN(len1, len2), but it needs to compare with MAX(len1, len2) to prevent overflow when the lengths are cast to size_t. (2) It needs comapare SIZE_T_MAX with MAX(len1 + skip1 % pagesize, len2 + skip2 % pagesize) to prevent overflow when the adjusted lengths are passed to mmap(). The casts in (1) are bogus anyway. They are unnecessary if a prototype for mmap() is in scope and are in the wrong place otherwise (the whole adjusted lengths should be cast). Bruce > > So constant should be decreased to 1GB > > This is rather machine-dependent, and probably too large to be optimal > anyway. > > To fix the machine-dependencies, I think cmp should just try to mmap > both files and fall back to c_special() if this fails. Note that > the current check against SIZE_T_MAX is just to prevent overflow, > but it is broken in several ways: > (1) It SIZE_T_MAX with MIN(len1, len2), but it needs to compare with > MAX(len1, len2) to prevent overflow when the lengths are cast to > size_t. > (2) It needs comapare SIZE_T_MAX with > MAX(len1 + skip1 % pagesize, len2 + skip2 % pagesize) to prevent > overflow when the adjusted lengths are passed to mmap(). > The casts in (1) are bogus anyway. They are unnecessary if a prototype > for mmap() is in scope and are in the wrong place otherwise (the whole > adjusted lengths should be cast). Hi, Bruce, Sorry, I do not understand why compare MAX(...) to SIZE_T_MAX. IMHO, mmap maps the file into process memory so both files together should fit into process accessible address space which is less then SIZE_T_MAX (my expirements with cmp showed that is about 2430MB, on Solaris 2.4 it was 2GB only). So it's nessesary to compare (len1+skip1%pagesize + len2+skip2%pagesize) to 2430MB (for my system). Cause, reverting to c_special on mmap failure could be a better way but, IMHO, the best solution would be proposal of Poul-Henning Kamp <phk@FreeBSD.org> ( http://www.freebsd.org/cgi/query-pr.cgi?pr=18589 ): "Add a loop around the mmap/compare operation which operates on some moderate amount of data each iteration." Sincerely Yours Dmitry On Tue, 30 Jan 2001, Dmitry Kazarov wrote: > Sorry, I do not understand why compare MAX(...) to SIZE_T_MAX. > IMHO, mmap maps the file into process memory so both files together should > fit into process accessible address space which is less then SIZE_T_MAX (my > expirements with cmp showed that is about 2430MB, on Solaris 2.4 it was 2GB > only). mmap()'s `len' arg has type size_t, so lengths larger than SIZE_T_MAX can't be passed to mmap() for it to check (and fail on). > So it's nessesary to compare (len1+skip1%pagesize + len2+skip2%pagesize) to > 2430MB (for my system). mmap() will check this if the full lengths can be passed to it. Since their are 2 mmap()s, there is a machine-dependent amount of space between them. PAGE_SIZE is probably sufficient, but you have to look at the implementation to tell. Your 2430MB must have some slop for this. > Cause, reverting to c_special on mmap failure could be a better way but, > IMHO, the best solution would be proposal of Poul-Henning Kamp > <phk@FreeBSD.org> ( http://www.freebsd.org/cgi/query-pr.cgi?pr=18589 ): > "Add a loop around the mmap/compare operation which operates on some > moderate amount of data each iteration." I agree. cp uses 8MB for the moderate amount. So does xinstall for its file comparison function. xinstall also falls back to read() when mmap() fails, and has a -M flag to prevent use of mmap(). Bruce Responsible Changed From-To: freebsd-bugs->dwmalone I'll have a look at this one since I had a look at size limits in tail. Attached patch solves (I hope) this problem. Basic idea of the patch -- to mmap() some moderate amount (8MB) of data rather then entire file (this idea belongs to phk@FreeBSD.org -- see bin/18589). Andrew. P.S. Patch was made against RELENG_4 sources (extern.h v 1.1.1.1.14.1 and regular.c v 1.7.2.2). Unfortunately anoncvs.freebsd.org is down ;-( I've produced a version of Andrew's patch which just mmaps the
things in 8MB chunks. For reasons I don't understand, this actually
seems to make c_regular faster (uses about half as much user time).
I'll commit this later in the week, unless anyone has any objections.
David.
Index: regular.c
===================================================================
RCS file: /cvs/FreeBSD-CVS/src/usr.bin/cmp/regular.c,v
retrieving revision 1.10
diff -u -r1.10 regular.c
--- regular.c 20 Jun 2000 20:28:40 -0000 1.10
+++ regular.c 29 Oct 2001 16:27:07 -0000
@@ -51,6 +51,9 @@
#include "extern.h"
+static u_char *remmap __P((u_char *, int, off_t));
+#define MMAP_CHUNK (8*1024*1024)
+
#define ROUNDPAGE(i) ((i) & ~pagemask)
void
@@ -59,7 +62,7 @@
char *file1, *file2;
off_t skip1, len1, skip2, len2;
{
- u_char ch, *p1, *p2;
+ u_char ch, *p1, *p2, *m1, *m2, *e1, *e2;
off_t byte, length, line;
int dfound;
off_t pagemask, off1, off2;
@@ -81,23 +84,24 @@
off2 = ROUNDPAGE(skip2);
length = MIN(len1, len2);
- if (length > SIZE_T_MAX)
- return (c_special(fd1, file1, skip1, fd2, file2, skip2));
- if ((p1 = (u_char *)mmap(NULL, (size_t)len1 + skip1 % pagesize,
- PROT_READ, MAP_SHARED, fd1, off1)) == (u_char *)MAP_FAILED)
- err(ERR_EXIT, "%s", file1);
-
- madvise(p1, len1 + skip1 % pagesize, MADV_SEQUENTIAL);
- if ((p2 = (u_char *)mmap(NULL, (size_t)len2 + skip2 % pagesize,
- PROT_READ, MAP_SHARED, fd2, off2)) == (u_char *)MAP_FAILED)
- err(ERR_EXIT, "%s", file2);
- madvise(p2, len2 + skip2 % pagesize, MADV_SEQUENTIAL);
+ if ((m1 = remmap(NULL, fd1, off1)) == NULL)
+ c_special(fd1, file1, skip1, fd2, file2, skip2);
+ /* NOTREACHED */
+
+ if ((m2 = remmap(NULL, fd2, off2)) == NULL) {
+ munmap(m1, MMAP_CHUNK);
+ c_special(fd1, file1, skip1, fd2, file2, skip2);
+ /* NOTREACHED */
+ }
dfound = 0;
- p1 += skip1 - off1;
- p2 += skip2 - off2;
- for (byte = line = 1; length--; ++p1, ++p2, ++byte) {
+ e1 = m1 + MMAP_CHUNK;
+ e2 = m2 + MMAP_CHUNK;
+ p1 = m1 + (skip1 - off1);
+ p2 = m2 + (skip2 - off2);
+
+ for (byte = line = 1; length--; ++byte) {
if ((ch = *p1) != *p2) {
if (xflag) {
dfound = 1;
@@ -111,10 +115,41 @@
}
if (ch == '\n')
++line;
+ if (++p1 == e1) {
+ off1 += MMAP_CHUNK;
+ if ((p1 = m1 = remmap(m1, fd1, off1)) == NULL) {
+ munmap(m2, MMAP_CHUNK);
+ err(ERR_EXIT, "remmap %s", file1);
+ }
+ e1 = m1 + MMAP_CHUNK;
+ }
+ if (++p2 == e2) {
+ off2 += MMAP_CHUNK;
+ if ((p2 = m2 = remmap(m2, fd2, off2)) == NULL) {
+ munmap(m1, MMAP_CHUNK);
+ err(ERR_EXIT, "remmap %s", file2);
+ }
+ e2 = m2 + MMAP_CHUNK;
+ }
}
if (len1 != len2)
eofmsg (len1 > len2 ? file2 : file1);
if (dfound)
exit(DIFF_EXIT);
+}
+
+static u_char *
+remmap(mem, fd, offset)
+ u_char *mem;
+ int fd;
+ off_t offset;
+{
+ if (mem != NULL)
+ munmap(mem, MMAP_CHUNK);
+ mem = mmap(NULL, MMAP_CHUNK, PROT_READ, MAP_SHARED, fd, offset);
+ if (mem == MAP_FAILED)
+ return (NULL);
+ madvise(mem, MMAP_CHUNK, MADV_SEQUENTIAL);
+ return (mem);
}
State Changed From-To: open->closed Fix committed to both -current and -stable. Thanks! |
in /usr/src/usr.bin/cmp/regular.c, line 84, length of files is compared with SIZE_T_MAX which is equal to 0xFFFFFFFF (4GB-1). But man mmap says: BUGS len is limited to 2GB. Mmapping slightly more than 2GB doesn't work, but it is possible to map a window of size (filesize % 2GB) for file sizes of slightly less than 2G, 4GB, 6GB and 8GB. And tests shows that cmp cannot mmap two files of silze greate about 1213MB (on my system). So constant should be decreased to 1GB How-To-Repeat: user$ dd if=/dev/zero of=tmp seek=2048k bs=1k count=1 1+0 records in 1+0 records out 1024 bytes transferred in 0.000258 secs (3969471 bytes/sec) user$ ln tmp tmp.l user$ cmp tmp tmp.l cmp: tmp: Invalid argument