| Summary: | buffer leak in cluster_wbuild | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Tor Egge <tegge> |
| Component: | kern | Assignee: | tegge |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | 4.0-CURRENT | ||
| Hardware: | Any | ||
| OS: | Any | ||
With this patch installed, the problem with processes getting stuck in
getblk disappeared.
The spurious SIGBUSes were due to mmap allowing us to map memory
completely after the end of the file. When accessing the pages that
weren't even partially backed by the file, the result was a SIGBUS.
The coredump routines needs some more robustness against the program
having performed incorrect mmap() operations.
---------------
Index: vfs_cluster.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_cluster.c,v
retrieving revision 1.87
diff -u -r1.87 vfs_cluster.c
--- vfs_cluster.c 1999/07/08 06:05:53 1.87
+++ vfs_cluster.c 1999/07/25 05:08:52
@@ -774,6 +774,20 @@
splx(s);
break;
}
+ if (tbp->b_flags & B_VMIO) {
+ vm_page_t m;
+
+ for (j = 0;
+ j < tbp->b_npages; j += 1) {
+ m = tbp->b_pages[j];
+ if (m->flags & PG_BUSY) {
+ BUF_UNLOCK(tbp);
+ splx(s);
+ goto finishcluster;
+ }
+ }
+ }
+
/*
* Ok, it's passed all the tests,
* so remove it from the free list
@@ -798,7 +812,7 @@
for (j = 0; j < tbp->b_npages; j += 1) {
m = tbp->b_pages[j];
if (m->flags & PG_BUSY)
- goto finishcluster;
+ panic("cluster_wbuild: PG_BUSY: m=%p, tbp=%p\n", m, tbp);
}
}
---------------
On FreeBSD it is legal to mmap() regions beyond end of the backing file.
The supplied test program tried to access pages after the end of the
backing file. That was a bug in the test program, and SIGBUS is the
normal expected behavior on FreeBSD when accessing those pages.
Matt Dillon has suggested the following patch which is better than
the one previously suggested by me.
Index: vfs_cluster.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_cluster.c,v
retrieving revision 1.88
diff -u -r1.88 vfs_cluster.c
--- vfs_cluster.c 1999/08/28 00:46:23 1.88
+++ vfs_cluster.c 1999/08/30 03:49:02
@@ -797,8 +797,10 @@
if (i != 0) { /* if not first buffer */
for (j = 0; j < tbp->b_npages; j += 1) {
m = tbp->b_pages[j];
- if (m->flags & PG_BUSY)
+ if (m->flags & PG_BUSY) {
+ bqrelse(tbp);
goto finishcluster;
+ }
}
}
The problem with corrupt coredumps still remains.
- Tor Egge
Responsible Changed From-To: freebsd-bugs->tegge Originator is a committer. Tor, feel free to send this back to freebsd-bugs, if you can't deal with it. State Changed From-To: open->closed Buffer leak fixed in revision 1.89 of sys/kern/vfs_cluster.c. |
If a candidate buffer for clustering contains a page marked BUSY, cluster_wbuild fails to return the buffer to the appropriate previous state. This causes processes to be stuck in getblk, due to the buffer being marked busy, with noone around to unbusy it. When running the program below for reproducing the problem, some other symptomps were also present: - Spurious SIGBUS signals. mmap returned success, but the address area returned by mmap is apparently not always accessible, causing the SIGBUS during the memset operation. - corrupt coredumps. Probably due to problems accessing the area returned by mmap(). How-To-Repeat: On a machine with 512MB memory with > 1GB swap: As root: sysctl -w vm.swap_enabled=0 sysctl -w kern.corefile='%N.core.%P' As a normal user, on a file system with lots of free space: cc -g -O2 -static -o badwrite badwrite.c ulimit -c unlimited ./badwrite 400 Wait 10-15 minutes, then use Ctrl-C to stop the program. There should be some corrupt core files in the current directory, and some stuck processes on the machines. -------------- start of badwrite.c ----- #include <sys/types.h> #include <sys/mman.h> #include <stdio.h> #include <fcntl.h> #include <stdlib.h> #define PAGESIZE 4096 #define TRASHSIZE (2*1024*1024) void writeit(void); int main(int argc, char **argv) { int childcnt, i; pid_t pid; if (argc >= 2) childcnt = atoi(argv[1]); else childcnt = 20; if (childcnt < 0) childcnt = 1; if (childcnt > 800) childcnt = 800; for (i = 0; i < childcnt; i++) { pid = fork(); if (pid < 0) exit(1); if (pid == 0) { writeit(); exit(0); } } while (wait(NULL) >= 0) { } exit(0); } void writeit(void) { char buf[100]; pid_t pid; int fd; int tbuf[8192]; char *trashmem; int trashpos; char *mapos; off_t fd_off, map_off; int pageoffset; int wgot, wtry; pid = getpid(); sprintf(buf, "tmpfile.%d", (int) pid); srandom(time(NULL) + pid); fd = open(buf, O_RDWR | O_CREAT | O_TRUNC | O_APPEND, 0666); if (fd < 0) exit(1); unlink(buf); trashmem = malloc(TRASHSIZE); memset(tbuf, 0, sizeof(tbuf)); fd_off = 0; while (1) { wtry = 1 + (random() % sizeof(tbuf)); wgot = write(fd, tbuf, wtry); if (wgot != wtry) exit(1); usleep(random() % 65536); #if 1 trashpos = (random() % (TRASHSIZE / PAGESIZE)) * PAGESIZE; trashmem[trashpos]++; #endif fd_off += wgot; pageoffset = fd_off & (PAGESIZE - 1); map_off = fd_off - pageoffset; mapos = mmap(NULL, PAGESIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, map_off); if (mapos == NULL || mapos == (char *) -1) exit(1); memset(mapos + pageoffset, 0, PAGESIZE - pageoffset); munmap(mapos, PAGESIZE); } } ----------------