Bug 12800

Summary: buffer leak in cluster_wbuild
Product: Base System Reporter: Tor Egge <tegge>
Component: kernAssignee: tegge
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 4.0-CURRENT   
Hardware: Any   
OS: Any   

Description Tor Egge 1999-07-25 04:20:00 UTC
If a candidate buffer for clustering contains a page marked BUSY,
cluster_wbuild fails to return the buffer to the appropriate previous state.
This causes processes to be stuck in getblk, due to the buffer
being marked busy, with noone around to unbusy it.

When running the program below for reproducing the problem, some other
symptomps were also present:

	- Spurious SIGBUS signals.

	  mmap returned success, but the address area returned by mmap
	  is apparently not always accessible, causing the SIGBUS during
	  the memset operation.

	- corrupt coredumps.  Probably due to problems accessing the
	  area returned by mmap().

How-To-Repeat: 
On a machine with 512MB memory with > 1GB swap:

  As root:

	sysctl -w vm.swap_enabled=0
	sysctl -w kern.corefile='%N.core.%P'

  As a normal user, on a file system with lots of free space:

	cc -g -O2 -static -o badwrite badwrite.c
	ulimit -c unlimited
	./badwrite 400
	
  Wait 10-15 minutes, then use Ctrl-C to stop the program.

  There should be some corrupt core files in the current directory, and
  some stuck processes on the machines.

-------------- start of badwrite.c -----
#include <sys/types.h>
#include <sys/mman.h>
#include <stdio.h>
#include <fcntl.h>
#include <stdlib.h>

#define PAGESIZE 4096

#define TRASHSIZE (2*1024*1024)

void writeit(void);

int main(int argc, char **argv)
{
  int childcnt, i;
  pid_t pid;

  if (argc  >= 2)
    childcnt = atoi(argv[1]);
  else
    childcnt = 20;

  if (childcnt < 0)
    childcnt = 1;

  if (childcnt > 800)
    childcnt = 800;

  for (i = 0; i < childcnt; i++) {
    pid = fork();
    if (pid < 0)
      exit(1);
    if (pid == 0) {
      writeit();
      exit(0);
    }
  }

  while (wait(NULL) >= 0) {
  }
  exit(0);
}


void writeit(void)
{
  char buf[100];
  pid_t pid;
  int fd;
  int tbuf[8192];
  char *trashmem;
  int trashpos;
  char *mapos;
  off_t fd_off, map_off;
  int pageoffset;
  
  int wgot, wtry;

  pid = getpid();
  sprintf(buf, "tmpfile.%d", (int) pid);

  srandom(time(NULL) + pid);
  
  fd = open(buf, O_RDWR | O_CREAT | O_TRUNC | O_APPEND, 0666);
  if (fd < 0)
    exit(1);

  unlink(buf);

  trashmem = malloc(TRASHSIZE);

  memset(tbuf, 0, sizeof(tbuf));

  fd_off = 0;

  while (1) {
    wtry = 1 + (random() % sizeof(tbuf));
    
    wgot = write(fd, tbuf, wtry);
    
    if (wgot != wtry)
      exit(1);

    usleep(random() % 65536);

#if 1
    trashpos = (random() % (TRASHSIZE / PAGESIZE)) * PAGESIZE;
    trashmem[trashpos]++;
#endif

    fd_off += wgot;

    pageoffset = fd_off & (PAGESIZE - 1);

    map_off = fd_off - pageoffset;

    mapos = mmap(NULL, PAGESIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE,
		 fd, map_off);

    if (mapos == NULL || mapos == (char *) -1)
      exit(1);
    
    memset(mapos + pageoffset, 0, PAGESIZE - pageoffset);

    munmap(mapos, PAGESIZE);
  }
  
}
----------------
Comment 1 Tor Egge 1999-07-25 06:23:18 UTC
With this patch installed, the problem with processes getting stuck in
getblk disappeared.

The spurious SIGBUSes were due to mmap allowing us to map memory
completely after the end of the file.  When accessing the pages that
weren't even partially backed by the file, the result was a SIGBUS.

The coredump routines needs some more robustness against the program
having performed incorrect mmap() operations.

---------------
Index: vfs_cluster.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_cluster.c,v
retrieving revision 1.87
diff -u -r1.87 vfs_cluster.c
--- vfs_cluster.c	1999/07/08 06:05:53	1.87
+++ vfs_cluster.c	1999/07/25 05:08:52
@@ -774,6 +774,20 @@
 					splx(s);
 					break;
 				}
+				if (tbp->b_flags & B_VMIO) {
+					vm_page_t m;
+
+					for (j = 0; 
+					     j < tbp->b_npages; j += 1) {
+						m = tbp->b_pages[j];
+						if (m->flags & PG_BUSY) {
+							BUF_UNLOCK(tbp);
+							splx(s);
+							goto finishcluster;
+						}
+					}
+				}
+					
 				/*
 				 * Ok, it's passed all the tests,
 				 * so remove it from the free list
@@ -798,7 +812,7 @@
 					for (j = 0; j < tbp->b_npages; j += 1) {
 						m = tbp->b_pages[j];
 						if (m->flags & PG_BUSY)
-							goto finishcluster;
+							panic("cluster_wbuild: PG_BUSY: m=%p, tbp=%p\n", m, tbp);
 					}
 				}
 					

---------------
Comment 2 Tor Egge 1999-08-30 20:00:22 UTC
On FreeBSD it is legal to mmap() regions beyond end of the backing file.
The supplied test program tried to access pages after the end of the 
backing file.  That was a bug in the test program, and SIGBUS is the
normal expected behavior on FreeBSD when accessing those pages.

Matt Dillon has suggested the following patch which is better than
the one previously suggested by me.

Index: vfs_cluster.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/vfs_cluster.c,v
retrieving revision 1.88
diff -u -r1.88 vfs_cluster.c
--- vfs_cluster.c	1999/08/28 00:46:23	1.88
+++ vfs_cluster.c	1999/08/30 03:49:02
@@ -797,8 +797,10 @@
 				if (i != 0) { /* if not first buffer */
 					for (j = 0; j < tbp->b_npages; j += 1) {
 						m = tbp->b_pages[j];
-						if (m->flags & PG_BUSY)
+						if (m->flags & PG_BUSY) {
+							bqrelse(tbp);
 							goto finishcluster;
+						}
 					}
 				}
 					


The problem with corrupt coredumps still remains.

- Tor Egge
Comment 3 Mike Barcroft freebsd_committer freebsd_triage 2001-07-21 00:57:53 UTC
Responsible Changed
From-To: freebsd-bugs->tegge


Originator is a committer.  Tor, feel free to send this back to 
freebsd-bugs, if you can't deal with it.
Comment 4 tegge freebsd_committer freebsd_triage 2001-11-15 18:19:36 UTC
State Changed
From-To: open->closed

Buffer leak fixed in revision 1.89 of sys/kern/vfs_cluster.c.