Bug 29421

Summary: Update a file with mmap will cause mtime/ctime changing repeately
Product: Base System Reporter: Kachun Lee <kachun>
Component: kernAssignee: Alan Cox <alc>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Kachun Lee 2001-08-03 23:20:00 UTC
One of my program that used mtime to detect file changes and updated
an index file using mmap stopped working after 4.2-release. I finally
narrowed it down to that the mmap updates would cause the mtime to
change continously (not delay) even after the file was closed,
unmmap'ed and had no more writing to it. 

I used a small program to create a 16 byte file, mmap it, change few
bytes, close and unmmap the file. Then I used a perl script to poll
the mtime every minute. I tried this on over 12 of our FreeBSD servers.
All the stable after 4.2-release would exhibit this problem.
The 1 4.1.1-stable and 3 4.2-release servers did not have the problem.

An example run with 4.3-stable...

# ./mmapmtime # C program - source include
# ll mmaptest.f
-rw-r--r--  1 root  wheel  16 Aug  3 13:44 mmaptest.f
# ./chkmtime  # perl script - source included
Fri Aug  3 13:44:47 2001
  mtime = Fri Aug  3 13:44:41 2001
  ctime = Fri Aug  3 13:44:41 2001
Fri Aug  3 13:46:47 2001
  mtime = Fri Aug  3 13:45:58 2001
  ctime = Fri Aug  3 13:45:58 2001
Fri Aug  3 13:47:47 2001
  mtime = Fri Aug  3 13:47:17 2001
  ctime = Fri Aug  3 13:47:17 2001
Fri Aug  3 13:48:47 2001
  mtime = Fri Aug  3 13:48:40 2001
  ctime = Fri Aug  3 13:48:40 2001
Fri Aug  3 13:50:47 2001
  mtime = Fri Aug  3 13:50:44 2001
  ctime = Fri Aug  3 13:50:44 2001
Fri Aug  3 13:52:47 2001
  mtime = Fri Aug  3 13:52:38 2001
  ctime = Fri Aug  3 13:52:38 2001
# this would go forever... mtime/ctime changed every 1-2 minutes 

The time between changes could be a few minutes to a few hours (!).
That seems depended on how busy the system was. I built an idle system
and ran the test on it a few days ago. The mtime/ctime of the test
file on that system has not been changed (yet?).

How-To-Repeat: /* mmapmtime.c */

#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int debug = 0;
int wait = 0;

void usage (char * prog)
{
        printf("usage: %s [-d]\n", prog);
}

const char testfile [] = "mmaptest.f";

int
main (int argc, char** argv)
{
        char *prog = argv[0];
        int i;
        int fd;
        char* shm;
        struct stat st;
        size_t size;
        int * idx;

        char data [] = {0x64, 0x0f, 0, 0, 0xfa, 0xff, 0xff, 0xff,
                         0xa3, 0xf0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };


        while ( (i = getopt(argc, argv, "dt:")) != -1 ) {
                switch (i)
                {
                case 'd':
                        debug++;
                        break;

                case 't':
                        wait = atoi(optarg);
                        break;

                default:
                        usage(prog);
                        exit(1);
                }
        }

        unlink(testfile);

        if ( (fd = open(testfile, O_CREAT|O_RDWR, 0644)) == -1 ) {
                perror("open");
                exit(1);
        }

        write(fd, data, sizeof(data));

        if ( fstat(fd, &st) ) {
                perror("stat");
                exit(1);
        }

        size = st.st_size;

        if ( (shm = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0))
                  == (void*) -1 ) {
                perror("mmap");
                exit(1);
        }

        idx = (int *) shm;
        idx[1] = 0x118;
        idx[2] = 0xffffef84;
        idx[3] = 0;

        if (wait)
                sleep(wait);

        munmap(shm, size);

        //
#if 0
        if ( (shm = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0))
                  == (void*) -1 ) {
                perror("mmap");
                exit(1);
        }

        idx = (int *) shm;
        printf ("%x %x\n", idx[0], idx[1]);

        munmap(shm, size);
#endif

        close(fd);
        exit(1);
}
------------------
#! /usr/bin/perl
# chkmtime

my $file = "mmaptest.f";

my ($ctime, $mtime);

for(;;)
{
        my @stat = stat $file;

        if ($stat[1] <= 0)
        {
                print 'Cannot stat file ', $file, "\n";
                exit 1;
        }

        if ($stat[9] != $mtime || $stat[10] != $ctime)
        {
                print scalar localtime, "\n",
                        '  mtime = ', scalar localtime($stat[9]), "\n",
                        '  ctime = ', scalar localtime($stat[10]), "\n";
                $mtime = $stat[9];
                $ctime = $stat[10];
        }

        sleep(60);
}
Comment 1 Bruce Evans 2001-08-07 17:03:25 UTC
On Fri, 3 Aug 2001, Kachun Lee wrote:

> >Description:
> One of my program that used mtime to detect file changes and updated
> an index file using mmap stopped working after 4.2-release. I finally
> narrowed it down to that the mmap updates would cause the mtime to
> change continously (not delay) even after the file was closed,
> unmmap'ed and had no more writing to it.
>
> I used a small program to create a 16 byte file, mmap it, change few
> bytes, close and unmmap the file. Then I used a perl script to poll
> the mtime every minute. I tried this on over 12 of our FreeBSD servers.
> All the stable after 4.2-release would exhibit this problem.
> The 1 4.1.1-stable and 3 4.2-release servers did not have the problem.
> ...
> The time between changes could be a few minutes to a few hours (!).
> That seems depended on how busy the system was. I built an idle system
> and ran the test on it a few days ago. The mtime/ctime of the test
> file on that system has not been changed (yet?).

To duplicate (at least under -current), run something that creates a lot
of dirty pages.  I used lat_fs from lmbench2.

The bug is caused by dirty pages never becoming clean if they are for
a small mmapped file like the one in your program.  The vm system keeps
setting m->dirty to VM_PAGE_BITS_ALL (0xff), but for small files,
flushing the pages only results in the lower bits being cleared.  E.g.,
if the fs block size is 1K, then your 16-byte test file takes 2 512-byte
subpages, and m->dirty gets "cleared" to 0xfc.  When the page cleaner
looks at such pages, it always finds them completely dirty and flushes
them.  Such pages don't go away until their object is destroyed.  Their
object is associated with the vnode for the file so it doesn't go away
until the vnode is recycled.

Bruce
Comment 2 dillon 2001-08-07 17:46:55 UTC
:...
:> The time between changes could be a few minutes to a few hours (!).
:> That seems depended on how busy the system was. I built an idle system
:> and ran the test on it a few days ago. The mtime/ctime of the test
:> file on that system has not been changed (yet?).
:
:To duplicate (at least under -current), run something that creates a lot
:of dirty pages.  I used lat_fs from lmbench2.
:
:The bug is caused by dirty pages never becoming clean if they are for
:a small mmapped file like the one in your program.  The vm system keeps
:setting m->dirty to VM_PAGE_BITS_ALL (0xff), but for small files,
:flushing the pages only results in the lower bits being cleared.  E.g.,
:if the fs block size is 1K, then your 16-byte test file takes 2 512-byte
:subpages, and m->dirty gets "cleared" to 0xfc.  When the page cleaner
:looks at such pages, it always finds them completely dirty and flushes
:them.  Such pages don't go away until their object is destroyed.  Their
:object is associated with the vnode for the file so it doesn't go away
:until the vnode is recycled.
:
:Bruce

    Hmm.  Didn't this come up about a year ago?  I'll have to look in the
    archives. 

						-Matt
Comment 3 dillon 2001-08-08 01:11:21 UTC
    Ok, I looked at this some more.  It's a bit of a sticky issue
    because the buffers in the buffer cache are in fact allowed to
    be page-misaligned and so the cleaning *must* be piecemeal, and the 
    VM fault / VMIO backing code requires the valid/dirty bits to be all
    or nothing to avoid forcing a re-read.  So when the dirty bits get set,
    they *all* have to get set.

    But, that said, I think we may be able to use the vnode size
    to special-case the cleaning code and to truncate the dirty bits
    that occur beyond file EOF.  I can't promise when I'll have time to
    play with it... hopefully in the next few days.

					    -Matt

:..
:> change continously (not delay) even after the file was closed,
:> unmmap'ed and had no more writing to it.
:>
:> I used a small program to create a 16 byte file, mmap it, change few
:> bytes, close and unmmap the file. Then I used a perl script to poll
:> the mtime every minute. I tried this on over 12 of our FreeBSD servers.
:> All the stable after 4.2-release would exhibit this problem.
:> The 1 4.1.1-stable and 3 4.2-release servers did not have the problem.
:> ...
:> The time between changes could be a few minutes to a few hours (!).
:> That seems depended on how busy the system was. I built an idle system
:> and ran the test on it a few days ago. The mtime/ctime of the test
:> file on that system has not been changed (yet?).
:
:To duplicate (at least under -current), run something that creates a lot
:of dirty pages.  I used lat_fs from lmbench2.
:
:The bug is caused by dirty pages never becoming clean if they are for
:a small mmapped file like the one in your program.  The vm system keeps
:setting m->dirty to VM_PAGE_BITS_ALL (0xff), but for small files,
:flushing the pages only results in the lower bits being cleared.  E.g.,
:if the fs block size is 1K, then your 16-byte test file takes 2 512-byte
:subpages, and m->dirty gets "cleared" to 0xfc.  When the page cleaner
:looks at such pages, it always finds them completely dirty and flushes
:them.  Such pages don't go away until their object is destroyed.  Their
:object is associated with the vnode for the file so it doesn't go away
:until the vnode is recycled.
:
:Bruce
Comment 4 K. Macy freebsd_committer freebsd_triage 2007-11-16 02:42:51 UTC
State Changed
From-To: open->feedback


Need to confirm that this is still an issue. 


Comment 5 K. Macy freebsd_committer freebsd_triage 2007-11-16 02:42:51 UTC
Responsible Changed
From-To: freebsd-bugs->alc


Need to confirm that this is still an issue.
Comment 6 Dmitry Sivachenko freebsd_committer freebsd_triage 2009-07-08 12:58:32 UTC
I just checked the test case supplied against FreeBSD-7.2-STABLE.

The problem is still there.

FYI.
Comment 7 Alexander Best 2009-10-31 03:06:06 UTC
running 9-CURRENT (i386 r198677)i'm not able to reproduce this issue. running
`mmapmtime` and `chkmtime` produces the following output:

Sat Oct 31 03:52:33 2009
 mtime = Sat Oct 31 03:52:30 2009
 ctime = Sat Oct 31 03:52:30 2009
Sat Oct 31 03:52:58 2009
 mtime = Sat Oct 31 03:52:58 2009
 ctime = Sat Oct 31 03:52:58 2009

so the mtime/ctime seems to get only updated once by the page cleaner when it
is being flushed and after that marked clean.

don't know what the situation is on 8-STABLE, but looks like the problem got
fixed at some point.

cheers.
alex
Comment 8 Alexander Best freebsd_committer freebsd_triage 2010-07-21 21:02:11 UTC
i ran some more tests and couldn't reproduce the problem anymore:

1.

time ./chkmtime 
Wed Jul 21 17:09:16 2010
  mtime = Wed Jul 21 17:09:08 2010
  ctime = Wed Jul 21 17:09:08 2010
^C
./chkmtime  0,02s user 0,00s system 0% cpu 2:11:42,88 total
hub% uname -a
FreeBSD hub.freebsd.org 7.3-STABLE FreeBSD 7.3-STABLE #3 r209978: Tue Jul 13 07:05:00 UTC 2010     simon@hub.freebsd.org:/g/obj/g/src/sys/HUB  i386


2.

time ./chkmtime
Wed Jul 21 18:57:01 2010
  mtime = Wed Jul 21 18:55:45 2010
  ctime = Wed Jul 21 18:55:45 2010
^C
./chkmtime  0,02s user 0,02s system 0% cpu 2:24:04,40 total
otaku% uname -a
FreeBSD otaku 9.0-CURRENT FreeBSD 9.0-CURRENT #5 r209887: Sat Jul 10 21:27:23 CEST 2010     root@otaku:/usr/obj/usr/src/sys/ARUNDEL  amd64


3.

time ./chkmtime 
Wed Jul 21 19:21:52 2010
  mtime = Wed Jul 21 19:21:46 2010
  ctime = Wed Jul 21 19:21:46 2010
^C
./chkmtime  0,00s user 0,01s system 0% cpu 22:09,88 total
freefall% uname -a
FreeBSD freefall.freebsd.org 8.1-PRERELEASE FreeBSD 8.1-PRERELEASE #44 r209978: Tue Jul 13 08:42:03 UTC 2010     simon@freefall.freebsd.org:/usr/src/sys/i386/compile/FREEFALL  i386

cheers.
alex
Comment 9 Alexander Best freebsd_committer freebsd_triage 2010-07-22 10:21:46 UTC
State Changed
From-To: feedback->closed

Alan confirmed that this pr can be closed.