Bug 27232

Summary: [nfs] On NFSv3 mounted filesystems, stat returns st_blksize=512
Product: Base System Reporter: conrad <conrad>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed Overcome By Events    
Severity: Affects Only Me CC: jpaetzel
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description conrad 2001-05-09 18:20:01 UTC
On NFSv3 mounts, stat returns st_blksize=512 for every regular file.
This in turn is used by libc routines as a default buffer size, as it
should be the 'optimal' io blocksize.

However, this leads to a drastic performance decrease. For example
a mailbox save of a 3MB messages (by pine) takes over half a minute
with 512 byte writes whereas it takes only a second or so with a 16kB
buffer.

Fix: 

The whole thing can be traced back to the NFS code in the kernel.
The function nfs_loadattrcache of sys/nfs/nfs_subs.c makes the
assignement

		vap->va_blocksize = NFS_FABLKSIZE;

for NFSv3, where NFS_FABLKSIZE is 512.

In my opinion the assignement should be something like

		vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize;

i.e. va_blocksize should be assigned the 'optimal' iosize for the
mounted file system. I think that this is the maximum of the read
and write blocksize of the nfs-mount (search for nfs_iosize in
nfs_vfsops.c).

This should solve the problem, but I am no kernel hacker :-)
How-To-Repeat: See above.
Comment 1 Garrett A. Wollman 2001-05-09 20:38:20 UTC
<<On Wed, 9 May 2001 10:11:55 -0700 (PDT), conrad@th.physik.uni-bonn.de said:

> On NFSv3 mounts, stat returns st_blksize=512 for every regular file.
> This in turn is used by libc routines as a default buffer size, as it
> should be the 'optimal' io blocksize.

No.  It should be the block size used by the underlying filesystem's
block allocator, and in which the file's `st_blocks' size-on-disk is
reported.  While SUS describes it as a ``preferred'' block size, and
the FreeBSD manual pages describe it as ``optimal ... for I/O'', the
most important meaning of this field is as a multiplier of st_blocks
to determine the file's size.

-GAWollman

--
Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
wollman@lcs.mit.edu  | O Siem / The fires of freedom 
Opinions not those of| Dance in the burning flame
MIT, LCS, CRS, or NSA|                     - Susan Aglukark and Chad Irschick
Comment 2 Bruce Evans 2001-05-10 09:24:04 UTC
On Wed, 9 May 2001, Garrett Wollman wrote:

>  <<On Wed, 9 May 2001 10:11:55 -0700 (PDT), conrad@th.physik.uni-bonn.de said:
>  
>  > On NFSv3 mounts, stat returns st_blksize=512 for every regular file.
>  > This in turn is used by libc routines as a default buffer size, as it
>  > should be the 'optimal' io blocksize.
>  
>  No.  It should be the block size used by the underlying filesystem's
>  block allocator,

Correct.  Even if there is no underlying filesystem's block allocator,
stat() must fake it, and should fake it as well as possible.  nfs seems
to have regressed to always setting vap->va_blocksize to NFS_FABLKSIZE
(512) in the v3 case (see nfs_subs.c).

>  and in which the file's `st_blocks' size-on-disk is
>  reportedi.

No.  At least under FreeBSD, st_blocks is in units of blocks with size
S_BLKSIZE (512).  It may count blocks for metadata, so it may be larger
than the file size.

>  While SUS describes it as a ``preferred'' block size, and
>  the FreeBSD manual pages describe it as ``optimal ... for I/O'', the

It is just the best available approximation to the optimal i/o size.
If it is good enough for filesystem blocks, then it can't be very bad
for userland i/o.

>  most important meaning of this field is as a multiplier of st_blocks
>  to determine the file's size.

No.  The multiplier is 512.

Bruce
Comment 3 conrad 2001-05-10 11:18:18 UTC
On Wed, 9 May 2001, Garrett Wollman wrote:

> <<On Wed, 9 May 2001 10:11:55 -0700 (PDT), conrad@th.physik.uni-bonn.de said:
>
> > On NFSv3 mounts, stat returns st_blksize=512 for every regular file.
> > This in turn is used by libc routines as a default buffer size, as it
> > should be the 'optimal' io blocksize.
>
> No.  It should be the block size used by the underlying filesystem's
> block allocator, and in which the file's `st_blocks' size-on-disk is
> reported.  While SUS describes it as a ``preferred'' block size, and
> the FreeBSD manual pages describe it as ``optimal ... for I/O'', the
> most important meaning of this field is as a multiplier of st_blocks
> to determine the file's size.

Hmm - I am sorry, but I can't believe your answer.

If I stat the following file on /var/tmp (newfs'd with -b 8192 -f 1024)
(all with FreeBSD 4.3R)

-rw-r--r--  1 root  wheel  33398 Mar 13 17:40 /var/tmp/dev.out

# my little stat checker (appended to the message) gives me
./stat
/var/tmp/dev.out:
st_mode = 100644
st_blksize = 8192
st_size = 33398
st_blocks = 66

As you can see st_blocks measures the size of the file in 512byte blocks
independent of st_blksize!

And the source of this is ufs_getattr in sys/ufs/ufs/ufs_vnops.c
(this is present in HEAD!)

	vap->va_flags = ip->i_flags;
	vap->va_gen = ip->i_gen;
	vap->va_blocksize = vp->v_mount->mnt_stat.f_iosize;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	vap->va_bytes = dbtob((u_quad_t)ip->i_blocks);
	vap->va_type = IFTOVT(ip->i_mode);

So either ufs or nfs is wrong (or both!)


-Jan



The output is from the following little program


#include <sys/types.h>
#include <sys/stat.h>

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#define FILE1 "/milles/home/conrad/src/stat/stat.c"
#define FILE2 "/.amd_mnt/avz109/users/Math_Dictionary/readme.htm"
#define FILE3 "/var/tmp/dev.out"


main()
{
  struct stat sb;
  char *file = FILE3;

  if (stat(file, &sb) < 0) {
    printf ("fail\n");
  } else {
    printf ("%s:\n", file);
    printf ("st_mode = %o\n", sb.st_mode);
    printf ("st_blksize = %u\n", sb.st_blksize);
    printf ("st_size = %u\n", sb.st_size);
    printf ("st_blocks = %U\n", sb.st_blocks);
  };
}




>
> -GAWollman
>
> --
> Garrett A. Wollman   | O Siem / We are all family / O Siem / We're all the same
> wollman@lcs.mit.edu  | O Siem / The fires of freedom
> Opinions not those of| Dance in the burning flame
> MIT, LCS, CRS, or NSA|                     - Susan Aglukark and Chad Irschick
>

-- 
Physikalisches Institut der Universitaet Bonn
Nussallee 12
D-53115 Bonn
GERMANY
Comment 4 conrad 2001-05-10 11:46:40 UTC
On Thu, 10 May 2001, Bruce Evans wrote:

> From: Bruce Evans <bde@zeta.org.au>
> To: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
> Cc: freebsd-gnats-submit@FreeBSD.ORG
> Subject: Re: kern/27232: On NFSv3 mounted filesystems, stat returns st_blksize=512
> Date: Thu, 10 May 2001 18:24:04 +1000 (EST)
>
>  On Wed, 9 May 2001, Garrett Wollman wrote:
>
>  >  <<On Wed, 9 May 2001 10:11:55 -0700 (PDT), conrad@th.physik.uni-bonn.de said:
>  >
>  >  > On NFSv3 mounts, stat returns st_blksize=512 for every regular file.
>  >  > This in turn is used by libc routines as a default buffer size, as it
>  >  > should be the 'optimal' io blocksize.
>  >
>  >  No.  It should be the block size used by the underlying filesystem's
>  >  block allocator,
>
>  Correct.  Even if there is no underlying filesystem's block allocator,
>  stat() must fake it, and should fake it as well as possible.  nfs seems
>  to have regressed to always setting vap->va_blocksize to NFS_FABLKSIZE
>  (512) in the v3 case (see nfs_subs.c).


My question is: Why not set this to mnt_stat.f_iosize of the mount point?
(As ufs does it?)

>
>  >  and in which the file's `st_blocks' size-on-disk is
>  >  reportedi.
>
>  No.  At least under FreeBSD, st_blocks is in units of blocks with size
>  S_BLKSIZE (512).  It may count blocks for metadata, so it may be larger
>  than the file size.
>
>  >  While SUS describes it as a ``preferred'' block size, and
>  >  the FreeBSD manual pages describe it as ``optimal ... for I/O'', the
>
>  It is just the best available approximation to the optimal i/o size.
>  If it is good enough for filesystem blocks, then it can't be very bad
>  for userland i/o.

It is *very* bad for userland io. Unfortunately we have only very limited
space here so some poor guys have to sit next to our file server!

They can tell by the sound when somebody is saving a large file by stdio
fwrites!!!

It takes *MORE* then ten times than with a larger st_blksize!


-Jan


>
>  >  most important meaning of this field is as a multiplier of st_blocks
>  >  to determine the file's size.
>
>  No.  The multiplier is 512.
>
>  Bruce
>
>
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-bugs" in the body of the message
>

-- 
Physikalisches Institut der Universitaet Bonn
Nussallee 12
D-53115 Bonn
GERMANY
Comment 5 cel freebsd_committer freebsd_triage 2006-05-24 19:57:36 UTC
Responsible Changed
From-To: freebsd-bugs->cel

Same problem existed in Linux 2.4.  Will look into it.
Comment 6 cel freebsd_committer freebsd_triage 2007-03-12 15:20:04 UTC
Responsible Changed
From-To: cel->free-bsd

Back to the public pool.
Comment 7 Ceri Davies freebsd_committer freebsd_triage 2007-03-19 11:28:15 UTC
Responsible Changed
From-To: free-bsd->freebsd-bugs

Correct responsible.