Bug 223015

Summary: tmpfs does not support sparse files
Product: Base System Reporter: Keith White <ksw.childe>
Component: kernAssignee: Konstantin Belousov <kib>
Status: Closed FIXED    
Severity: Affects Some People CC: cem, chris, emaste, fs, grahamperrin, kib, ota
Priority: --- Flags: grahamperrin: mfc-stable13+
Version: CURRENT   
Hardware: Any   
OS: Any   
URL: https://cgit.freebsd.org/src/commit/?id=37aea2649ff707f23d35309d882b38e9ac818e42
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210183
https://reviews.freebsd.org/D37097
Attachments:
Description Flags
vfs.tmpfs.inactive_percent sysctl
none
POC for using resident_page_count none

Description Keith White 2017-10-14 21:28:06 UTC
When attempting to create a large sparse file on a tmpfs file system, I get this message: "truncate: ... No space left on device"

It appears to work on a similarly sized mfs file system.

...keith

Sample test script output:

# sh t_tmpfs
=== tmpfs ===
truncate: /tmp/_trunc_/sparse: No space left on device
0 -rw-r--r--  1 root  wheel  0 Oct 14 17:24 /tmp/_trunc_/sparse
=== mfs ===
96 -rw-r--r--  1 root  wheel  2147483648 Oct 14 17:24 /tmp/_trunc_/sparse



Test script follows:

#!/bin/sh
#

DIR=/tmp/_trunc_
TUNIT=101
TFILE=sparse
SSIZE=16m
TSIZE=2g

exec 2>&1
mkdir $DIR || exit 1

echo "=== tmpfs ==="
mount -t tmpfs -osize=$SSIZE /tmp $DIR
truncate -s$TSIZE $DIR/$TFILE
ls -ls $DIR/$TFILE
umount $DIR

echo "=== mfs ==="
mount -t mfs -o-s=$SSIZE md$TUNIT $DIR
truncate -s$TSIZE $DIR/$TFILE
ls -ls $DIR/$TFILE
umount $DIR
mdconfig -du $TUNIT

rmdir $DIR

exit 0
Comment 1 Konstantin Belousov freebsd_committer freebsd_triage 2017-10-15 11:41:14 UTC
Well, tmpfs does support spare files.

What you reported is indeed the wrong behavior, but it is caused not by supposed lack of support for spares, but due to wrong code what tries to avoid OOM situations due to over-committing the file backing for tmpfs files.  See sys/fs/tmpfs_subr.c, functions tmpfs_pages_check_avail() and functions tmpfs_mem_avail() and tmpfs_pages_used() referenced from there.

In particular, tmpfs_mem_avail() is completely wrong, it mis-interprets v_free_count.

I think that tmpfs should only check the current page usage by specific mount point against per-mount point limit, if any.  Trying to formulate a limit against some formula involving v_free_count and other VM metrics cannot work, due to the VM algorithms.  The main reason is that we support paging memory to the backing store, so v_free_count indicates wasted memory (as opposed to the free or free-able memory in the common sense of the word).
Comment 2 Keith White 2017-10-15 16:21:01 UTC
Created attachment 187197 [details]
vfs.tmpfs.inactive_percent sysctl
Comment 3 Keith White 2017-10-15 16:23:53 UTC
Certainly not my intention to mislead with the bug subject!

I have no idea how to approach a fix.

Probably related to tmpfs reluctance to use VM: in order to use tmpfs as /tmp and /usr/obj for a kernel build on a read-only RPI3, the previous patch adds a  vfs.tmpfs.inactive_percent sysctl that "allows" tmpfs to use X% of inactive memory (default 0%).  Otherwise builds _may_ fail.

...keith
Comment 4 Konstantin Belousov freebsd_committer freebsd_triage 2017-10-16 07:25:37 UTC
(In reply to Keith White from comment #3)
The tmpfs_mem_avail() function should die.  tmpfs_mount.tm_pages_max should be enough, it already allows administrator to limit the memory usage by mount point, mount -o size.
Comment 5 Keith White 2017-10-22 12:41:49 UTC
(In reply to Konstantin Belousov from comment #4)
After some dtracing (and head scratching), I believe I see a solution for my problem by using resident_page_count instead of size. It will complicate tmpfs_write() though...

...keith
Comment 6 Keith White 2017-10-22 23:53:45 UTC
Created attachment 187384 [details]
POC for using resident_page_count

This patch allows me to use large files with holes. e.g.

# mkdir /tmp/_x
# mount -t tmpfs -osize=10m tmp /tmp/_x
# truncate -s4g /tmp/_x/4g
# ls -ls /tmp/_x/4g
    0 -rw-r--r--  1 root  wheel  4294967296 Oct 22 19:51 /tmp/_x/4g
# df /tmp/_x
    Filesystem 1K-blocks Used Avail Capacity  Mounted on
    tmpfs          10240    4 10236     0%    /tmp/_x
# du /tmp/_x
    0       /tmp/_x
# umount /tmp/_x
# rmdir /tmp/_x
Comment 7 Konstantin Belousov freebsd_committer freebsd_triage 2017-10-23 08:41:18 UTC
(In reply to Keith White from comment #5)
object->resident_page_fault is equally meaningless for your purposes.  What is not clear in my comment #4 ?

The patch does not require any of these dynamic calculations using meaningless (for this purpose) values.  If you want to limit the tmpfs mount memory use, specify explicit limit to mount_tmpfs.  The attempts to mis-use some pagedaemon or object internal counters would not work out, their purpose is very different and they do not match with resource limiting for tmpfs.
Comment 8 Keith White 2017-10-23 13:08:38 UTC
I read but probably misunderstood! If I create a file with holes I don't want to be "charged" for it until the holes are filled in.  I may want to create a disk image file, say, that is 4g but I know that I will only be storing 20m in total.  ffs-like filesystems allow this.  tmpfs has a more straight-forward idea of a file (allocate up-front, no hole management?).  A "BUGS" section in tmpfs(5) would have guided me away from attempting to use tmpfs for sparse files. Othersize tmpfs is an excellent fit for me since I'm running diskless+swapless.

I'll re-read your comments. I see a drawing-board over there that I should get back to...

...keith
Comment 9 Konstantin Belousov freebsd_committer freebsd_triage 2017-10-23 14:53:48 UTC
(In reply to Keith White from comment #8)
Let me explain it in full:
1. tmpfs should not try to use current counters of the active/inactive or free queues, since they are irrelevant to the system ability to satisfy page requests.  If page is needed, the queues are scanned and a usable page might appear even if there is no free pages or all swap space is used (e.g. we can write out dirty file page or reuse clean file page).
2. tmpfs should provide a global limit on the number of used pages, in fact it already has it "-o size".  The limit is compared against the maintained counter of the supposedly used pages tm_pages_used.
3. Your problem is because tm_pages_used is too harsh.  It just sums up all files sizes, while it really should only count file pages which were really written to.  In other words, instead of adjusting tm_pages_used in tmpfs_reg_resize(), it should be adjusted in tmpfs_write().  [There is additional complication, see below].
4. The tmpfs_mem_avail() should be removed.  See item 1.

The complication is due to the tmpfs using in-place mapping, i.e. the vm object which contains the pages with the file data, directly provides the pages used for file mapping.  This was highly desirable feature, because it avoids duplicating memory for the mmapped tmpfs files, and makes mmap zero-copy.

Problem is, page faults in the sparcerly allocated mmaped file range instantiate the file pages, which must be accounted for in tm_pages_max.  The tmpfs vm objects are already flagged so this is not too hard to do, just that you cannot limit the patch to fs/tmpfs only.

This is my current opinion on the issue, hope this is clean enough.
Comment 10 Keith White 2017-10-23 16:03:59 UTC
(In reply to Konstantin Belousov from comment #9)

Yes, this helps to clear things up.  Very many thanks!

...keith
Comment 11 Graham Perrin freebsd_committer freebsd_triage 2022-10-17 12:36:45 UTC
Keyword: 

    patch
or  patch-ready

– in lieu of summary line prefix: 

    [patch]

* bulk change for the keyword
* summary lines may be edited manually (not in bulk). 

Keyword descriptions and search interface: 

    <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>
Comment 12 commit-hook freebsd_committer freebsd_triage 2022-12-09 12:18:17 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=37aea2649ff707f23d35309d882b38e9ac818e42

commit 37aea2649ff707f23d35309d882b38e9ac818e42
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2022-10-20 13:17:43 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2022-12-09 12:17:12 +0000

    tmpfs: for used pages, account really allocated pages, instead of file sizes

    This makes tmpfs size accounting correct for the sparce files. Also
    correct report st_blocks/va_bytes. Previously the reported value did not
    accounted for the swapped out pages.

    PR:     223015
    Reviewed by:    markj
    Tested by:      pho
    Sponsored by:   The FreeBSD Foundation
    MFC after:      1 week
    Differential revision:  https://reviews.freebsd.org/D37097

 sys/fs/tmpfs/tmpfs.h        |  18 ++++++-
 sys/fs/tmpfs/tmpfs_subr.c   | 118 ++++++++++++++++++++++++++++++++++++++++----
 sys/fs/tmpfs/tmpfs_vfsops.c |   6 ++-
 sys/fs/tmpfs/tmpfs_vnops.c  |  17 +++++--
 sys/kern/uipc_shm.c         |   2 +-
 5 files changed, 144 insertions(+), 17 deletions(-)
Comment 13 Graham Perrin freebsd_committer freebsd_triage 2022-12-12 00:12:01 UTC
Thank you. 

Triage: 

* assignment to the committer of 37aea2649ff707f23d35309d882b38e9ac818e42

* status, URL, CC list, see also, flags

* the 'patch' keyword is deprecated

* summary line tags such as [patch] are no longer used.

Please see <https://bugs.freebsd.org/bugzilla/> (changes, to the right); <https://bugs.freebsd.org/bugzilla/describekeywords.cgi> (updated); <https://wiki.freebsd.org/Bugzilla>.
Comment 14 commit-hook freebsd_committer freebsd_triage 2023-01-20 03:25:04 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=06e03b4b0d24bc0c513eb7dc5651664565d0c219

commit 06e03b4b0d24bc0c513eb7dc5651664565d0c219
Author:     Konstantin Belousov <kib@FreeBSD.org>
AuthorDate: 2022-10-20 13:17:43 +0000
Commit:     Konstantin Belousov <kib@FreeBSD.org>
CommitDate: 2023-01-20 03:19:50 +0000

    tmpfs: for used pages, account really allocated pages, instead of file sizes

    PR:     223015
    Tested by:      pho

    (cherry picked from commit 37aea2649ff707f23d35309d882b38e9ac818e42)

 sys/fs/tmpfs/tmpfs.h        |  18 ++++++-
 sys/fs/tmpfs/tmpfs_subr.c   | 118 ++++++++++++++++++++++++++++++++++++++++----
 sys/fs/tmpfs/tmpfs_vfsops.c |   6 ++-
 sys/fs/tmpfs/tmpfs_vnops.c  |  17 +++++--
 sys/kern/uipc_shm.c         |   2 +-
 5 files changed, 144 insertions(+), 17 deletions(-)
Comment 15 Mark Linimon freebsd_committer freebsd_triage 2024-01-10 03:19:33 UTC
^Triage: committed in all supported branches.