Bug 223015 - [tmpfs] [patch] tmpfs does not support sparse files
Summary: [tmpfs] [patch] tmpfs does not support sparse files
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-fs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-10-14 21:28 UTC by Keith White
Modified: 2018-03-09 17:51 UTC (History)
4 users (show)

See Also:


Attachments
vfs.tmpfs.inactive_percent sysctl (1.64 KB, patch)
2017-10-15 16:21 UTC, Keith White
no flags Details | Diff
POC for using resident_page_count (3.20 KB, patch)
2017-10-22 23:53 UTC, Keith White
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Keith White 2017-10-14 21:28:06 UTC
When attempting to create a large sparse file on a tmpfs file system, I get this message: "truncate: ... No space left on device"

It appears to work on a similarly sized mfs file system.

...keith

Sample test script output:

# sh t_tmpfs
=== tmpfs ===
truncate: /tmp/_trunc_/sparse: No space left on device
0 -rw-r--r--  1 root  wheel  0 Oct 14 17:24 /tmp/_trunc_/sparse
=== mfs ===
96 -rw-r--r--  1 root  wheel  2147483648 Oct 14 17:24 /tmp/_trunc_/sparse



Test script follows:

#!/bin/sh
#

DIR=/tmp/_trunc_
TUNIT=101
TFILE=sparse
SSIZE=16m
TSIZE=2g

exec 2>&1
mkdir $DIR || exit 1

echo "=== tmpfs ==="
mount -t tmpfs -osize=$SSIZE /tmp $DIR
truncate -s$TSIZE $DIR/$TFILE
ls -ls $DIR/$TFILE
umount $DIR

echo "=== mfs ==="
mount -t mfs -o-s=$SSIZE md$TUNIT $DIR
truncate -s$TSIZE $DIR/$TFILE
ls -ls $DIR/$TFILE
umount $DIR
mdconfig -du $TUNIT

rmdir $DIR

exit 0
Comment 1 Konstantin Belousov freebsd_committer 2017-10-15 11:41:14 UTC
Well, tmpfs does support spare files.

What you reported is indeed the wrong behavior, but it is caused not by supposed lack of support for spares, but due to wrong code what tries to avoid OOM situations due to over-committing the file backing for tmpfs files.  See sys/fs/tmpfs_subr.c, functions tmpfs_pages_check_avail() and functions tmpfs_mem_avail() and tmpfs_pages_used() referenced from there.

In particular, tmpfs_mem_avail() is completely wrong, it mis-interprets v_free_count.

I think that tmpfs should only check the current page usage by specific mount point against per-mount point limit, if any.  Trying to formulate a limit against some formula involving v_free_count and other VM metrics cannot work, due to the VM algorithms.  The main reason is that we support paging memory to the backing store, so v_free_count indicates wasted memory (as opposed to the free or free-able memory in the common sense of the word).
Comment 2 Keith White 2017-10-15 16:21:01 UTC
Created attachment 187197 [details]
vfs.tmpfs.inactive_percent sysctl
Comment 3 Keith White 2017-10-15 16:23:53 UTC
Certainly not my intention to mislead with the bug subject!

I have no idea how to approach a fix.

Probably related to tmpfs reluctance to use VM: in order to use tmpfs as /tmp and /usr/obj for a kernel build on a read-only RPI3, the previous patch adds a  vfs.tmpfs.inactive_percent sysctl that "allows" tmpfs to use X% of inactive memory (default 0%).  Otherwise builds _may_ fail.

...keith
Comment 4 Konstantin Belousov freebsd_committer 2017-10-16 07:25:37 UTC
(In reply to Keith White from comment #3)
The tmpfs_mem_avail() function should die.  tmpfs_mount.tm_pages_max should be enough, it already allows administrator to limit the memory usage by mount point, mount -o size.
Comment 5 Keith White 2017-10-22 12:41:49 UTC
(In reply to Konstantin Belousov from comment #4)
After some dtracing (and head scratching), I believe I see a solution for my problem by using resident_page_count instead of size. It will complicate tmpfs_write() though...

...keith
Comment 6 Keith White 2017-10-22 23:53:45 UTC
Created attachment 187384 [details]
POC for using resident_page_count

This patch allows me to use large files with holes. e.g.

# mkdir /tmp/_x
# mount -t tmpfs -osize=10m tmp /tmp/_x
# truncate -s4g /tmp/_x/4g
# ls -ls /tmp/_x/4g
    0 -rw-r--r--  1 root  wheel  4294967296 Oct 22 19:51 /tmp/_x/4g
# df /tmp/_x
    Filesystem 1K-blocks Used Avail Capacity  Mounted on
    tmpfs          10240    4 10236     0%    /tmp/_x
# du /tmp/_x
    0       /tmp/_x
# umount /tmp/_x
# rmdir /tmp/_x
Comment 7 Konstantin Belousov freebsd_committer 2017-10-23 08:41:18 UTC
(In reply to Keith White from comment #5)
object->resident_page_fault is equally meaningless for your purposes.  What is not clear in my comment #4 ?

The patch does not require any of these dynamic calculations using meaningless (for this purpose) values.  If you want to limit the tmpfs mount memory use, specify explicit limit to mount_tmpfs.  The attempts to mis-use some pagedaemon or object internal counters would not work out, their purpose is very different and they do not match with resource limiting for tmpfs.
Comment 8 Keith White 2017-10-23 13:08:38 UTC
I read but probably misunderstood! If I create a file with holes I don't want to be "charged" for it until the holes are filled in.  I may want to create a disk image file, say, that is 4g but I know that I will only be storing 20m in total.  ffs-like filesystems allow this.  tmpfs has a more straight-forward idea of a file (allocate up-front, no hole management?).  A "BUGS" section in tmpfs(5) would have guided me away from attempting to use tmpfs for sparse files. Othersize tmpfs is an excellent fit for me since I'm running diskless+swapless.

I'll re-read your comments. I see a drawing-board over there that I should get back to...

...keith
Comment 9 Konstantin Belousov freebsd_committer 2017-10-23 14:53:48 UTC
(In reply to Keith White from comment #8)
Let me explain it in full:
1. tmpfs should not try to use current counters of the active/inactive or free queues, since they are irrelevant to the system ability to satisfy page requests.  If page is needed, the queues are scanned and a usable page might appear even if there is no free pages or all swap space is used (e.g. we can write out dirty file page or reuse clean file page).
2. tmpfs should provide a global limit on the number of used pages, in fact it already has it "-o size".  The limit is compared against the maintained counter of the supposedly used pages tm_pages_used.
3. Your problem is because tm_pages_used is too harsh.  It just sums up all files sizes, while it really should only count file pages which were really written to.  In other words, instead of adjusting tm_pages_used in tmpfs_reg_resize(), it should be adjusted in tmpfs_write().  [There is additional complication, see below].
4. The tmpfs_mem_avail() should be removed.  See item 1.

The complication is due to the tmpfs using in-place mapping, i.e. the vm object which contains the pages with the file data, directly provides the pages used for file mapping.  This was highly desirable feature, because it avoids duplicating memory for the mmapped tmpfs files, and makes mmap zero-copy.

Problem is, page faults in the sparcerly allocated mmaped file range instantiate the file pages, which must be accounted for in tm_pages_max.  The tmpfs vm objects are already flagged so this is not too hard to do, just that you cannot limit the patch to fs/tmpfs only.

This is my current opinion on the issue, hope this is clean enough.
Comment 10 Keith White 2017-10-23 16:03:59 UTC
(In reply to Konstantin Belousov from comment #9)

Yes, this helps to clear things up.  Very many thanks!

...keith