Created attachment 150465 [details] example code System does not free mem after unmap(). Before start build set: const char *file_name = (const char *)"/testvn.tmp"; off_t file_size = (4 * 1024 * mb); /* Set to x2 RAM size. */ 4 - replace to your host ram size * 2.
swap is off.
mem = mmap(NULL, write_size, (PROT_READ | PROT_WRITE), (MAP_SHARED | MAP_NOCORE), fd, (i * write_size)); . . . //msync(mem, file_size, MS_SYNC); //posix_madvise(mem, file_size, MADV_FREE); munmap(mem, file_size); write_size for len for mmap but file_size for len for munmap? Quoting the man page for munmap: The munmap() system call will fail if: [EINVAL] The addr argument was not page aligned, the len argu- ment was zero or negative, or some part of the region being unmapped is outside the valid address range for a process. As near as I can tell the munmap calls were returning EINVAL and possibly not actually doing the unmap at all. A correct len for the munmap would be needed in order for the munmap calls to actually guarantee to unmap without leaving any pages mapped. The denial of service could just have the munmap commented out, like the msync and posix_madvise are. munmap freeing RAM (or not) is a separate issue. You probably would need distinct submittals for the two issues if both really apply.
Process memory usage does not grow, unmap should works fine.
(In reply to rozhuk.im from comment #3) Looks like I misinterpreted the man page's description the munmap does return 0 in the example. Sorry for the noise for that point. But may be a can make up for it . . . May be the following example makes the report clearer? I ran an a.out built from a source with 16 instead of 4, so for a 8 GiByte aarch64 system. Before the a.out run: Mem: 13M Active 1240K Inact, 108M Wired, 28M Buf, 7757M Free Swap: 28G Total, 28G Free After the .aout had finished, from a separate top run: Mem: 2197M Active, 4937M Inact, 255M Wired, 186M Buf, 629M Free Swap: 28G Total, 11M Used, 28G Free 15 minutes+ later with the system left idle: not much change from what is shown above for the above.
(In reply to Mark Millard from comment #4) This behaviour is expected. The kernel is caching the file's contents even after the file is unmapped.
(In reply to Mark Johnston from comment #5) On 10.1 this cache eat all free mem and all system freeze, if swap not enabled. Now is better, but disk cache still cause swap usage or freezes (but less then on 10x) if no swap present. Disk cache should have very low priority and be like free mem - avaible to use by any app for mem alloc or have very hard limit via sysctl and does not consume all free mem. Another thing: I do not want use swap, but I need kernel coredumps, how can I do this?
(In reply to rozhuk.im from comment #6) Your test program is dirtying pages by writing to them, so the OS is forced to flush them to disk before they can be reused. It is easy to dirty pages more quickly than they can be written back and freed, in which case the system will continually be starved for free pages. Currently I don't believe we have any mechanism to restrict the amount of dirty memory mapped into a given process. Regarding kernel dumps, if you have space on a raw partition, you can simply point dumpon(8) at that. If not, you might consider using netdump(4).
(In reply to Mark Johnston from comment #7) Main problem is that dirty pages does not go to free after if flushed to disk, and it cause swap usage. On 10.1 - system can not allocate mem even after system stop write to disk. Probably now I should count free mem as: free + laundry. Another bug: I have 6,5 Gb free, program write 6 Gb, rename file and restart program. Program fail some time later, but system can not flush pages to disk - no space, and whole time move mem: ... CPU: 0.0% user, 0.0% nice, 10.5% system, 7.0% interrupt, 82.5% idle Mem: 1945M Active, 120K Inact, 1537M Laundry, 315M Wired, 199M Buf, 47M Free ... CPU: 0.0% user, 0.0% nice, 14.3% system, 6.3% interrupt, 79.5% idle Mem: 2765M Active, 36K Inact, 717M Laundry, 315M Wired, 199M Buf, 48M Free 16 root 49 - 0 72K CPU3 3 12:26 59.96% [pagedaemon{laundry: dom0}] ... and eat CPU. Until file was deleted. About kernel dump. I try: swapoff -a swapoff: /dev/gptid/0714a812-b98e-11e8-a831-7085c2375722.eli: Cannot allocate memory System has 32 Gb RAM, 1,8 Gb in swap and summ of RES all running apps is less than 20 Gb. Only after I stop one vbox vm and get 4+gb free mem it work without error.
(In reply to rozhuk.im from comment #6) > On 10.1 this cache eat all free mem and all system freeze, if swap not enabled. In what I describe below I was testing a head -r339076 based FreeBSD on an aarch64 8GiByte system. So I used 16 instead of 4 to scale to twice the RAM size. swapoff -a had been used first so no swap was enabled. First I give the ideas, then comment on the tests of using the ideas. If the file is to stick around and should be fully updated, an fsync before closing would seem to deal with updating the file at the time. If so, the RAM pages should no longer be dirty. That should in turn allow later conversion of such pages to other uses in any process without I/O or such at the time. True even if started after swapoff -a so no swap is available. If the file is to stick around and does not need to be (fully) updated, posix_madvise(mem, write_size, MADV_FREE) before each unmap and use of the fsync before close (to be sure to deal with any dirty pages if the hint is not taken) might be how to hint the intent to the system. Both Active and Inact might end up containing such clean RAM pages that could be put to direct use for other memory requests, if I understand right. True even if started after swapoff -a so no swap is available. So I tried those combinations: The MADV_FREE and fsync combination allowed me to (with swapoff -a before): run, rename the file, run, . . . The same was true for the just fsync variant: run, rename the file, run, . . . So the presence of multiple such 16 GiByte files from past runs (no boots between) did not prevent the next file from being created and used the same way on the 8 GiByte RAM box with no swap space enabled. Clearly the RAM is being made available as needed for this program. In at least the MADV_FREE involved case, both Active and Inact could be large after a run but that did not interfere with later runs: the RAM pages became available as needed without needing to be swapped first. If there is something I missed in the test structure, let me know.
(In reply to rozhuk.im from comment #8) One difference in my test context and yours appears to be that I'm not using any encryption layer but the .eli in: /dev/gptid/0714a812-b98e-11e8-a831-7085c2375722.eli suggests that you are using geli-based encryption. My context is basic, simple UFS. In my context the I/O is fairly low latency and fairly high rate (an SSD). So, for example, the fsync activity does not last long. I have no experience with such issues for geli encryption and have no clue how your I/O subsystem latency and bandwidth might compare. I'm also probably less likely to see the file system try to allocate memory during its attempt to fsync or otherwise write out dirty RAM pages (making them clean). All of this may make it harder for me to replicate the behavior that you would see for the same test program run the same way but in your context.
(In reply to Mark Johnston from comment #5) Thanks for that note. The caching status after unmap and after close and after process deletion has helped clear out bad assumptions of mine, including just what top's Active, Inact, and Buf mean in various contexts. My assumptions were tied to the observed behavior of the limited range of my typical workloads.