Bug 275632 - process being killed which has a big file mmap'ed and performs writes to it
Summary: process being killed which has a big file mmap'ed and performs writes to it
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 14.0-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-12-08 12:42 UTC by Martin Birgmeier
Modified: 2024-01-18 10:24 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Birgmeier 2023-12-08 12:42:42 UTC
Scenario:
- FreeBSD stable/14 52369c5d29f5f, amd64
- 244 GB file on ZFS
- The file is being mmap'ed in its entirety by a program called "cawic".
- Read and write operations are performed on selected portions of this mapping (specifically, the contents are - sequentially from the beginning to the end - read in blocks of size 128k and some of these blocks immediately rewritten with updated data, such that ultimately all blocks will have been read and some will have been rewritten).

Result:
- The process is killed, with the following message in dmesg:

pid 28386 (cawic), jid 0, uid 20201, was killed: failed to reclaim memory

- Swap space was not used

Expected result:
- The process should run to completion successfully.

Notes:
- This was working correctly (i.e., the process never was killed) some time before FreeBSD 12; with FreeBSD 12 the same issue occurred.
- The ZFS filesystem on which the file resides has compression on.
- If no blocks are ever written (all are only read), the process is not killed.
Comment 1 Ed Maste freebsd_committer freebsd_triage 2023-12-08 13:17:30 UTC
> This was working correctly (i.e., the process never was killed) some time before FreeBSD 12; with FreeBSD 12 the same issue occurred.

Do you have any more detail on "some time before FreeBSD 12?" I.e., was it 10.x? 11.x? And to confirm, the issue is reproducible on some version of 12.x, and on 14.0?
Comment 2 Martin Birgmeier 2023-12-08 13:36:28 UTC
Unfortunately not really... the program itself exists since Sep. 2011, and it was working o.k. for quite some time. In May 2019 I added sleep(3) calls and removed them immediately afterwards - probably because the problem surfaced then and I tried to lower the write load, seemingly unsuccessfully.

I am running this only every half a year or so. The problem certainly existed on 12.3 and 12.4; I switched to stable/14 from 12.4.

There does not seem to be a problem if the write load is rather small; but if a substantial portion of consecutive stretches of ca. 20 GB needs to be rewritten the program will be killed.

The machine in question has 16 GB main memory.

-- Martin
Comment 3 Martin Birgmeier 2023-12-08 13:53:37 UTC
The issue is certainly reproducible - it happens every time I let the program run.

Some background: "cawic" stands for "compare and write if changed". It is basically a fancy way of copying stdin to a pre-existing file, but only those portions which have changed are actually written. The pre-existing file is actually a disk image; it is being updated from the physical disk every few months (backup).

The 128 block size was chosen to match the ZFS default block size. Writing only changed portions is intended to minimize snapshot sizes of the ZFS filesystem the file resides on.

The current workaround is to restart the program until it runs to completion - the parts which have already been written will only be read, then the next part is partially rewritten and either fails there or finally succeeds.

-- Martin
Comment 4 Martin Birgmeier 2023-12-08 13:56:09 UTC
... it happens every time I let the program run AND a substantial part is being written.

-- Martin
Comment 5 Martin Birgmeier 2024-01-03 08:33:40 UTC
Update: Not only the program itself but also unrelated programs may be killed - here is an excerpt of today's /var/log/messages:

Jan  3 09:08:55 hal kernel: pid 4892 (cawic), jid 0, uid 20201, was killed: failed to reclaim memory
Jan  3 09:08:57 hal kernel: pid 892 (Xorg), jid 0, uid 0, was killed: failed to reclaim memory
Jan  3 09:08:59 hal kernel: pid 644 (named), jid 0, uid 53, was killed: failed to reclaim memory
Jan  3 09:09:00 hal kernel: pid 2891 (xterm), jid 0, uid 0, was killed: failed to reclaim memory
Jan  3 09:09:01 hal devd[388]: notify_clients: send() failed; dropping unresponsive client
Jan  3 09:09:02 hal kernel: pid 4842 (xterm), jid 0, uid 20201, was killed: failed to reclaim memory
Jan  3 09:09:18 hal kernel: pid 2917 (xterm), jid 0, uid 80, was killed: failed to reclaim memory

The process was running as a regular user, so this can easily be used as a DOS attack.

-- Martin
Comment 6 Martin Birgmeier 2024-01-03 08:47:20 UTC
Output of "vmstat -c 1000" just before the program gets killed:

 procs    memory    page                      disks     faults       cpu
 r  b  w  avm  fre  flt  re  pi  po   fr   sr ad0 ad1   in   sy   cs us sy id
 1  0 21 229G 357M  22K   0 700 376  66K  69K 603 772 5495  97K  51K  1 42 57
 3  0 21 229G 336M  22K   0 703 480  26K  53K 376 441 3907 100K  63K  2 15 84
 0  0 21 229G 418M  23K   0 731  32  44K 4.3K 210 219 3011 105K  61K  2 12 85
 2  0 21 229G 326M  22K   0 702  52  527 5.6K   5   5 1580 101K  53K  2  7 91
 0  0 23 229G 358M  21K 1713 670 572  48K 254K   0   0 1505  96K  51K  3 12 85
 1  0 22 229G 453M  20K 4262 648 1609  91K 166K 426 677 4658  90K  45K  2 48 49
 0  0 22 229G 369M  22K 1504 718 344 4.9K  33K 292 355 3681 103K  63K  2 11 87
 0  0 22 229G 317M  22K   0 702 159  22K  16K 520 610 5071  99K  54K  3 22 75
 0  0 23 229G 161M  22K 4419 712   0  17K  74K 432 747 4953  99K  58K  2 34 64
 0  0 21 229G 435M 7.9K 1742 248   0  88K  15K 355 444 3443  36K  32K  1 17 82
 0  0 21 1.4G 408M    6   0   0 1955  65K  63K 692 803 5242  331  14K  0 45 55
 0  0 21 1.4G 387M    6   0   0   0    3 2.7K 697 973 5925  293  25K  0  5 95
 0  0 21 1.4G 384M    6   0   0 1956    0 2.8K 677 705 4968  283  14K  0 15 85
 0  0 21 1.4G 382M    4   0   0   0    0 2.7K 768 1273 7037  269  26K  0 30 70
 0  0 21 1.4G 382M    6   0   0 1956    0 2.5K 487 488 3809  302 9.9K  0 11 89

The last five lines are after the kill.
Comment 7 Martin Birgmeier 2024-01-03 09:16:49 UTC
Watching the vmstat output, it seems that there is no pageout activity for a long time even while the cawic process has already started rewriting substantial amounts of the mapped file.

Maybe the OS is busy paging in, completely "forgetting" about also writing back the changed pages.

Or there is no "memory pressure" for pages rewritten of mapped files...

-- Martin
Comment 8 Martin Birgmeier 2024-01-03 09:17:48 UTC
Another kill with associated vmstat output:

 0  0 14  17G 333M  14K   0 434   0  15K  18K   0   0 1373  62K  34K  1  5 94
 0  0 14  17G 335M  14K   0 422   0  15K  18K   0   0 1354  61K  33K  1  6 93
 0  0 14  17G 336M  13K   0 420   0  18K  19K 362 391 3952  60K  38K  1 11 89
 2  0 22  17G 310M  11K 2151 358 439  13K 790K  14  28 1426  51K  29K  1 12 87
 3  0 22  17G 456M  14K 3058 436 1450  97K 133K   0  23 1419  62K  35K  1 13 87
 3  0 22  17G 398M  14K   0 432  39  445 5.2K   0   6 1400  62K  34K  2  6 93
 0  0 22  17G 345M  14K   0 424   0  444 3.5K   1   0 1350  61K  33K  2  6 93
 1  0 22  17G 108M  14K 2408 426  82  14K  63K 1397 1677 11128  58K  39K  2 55 43
 1  0 21  17G 298M  14K 4834 422 106  65K  33K 231 249 3033  60K  39K  2  9 89
 0  0 21  17G 268M  13K 4477 420 1854  67K 171K 154 172 2484  60K  35K  2 20 78
 0  0 21  17G 311M  13K 4010 416   0  51K  24K 1469 1765 11330  58K  43K  2 53 45
 3  0 22  17G 380M  14K 4779 433 655  51K  77K   6  22 1430  62K  34K  1  9 89
 0  0 22  17G 459M  14K 4465 423 469  51K  64K   0   1 1379  61K  34K  1  6 93
 0  0 22  17G 406M  14K 4982 424 468  15K  62K   0   0 1386  61K  33K  1  7 92
 0  0 22  17G 349M  14K 4991 430 381  12K  55K 158 443 2957  62K  33K  2 28 70
 3  0 21  17G 284M  14K 4670 430 256  31K  27K 672 1080 7129  62K  41K  2 38 60
 0  0 21  17G 396M  13K 5051 420 650  58K  93K   0   0 1375  60K  34K  2  9 89
 0  0 22  17G 344M  14K 4239 422 470  15K  60K   0   5 1380  61K  33K  1  8 91
 0  0 21  17G 406M  14K 843 422 240  37K  23K   0   0 1354  61K  34K  1  8 91
 1  0 21  17G 319M  14K   0 427   0  11K 3.5K 479 779 5628  61K  38K  2 27 72
 procs    memory    page                      disks     faults       cpu
 r  b  w  avm  fre  flt  re  pi  po   fr   sr ad0 ad1   in   sy   cs us sy id
 1  0 21  17G 401M  13K 3450 409 649  57K 100K 759 815 6777  59K  41K  1 28 71
 0  0 21  17G 337M  14K   0 428  84  446 7.7K   1   0 1367  62K  34K  2  4 94
 0  0 22  17G 313M  14K 1796 428 523  24K  80K   0   1 1366  61K  34K  0  7 93
 0  0 22  17G 336M  13K   0 416   0  19K 3.5K   0   0 1346  60K  33K  1  5 94
 0  0 22  17G 264M  13K 3718 406 1015  32K 110K 402 474 4009  58K  35K  1 29 70
 1  0 22  17G 139M  14K 5251 436   0  21K  28K 657 951 6062  62K  49K  1 30 69
 1  0 20 1.4G 422M 8.4K 2401 330 1569 132K  65K 215 276 2965  37K  25K  1 18 81
 0  0 20 1.4G 351M  326  17   0   0  500 3.3K 1943 2010 14499  544  32K  0 49 51
 0  0 19 1.4G 429M   19   0   0 1956  20K  24K 1588 1900 12727  174  30K  0 56 44
 0  0 19 1.4G 429M    2   0   0 1956    0 3.6K 1177 1216 8171  269  20K  0 39 61
 0  0 19 1.4G 429M    2   0   0   0    0 3.6K 350 584 3307  179 8.3K  0 21 79
 0  0 19 1.4G 348M    2   0   0 3213  52K  40K 772 1188 6669  162  17K  0 61 39
Comment 9 Martin Birgmeier 2024-01-04 10:09:40 UTC
After adding an msync call for the region just written the program runs to completion.

    msync(curloc, nread, MS_ASYNC);

This should not be necessary as the system should flush pages even without.

-- Martin
Comment 10 Martin Birgmeier 2024-01-07 07:43:24 UTC
In bug 195972 I used the same cawic program, at that time it did not exhibit the issue described here.