Scenario: - FreeBSD stable/14 52369c5d29f5f, amd64 - 244 GB file on ZFS - The file is being mmap'ed in its entirety by a program called "cawic". - Read and write operations are performed on selected portions of this mapping (specifically, the contents are - sequentially from the beginning to the end - read in blocks of size 128k and some of these blocks immediately rewritten with updated data, such that ultimately all blocks will have been read and some will have been rewritten). Result: - The process is killed, with the following message in dmesg: pid 28386 (cawic), jid 0, uid 20201, was killed: failed to reclaim memory - Swap space was not used Expected result: - The process should run to completion successfully. Notes: - This was working correctly (i.e., the process never was killed) some time before FreeBSD 12; with FreeBSD 12 the same issue occurred. - The ZFS filesystem on which the file resides has compression on. - If no blocks are ever written (all are only read), the process is not killed.
> This was working correctly (i.e., the process never was killed) some time before FreeBSD 12; with FreeBSD 12 the same issue occurred. Do you have any more detail on "some time before FreeBSD 12?" I.e., was it 10.x? 11.x? And to confirm, the issue is reproducible on some version of 12.x, and on 14.0?
Unfortunately not really... the program itself exists since Sep. 2011, and it was working o.k. for quite some time. In May 2019 I added sleep(3) calls and removed them immediately afterwards - probably because the problem surfaced then and I tried to lower the write load, seemingly unsuccessfully. I am running this only every half a year or so. The problem certainly existed on 12.3 and 12.4; I switched to stable/14 from 12.4. There does not seem to be a problem if the write load is rather small; but if a substantial portion of consecutive stretches of ca. 20 GB needs to be rewritten the program will be killed. The machine in question has 16 GB main memory. -- Martin
The issue is certainly reproducible - it happens every time I let the program run. Some background: "cawic" stands for "compare and write if changed". It is basically a fancy way of copying stdin to a pre-existing file, but only those portions which have changed are actually written. The pre-existing file is actually a disk image; it is being updated from the physical disk every few months (backup). The 128 block size was chosen to match the ZFS default block size. Writing only changed portions is intended to minimize snapshot sizes of the ZFS filesystem the file resides on. The current workaround is to restart the program until it runs to completion - the parts which have already been written will only be read, then the next part is partially rewritten and either fails there or finally succeeds. -- Martin
... it happens every time I let the program run AND a substantial part is being written. -- Martin
Update: Not only the program itself but also unrelated programs may be killed - here is an excerpt of today's /var/log/messages: Jan 3 09:08:55 hal kernel: pid 4892 (cawic), jid 0, uid 20201, was killed: failed to reclaim memory Jan 3 09:08:57 hal kernel: pid 892 (Xorg), jid 0, uid 0, was killed: failed to reclaim memory Jan 3 09:08:59 hal kernel: pid 644 (named), jid 0, uid 53, was killed: failed to reclaim memory Jan 3 09:09:00 hal kernel: pid 2891 (xterm), jid 0, uid 0, was killed: failed to reclaim memory Jan 3 09:09:01 hal devd[388]: notify_clients: send() failed; dropping unresponsive client Jan 3 09:09:02 hal kernel: pid 4842 (xterm), jid 0, uid 20201, was killed: failed to reclaim memory Jan 3 09:09:18 hal kernel: pid 2917 (xterm), jid 0, uid 80, was killed: failed to reclaim memory The process was running as a regular user, so this can easily be used as a DOS attack. -- Martin
Output of "vmstat -c 1000" just before the program gets killed: procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad0 ad1 in sy cs us sy id 1 0 21 229G 357M 22K 0 700 376 66K 69K 603 772 5495 97K 51K 1 42 57 3 0 21 229G 336M 22K 0 703 480 26K 53K 376 441 3907 100K 63K 2 15 84 0 0 21 229G 418M 23K 0 731 32 44K 4.3K 210 219 3011 105K 61K 2 12 85 2 0 21 229G 326M 22K 0 702 52 527 5.6K 5 5 1580 101K 53K 2 7 91 0 0 23 229G 358M 21K 1713 670 572 48K 254K 0 0 1505 96K 51K 3 12 85 1 0 22 229G 453M 20K 4262 648 1609 91K 166K 426 677 4658 90K 45K 2 48 49 0 0 22 229G 369M 22K 1504 718 344 4.9K 33K 292 355 3681 103K 63K 2 11 87 0 0 22 229G 317M 22K 0 702 159 22K 16K 520 610 5071 99K 54K 3 22 75 0 0 23 229G 161M 22K 4419 712 0 17K 74K 432 747 4953 99K 58K 2 34 64 0 0 21 229G 435M 7.9K 1742 248 0 88K 15K 355 444 3443 36K 32K 1 17 82 0 0 21 1.4G 408M 6 0 0 1955 65K 63K 692 803 5242 331 14K 0 45 55 0 0 21 1.4G 387M 6 0 0 0 3 2.7K 697 973 5925 293 25K 0 5 95 0 0 21 1.4G 384M 6 0 0 1956 0 2.8K 677 705 4968 283 14K 0 15 85 0 0 21 1.4G 382M 4 0 0 0 0 2.7K 768 1273 7037 269 26K 0 30 70 0 0 21 1.4G 382M 6 0 0 1956 0 2.5K 487 488 3809 302 9.9K 0 11 89 The last five lines are after the kill.
Watching the vmstat output, it seems that there is no pageout activity for a long time even while the cawic process has already started rewriting substantial amounts of the mapped file. Maybe the OS is busy paging in, completely "forgetting" about also writing back the changed pages. Or there is no "memory pressure" for pages rewritten of mapped files... -- Martin
Another kill with associated vmstat output: 0 0 14 17G 333M 14K 0 434 0 15K 18K 0 0 1373 62K 34K 1 5 94 0 0 14 17G 335M 14K 0 422 0 15K 18K 0 0 1354 61K 33K 1 6 93 0 0 14 17G 336M 13K 0 420 0 18K 19K 362 391 3952 60K 38K 1 11 89 2 0 22 17G 310M 11K 2151 358 439 13K 790K 14 28 1426 51K 29K 1 12 87 3 0 22 17G 456M 14K 3058 436 1450 97K 133K 0 23 1419 62K 35K 1 13 87 3 0 22 17G 398M 14K 0 432 39 445 5.2K 0 6 1400 62K 34K 2 6 93 0 0 22 17G 345M 14K 0 424 0 444 3.5K 1 0 1350 61K 33K 2 6 93 1 0 22 17G 108M 14K 2408 426 82 14K 63K 1397 1677 11128 58K 39K 2 55 43 1 0 21 17G 298M 14K 4834 422 106 65K 33K 231 249 3033 60K 39K 2 9 89 0 0 21 17G 268M 13K 4477 420 1854 67K 171K 154 172 2484 60K 35K 2 20 78 0 0 21 17G 311M 13K 4010 416 0 51K 24K 1469 1765 11330 58K 43K 2 53 45 3 0 22 17G 380M 14K 4779 433 655 51K 77K 6 22 1430 62K 34K 1 9 89 0 0 22 17G 459M 14K 4465 423 469 51K 64K 0 1 1379 61K 34K 1 6 93 0 0 22 17G 406M 14K 4982 424 468 15K 62K 0 0 1386 61K 33K 1 7 92 0 0 22 17G 349M 14K 4991 430 381 12K 55K 158 443 2957 62K 33K 2 28 70 3 0 21 17G 284M 14K 4670 430 256 31K 27K 672 1080 7129 62K 41K 2 38 60 0 0 21 17G 396M 13K 5051 420 650 58K 93K 0 0 1375 60K 34K 2 9 89 0 0 22 17G 344M 14K 4239 422 470 15K 60K 0 5 1380 61K 33K 1 8 91 0 0 21 17G 406M 14K 843 422 240 37K 23K 0 0 1354 61K 34K 1 8 91 1 0 21 17G 319M 14K 0 427 0 11K 3.5K 479 779 5628 61K 38K 2 27 72 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr ad0 ad1 in sy cs us sy id 1 0 21 17G 401M 13K 3450 409 649 57K 100K 759 815 6777 59K 41K 1 28 71 0 0 21 17G 337M 14K 0 428 84 446 7.7K 1 0 1367 62K 34K 2 4 94 0 0 22 17G 313M 14K 1796 428 523 24K 80K 0 1 1366 61K 34K 0 7 93 0 0 22 17G 336M 13K 0 416 0 19K 3.5K 0 0 1346 60K 33K 1 5 94 0 0 22 17G 264M 13K 3718 406 1015 32K 110K 402 474 4009 58K 35K 1 29 70 1 0 22 17G 139M 14K 5251 436 0 21K 28K 657 951 6062 62K 49K 1 30 69 1 0 20 1.4G 422M 8.4K 2401 330 1569 132K 65K 215 276 2965 37K 25K 1 18 81 0 0 20 1.4G 351M 326 17 0 0 500 3.3K 1943 2010 14499 544 32K 0 49 51 0 0 19 1.4G 429M 19 0 0 1956 20K 24K 1588 1900 12727 174 30K 0 56 44 0 0 19 1.4G 429M 2 0 0 1956 0 3.6K 1177 1216 8171 269 20K 0 39 61 0 0 19 1.4G 429M 2 0 0 0 0 3.6K 350 584 3307 179 8.3K 0 21 79 0 0 19 1.4G 348M 2 0 0 3213 52K 40K 772 1188 6669 162 17K 0 61 39
After adding an msync call for the region just written the program runs to completion. msync(curloc, nread, MS_ASYNC); This should not be necessary as the system should flush pages even without. -- Martin
In bug 195972 I used the same cawic program, at that time it did not exhibit the issue described here.