A system running buildbot workers for the upstream (C)Python project CI, running 12.2-RELEASE-p7 GENERIC amd64, sees swap usage increase over time (multiple runs) until 100% is consumed, at which point the following errors are generated: Nov 23 10:10:57 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(18): failed Nov 23 10:10:57 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(9): failed Nov 23 10:10:57 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(3): failed Nov 23 10:10:58 122-RELEASE-p10-amd64-9e36 kernel: pid 24131 (python), jid 0, uid 1002, was killed: out of swap space Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 kernel: pid 78211 (python3.9), jid 0, uid 1002, was killed: out of swap space Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(32): failed Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 syslogd: last message repeated 26 times Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(24): failed Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(32): failed Nov 23 10:11:01 122-RELEASE-p10-amd64-9e36 kernel: swap_pager_getswapspace(24): failed Usually, a single test run will consume 30-40% of the 8Gb swap total, and a second or third run will consume it all. The system is unable to swapoff, and no memory/vm resource utilisation changes after killing processes. The issue appears not to be reproducible after updating to a stable/12 kernel and installing in-place. Steps to reproduce: - Install 12.2-RELEASE and update to latest patch level (p7 at time of writing) - Checkout CPython (main) sources and run test suite [1] Additional References: Buildbot worker build run history: https://buildbot.python.org/all/#/builders/172 (test failure output may provide additional info)) [1] test command: make buildbottest TESTOPTS=-j4 TESTTIMEOUT=2100 I will follow this report up with `sysctl -a |grep vm` and `vmstat -z` output before and after issue reproduction
Created attachment 229931 [details] top output STARTING (first run)
Created attachment 229932 [details] vmstat / sysctl output BEFORE
Created attachment 229933 [details] vmstat / sysctl output DURING (first run)
Created attachment 229934 [details] top output DURING (first run)
Do you get any OOM kills while the tests are running?
I tried building cpython and running the test suite in a 12.2 bhyve VM with 2GB RAM, and it does not swap at all. I suspect that some tests are being skipped due to missing dependencies, or I did not configure/build python in some specific way (I just ran ./configure && make) or the specified test invocation is not right somehow. (In reply to Mark Johnston from comment #5) Never mind, I missed it in the first comment.
(In reply to Mark Johnston from comment #5) I'll see if I can reproduce out of the buildbot environment, and watch the tests run. What am I looking for that indicated OOM kills specifically?
Created attachment 229947 [details] vmstat / sysctl /top output REPROD Woke up this morning with all swap used (see attached file with sysctl.vm, zmstat and top output) Perhaps related, is buildbot 'leftover' processes at the end of the test run. Note: I've noticed these for quite a while, so they may not be related, but worth pointing out in case.
Cloning cpython main onto the same system to attempt reproduction outside of buildbot. The only additional dependencies I installed for the worker is gdb and sqlite for their Python modules and tests, which could well be relevent. Here's a full possibly relevent package list installed on the system, with known python dependencies (bundled, tests, or otherwise), marked with asterisk (*): autoconf-2.69_3 = up-to-date with index autoconf-wrapper-20131203 = up-to-date with index ca_root_nss-3.71 = up-to-date with index curl-7.80.0 = up-to-date with index expat-2.4.1 = up-to-date with index * gdb-11.1_1 = up-to-date with index * gettext-runtime-0.21 = up-to-date with index git-2.34.1 = up-to-date with index * (buildbot CI steps use this) gmake-4.3_2 = up-to-date with index gmp-6.2.1 = up-to-date with index indexinfo-0.3.1 = up-to-date with index libevent-2.1.12 = up-to-date with index libffi-3.3_1 = up-to-date with index * libiconv-1.16 = up-to-date with index * libtextstyle-0.21 = up-to-date with index m4-1.4.19,1 = up-to-date with index mpdecimal-2.5.1 = up-to-date with index mpfr-4.1.0_1 = up-to-date with index pcre-8.45 = up-to-date with index perl5-5.32.1_1 = up-to-date with index pkgconf-1.8.0,1 = up-to-date with index popt-1.18_1 = up-to-date with index python39-3.9.8 = up-to-date with index readline-8.1.1 = up-to-date with index * sqlite3-3.35.5_4,1 = up-to-date with index *
I made also test with jemalloc library dirty_decay_ms=0 setting, but this changed nothing. Attaching log.
Sorry for the noise - made comment in wrong browser window, please disregard comment #10.
^Triage: clear stale flags. To submitter: is this PR still relevant?