While implementing *BSD platform support for sysutils/py-psutils, I've encountered strange behaviour when using either the libprocstat or kvm interface to retrieve the environ for a foreign process on the system: Both kvm_getenvv() and procstat_getenvv() return ENOMEM on some systems. This happens repeatingly reproducable e.g. for the X11 Xorg server process; it can already be seen by using system utilities as well: » pgrep Xorg 1401 » sudo ps -e -p 1401 PID TT STAT TIME COMMAND 1401 v0 S 646:37.30 /usr/local/bin/Xorg :0 -listen tcp (shows no environment and swallows the error) » sudo procstat -e 1401 PID COMM ENVIRONMENT procstat: sysctl(kern.proc.env): Cannot allocate memory 1401 Xorg - I've tried to track this down and searched for similar bug reports, but all suggestions mentioned there do not work. 1) Both library functions internally use the sysctl() interface to retrieve the information from the kernel. sysctl() uses _locked memory_ for transferring the data. So, a resource limit may be hit. But even if I raise the locked memory soft and hard limits (``ulimit -l -H ...´´, the error stays. 2) Even if I raise **vm.max_wired** to make sure no global limit is reached, the error stays. There is another scenario which trigges the problem: When using the 'Cirrus CI' continuous build platform (which runs on Google compute engine to my knownledge), there is also a process whose environment cannot be retrieve with the system interfaces kvm_getenvv() / procstat_getenvv(): sysctl vm.max_wired vm.stats.vm.v_wire_count vm.max_wired: 331490 vm.stats.vm.v_wire_count: 136164 limits Resource limits (current): cputime infinity secs filesize infinity kB datasize 33554432 kB stacksize 524288 kB coredumpsize infinity kB memoryuse infinity kB memorylocked 131072 kB maxprocesses 8499 openfiles 116856 sbsize infinity bytes vmemoryuse infinity kB pseudo-terminals infinity swapuse infinity kB kqueues infinity umtxp infinity ps auxm USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND root 25598 0.0 7.0 990596 291548 u0 S+ 23:58 0:12.63 ./cirrus-ci-agent -task-id 5900109814693888 -client-token d5950e87a4cc4ce89436b07589944d60 -server-token abae919382e6411787e4d67fc60dd527 -api-endpoint grpc.cirrus-ci.com:443 root 1 0.0 0.0 9788 284 - ILs 23:52 0:00.01 /sbin/init -- root 453 0.0 0.0 10456 1460 - Ss 23:52 0:00.00 /sbin/devd [....] root 22 0.0 0.0 12320 1596 u0 Is+ 23:52 0:00.04 sh /etc/rc autoboot root 25548 0.0 0.0 12320 1848 u0 I+ 23:58 0:00.00 sh /etc/rc autoboot [....] root 0 0.0 0.0 0 320 - DLs 23:52 0:00.00 [kernel] [....] procstat -e 22 procstat: sysctl(kern.proc.env): Cannot allocate memory PID COMM ENVIRONMENT 22 sh -
Some possible reasons for ENOMEM from kern.proc.env are: - env vector corruption, e.g. if application filled env vector with invalid (or NULL) pointers - application made the env vector or env strings set larger that ARG_MAX. From the kernel PoV, the environment strings exist only at the moment of the execve(2) calls, when strings for args and env are passed through kernel from previous program to the new one. Between execs, it is up to the usermode to maintain env strings in a way it finds most convenient. Sysctl kern.proc.env is a hack to satisfy the popular request assuming the application did not deviated much from the structure passed to the new program on exec. If it did deviate, kernel cannot do much.
(In reply to Konstantin Belousov from comment #1) Hi Konstantin, thanks for the quick reply and the explanations so far. Howevery, in the particular cases I've mentioned, I still cannot understand that processes like - the X11 Xorg server - /bin/sh launched for system autoboot like seen in the Cirrus CI box would deviate or corrupt their environment?!
(In reply to Armin Gruner from comment #2) Okay, I think I begin to understand a bit more after thinking about your explanations. Is it that a process after launch did setenv(), **envp had to be reallocated so that an new entry fits in and thus, that is the deviation you meant?
(In reply to Armin Gruner from comment #3) Yes, this is one of the most common cases. I highly doubt that it is a corruption. Just a state which is made internally by libc due to the app request.