Bug 248537 - procstat -e/kvm_getenvv() fails for specific processes
Summary: procstat -e/kvm_getenvv() fails for specific processes
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 12.1-RELEASE
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-08-08 14:41 UTC by Armin Gruner
Modified: 2020-08-08 16:37 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Armin Gruner 2020-08-08 14:41:40 UTC
While implementing *BSD platform support for sysutils/py-psutils, I've encountered strange behaviour when using either the libprocstat or kvm interface to retrieve the environ for a foreign process on the system:

Both kvm_getenvv() and procstat_getenvv() return ENOMEM on some systems.

This happens repeatingly reproducable e.g. for the X11 Xorg server process; it can already be seen by using system utilities as well:

» pgrep Xorg
1401

» sudo ps -e -p 1401
 PID TT  STAT      TIME COMMAND
1401 v0  S    646:37.30  /usr/local/bin/Xorg :0 -listen tcp
 (shows no environment and swallows the error)

» sudo procstat -e 1401
  PID COMM             ENVIRONMENT
procstat: sysctl(kern.proc.env): Cannot allocate memory
 1401 Xorg             -


I've tried to track this down and searched for similar bug reports, but all suggestions mentioned there do not work.

1) Both library functions internally use the sysctl() interface to retrieve the information from the kernel. sysctl() uses _locked memory_ for transferring the data. So, a resource limit may be hit.

But even if I raise the locked memory soft and hard limits (``ulimit -l -H ...´´, the error stays.


2) Even if I raise **vm.max_wired** to make sure no global limit is reached, the error stays.


There is another scenario which trigges the problem:
When using the 'Cirrus CI'  continuous build platform (which runs on Google compute engine to my knownledge), there is also a process whose environment cannot be retrieve with the system interfaces kvm_getenvv() / procstat_getenvv():

sysctl vm.max_wired vm.stats.vm.v_wire_count
vm.max_wired: 331490
vm.stats.vm.v_wire_count: 136164

limits
Resource limits (current):
  cputime              infinity secs
  filesize             infinity kB
  datasize             33554432 kB
  stacksize              524288 kB
  coredumpsize         infinity kB
  memoryuse            infinity kB
  memorylocked           131072 kB
  maxprocesses             8499
  openfiles              116856
  sbsize               infinity bytes
  vmemoryuse           infinity kB
  pseudo-terminals     infinity
  swapuse              infinity kB
  kqueues              infinity
  umtxp                infinity

ps auxm
USER    PID  %CPU %MEM    VSZ    RSS TT  STAT STARTED     TIME COMMAND
root  25598   0.0  7.0 990596 291548 u0  S+   23:58    0:12.63 ./cirrus-ci-agent -task-id 5900109814693888 -client-token d5950e87a4cc4ce89436b07589944d60 -server-token abae919382e6411787e4d67fc60dd527 -api-endpoint grpc.cirrus-ci.com:443
root      1   0.0  0.0   9788    284  -  ILs  23:52    0:00.01 /sbin/init --
root    453   0.0  0.0  10456   1460  -  Ss   23:52    0:00.00 /sbin/devd
[....]
root     22   0.0  0.0  12320   1596 u0  Is+  23:52    0:00.04 sh /etc/rc autoboot
root  25548   0.0  0.0  12320   1848 u0  I+   23:58    0:00.00 sh /etc/rc autoboot
[....]
root      0   0.0  0.0      0    320  -  DLs  23:52    0:00.00 [kernel]
[....]

procstat -e 22
procstat: sysctl(kern.proc.env): Cannot allocate memory
  PID COMM             ENVIRONMENT
   22 sh               -
Comment 1 Konstantin Belousov freebsd_committer 2020-08-08 14:51:34 UTC
Some possible reasons for ENOMEM from kern.proc.env are:
- env vector corruption, e.g. if application filled env vector with invalid (or NULL) pointers
- application made the env vector or env strings set larger that ARG_MAX.

From the kernel PoV, the environment strings exist only at the moment of the execve(2) calls,
when strings for args and env are passed through kernel from previous program to the new one.
Between execs, it is up to the usermode to maintain env strings in a way it finds most convenient.
Sysctl kern.proc.env is a hack to satisfy the popular request assuming the application did
not deviated much from the structure passed to the new program on exec.  If it did deviate,
kernel cannot do much.
Comment 2 Armin Gruner 2020-08-08 16:28:39 UTC
(In reply to Konstantin Belousov from comment #1)

Hi Konstantin,
thanks for the quick reply and the explanations so far.

Howevery, in the particular cases I've mentioned, I still cannot understand that processes like

- the X11 Xorg server
- /bin/sh launched for system autoboot like seen in the Cirrus CI box

would deviate or corrupt their environment?!
Comment 3 Armin Gruner 2020-08-08 16:32:34 UTC
(In reply to Armin Gruner from comment #2)

Okay, I think I begin to understand a bit more after thinking about your explanations.

Is it that a process after launch did setenv(), **envp had to be reallocated so that an new entry fits in and thus, that is the deviation you meant?
Comment 4 Konstantin Belousov freebsd_committer 2020-08-08 16:37:20 UTC
(In reply to Armin Gruner from comment #3)
Yes, this is one of the most common cases.  I highly doubt that it is a corruption.
Just a state which is made internally by libc due to the app request.