Bug 110498 - net-snmp proc monitoring randomly fails
Summary: net-snmp proc monitoring randomly fails
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: Normal Affects Only Me
Assignee: Jun Kuriyama
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-19 03:40 UTC by Mike Andrews
Modified: 2007-03-25 13:40 UTC (History)
0 users

See Also:


Attachments
file.diff (180 bytes, patch)
2007-03-19 03:40 UTC, Mike Andrews
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Andrews 2007-03-19 03:40:04 UTC
With net-snmp 5.3.1 and FreeBSD 6.2-RELEASE (i386 or amd64) the "proc"
monitoring facility will randomly indicate alarms that certain processes
are not running (or not enough are running) when in fact they actually are.
The alarms will suddenly start with no warning and then clear themselves
up several hours later.

If you have Nagios checking these alarms, it can be highly annoying. :)

I'm fairly certain net-snmp 5.2.x and earlier don't have this problem
(I've been using them for years).

The problem is that net-snmp uses /bin/ps to get a list of processes
and writes the output of ps to /var/net-snmp/.snmp-exec-cache.  The
file is truncated at 16000 bytes.  This is way too small for systems
with many hundreds of running processes at a time.

Maybe previous versions (5.2.x and earlier) of net-snmp used something
other than /bin/ps to get the process list?  I don't have a procfs
filesystem mounted (I did try it to see if it'd help and it didn't)

Fix: Try this patch, though only the second half of it seems to actually fix it:


#define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (200*80)   /* roughly 200 lines max */

  /* misc defaults */

--- 488,494 ----

  #define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (1500*80)   /* roughly 1500 lines max */

  /* misc defaults */



#define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (200*80)   /* roughly 200 lines max */

  /* misc defaults */

--- 1334,1340 ----

  #define EXCACHETIME 30
  #define CACHEFILE ".snmp-exec-cache"
! #define MAXCACHESIZE (1500*80)   /* roughly 1500 lines max */

  /* misc defaults */--sznsPMpknQ0JjtUfvaMZEilFdgT744FenAIZbpazL1aJI1Tu
Content-Type: text/plain; name="file.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="file.diff"

*** acconfig.h.orig     Fri May 26 12:36:06 2006
--- acconfig.h  Sun Mar 18 22:24:27 2007
***************
*** 488,494 ****
How-To-Repeat: 
bourbon# grep proc /usr/local/share/snmp/snmpd.conf
proc syslogd 1 1
proc httpd
proc ntpd 1 1
proc smartd
proc clamd
proc freshclam
bourbon# ps -U vscan | grep clam
84154  ??  Is     0:00.18 /usr/local/bin/freshclam --daemon -p /var/run/clamav/freshclam.pid
84265  ??  Is     0:04.61 /usr/local/sbin/clamd
bourbon# snmpwalk -v 2c -c ___ localhost .1.3.6.1.4.1.2021.2.1
UCD-SNMP-MIB::prIndex.1 = INTEGER: 1
UCD-SNMP-MIB::prIndex.2 = INTEGER: 2
UCD-SNMP-MIB::prIndex.3 = INTEGER: 3
UCD-SNMP-MIB::prIndex.4 = INTEGER: 4
UCD-SNMP-MIB::prIndex.5 = INTEGER: 5
UCD-SNMP-MIB::prIndex.6 = INTEGER: 6
UCD-SNMP-MIB::prNames.1 = STRING: syslogd
UCD-SNMP-MIB::prNames.2 = STRING: httpd
UCD-SNMP-MIB::prNames.3 = STRING: ntpd
UCD-SNMP-MIB::prNames.4 = STRING: smartd
UCD-SNMP-MIB::prNames.5 = STRING: clamd
UCD-SNMP-MIB::prNames.6 = STRING: freshclam
UCD-SNMP-MIB::prMin.1 = INTEGER: 1
UCD-SNMP-MIB::prMin.2 = INTEGER: 0
UCD-SNMP-MIB::prMin.3 = INTEGER: 1
UCD-SNMP-MIB::prMin.4 = INTEGER: 0
UCD-SNMP-MIB::prMin.5 = INTEGER: 0
UCD-SNMP-MIB::prMin.6 = INTEGER: 0
UCD-SNMP-MIB::prMax.1 = INTEGER: 1
UCD-SNMP-MIB::prMax.2 = INTEGER: 0
UCD-SNMP-MIB::prMax.3 = INTEGER: 1
UCD-SNMP-MIB::prMax.4 = INTEGER: 0
UCD-SNMP-MIB::prMax.5 = INTEGER: 0
UCD-SNMP-MIB::prMax.6 = INTEGER: 0
UCD-SNMP-MIB::prCount.1 = INTEGER: 1
UCD-SNMP-MIB::prCount.2 = INTEGER: 345
UCD-SNMP-MIB::prCount.3 = INTEGER: 1
UCD-SNMP-MIB::prCount.4 = INTEGER: 1
UCD-SNMP-MIB::prCount.5 = INTEGER: 0
UCD-SNMP-MIB::prCount.6 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.1 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.2 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.3 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.4 = INTEGER: 0
UCD-SNMP-MIB::prErrorFlag.5 = INTEGER: 1
UCD-SNMP-MIB::prErrorFlag.6 = INTEGER: 1
UCD-SNMP-MIB::prErrMessage.1 = STRING:
UCD-SNMP-MIB::prErrMessage.2 = STRING:
UCD-SNMP-MIB::prErrMessage.3 = STRING:
UCD-SNMP-MIB::prErrMessage.4 = STRING:
UCD-SNMP-MIB::prErrMessage.5 = STRING: No clamd process running.
UCD-SNMP-MIB::prErrMessage.6 = STRING: No freshclam process running.
UCD-SNMP-MIB::prErrFix.1 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.2 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.3 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.4 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.5 = INTEGER: 0
UCD-SNMP-MIB::prErrFix.6 = INTEGER: 0
UCD-SNMP-MIB::prErrFixCmd.1 = STRING:
UCD-SNMP-MIB::prErrFixCmd.2 = STRING:
UCD-SNMP-MIB::prErrFixCmd.3 = STRING:
UCD-SNMP-MIB::prErrFixCmd.4 = STRING:
UCD-SNMP-MIB::prErrFixCmd.5 = STRING:
UCD-SNMP-MIB::prErrFixCmd.6 = STRING:
bourbon# ps -U vscan | grep clam
84154  ??  Is     0:00.18 /usr/local/bin/freshclam --daemon -p /var/run/clamav/freshclam.pid
84265  ??  Is     0:04.61 /usr/local/sbin/clamd
bourbon# ps -acx | grep httpd | wc
     744    3720   23808

(744 > 345)   ;-)
Comment 1 Edwin Groothuis freebsd_committer freebsd_triage 2007-03-19 06:55:10 UTC
Responsible Changed
From-To: freebsd-ports-bugs->kuriyama

Over to maintainer
Comment 2 dfilter service freebsd_committer freebsd_triage 2007-03-25 13:35:54 UTC
kuriyama    2007-03-25 12:35:46 UTC

  FreeBSD ports repository

  Modified files:
    net-mgmt/net-snmp    Makefile 
    net-mgmt/net-snmp/files snmpd.sh.in 
  Added files:
    net-mgmt/net-snmp/files patch-net-snmp-config.h.in 
  Log:
  - Remove "sig_stop=KILL" in snmpd.sh.in.  This was introduced when
    PR ports/63759 was committed (3 years ago).  Try to use normal TERM
    signal for graceful termination [1].
  - Increase /bin/ps cache size from 16KB to 120KB.  This should fix
    process counter (ex prCount.1) on the server which has large number
    of processes [2].
  
  PR:             ports/103811 [1], ports/110498 [2]
  Reported by:    Yuri Arabadji <yuri@deepunix.net> [1],
                  Mike Andrews <mandrews@bit0.com> [2]
  
  Revision  Changes    Path
  1.141     +1 -1      ports/net-mgmt/net-snmp/Makefile
  1.1       +11 -0     ports/net-mgmt/net-snmp/files/patch-net-snmp-config.h.in (new)
  1.6       +1 -2      ports/net-mgmt/net-snmp/files/snmpd.sh.in
_______________________________________________
cvs-all@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/cvs-all
To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
Comment 3 Jun Kuriyama freebsd_committer freebsd_triage 2007-03-25 13:36:35 UTC
State Changed
From-To: open->closed

Increased to 120KB as your patch.  Thanks!