With net-snmp 5.3.1 and FreeBSD 6.2-RELEASE (i386 or amd64) the "proc" monitoring facility will randomly indicate alarms that certain processes are not running (or not enough are running) when in fact they actually are. The alarms will suddenly start with no warning and then clear themselves up several hours later. If you have Nagios checking these alarms, it can be highly annoying. :) I'm fairly certain net-snmp 5.2.x and earlier don't have this problem (I've been using them for years). The problem is that net-snmp uses /bin/ps to get a list of processes and writes the output of ps to /var/net-snmp/.snmp-exec-cache. The file is truncated at 16000 bytes. This is way too small for systems with many hundreds of running processes at a time. Maybe previous versions (5.2.x and earlier) of net-snmp used something other than /bin/ps to get the process list? I don't have a procfs filesystem mounted (I did try it to see if it'd help and it didn't) Fix: Try this patch, though only the second half of it seems to actually fix it: #define EXCACHETIME 30 #define CACHEFILE ".snmp-exec-cache" ! #define MAXCACHESIZE (200*80) /* roughly 200 lines max */ /* misc defaults */ --- 488,494 ---- #define EXCACHETIME 30 #define CACHEFILE ".snmp-exec-cache" ! #define MAXCACHESIZE (1500*80) /* roughly 1500 lines max */ /* misc defaults */ #define EXCACHETIME 30 #define CACHEFILE ".snmp-exec-cache" ! #define MAXCACHESIZE (200*80) /* roughly 200 lines max */ /* misc defaults */ --- 1334,1340 ---- #define EXCACHETIME 30 #define CACHEFILE ".snmp-exec-cache" ! #define MAXCACHESIZE (1500*80) /* roughly 1500 lines max */ /* misc defaults */--sznsPMpknQ0JjtUfvaMZEilFdgT744FenAIZbpazL1aJI1Tu Content-Type: text/plain; name="file.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="file.diff" *** acconfig.h.orig Fri May 26 12:36:06 2006 --- acconfig.h Sun Mar 18 22:24:27 2007 *************** *** 488,494 **** How-To-Repeat: bourbon# grep proc /usr/local/share/snmp/snmpd.conf proc syslogd 1 1 proc httpd proc ntpd 1 1 proc smartd proc clamd proc freshclam bourbon# ps -U vscan | grep clam 84154 ?? Is 0:00.18 /usr/local/bin/freshclam --daemon -p /var/run/clamav/freshclam.pid 84265 ?? Is 0:04.61 /usr/local/sbin/clamd bourbon# snmpwalk -v 2c -c ___ localhost .1.3.6.1.4.1.2021.2.1 UCD-SNMP-MIB::prIndex.1 = INTEGER: 1 UCD-SNMP-MIB::prIndex.2 = INTEGER: 2 UCD-SNMP-MIB::prIndex.3 = INTEGER: 3 UCD-SNMP-MIB::prIndex.4 = INTEGER: 4 UCD-SNMP-MIB::prIndex.5 = INTEGER: 5 UCD-SNMP-MIB::prIndex.6 = INTEGER: 6 UCD-SNMP-MIB::prNames.1 = STRING: syslogd UCD-SNMP-MIB::prNames.2 = STRING: httpd UCD-SNMP-MIB::prNames.3 = STRING: ntpd UCD-SNMP-MIB::prNames.4 = STRING: smartd UCD-SNMP-MIB::prNames.5 = STRING: clamd UCD-SNMP-MIB::prNames.6 = STRING: freshclam UCD-SNMP-MIB::prMin.1 = INTEGER: 1 UCD-SNMP-MIB::prMin.2 = INTEGER: 0 UCD-SNMP-MIB::prMin.3 = INTEGER: 1 UCD-SNMP-MIB::prMin.4 = INTEGER: 0 UCD-SNMP-MIB::prMin.5 = INTEGER: 0 UCD-SNMP-MIB::prMin.6 = INTEGER: 0 UCD-SNMP-MIB::prMax.1 = INTEGER: 1 UCD-SNMP-MIB::prMax.2 = INTEGER: 0 UCD-SNMP-MIB::prMax.3 = INTEGER: 1 UCD-SNMP-MIB::prMax.4 = INTEGER: 0 UCD-SNMP-MIB::prMax.5 = INTEGER: 0 UCD-SNMP-MIB::prMax.6 = INTEGER: 0 UCD-SNMP-MIB::prCount.1 = INTEGER: 1 UCD-SNMP-MIB::prCount.2 = INTEGER: 345 UCD-SNMP-MIB::prCount.3 = INTEGER: 1 UCD-SNMP-MIB::prCount.4 = INTEGER: 1 UCD-SNMP-MIB::prCount.5 = INTEGER: 0 UCD-SNMP-MIB::prCount.6 = INTEGER: 0 UCD-SNMP-MIB::prErrorFlag.1 = INTEGER: 0 UCD-SNMP-MIB::prErrorFlag.2 = INTEGER: 0 UCD-SNMP-MIB::prErrorFlag.3 = INTEGER: 0 UCD-SNMP-MIB::prErrorFlag.4 = INTEGER: 0 UCD-SNMP-MIB::prErrorFlag.5 = INTEGER: 1 UCD-SNMP-MIB::prErrorFlag.6 = INTEGER: 1 UCD-SNMP-MIB::prErrMessage.1 = STRING: UCD-SNMP-MIB::prErrMessage.2 = STRING: UCD-SNMP-MIB::prErrMessage.3 = STRING: UCD-SNMP-MIB::prErrMessage.4 = STRING: UCD-SNMP-MIB::prErrMessage.5 = STRING: No clamd process running. UCD-SNMP-MIB::prErrMessage.6 = STRING: No freshclam process running. UCD-SNMP-MIB::prErrFix.1 = INTEGER: 0 UCD-SNMP-MIB::prErrFix.2 = INTEGER: 0 UCD-SNMP-MIB::prErrFix.3 = INTEGER: 0 UCD-SNMP-MIB::prErrFix.4 = INTEGER: 0 UCD-SNMP-MIB::prErrFix.5 = INTEGER: 0 UCD-SNMP-MIB::prErrFix.6 = INTEGER: 0 UCD-SNMP-MIB::prErrFixCmd.1 = STRING: UCD-SNMP-MIB::prErrFixCmd.2 = STRING: UCD-SNMP-MIB::prErrFixCmd.3 = STRING: UCD-SNMP-MIB::prErrFixCmd.4 = STRING: UCD-SNMP-MIB::prErrFixCmd.5 = STRING: UCD-SNMP-MIB::prErrFixCmd.6 = STRING: bourbon# ps -U vscan | grep clam 84154 ?? Is 0:00.18 /usr/local/bin/freshclam --daemon -p /var/run/clamav/freshclam.pid 84265 ?? Is 0:04.61 /usr/local/sbin/clamd bourbon# ps -acx | grep httpd | wc 744 3720 23808 (744 > 345) ;-)
Responsible Changed From-To: freebsd-ports-bugs->kuriyama Over to maintainer
kuriyama 2007-03-25 12:35:46 UTC FreeBSD ports repository Modified files: net-mgmt/net-snmp Makefile net-mgmt/net-snmp/files snmpd.sh.in Added files: net-mgmt/net-snmp/files patch-net-snmp-config.h.in Log: - Remove "sig_stop=KILL" in snmpd.sh.in. This was introduced when PR ports/63759 was committed (3 years ago). Try to use normal TERM signal for graceful termination [1]. - Increase /bin/ps cache size from 16KB to 120KB. This should fix process counter (ex prCount.1) on the server which has large number of processes [2]. PR: ports/103811 [1], ports/110498 [2] Reported by: Yuri Arabadji <yuri@deepunix.net> [1], Mike Andrews <mandrews@bit0.com> [2] Revision Changes Path 1.141 +1 -1 ports/net-mgmt/net-snmp/Makefile 1.1 +11 -0 ports/net-mgmt/net-snmp/files/patch-net-snmp-config.h.in (new) 1.6 +1 -2 ports/net-mgmt/net-snmp/files/snmpd.sh.in _______________________________________________ cvs-all@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/cvs-all To unsubscribe, send any mail to "cvs-all-unsubscribe@freebsd.org"
State Changed From-To: open->closed Increased to 120KB as your patch. Thanks!