Created attachment 201703 [details] patch check_smartmon I've noticed that smartctl oputput has changed. I am not sure why. The attached patch gets the desired results from check_smartmon. It also adds more debugging output.
We previously patched the code to look for: SMART STATUS RETURN: incomplete response, ATA output registers missing That line is now: SMART Status not supported: Incomplete response, ATA output registers missing It also seems like the temperature has moved from part 190 to 194.
(In reply to Dan Langille from comment #1) I said moved from 194 to 190, that is not accurate. Sometimes the temperature is on 194, sometimes on 190: 190 Airflow_Temperature_Cel 0x0032 066 052 000 Old_age Always - 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 27 (Min/Max 14/52) NOTE: these differences appeared when I changed HBA. The previous HBA: mps0: <Avago Technologies (LSI) SAS2008> port 0xd000-0xd0ff mem 0xdf600000-0xdf603fff,0xdf580000-0xdf5bffff irq 17 at device 0.0 on pci2 mps0: Firmware: 15.00.00.00, Driver: 21.02.00.00-fbsd mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> The new HBA: mpr0: <Avago Technologies (LSI) SAS3008> port 0xe000-0xe0ff mem 0xfbd00000-0xfbd0ffff irq 64 at device 0.0 numa-domain 1 on pci12 mpr0: Firmware: 05.00.00.00, Driver: 18.03.00.00-fbsd mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc,FastPath,RDPQArray> The system is running FreeBSD 12.0-RELEASE-p2
(In reply to Dan Langille from comment #1) I think I am confusing my local patches with previously submitted patches...
(In reply to Dan Langille from comment #3) see https://dan.langille.org/2014/03/02/nagios-smartmon-stops-working-after-python-upgrade/ for the local patches I was running with, and which are included in the submitted patch.
I have to check your patch. In my system I have such results: /usr/local/libexec/nagios/check_smartmon -d /dev/ada3 OK: device is functional and stable (temperature: 26) [~]# /usr/local/libexec/nagios/check_smartmon -d ada3 UNKNOWN: no such device found [~]# /usr/local/libexec/nagios/check_smartmon -t ata UNKNOWN: no such device found So for single disk it is working correct, for type of device not. I've checked also that "190 Airflow_Temperature_Cel" is connected with smartarrays (in my example they are: dell and Supercmicro servers). So I will check it and give you replay.
For background: This issue arose when I upgraded a system. Everything but the storage was replaced. It might be easier to say I moved the storage to a new system. After that happened, the Nagios alerts started. The change in the SAS controller may be related. These are SATA drives.
So I've made some tests. Also I refresh my memory how this script is working :-))) So script REQUIRES -d /dev/da0 or similiar. -t flag is "supporting": to show smartctl what type of disk it is. For example: /usr/local/libexec/nagios/check_smartmon -t scsi -d /dev/da0 Traceback (most recent call last): File "/usr/local/libexec/nagios/check_smartmon", line 304, in <module> (healthStatus, temperature) = parseOutput(healthStatusOutput, temperatureOutput, devtype) File "/usr/local/libexec/nagios/check_smartmon", line 216, in parseOutput vprint(3, "Health status: %s" % healthStatus) UnboundLocalError: local variable 'healthStatus' referenced before assignment /usr/local/libexec/nagios/check_smartmon -t ata -d /dev/da0 OK: device is functional and stable (temperature: 37) (there are ATA disks connected to mps0: <Avago Technologies (LSI) SAS2308> raid controler). I've tried your patch, but on latest version I receive message that this patch was applied... It was not me :-))) So for me it is not needed to add anything to current version of this nagios plugin.
[dan@slocum:~] $ sudo /usr/local/libexec/nagios/check_smartmon -t scsi -d /dev/da9 Traceback (most recent call last): File "/usr/local/libexec/nagios/check_smartmon", line 309, in <module> (healthStatus, temperature) = parseOutput(healthStatusOutput, temperatureOutput, devtype) File "/usr/local/libexec/nagios/check_smartmon", line 219, in parseOutput vprint(3, "Health status: %s" % healthStatus) UnboundLocalError: local variable 'healthStatus' referenced before assignment [dan@slocum:~] $ pkg info -x smartmon nagios-check_smartmon-20100318_2 smartmontools-7.0 [dan@slocum:~] $
I have redone the patch and will upload next. Original code: $ sudo /usr/local/libexec/nagios/check_smartmon -d /dev/da7 -v 10 Get device name Device: /dev/da7 Check device Check if /dev/da7 does exist and can be read Check if /usr/local/sbin/smartctl does exist and can be read Path to smartctl: /usr/local/sbin/smartctl Get device type Device type: ata Call smartctl Get device health status: /usr/local/sbin/smartctl -H /dev/da7 Get device temperature: /usr/local/sbin/smartctl -A /dev/da7 Parse smartctl output parseOutput: Device type is ata Health status: missing Temperature: 38 Generate return information CRITICAL: device does not pass health status Patched code without -t NOTE the 'parseOutput: line is' debug lines. [dan@slocum:~/tmp] $ sudo ~/tmp/check_smartmon -d /dev/da7 -v 10 Get device name Device: /dev/da7 Check device Check if /dev/da7 does exist and can be read Check if /usr/local/sbin/smartctl does exist and can be read Path to smartctl: /usr/local/sbin/smartctl Get device type command line supplied device type is: ata Device type: ata Call smartctl Get device health status: /usr/local/sbin/smartctl -H /dev/da7 Get device temperature: /usr/local/sbin/smartctl -A /dev/da7 Parse smartctl output parseOutput: Device type is ata parseOutput: line is: 'smartctl 7.0 2018-12-30 r4883 [FreeBSD 12.0-RELEASE-p2 amd64] (local build)' parseOutput: line is: 'Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org' parseOutput: line is: '' parseOutput: line is: '=== START OF READ SMART DATA SECTION ===' parseOutput: line is: 'SMART Status not supported: Incomplete response, ATA output registers missing' parseOutput: line is: 'SMART overall-health self-assessment test result: PASSED' Health status: PASSED Temperature: 38 Generate return information OK: device is functional and stable (temperature: 38) [patched code with -t [dan@slocum:~/tmp] $ sudo ~/tmp/check_smartmon -d /dev/da7 -v 10 -t scsi Get device name Device: /dev/da7 Check device Check if /dev/da7 does exist and can be read Check if /usr/local/sbin/smartctl does exist and can be read Path to smartctl: /usr/local/sbin/smartctl Get device type command line supplied device type is: scsi Device type: scsi Call smartctl Get device health status: /usr/local/sbin/smartctl -H /dev/da7 Get device temperature: /usr/local/sbin/smartctl -A /dev/da7 Parse smartctl output parseOutput: Device type is scsi Traceback (most recent call last): File "/usr/home/dan/tmp/check_smartmon", line 309, in <module> (healthStatus, temperature) = parseOutput(healthStatusOutput, temperatureOutput, devtype) File "/usr/home/dan/tmp/check_smartmon", line 219, in parseOutput vprint(3, "Health status: %s" % healthStatus) UnboundLocalError: local variable 'healthStatus' referenced before assignment [dan@slocum:~/tmp] $
Created attachment 201732 [details] correctly patches $ patch < ~/patch.smartmon Hmm... Looks like a unified diff to me... The text leading up to this was: -------------------------- |--- check_smartmon.orig 2019-02-01 04:28:13.000000000 +0000 |+++ check_smartmon 2019-02-04 22:27:35.297345000 +0000 -------------------------- Patching file check_smartmon using Plan A... Hunk #1 succeeded at 161. Hunk #2 succeeded at 184. Hunk #3 succeeded at 228. Hunk #4 succeeded at 291. done
Comment on attachment 201732 [details] correctly patches Yes, this patch is correct now
Comment on attachment 201732 [details] correctly patches I forgot to approve?
Sorry, I thought you were going to commit. ;) I can proceed do if you like.
Yes, please. I have no rights to make commits. I wanted to add "+" to patch, but unfortunately it does not work. That's why I confirmed the whole bug. Thanks a lot for your help.
A commit references this bug: Author: dvl Date: Mon Feb 18 15:08:47 UTC 2019 New revision: 493266 URL: https://svnweb.freebsd.org/changeset/ports/493266 Log: Patch check_smartmon to cater for changed smartctl output We previously patched the code to look for: SMART STATUS RETURN: incomplete response, ATA output registers missing That line is now: SMART Status not supported: Incomplete response, ATA output registers missing It also seems like the temperature has moved from part 190 to 194. PR: 235475 Approved by: Krzysztof (maintainer) MFH: 2019Q1 Changes: head/net-mgmt/nagios-check_smartmon/Makefile head/net-mgmt/nagios-check_smartmon/files/patch-check_smartmon
Thank you.
A commit references this bug: Author: dvl Date: Tue Feb 19 13:35:48 UTC 2019 New revision: 493370 URL: https://svnweb.freebsd.org/changeset/ports/493370 Log: MFH: r493266 Patch check_smartmon to cater for changed smartctl output We previously patched the code to look for: SMART STATUS RETURN: incomplete response, ATA output registers missing That line is now: SMART Status not supported: Incomplete response, ATA output registers missing It also seems like the temperature has moved from part 190 to 194. PR: 235475 Approved by: Krzysztof (maintainer) Approved by: ports-secteam (miwi) Changes: _U branches/2019Q1/ branches/2019Q1/net-mgmt/nagios-check_smartmon/Makefile branches/2019Q1/net-mgmt/nagios-check_smartmon/files/patch-check_smartmon