Bug 235475

Summary: [patch] net-mgmt/nagios-check_smartmon for changed smartctl output
Product: Ports & Packages Reporter: Dan Langille <dvl>
Component: Individual Port(s)Assignee: Dan Langille <dvl>
Status: Closed FIXED    
Severity: Affects Only Me CC: ports
Priority: --- Flags: ports: maintainer-feedback+
Version: Latest   
Hardware: Any   
OS: Any   
See Also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=201767
Attachments:
Description Flags
patch check_smartmon
none
correctly patches none

Description Dan Langille freebsd_committer 2019-02-04 03:00:05 UTC
Created attachment 201703 [details]
patch check_smartmon

I've noticed that smartctl oputput has changed. I am not sure why. The attached patch gets the desired results from check_smartmon. It also adds more debugging output.
Comment 1 Dan Langille freebsd_committer 2019-02-04 03:11:10 UTC
We previously patched the code to look for:

SMART STATUS RETURN: incomplete response, ATA output registers missing

That line is now:

SMART Status not supported: Incomplete response, ATA output registers missing

It also seems like the temperature has moved from part 190 to 194.
Comment 2 Dan Langille freebsd_committer 2019-02-04 03:19:45 UTC
(In reply to Dan Langille from comment #1)

I said moved from 194 to 190, that is not accurate.

Sometimes the temperature is on 194, sometimes on 190:

190 Airflow_Temperature_Cel 0x0032   066   052   000    Old_age   Always       -       194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       27 (Min/Max 14/52)


NOTE: these differences appeared when I changed HBA.

The previous HBA:

mps0: <Avago Technologies (LSI) SAS2008> port 0xd000-0xd0ff mem 0xdf600000-0xdf603fff,0xdf580000-0xdf5bffff irq 17 at device 0.0 on pci2
mps0: Firmware: 15.00.00.00, Driver: 21.02.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

The new HBA:

mpr0: <Avago Technologies (LSI) SAS3008> port 0xe000-0xe0ff mem 0xfbd00000-0xfbd0ffff irq 64 at device 0.0 numa-domain 1 on pci12
mpr0: Firmware: 05.00.00.00, Driver: 18.03.00.00-fbsd
mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc,FastPath,RDPQArray>

The system is running FreeBSD 12.0-RELEASE-p2
Comment 3 Dan Langille freebsd_committer 2019-02-04 03:35:20 UTC
(In reply to Dan Langille from comment #1)
I think I am confusing my local patches with previously submitted patches...
Comment 4 Dan Langille freebsd_committer 2019-02-04 03:37:15 UTC
(In reply to Dan Langille from comment #3)
see https://dan.langille.org/2014/03/02/nagios-smartmon-stops-working-after-python-upgrade/ for the local patches I was running with, and which are included in the submitted patch.
Comment 5 Krzysztof 2019-02-04 06:44:25 UTC
I have to check your patch. In my system I have such results:

/usr/local/libexec/nagios/check_smartmon -d /dev/ada3
OK: device is functional and stable (temperature: 26)
[~]# /usr/local/libexec/nagios/check_smartmon -d ada3
UNKNOWN: no such device found

[~]# /usr/local/libexec/nagios/check_smartmon -t ata
UNKNOWN: no such device found

So for single disk it is working correct, for type of device not.

I've checked also that "190 Airflow_Temperature_Cel" is connected with smartarrays (in my example they are: dell and Supercmicro servers). 

So I will check it and give you replay.
Comment 6 Dan Langille freebsd_committer 2019-02-04 18:24:12 UTC
For background: This issue arose when I upgraded a system. Everything but the storage was replaced. It might be easier to say I moved the storage to a new system.

After that happened, the Nagios alerts started.

The change in the SAS controller may be related.  These are SATA drives.
Comment 7 Krzysztof 2019-02-04 21:41:56 UTC
So I've made some tests. Also I refresh my memory how this script is working :-)))

So script REQUIRES -d /dev/da0 or similiar. -t flag is "supporting": to show smartctl what type of disk it is. For example:

 /usr/local/libexec/nagios/check_smartmon -t scsi -d /dev/da0
Traceback (most recent call last):
  File "/usr/local/libexec/nagios/check_smartmon", line 304, in <module>
    (healthStatus, temperature) = parseOutput(healthStatusOutput, temperatureOutput, devtype)
  File "/usr/local/libexec/nagios/check_smartmon", line 216, in parseOutput
    vprint(3, "Health status: %s" % healthStatus)
UnboundLocalError: local variable 'healthStatus' referenced before assignment

/usr/local/libexec/nagios/check_smartmon -t ata -d /dev/da0
OK: device is functional and stable (temperature: 37)

(there are ATA disks connected to mps0: <Avago Technologies (LSI) SAS2308> raid controler).

I've tried your patch, but on latest version I receive message that this patch was applied... It was not me :-)))

So for me it is not needed to add anything to current version of this nagios plugin.
Comment 8 Dan Langille freebsd_committer 2019-02-04 22:16:39 UTC
[dan@slocum:~] $ sudo /usr/local/libexec/nagios/check_smartmon -t scsi -d /dev/da9
Traceback (most recent call last):
  File "/usr/local/libexec/nagios/check_smartmon", line 309, in <module>
    (healthStatus, temperature) = parseOutput(healthStatusOutput, temperatureOutput, devtype)
  File "/usr/local/libexec/nagios/check_smartmon", line 219, in parseOutput
    vprint(3, "Health status: %s" % healthStatus)
UnboundLocalError: local variable 'healthStatus' referenced before assignment
[dan@slocum:~] $ pkg info -x smartmon
nagios-check_smartmon-20100318_2
smartmontools-7.0
[dan@slocum:~] $
Comment 9 Dan Langille freebsd_committer 2019-02-04 22:32:46 UTC
I have redone the patch and will upload next.

Original code:

$ sudo /usr/local/libexec/nagios/check_smartmon -d /dev/da7 -v 10
Get device name
Device: /dev/da7
Check device
Check if /dev/da7 does exist and can be read
Check if /usr/local/sbin/smartctl does exist and can be read
Path to smartctl: /usr/local/sbin/smartctl
Get device type
Device type: ata
Call smartctl
Get device health status: /usr/local/sbin/smartctl -H /dev/da7
Get device temperature: /usr/local/sbin/smartctl -A /dev/da7
Parse smartctl output
parseOutput: Device type is ata
Health status: missing
Temperature: 38
Generate return information
CRITICAL: device does not pass health status

Patched code without -t 

NOTE the 'parseOutput: line is' debug lines.

[dan@slocum:~/tmp] $ sudo ~/tmp/check_smartmon -d /dev/da7 -v 10
Get device name
Device: /dev/da7
Check device
Check if /dev/da7 does exist and can be read
Check if /usr/local/sbin/smartctl does exist and can be read
Path to smartctl: /usr/local/sbin/smartctl
Get device type
command line supplied device type is: ata
Device type: ata
Call smartctl
Get device health status: /usr/local/sbin/smartctl -H /dev/da7
Get device temperature: /usr/local/sbin/smartctl -A /dev/da7
Parse smartctl output
parseOutput: Device type is ata
parseOutput: line is: 'smartctl 7.0 2018-12-30 r4883 [FreeBSD 12.0-RELEASE-p2 amd64] (local build)'
parseOutput: line is: 'Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org'
parseOutput: line is: ''
parseOutput: line is: '=== START OF READ SMART DATA SECTION ==='
parseOutput: line is: 'SMART Status not supported: Incomplete response, ATA output registers missing'
parseOutput: line is: 'SMART overall-health self-assessment test result: PASSED'
Health status: PASSED
Temperature: 38
Generate return information
OK: device is functional and stable (temperature: 38)


[patched code with -t

[dan@slocum:~/tmp] $ sudo ~/tmp/check_smartmon -d /dev/da7 -v 10 -t scsi
Get device name
Device: /dev/da7
Check device
Check if /dev/da7 does exist and can be read
Check if /usr/local/sbin/smartctl does exist and can be read
Path to smartctl: /usr/local/sbin/smartctl
Get device type
command line supplied device type is: scsi
Device type: scsi
Call smartctl
Get device health status: /usr/local/sbin/smartctl -H /dev/da7
Get device temperature: /usr/local/sbin/smartctl -A /dev/da7
Parse smartctl output
parseOutput: Device type is scsi
Traceback (most recent call last):
  File "/usr/home/dan/tmp/check_smartmon", line 309, in <module>
    (healthStatus, temperature) = parseOutput(healthStatusOutput, temperatureOutput, devtype)
  File "/usr/home/dan/tmp/check_smartmon", line 219, in parseOutput
    vprint(3, "Health status: %s" % healthStatus)
UnboundLocalError: local variable 'healthStatus' referenced before assignment
[dan@slocum:~/tmp] $
Comment 10 Dan Langille freebsd_committer 2019-02-04 22:35:02 UTC
Created attachment 201732 [details]
correctly patches

$ patch < ~/patch.smartmon 
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|--- check_smartmon.orig	2019-02-01 04:28:13.000000000 +0000
|+++ check_smartmon	2019-02-04 22:27:35.297345000 +0000
--------------------------
Patching file check_smartmon using Plan A...
Hunk #1 succeeded at 161.
Hunk #2 succeeded at 184.
Hunk #3 succeeded at 228.
Hunk #4 succeeded at 291.
done
Comment 11 Krzysztof 2019-02-04 22:40:41 UTC
Comment on attachment 201732 [details]
correctly patches

Yes, this patch is correct now
Comment 12 Krzysztof 2019-02-17 21:43:02 UTC
Comment on attachment 201732 [details]
correctly patches

I forgot to approve?
Comment 13 Dan Langille freebsd_committer 2019-02-17 22:22:47 UTC
Sorry, I thought you were going to commit.  ;)

I can proceed do if you like.
Comment 14 Krzysztof 2019-02-17 23:43:59 UTC
Yes, please.

I have no rights to make commits. I wanted to add "+" to patch, but unfortunately it does not work. That's why I confirmed the whole bug.

Thanks a lot for your help.
Comment 15 commit-hook freebsd_committer 2019-02-18 15:09:02 UTC
A commit references this bug:

Author: dvl
Date: Mon Feb 18 15:08:47 UTC 2019
New revision: 493266
URL: https://svnweb.freebsd.org/changeset/ports/493266

Log:
  Patch check_smartmon to cater for changed smartctl output

  We previously patched the code to look for:

  SMART STATUS RETURN: incomplete response, ATA output registers missing

  That line is now:

  SMART Status not supported: Incomplete response, ATA output registers missing

  It also seems like the temperature has moved from part 190 to 194.

  PR:		235475
  Approved by:	Krzysztof (maintainer)
  MFH:		2019Q1

Changes:
  head/net-mgmt/nagios-check_smartmon/Makefile
  head/net-mgmt/nagios-check_smartmon/files/patch-check_smartmon
Comment 16 Dan Langille freebsd_committer 2019-02-18 15:10:41 UTC
Thank you.
Comment 17 commit-hook freebsd_committer 2019-02-19 13:36:12 UTC
A commit references this bug:

Author: dvl
Date: Tue Feb 19 13:35:48 UTC 2019
New revision: 493370
URL: https://svnweb.freebsd.org/changeset/ports/493370

Log:
  MFH: r493266

  Patch check_smartmon to cater for changed smartctl output

  We previously patched the code to look for:

  SMART STATUS RETURN: incomplete response, ATA output registers missing

  That line is now:

  SMART Status not supported: Incomplete response, ATA output registers missing

  It also seems like the temperature has moved from part 190 to 194.

  PR:		235475
  Approved by:	Krzysztof (maintainer)

  Approved by:	ports-secteam (miwi)

Changes:
_U  branches/2019Q1/
  branches/2019Q1/net-mgmt/nagios-check_smartmon/Makefile
  branches/2019Q1/net-mgmt/nagios-check_smartmon/files/patch-check_smartmon