Summary: | sysutils/smartmontools: Causing controller resets | ||
---|---|---|---|
Product: | Ports & Packages | Reporter: | Danny McGrath <danmcgrath.ca> |
Component: | Individual Port(s) | Assignee: | freebsd-ports-bugs (Nobody) <ports-bugs> |
Status: | Closed Works As Intended | ||
Severity: | Affects Only Me | CC: | fernape, samm |
Priority: | --- | Flags: | bugzilla:
maintainer-feedback?
(samm) |
Version: | Latest | ||
Hardware: | amd64 | ||
OS: | Any |
Description
Danny McGrath
2019-05-02 03:38:55 UTC
It seems that the system has been much better since removing the line: daily_status_smart_devices="...." from the /etc/periodic.conf. I suspect that past issues with these systems (Zabbix alerts during maintenance times) may have actually been coming from IO stall outs during the periodic runs that were invoking the smart status. Interestingly enough, running `smartctl -i /dev/da#` alone doesn't cause the problems, but it was only a recent update to the smartmontools that started to populate the daily logs, which in turn caused the errors, that started to reveal a potentially old bug. For some extra background, these R410's have HW Raid capable HBA cards in them, but were configured as non raid, which appeared to pass it through fine (at least enough to give me output in smartctl. My guess is smartctl doesn't like something about these particular devices. The are technically in IR mode, not IT, yet allow SMART to be queried. Maybe there is some issue with this? I don't know, you guys are the experts, so I can only give some historical insights and technical info. Hope it helps! (In reply to Dan McGrath from comment #1) Hi Dan, Can I close this PR then? (In reply to Fernando Apesteguía from comment #2) Hi, Honestly, it's up to you. I was able to just disable the periodic for the SMART stuff on those hosts to avoid the issue, but there is clearly some non ideal stuff that needs to maybe be evaluated. That said, I think that seems like more of an upstream problem, than a FreeBSD one, so perhaps that would be best. Your call. I would recommend to close this PR. Problems with dell controller are 100% caused by buggy firmware in mpt device. Some of the versions are affected and most of them are not implementing SCSH->SATA tunneling as they should. We do have few workarounds in the code, however, it is possible that it is not sufficient. My recommendation is to use latest firmware version to see if problem persist. Anyway, there is nothing we can do in smartmontools itself. OK, closing PR as requested by maintainer. |