We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5.
We've installed most of the OMSA (Dell monitoring) suite.
Our current alerting is happening through SNMP, though it's a bit hit or miss (we apparently missed a couple of earlier predictive failure alerts on one drive).
OMSA conflicts with mega-cli, though we may find that the latter is the more useful package. Both are pretty byzantine, the Dell stuff simply doesn't have docs (in particular: docs on how to interpret the omconfig log output).
Ideally we'd like something which could be run as a Nagios plugin or cron job providing information on RAID status and/or possible disk errors. Probably both, actually.
Thanks in advance.
2011/3/7 Dr. Ed Morbius dredmorbius@gmail.com:
We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5.
We've installed most of the OMSA (Dell monitoring) suite.
Our current alerting is happening through SNMP, though it's a bit hit or miss (we apparently missed a couple of earlier predictive failure alerts on one drive).
OMSA conflicts with mega-cli, though we may find that the latter is the more useful package. Both are pretty byzantine, the Dell stuff simply doesn't have docs (in particular: docs on how to interpret the omconfig log output).
Ideally we'd like something which could be run as a Nagios plugin or cron job providing information on RAID status and/or possible disk errors. Probably both, actually.
if your system supports omreport (comes with omsa) then this is good solution: http://folk.uio.no/trondham/software/check_openmanage.html
-- Eero
on 22:57 Mon 07 Mar, Eero Volotinen (eero.volotinen@iki.fi) wrote:
2011/3/7 Dr. Ed Morbius dredmorbius@gmail.com:
We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5.
We've installed most of the OMSA (Dell monitoring) suite.
Our current alerting is happening through SNMP, though it's a bit hit or miss (we apparently missed a couple of earlier predictive failure alerts on one drive).
OMSA conflicts with mega-cli, though we may find that the latter is the more useful package. Both are pretty byzantine, the Dell stuff simply doesn't have docs (in particular: docs on how to interpret the omconfig log output).
Ideally we'd like something which could be run as a Nagios plugin or cron job providing information on RAID status and/or possible disk errors. Probably both, actually.
if your system supports omreport (comes with omsa) then this is good solution: http://folk.uio.no/trondham/software/check_openmanage.html
So ... this slots on top of OMSA to provide reporting?
2011/3/7 Dr. Ed Morbius dredmorbius@gmail.com:
on 22:57 Mon 07 Mar, Eero Volotinen (eero.volotinen@iki.fi) wrote:
2011/3/7 Dr. Ed Morbius dredmorbius@gmail.com:
We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5.
We've installed most of the OMSA (Dell monitoring) suite.
Our current alerting is happening through SNMP, though it's a bit hit or miss (we apparently missed a couple of earlier predictive failure alerts on one drive).
OMSA conflicts with mega-cli, though we may find that the latter is the more useful package. Both are pretty byzantine, the Dell stuff simply doesn't have docs (in particular: docs on how to interpret the omconfig log output).
Ideally we'd like something which could be run as a Nagios plugin or cron job providing information on RAID status and/or possible disk errors. Probably both, actually.
if your system supports omreport (comes with omsa) then this is good solution: http://folk.uio.no/trondham/software/check_openmanage.html
So ... this slots on top of OMSA to provide reporting?
this plugin parsers omreport output and uses it for nagios output.
omsa webserver is not required, but working omreport cli is. .. works great on my servers.
-- Eero
on 23:15 Mon 07 Mar, Eero Volotinen (eero.volotinen@iki.fi) wrote:
2011/3/7 Dr. Ed Morbius dredmorbius@gmail.com:
on 22:57 Mon 07 Mar, Eero Volotinen (eero.volotinen@iki.fi) wrote:
2011/3/7 Dr. Ed Morbius dredmorbius@gmail.com:
We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5.
We've installed most of the OMSA (Dell monitoring) suite.
Our current alerting is happening through SNMP, though it's a bit hit or miss (we apparently missed a couple of earlier predictive failure alerts on one drive).
OMSA conflicts with mega-cli, though we may find that the latter is the more useful package. Both are pretty byzantine, the Dell stuff simply doesn't have docs (in particular: docs on how to interpret the omconfig log output).
Ideally we'd like something which could be run as a Nagios plugin or cron job providing information on RAID status and/or possible disk errors. Probably both, actually.
if your system supports omreport (comes with omsa) then this is good solution: http://folk.uio.no/trondham/software/check_openmanage.html
So ... this slots on top of OMSA to provide reporting?
this plugin parsers omreport output and uses it for nagios output.
Is it running/invoking omreport or relying on periodic runs? I'll dig through the docs but if you know this off-hand it'd be helpful.
omsa webserver is not required, but working omreport cli is. .. works great on my servers.
Good to know, much appreciated.
2011/3/8 Dr. Ed Morbius dredmorbius@gmail.com:
on 23:15 Mon 07 Mar, Eero Volotinen (eero.volotinen@iki.fi) wrote:
2011/3/7 Dr. Ed Morbius dredmorbius@gmail.com:
on 22:57 Mon 07 Mar, Eero Volotinen (eero.volotinen@iki.fi) wrote:
2011/3/7 Dr. Ed Morbius dredmorbius@gmail.com:
We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5.
We've installed most of the OMSA (Dell monitoring) suite.
Our current alerting is happening through SNMP, though it's a bit hit or miss (we apparently missed a couple of earlier predictive failure alerts on one drive).
OMSA conflicts with mega-cli, though we may find that the latter is the more useful package. Both are pretty byzantine, the Dell stuff simply doesn't have docs (in particular: docs on how to interpret the omconfig log output).
Ideally we'd like something which could be run as a Nagios plugin or cron job providing information on RAID status and/or possible disk errors. Probably both, actually.
if your system supports omreport (comes with omsa) then this is good solution: http://folk.uio.no/trondham/software/check_openmanage.html
So ... this slots on top of OMSA to provide reporting?
this plugin parsers omreport output and uses it for nagios output.
Is it running/invoking omreport or relying on periodic runs? I'll dig through the docs but if you know this off-hand it'd be helpful.
It runs omreport each time nagios polls it via nrpe or snmp.
-- Eero
-------- Original Message -------- Subject: [CentOS] Dell PERC H800 commandline RAID monitoring tools From: Dr. Ed Morbius dredmorbius@gmail.com To: CentOS User list centos@centos.org Date: Monday, March 07, 2011 2:43:03 PM
We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5.
If you purchased the server with an add-in DRAC, the DRAC can provide email alerts if an array becomes degraded (or just about any other hardware fault). This isn't necessarily a replacement for your current monitoring, but it can be used to supplement or compliment it.
--Blake
on 16:04 Mon 07 Mar, Blake Hudson (blake@ispn.net) wrote:
-------- Original Message -------- Subject: [CentOS] Dell PERC H800 commandline RAID monitoring tools From: Dr. Ed Morbius dredmorbius@gmail.com To: CentOS User list centos@centos.org Date: Monday, March 07, 2011 2:43:03 PM
We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5.
If you purchased the server with an add-in DRAC, the DRAC can provide email alerts if an array becomes degraded (or just about any other hardware fault). This isn't necessarily a replacement for your current monitoring, but it can be used to supplement or compliment it.
The iDRAC /doesn't/ report on RAID / storage configuration or status.
iDRAC 6, Dell r610, onboard PERC H700, offboard PERC H800 (MD1200 array). BIOS version 2.1.15, Firmware 1.54 (Build 15).
We get batteries, fans, intrusion, power, removable flash media, temps, and volts, but not storage.o
The iDRAC is pretty good compared with some past Dell offerings. Ability to boot virtual media in particular is very slick (I can specify local removable storage or a drive image and mount it for booting / diagnostics remotely).
But no RAID / storage management or monitoring.
on 12:43 Mon 07 Mar, Dr. Ed Morbius (dredmorbius@gmail.com) wrote:
We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5.
Pardoning the self-reply, but one issue we've ahd is reconciling the omcontrol log report with the Dell Server Manager syslog messages.
omcontrol reported a predictive drive failure, but we (and three Dell storage/support techs) had trouble identifying which actual device was being reporrted as bad.
From 'omconfig storage controller action=exportlog controller=0' output:
03/04/11 21:42:42: EVT#02959-03/04/11 21:42:42: 96=Predictive failure: PD 00(e0x08/s2) 03/05/11 14:28:41: EVT#02961-03/05/11 14:28:41: 112=Removed: PD 00(e0x08/s2)
In /var/log/messages (timestamp/hostname trimmed):
Server Administrator: Storage Service EventID: 2243 The Patrol Read has stopped.: Controller 0 (PERC H800 Adapter) Server Administrator: Storage Service EventID: 2049 Physical disk removed: Physical Disk 0:0:2 Controller 0, Connector 0
The Server Administrator reports of a slot 2 failure correspond to the drive which was physically replaced.
The OMSA omconfig report is throwing us a bunch of crud about some device, but Dell variously identified it as slot 0 and slot 9. We're now getting from them that "/s2" identifies slot 2.
Dell said point blank "you're not going to have any luck with that" as far as documentation of the OMSA log report format and parsing being documented. Does anyone have a clue as to WTF it's actaully trying to say, or what this tool is based off of (I'm suspecting mega-cli on a general hunch but not much stronger).
"Enterprise support" .... indeed.
On Mar 7, 2011, at 3:43 PM, "Dr. Ed Morbius" dredmorbius@gmail.com wrote:
We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5.
We've installed most of the OMSA (Dell monitoring) suite.
Our current alerting is happening through SNMP, though it's a bit hit or miss (we apparently missed a couple of earlier predictive failure alerts on one drive).
OMSA conflicts with mega-cli, though we may find that the latter is the more useful package. Both are pretty byzantine, the Dell stuff simply doesn't have docs (in particular: docs on how to interpret the omconfig log output).
Ideally we'd like something which could be run as a Nagios plugin or cron job providing information on RAID status and/or possible disk errors. Probably both, actually.
I can't speak about nagios, but I have my OMSA setup to send traps, but for critical errors to also send emails and it works well for us.
If you link the shared lib (forget the paths) and install megacli with --nodeps you can have both installed.
-Ross
On Mon, Mar 07, 2011 at 12:43:03PM -0800, Dr. Ed Morbius wrote:
OMSA conflicts with mega-cli, though we may find that the latter is the more useful package. Both are pretty byzantine, the Dell stuff simply doesn't have docs (in particular: docs on how to interpret the omconfig log output).
We're using megacli wrapped by perl to provide information about Perc events. It works quite well as far.
Dominik Zyla wrote on Thu, 10 Mar 2011 09:10:37 +0100:
We're using megacli wrapped by perl to provide information about Perc events. It works quite well as far.
Do you have a megacli rpm that works with the CentOS-provided drivers, which is MPT 3.something? I googled about this some time ago and there's an rpm mentioned here and there that contains only the megacli utility, but it's not downloadable anymore from anywhere. I got hold of a package that cotnains the 4 version, but that doesn't work with the CentOS drivers. LSI themselves provide only the complete MegaRAID driver/package for download and it's not clear if the singe megacli utility is included or if installing it may overwrite the built-in driver.
Kai
On Thu, Mar 10, 2011 at 06:47:09PM +0100, Kai Schaetzl wrote:
Dominik Zyla wrote on Thu, 10 Mar 2011 09:10:37 +0100:
We're using megacli wrapped by perl to provide information about Perc events. It works quite well as far.
Do you have a megacli rpm that works with the CentOS-provided drivers, which is MPT 3.something? I googled about this some time ago and there's an rpm mentioned here and there that contains only the megacli utility, but it's not downloadable anymore from anywhere. I got hold of a package that cotnains the 4 version, but that doesn't work with the CentOS drivers. LSI themselves provide only the complete MegaRAID driver/package for download and it's not clear if the singe megacli utility is included or if installing it may overwrite the built-in driver.
It's some single binary version, compiled statically.