[CentOS] Dell PERC H800 commandline RAID monitoring tools

Mon Mar 7 22:28:54 UTC 2011
Dr. Ed Morbius <dredmorbius at gmail.com>

on 12:43 Mon 07 Mar, Dr. Ed Morbius (dredmorbius at gmail.com) wrote:
> We're looking for tools to be used in monitoring the PERC H800 arrays on
> a set of database servers running CentOS 5.5.

Pardoning the self-reply, but one issue we've ahd is reconciling the
omcontrol log report with the Dell Server Manager syslog messages.

omcontrol reported a predictive drive failure, but we (and three Dell
storage/support techs) had trouble identifying which actual device was
being reporrted as bad.

From 'omconfig storage controller action=exportlog controller=0' output:

    03/04/11 21:42:42: EVT#02959-03/04/11 21:42:42:  96=Predictive failure: PD 00(e0x08/s2)
    03/05/11 14:28:41: EVT#02961-03/05/11 14:28:41: 112=Removed: PD 00(e0x08/s2)

In /var/log/messages (timestamp/hostname trimmed):

    Server Administrator: Storage Service EventID: 2243  The Patrol Read has stopped.:  Controller 0 (PERC H800 Adapter) 
    Server Administrator: Storage Service EventID: 2049  Physical disk removed:  Physical Disk 0:0:2 Controller 0, Connector 0

The Server Administrator reports of a slot 2 failure correspond to the
drive which was physically replaced.

The OMSA omconfig report is throwing us a bunch of crud about some
device, but Dell variously identified it as slot 0 and slot 9.  We're
now getting from them that "/s2" identifies slot 2.

Dell said point blank "you're not going to have any luck with that" as
far as documentation of the OMSA log report format and parsing being
documented.  Does anyone have a clue as to WTF it's actaully trying to
say, or what this tool is based off of (I'm suspecting mega-cli on a
general hunch but not much stronger).

"Enterprise support" .... indeed.

Dr. Ed Morbius, Chief Scientist /            |
  Robot Wrangler / Staff Psychologist        | When you seek unlimited power
Krell Power Systems Unlimited                |                  Go to Krell!