on 12:43 Mon 07 Mar, Dr. Ed Morbius (dredmorbius@gmail.com) wrote:
We're looking for tools to be used in monitoring the PERC H800 arrays on a set of database servers running CentOS 5.5.
Pardoning the self-reply, but one issue we've ahd is reconciling the omcontrol log report with the Dell Server Manager syslog messages.
omcontrol reported a predictive drive failure, but we (and three Dell storage/support techs) had trouble identifying which actual device was being reporrted as bad.
From 'omconfig storage controller action=exportlog controller=0' output:
03/04/11 21:42:42: EVT#02959-03/04/11 21:42:42: 96=Predictive failure: PD 00(e0x08/s2) 03/05/11 14:28:41: EVT#02961-03/05/11 14:28:41: 112=Removed: PD 00(e0x08/s2)
In /var/log/messages (timestamp/hostname trimmed):
Server Administrator: Storage Service EventID: 2243 The Patrol Read has stopped.: Controller 0 (PERC H800 Adapter) Server Administrator: Storage Service EventID: 2049 Physical disk removed: Physical Disk 0:0:2 Controller 0, Connector 0
The Server Administrator reports of a slot 2 failure correspond to the drive which was physically replaced.
The OMSA omconfig report is throwing us a bunch of crud about some device, but Dell variously identified it as slot 0 and slot 9. We're now getting from them that "/s2" identifies slot 2.
Dell said point blank "you're not going to have any luck with that" as far as documentation of the OMSA log report format and parsing being documented. Does anyone have a clue as to WTF it's actaully trying to say, or what this tool is based off of (I'm suspecting mega-cli on a general hunch but not much stronger).
"Enterprise support" .... indeed.