[CentOS] Hardware raid health?
Digimer
lists at alteeve.ca
Mon Aug 25 20:08:32 UTC 2014
On 25/08/14 04:03 PM, Les Mikesell wrote:
> I just had an IBM in a remote location with a hardware raid1 have both
> drives go bad. With local machines I probably would have caught it
> from the drive light before the 2nd one died... What is the state of
> the art in linux software monitoring for this? Long ago when that
> box was set up I think the best I could have done was a Java GUI tool
> that IBM had for their servers - and that seemed like overkill for a
> simple monitor. Is there anything more lightweight that knows about
> the underlying drives in a hardware raid set on IBM's - and also
> recent HP servers?
IBM used LSI-based controllers, I believe.
For our monitoring, we wrote a little script that calls MegaCli64 every
30 seconds and checks for changes. If anything of note changes (drive
health, BBU/FBU issues, temperature issues, etc) it sends us an email.
It would be fairly easy to do the same for hpacucli, I would imagine.
Unfortunately, though it's all open source, it's part of a package that
monitors a pile of things (including IPMI sensors, APC UPSes, Red Hat HA
stack, etc), so it wouldn't be drop-in-and-go. That said, you could
probably fairly easily strip it down if you wanted to use it, too.
If you're curious, I show how to set it up here. If you're comfortable
with perl, it'll be pretty easy to adapt, I suspect.
https://alteeve.ca/w/AN!Cluster_Tutorial_2#Setting_Up_Alerts
Cheers
--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
More information about the CentOS
mailing list