Re: [CentOS] Hardware raid health?

25 Aug 2014


      On 25/08/14 04:03 PM, Les Mikesell wrote:
...
I just had an IBM in a remote location with a hardware raid1 have both
drives go bad.  With local machines I probably would have caught it
from the drive light before the 2nd one died...  What is the state of
the art in linux software monitoring for this?   Long ago when that
box was set up I think the best I could have done was a Java GUI tool
that IBM had for their servers - and that seemed like overkill for a
simple monitor.    Is there anything more lightweight that knows about
the underlying drives in a hardware raid set on IBM's - and also
recent HP servers?
IBM used LSI-based controllers, I believe.
For our monitoring, we wrote a little script that calls MegaCli64 every 
30 seconds and checks for changes. If anything of note changes (drive 
health, BBU/FBU issues, temperature issues, etc) it sends us an email. 
It would be fairly easy to do the same for hpacucli, I would imagine.
Unfortunately, though it's all open source, it's part of a package that 
monitors a pile of things (including IPMI sensors, APC UPSes, Red Hat HA 
stack, etc), so it wouldn't be drop-in-and-go. That said, you could 
probably fairly easily strip it down if you wanted to use it, too.
If you're curious, I show how to set it up here. If you're comfortable 
with perl, it'll be pretty easy to adapt, I suspect.
https://alteeve.ca/w/AN!Cluster_Tutorial_2#Setting_Up_Alerts
Cheers
-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] Hardware raid health?