On 25/08/14 04:03 PM, Les Mikesell wrote: > I just had an IBM in a remote location with a hardware raid1 have both > drives go bad. With local machines I probably would have caught it > from the drive light before the 2nd one died... What is the state of > the art in linux software monitoring for this? Long ago when that > box was set up I think the best I could have done was a Java GUI tool > that IBM had for their servers - and that seemed like overkill for a > simple monitor. Is there anything more lightweight that knows about > the underlying drives in a hardware raid set on IBM's - and also > recent HP servers? IBM used LSI-based controllers, I believe. For our monitoring, we wrote a little script that calls MegaCli64 every 30 seconds and checks for changes. If anything of note changes (drive health, BBU/FBU issues, temperature issues, etc) it sends us an email. It would be fairly easy to do the same for hpacucli, I would imagine. Unfortunately, though it's all open source, it's part of a package that monitors a pile of things (including IPMI sensors, APC UPSes, Red Hat HA stack, etc), so it wouldn't be drop-in-and-go. That said, you could probably fairly easily strip it down if you wanted to use it, too. If you're curious, I show how to set it up here. If you're comfortable with perl, it'll be pretty easy to adapt, I suspect. https://alteeve.ca/w/AN!Cluster_Tutorial_2#Setting_Up_Alerts Cheers -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education?