i'm configuring a storage server with CentOS 6.2, it uses a LSI MegaRAID SAS controller, I'm using LSI's megacli to configure the storage... Any ideas on how to get drive failure notifications out of this system? I'm configuring hot spares but I'd still like some sort of notification when a drive has failed so the spare can be replaced.
On Dec 22, 2011, at 1:12 PM, John R Pierce wrote:
i'm configuring a storage server with CentOS 6.2, it uses a LSI MegaRAID SAS controller, I'm using LSI's megacli to configure the storage... Any ideas on how to get drive failure notifications out of this system? I'm configuring hot spares but I'd still like some sort of notification when a drive has failed so the spare can be replaced.
---- don't know how to do it on CentOS but on Ubuntu, I use megaclisas-status package which goes hand in hand with megacli and it sends notifications.
If you want, I can e-mail you the megaclisas-status script from /usr/sbin and beyond that, there's a sysv initscript that periodically checks and sends an e-mail. Simple enough.
Craig
On 12/22/11 1:32 PM, Craig White wrote:
On Dec 22, 2011, at 1:12 PM, John R Pierce wrote:
i'm configuring a storage server with CentOS 6.2, it uses a LSI MegaRAID SAS controller, I'm using LSI's megacli to configure the storage... Any ideas on how to get drive failure notifications out of this system? I'm configuring hot spares but I'd still like some sort of notification when a drive has failed so the spare can be replaced.
don't know how to do it on CentOS but on Ubuntu, I use megaclisas-status package which goes hand in hand with megacli and it sends notifications.
If you want, I can e-mail you the megaclisas-status script from /usr/sbin and beyond that, there's a sysv initscript that periodically checks and sends an e-mail. Simple enough.
not having much luck locating that megaclisas-status script
http://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS talks about it, but the source is nowhere to be found
I found this, which looks moderately interesting, but I'm not a python programmer http://windowsmasher.wordpress.com/2011/08/15/using-megacli-to-monitor-openf...
On Jan 4, 2012, at 7:08 PM, John R Pierce wrote:
On 12/22/11 1:32 PM, Craig White wrote:
On Dec 22, 2011, at 1:12 PM, John R Pierce wrote:
i'm configuring a storage server with CentOS 6.2, it uses a LSI MegaRAID SAS controller, I'm using LSI's megacli to configure the storage... Any ideas on how to get drive failure notifications out of this system? I'm configuring hot spares but I'd still like some sort of notification when a drive has failed so the spare can be replaced.
don't know how to do it on CentOS but on Ubuntu, I use megaclisas-status package which goes hand in hand with megacli and it sends notifications.
If you want, I can e-mail you the megaclisas-status script from /usr/sbin and beyond that, there's a sysv initscript that periodically checks and sends an e-mail. Simple enough.
not having much luck locating that megaclisas-status script
http://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS talks about it, but the source is nowhere to be found
---- It seems to me that the megaclisas-statusd is basically 2 script files, 1 in /usr/sbin and the other a sysv initscript and should be easily modifiable to run on CentOS instead of Debian/Ubuntu with the requirement that you would need to have megacli installed/running. I agree that I couldn't find any rpm's of the same.
Craig
On 01/05/12 7:14 AM, Craig White wrote:
not having much luck locating that megaclisas-status script
http://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS talks about it, but the source is nowhere to be found
It seems to me that the megaclisas-statusd is basically 2 script files, 1 in /usr/sbin and the other a sysv initscript and should be easily modifiable to run on CentOS instead of Debian/Ubuntu with the requirement that you would need to have megacli installed/running. I agree that I couldn't find any rpm's of the same.
indeed it does. but I can't find those two scripts at the above site or elsewhere. I don't have any debian, and don't really want to have to install it.
I spent about 2 hours with the code from http://windowsmasher.wordpress.com/2011/08/15/using-megacli-to-monitor-openf... but its badly broken, the blog seems to have trashed pythons indentation, and while I fixed enough to get it working, its not working right as its not listing any of the LD or PD's.
sent to you via PM - hope you don't mind.
Craig
On Jan 5, 2012, at 12:53 PM, John R Pierce wrote:
On 01/05/12 7:14 AM, Craig White wrote:
not having much luck locating that megaclisas-status script
http://hwraid.le-vert.net/wiki/LSIMegaRAIDSAS talks about it, but the source is nowhere to be found
It seems to me that the megaclisas-statusd is basically 2 script files, 1 in /usr/sbin and the other a sysv initscript and should be easily modifiable to run on CentOS instead of Debian/Ubuntu with the requirement that you would need to have megacli installed/running. I agree that I couldn't find any rpm's of the same.
indeed it does. but I can't find those two scripts at the above site or elsewhere. I don't have any debian, and don't really want to have to install it.
I spent about 2 hours with the code from http://windowsmasher.wordpress.com/2011/08/15/using-megacli-to-monitor-openf... but its badly broken, the blog seems to have trashed pythons indentation, and while I fixed enough to get it working, its not working right as its not listing any of the LD or PD's.
On Jan 5, 2012, at 2:02 PM, John R Pierce wrote:
On 01/05/12 12:42 PM, Craig White wrote:
sent to you via PM - hope you don't mind.
thanks, got it.
---- I probably should have figured out a way to send you just a tarball but there really wasn't much else and certainly nothing of significance. This was the full manifest...
# dpkg -L megaclisas-status /. /etc /etc/init.d /etc/init.d/megaclisas-statusd /usr /usr/sbin /usr/sbin/megaclisas-status /usr/share /usr/share/doc /usr/share/doc/megaclisas-status /usr/share/doc/megaclisas-status/README.Debian /usr/share/doc/megaclisas-status/changelog.gz /usr/share/doc/megaclisas-status/copyright
If there's anything else you want from this, let me know.
Craig
On 01/05/12 1:09 PM, Craig White wrote:
I probably should have figured out a way to send you just a tarball but there really wasn't much else and certainly nothing of significance.
sigh, its doing the same thing as the code I 'fixed' from that blog I posted earlier...
# ./megaclisas-status -- Controller informations -- -- ID | Model c0 | LSI MegaRAID SAS 9261-8i
-- Arrays informations -- -- ID | Type | Size | Status | InProgress
-- Disks informations -- ID | Model | Status
(I have 1 array and 36 disks on this controller) so the output format from MegaCli has probably changed just enough to throw it off, so I need to refactor it. meh.
I can't believe noone is running a late model MegaRAID SAS card with Linux and doesn't require error status change notifications.
On Jan 5, 2012, at 2:35 PM, John R Pierce wrote:
On 01/05/12 1:09 PM, Craig White wrote:
I probably should have figured out a way to send you just a tarball but there really wasn't much else and certainly nothing of significance.
sigh, its doing the same thing as the code I 'fixed' from that blog I posted earlier...
# ./megaclisas-status -- Controller informations -- -- ID | Model c0 | LSI MegaRAID SAS 9261-8i
-- Arrays informations -- -- ID | Type | Size | Status | InProgress
-- Disks informations -- ID | Model | Status
(I have 1 array and 36 disks on this controller) so the output format from MegaCli has probably changed just enough to throw it off, so I need to refactor it. meh.
I can't believe noone is running a late model MegaRAID SAS card with Linux and doesn't require error status change notifications.
---- maybe it's the RAID controller you are using that isn't compatible with megacli or the version of megacli that you are using...
from dpkg -l megacli... megacli 5.00.12-1 LSI Logic MegaRAID SAS MegaCLI
/usr/sbin/megaclisas-status -- Controller informations -- -- ID | Model c0 | Supermicro SMC2108
-- Arrays informations -- -- ID | Type | Size | Status | InProgress c0u0 | RAID1 | 930G | Optimal | None
-- Disks informations -- ID | Model | Status c0u0p0 | 9XG06RH0ST91000640NS SN01 | Online c0u0p1 | 9XG067L3ST91000640NS SN01 | Online
of course this is on Ubuntu so YMMV
Craig
On 01/05/12 3:23 PM, Craig White wrote:
maybe it's the RAID controller you are using that isn't compatible with megacli or the version of megacli that you are using...
brand new 9261-8i SAS2 controller definitely works 100% with the latest MegaCLI64 I got from LSI Logic (installed from an RPM).... from what I'm gathering, megacli the program simply passes the command line to the card's firmware, which does all the processing and output generation, and this output format is human readable, resulting in parsing nightmares when it changes.
# rpm -qf /opt/MegaRAID/MegaCli/MegaCli64 MegaCli-8.02.16-1.i386
# /opt/MegaRAID/MegaCli/MegaCli64 showsummary a0
System Operating System: Linux version 2.6.32-220.el6.x86_64 Driver Version: 00.00.05.40-rh2 CLI Version: 8.02.16
Hardware Controller ProductName : LSI MegaRAID SAS 9261-8i(Bus 0, Dev 0) SAS Address : 500605b0032943d0 FW Package Version: 12.12.0-0046 Status : Optimal (300 more lines of drives and stuff deleted)
anyways, I think I should take this discussion off this CentOS mail list, as its really not CentOS specific, its an LSI Logic generic problem. I've got enough info now to figure out how to do my own parser, I believe I'll use that ShowSummary command rather than the LDInfo/PDInfo commands used by the other scripts.
On 01/05/12 1:35 PM, John R Pierce wrote:
sigh, its doing the same thing as the code I 'fixed' from that blog I posted earlier...
ok, I've figured out the differences between what megaclisas-status expected and what megacli for these new sas cards generated, and hacked up the code to work with the version firmware I have, but it didn't really understand SAS enclosure numbering nor did it list the global hotspares:
# ./megaclisas-status -- Controller informations -- -- ID | Model c0 | LSI MegaRAID SAS 9261-8i
-- Arrays informations -- -- ID | Type | Size | Status | InProgress c0u0 | RAID6 | 73668G | Optimal | Background Initialization: Completed 79%, Taken 329 min.
-- Disks informations -- ID | Model | Status c0u0p0 | SEAGATE ST33000650SS 0003Z290SBNR | Online, Spun Up c0u0p1 | SEAGATE ST33000650SS 0003Z290JX8W | Online, Spun Up c0u0p2 | SEAGATE ST33000650SS 0003Z290WT5A | Online, Spun Up c0u0p3 | SEAGATE ST33000650SS 0003Z290T04B | Online, Spun Up c0u0p4 | SEAGATE ST33000650SS 0003Z290VL94 | Online, Spun Up c0u0p5 | SEAGATE ST33000650SS 0003Z290VA0W | Online, Spun Up c0u0p6 | SEAGATE ST33000650SS 0003Z290QGSF | Online, Spun Up c0u0p7 | SEAGATE ST33000650SS 0003Z290QLYD | Online, Spun Up c0u0p8 | SEAGATE ST33000650SS 0003Z290ML45 | Online, Spun Up c0u0p9 | SEAGATE ST33000650SS 0003Z290TCLW | Online, Spun Up c0u0p10 | SEAGATE ST33000650SS 0003Z290X68R | Online, Spun Up c0u0p0 | SEAGATE ST33000650SS 0003Z290LC8R | Online, Spun Up c0u0p1 | SEAGATE ST33000650SS 0003Z290PG2G | Online, Spun Up c0u0p2 | SEAGATE ST33000650SS 0003Z290N3MF | Online, Spun Up c0u0p3 | SEAGATE ST33000650SS 0003Z290BD3Q | Online, Spun Up c0u0p4 | SEAGATE ST33000650SS 0003Z290BDL4 | Online, Spun Up c0u0p5 | SEAGATE ST33000650SS 0003Z290R7DJ | Online, Spun Up c0u0p6 | SEAGATE ST33000650SS 0003Z2908KHH | Online, Spun Up c0u0p7 | SEAGATE ST33000650SS 0003Z290BDCN | Online, Spun Up c0u0p8 | SEAGATE ST33000650SS 0003Z290QR9Q | Online, Spun Up c0u0p9 | SEAGATE ST33000650SS 0003Z290TDTE | Online, Spun Up c0u0p10 | SEAGATE ST33000650SS 0003Z290PTX5 | Online, Spun Up c0u0p0 | SEAGATE ST33000650SS 0003Z290PSZ2 | Online, Spun Up c0u0p1 | SEAGATE ST33000650SS 0003Z290S8LH | Online, Spun Up c0u0p2 | SEAGATE ST33000650SS 0003Z290QYX2 | Online, Spun Up c0u0p3 | SEAGATE ST33000650SS 0003Z290MY22 | Online, Spun Up c0u0p4 | SEAGATE ST33000650SS 0003Z290MY43 | Online, Spun Up c0u0p5 | SEAGATE ST33000650SS 0003Z290LGTG | Online, Spun Up c0u0p6 | SEAGATE ST33000650SS 0003Z290TXHX | Online, Spun Up c0u0p7 | SEAGATE ST33000650SS 0003Z290R0AE | Online, Spun Up c0u0p8 | SEAGATE ST33000650SS 0003Z290L1D5 | Online, Spun Up c0u0p9 | SEAGATE ST33000650SS 0003Z290TLGX | Online, Spun Up c0u0p10 | SEAGATE ST33000650SS 0003Z290TQW7 | Online, Spun Up
(note all the disks are c0u0, which is way wrong, my disks are 20:0-20:11 and 45:0-45:23)
so I took the OTHER one I'd found and saw that it WAS setup for SAS enclosure info, and on my test system generates this output...
# ./lsi-raidinfo -- Controllers -- -- ID | Model c0 | LSI MegaRAID SAS 9261-8i
-- Volumes -- -- ID | Type | Size | Status | InProgress volinfo: c0u0 | RAID6 | 73668G | Optimal | Background Initialization: Completed 77%, Taken 303 min.
-- Disks -- -- Encl:Slot | Model | Status diskinfo: 20:0 | SEAGATE ST33000650SS 0003Z290SBNR | Online, Spun Up diskinfo: 20:1 | SEAGATE ST33000650SS 0003Z290JX8W | Online, Spun Up diskinfo: 20:2 | SEAGATE ST33000650SS 0003Z290WT5A | Online, Spun Up diskinfo: 20:3 | SEAGATE ST33000650SS 0003Z290T04B | Online, Spun Up diskinfo: 20:4 | SEAGATE ST33000650SS 0003Z290VL94 | Online, Spun Up diskinfo: 20:5 | SEAGATE ST33000650SS 0003Z290VA0W | Online, Spun Up diskinfo: 20:6 | SEAGATE ST33000650SS 0003Z290QGSF | Online, Spun Up diskinfo: 20:7 | SEAGATE ST33000650SS 0003Z290QLYD | Online, Spun Up diskinfo: 20:8 | SEAGATE ST33000650SS 0003Z290ML45 | Online, Spun Up diskinfo: 20:9 | SEAGATE ST33000650SS 0003Z290TCLW | Online, Spun Up diskinfo: 20:10 | SEAGATE ST33000650SS 0003Z290X68R | Online, Spun Up diskinfo: 45:11 | SEAGATE ST33000650SS 0003Z290V4PZ Hotspare Information | Hotspare, Spun down diskinfo: 45:0 | SEAGATE ST33000650SS 0003Z290LC8R | Online, Spun Up diskinfo: 45:1 | SEAGATE ST33000650SS 0003Z290PG2G | Online, Spun Up diskinfo: 45:2 | SEAGATE ST33000650SS 0003Z290N3MF | Online, Spun Up diskinfo: 45:3 | SEAGATE ST33000650SS 0003Z290BD3Q | Online, Spun Up diskinfo: 45:4 | SEAGATE ST33000650SS 0003Z290BDL4 | Online, Spun Up diskinfo: 45:5 | SEAGATE ST33000650SS 0003Z290R7DJ | Online, Spun Up diskinfo: 45:6 | SEAGATE ST33000650SS 0003Z2908KHH | Online, Spun Up diskinfo: 45:7 | SEAGATE ST33000650SS 0003Z290BDCN | Online, Spun Up diskinfo: 45:8 | SEAGATE ST33000650SS 0003Z290QR9Q | Online, Spun Up diskinfo: 45:9 | SEAGATE ST33000650SS 0003Z290TDTE | Online, Spun Up diskinfo: 45:10 | SEAGATE ST33000650SS 0003Z290PTX5 | Online, Spun Up diskinfo: 45:11 | SEAGATE ST33000650SS 00039XK0EW80 Hotspare Information | Hotspare, Spun down diskinfo: 45:12 | SEAGATE ST33000650SS 0003Z290PSZ2 | Online, Spun Up diskinfo: 45:13 | SEAGATE ST33000650SS 0003Z290S8LH | Online, Spun Up diskinfo: 45:14 | SEAGATE ST33000650SS 0003Z290QYX2 | Online, Spun Up diskinfo: 45:15 | SEAGATE ST33000650SS 0003Z290MY22 | Online, Spun Up diskinfo: 45:16 | SEAGATE ST33000650SS 0003Z290MY43 | Online, Spun Up diskinfo: 45:17 | SEAGATE ST33000650SS 0003Z290LGTG | Online, Spun Up diskinfo: 45:18 | SEAGATE ST33000650SS 0003Z290TXHX | Online, Spun Up diskinfo: 45:19 | SEAGATE ST33000650SS 0003Z290R0AE | Online, Spun Up diskinfo: 45:20 | SEAGATE ST33000650SS 0003Z290L1D5 | Online, Spun Up diskinfo: 45:21 | SEAGATE ST33000650SS 0003Z290TLGX | Online, Spun Up diskinfo: 45:22 | SEAGATE ST33000650SS 0003Z290TQW7 | Online, Spun Up diskinfo: 45:23 | SEAGATE ST33000650SS 0003Z290TKE1 Hotspare Information | Hotspare, Spun down
20: or 45: is the enclosure/backplane and :nn is the drive on the backplane.
MEH, thats still messed up. the first hotspare above should be 20:11, and looking at the incredibly verbose output of the megacli command that this util is parsing, its a little messed up, and I think its getting the last enclosure# from a non-hotspare as they were listed first. $#@$@#$#$@#$
GAHHH! this is a mess.
i'm configuring a storage server with CentOS 6.2, it uses a LSI MegaRAID SAS controller, I'm using LSI's megacli to configure the storage... Any ideas on how to get drive failure notifications out of this system? I'm configuring hot spares but I'd still like some sort of notification when a drive has failed so the spare can be replaced.
I've tried countless times to get the LSI snmp agent designed just for this to work in CentOS 5 and 6, no luck.
Resorted to a pathetic snmp extend that runs megacli, sigh...
On 12/22/11 3:28 PM, Joseph L. Casale wrote:
i'm configuring a storage server with CentOS 6.2, it uses a LSI MegaRAID
SAS controller, I'm using LSI's megacli to configure the storage... Any ideas on how to get drive failure notifications out of this system? I'm configuring hot spares but I'd still like some sort of notification when a drive has failed so the spare can be replaced.
I've tried countless times to get the LSI snmp agent designed just for this to work in CentOS 5 and 6, no luck.
Resorted to a pathetic snmp extend that runs megacli, sigh...
I just installed the Linux MSM package from LSI...
I had to add these packages to a minimal 6.2 install for what its worth...
*
yum install libstdc++.i686 compat-libstdc++-33.i686 libX11 libX11.i686 libXi libXi.i686 libXtst libXtst.i686 xauth net-snmp net-snmp-utils csh
The X stuff was required by the GUI manager, I suspect if you did a agent-only install, that wouldn't be needed.
I'm a total novice when it comes to snmp. but i'm reading some howto's and stumbling around.