I'm looking to do a bit more monitoring of my 3ware 9550 with smartd, and wanted to see what others were doing with smart for monitoring 3ware hardware.
Do you have the smartd.conf configured to test, or simply monitor health status? Are you monitoring the drive as centos sees it (/dev/sdX) or are you using the 3ware /dev/twaX for monitoring?
Opinions and discussions are welcome :-P
On Tue, 2009-02-10 at 21:42 -0500, Jim Perrin wrote:
I'm looking to do a bit more monitoring of my 3ware 9550 with smartd, and wanted to see what others were doing with smart for monitoring 3ware hardware.
Do you have the smartd.conf configured to test, or simply monitor health status? Are you monitoring the drive as centos sees it (/dev/sdX) or are you using the 3ware /dev/twaX for monitoring?
Opinions and discussions are welcome :-P
This is my smartd.conf for monitoring drives on a 9550SX:
/dev/twa0 -d 3ware,0 -H -m root /dev/twa0 -d 3ware,1 -H -m root /dev/twa0 -d 3ware,2 -H -m root /dev/twa0 -d 3ware,3 -H -m root
Using smartctl is similar:
# smartctl -Hd 3ware,0 /dev/twa0
It's straightforward to do testing with smartctl, but the above -H/--health output gives you some warning that things aren't right before the drive fails, especially the later lines of output (e.g. Current_Pending_Sector, Offline_Uncorrectable). I run it as a weekly cron job.
Steve
On Wed, Feb 11, 2009 at 12:46:24AM -0500, S.Tindall wrote:
On Tue, 2009-02-10 at 21:42 -0500, Jim Perrin wrote:
I'm looking to do a bit more monitoring of my 3ware 9550 with smartd, and wanted to see what others were doing with smart for monitoring 3ware hardware.
Do you have the smartd.conf configured to test, or simply monitor health status? Are you monitoring the drive as centos sees it (/dev/sdX) or are you using the 3ware /dev/twaX for monitoring?
Opinions and discussions are welcome :-P
This is my smartd.conf for monitoring drives on a 9550SX:
/dev/twa0 -d 3ware,0 -H -m root /dev/twa0 -d 3ware,1 -H -m root /dev/twa0 -d 3ware,2 -H -m root /dev/twa0 -d 3ware,3 -H -m root
Using smartctl is similar:
# smartctl -Hd 3ware,0 /dev/twa0
It's straightforward to do testing with smartctl, but the above -H/--health output gives you some warning that things aren't right before the drive fails, especially the later lines of output (e.g. Current_Pending_Sector, Offline_Uncorrectable). I run it as a weekly cron job.
Do you ever run the long/short tests? These are mentioned in the smartd.conf 3ware examples.
I've never enabled them.
Ray
On Tue, 2009-02-10 at 21:52 -0800, Ray Van Dolson wrote:
On Wed, Feb 11, 2009 at 12:46:24AM -0500, S.Tindall wrote:
On Tue, 2009-02-10 at 21:42 -0500, Jim Perrin wrote:
I'm looking to do a bit more monitoring of my 3ware 9550 with smartd, and wanted to see what others were doing with smart for monitoring 3ware hardware.
Do you have the smartd.conf configured to test, or simply monitor health status? Are you monitoring the drive as centos sees it (/dev/sdX) or are you using the 3ware /dev/twaX for monitoring?
Opinions and discussions are welcome :-P
This is my smartd.conf for monitoring drives on a 9550SX:
/dev/twa0 -d 3ware,0 -H -m root /dev/twa0 -d 3ware,1 -H -m root /dev/twa0 -d 3ware,2 -H -m root /dev/twa0 -d 3ware,3 -H -m root
Using smartctl is similar:
# smartctl -Hd 3ware,0 /dev/twa0
It's straightforward to do testing with smartctl, but the above -H/--health output gives you some warning that things aren't right before the drive fails, especially the later lines of output (e.g. Current_Pending_Sector, Offline_Uncorrectable). I run it as a weekly cron job.
Do you ever run the long/short tests? These are mentioned in the smartd.conf 3ware examples.
I've never enabled them.
Ray
No, I scheduled the selftest through the 3ware controller itself, but it's been so long since I set it up, that I would have to get out the manual to remember how to do it.
# /usr/local/sbin/tw_cli show selftest
Selftest Schedule for Controller /c2 ======================================================== Slot Day Hour UDMA SMART -------------------------------------------------------- 1 Sun 12:00am enabled enabled 2 Mon 12:00am enabled enabled 3 Tue 12:00am enabled enabled 4 Wed 12:00am enabled enabled 5 Thu 12:00am enabled enabled 6 Fri 12:00am enabled enabled 7 Sat 12:00am enabled enabled
Steve
On Wednesday 11 February 2009, Jim Perrin wrote:
I'm looking to do a bit more monitoring of my 3ware 9550 with smartd, and wanted to see what others were doing with smart for monitoring 3ware hardware.
Do you have the smartd.conf configured to test, or simply monitor health status? Are you monitoring the drive as centos sees it (/dev/sdX) or are you using the 3ware /dev/twaX for monitoring?
I don't use smartd against them but I do run smartctl from time to time. I have had issues with using /dev/sdX and now only use /dev/twaX (but I can't really remember what bit me...).
/Peter
Opinions and discussions are welcome :-P
On Tue, 10 Feb 2009 at 9:42pm, Jim Perrin wrote
I'm looking to do a bit more monitoring of my 3ware 9550 with smartd, and wanted to see what others were doing with smart for monitoring 3ware hardware.
Do you have the smartd.conf configured to test, or simply monitor health status? Are you monitoring the drive as centos sees it (/dev/sdX) or are you using the 3ware /dev/twaX for monitoring?
Opinions and discussions are welcome :-P
Have you thought about tying tw_cli into nagios? That's one of my round-tuit projects. I'm sure there are already plugins for it, and it seems like you may get better info.
On Wed, Feb 11, 2009 at 12:17:09PM -0500, Joshua Baker-LePain wrote:
On Tue, 10 Feb 2009 at 9:42pm, Jim Perrin wrote
I'm looking to do a bit more monitoring of my 3ware 9550 with smartd, and wanted to see what others were doing with smart for monitoring 3ware hardware.
Do you have the smartd.conf configured to test, or simply monitor health status? Are you monitoring the drive as centos sees it (/dev/sdX) or are you using the 3ware /dev/twaX for monitoring?
Opinions and discussions are welcome :-P
Have you thought about tying tw_cli into nagios? That's one of my round-tuit projects. I'm sure there are already plugins for it, and it seems like you may get better info.
On a somewhat related note. I haven't yet looked into what 3ware does on the SNMP side. Does it include a MIB file? Anyone generating traps on bad disk events?
Not hard to set up your own OID to run a shell script to check disk status, but traps are cooler. :)
Ray
On Wed, Feb 11, 2009 at 12:17 PM, Joshua Baker-LePain jlb17@duke.edu wrote:
Have you thought about tying tw_cli into nagios? That's one of my round-tuit projects. I'm sure there are already plugins for it, and it seems like you may get better info.
I actually did think about doing this, but this is for my home network, and I'm a little too lazy/busy currently to set up nagios for personal use. I might do that a bit later on if I'm feeling frisky, but for now I was just looking for quick-fix type checking. There are plugins for this already as you suggested, but I have no idea how well they actually function.
Jim Perrin wrote:
I'm looking to do a bit more monitoring of my 3ware 9550 with smartd, and wanted to see what others were doing with smart for monitoring 3ware hardware.
Do you have the smartd.conf configured to test, or simply monitor health status? Are you monitoring the drive as centos sees it (/dev/sdX) or are you using the 3ware /dev/twaX for monitoring?
Opinions and discussions are welcome :-P
I run smart tests weekly in a staggered fashion during off hours on my 3ware arrays. Like this:
/dev/twa0 -d 3ware,0 -H -l selftest -l error -o on -S on -s (O/../../1/18|S/../../2/22|L/../../3/01) -m root /dev/twa0 -d 3ware,1 -H -l selftest -l error -o on -S on -s (O/../../2/18|S/../../3/22|L/../../4/01) -m root
I've found that the smart monitors are pretty good about giving me at least some warning about imminent drive failures when I do this.