rkarhuse at gmail.com
Sat Aug 30 08:57:10 UTC 2008
On Sat, Aug 30, 2008 at 4:08 AM, Mag Gam <magawake at gmail.com> wrote:
> At my physics lab we have 30 servers with 1TB disk packs. I am in need
> of monitoring for disk failures. I have been reading about SMART and
> it seems it can help. However, I am not sure what to look for if a
> drive is about to fail. Any thoughts about this? Is anyone using this
> method to predetermine disk failures?
Here are a few references from my archives w.r.t. SMART ...
Hope they help ...
Google Releases Paper on Disk Reliability*"The Google engineers just
published a paper on Failure Trends in a Large Disk Drive
Based on a study of 100,000 disk drives over 5 years they find some
interesting stuff. To quote from the abstract: 'Our analysis identifies
several parameters from the drive's self monitoring facility (SMART) that
correlate highly with failures. Despite this high correlation, we conclude
that models based on SMART parameters alone are unlikely to be useful for
predicting individual drive failures. Surprisingly, we found that
temperature and activity levels were much less correlated with drive
failures than previously reported.'"
Everything You Know About Disks Is Wrong*"Google's wasn't the best storage
paper at FAST '07 <http://www.usenix.org/events/fast07/>. Another, more
provocative paper looking at real-world results from 100,000 disk drives got
the 'Best Paper' award. Bianca Schroeder, of CMU's Parallel Data Lab,
submitted Disk failures in the real world: What does an MTTF of 1,000,000
hours mean to you?<http://www.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html>The
paper crushes a number of (what we now know to be) myths about disks
such as vendor MTBF validity, 'consumer' vs. 'enterprise' drive reliability
(spoiler: no difference), and RAID 5 assumptions. StorageMojo has a good
summary of the paper's key points <http://storagemojo.com/?p=383>."*
Monitoring Hard Disks with SMART By Bruce
Allen<http://www.linuxjournal.com/user/801273>on Thu, 2004-01-01
SysAdmin <http://www.linuxjournal.com/taxonomy/term/8> One of your hard
disks might be trying to tell you it's not long for this world. Install
software that lets you know when to replace it.
It's a given that all disks eventually die, and it's easy to see why. The
platters in a modern disk drive rotate more than a hundred times per second,
maintaining submicron tolerances between the disk heads and the magnetic
media that store data. Often they run 24/7 in dusty, overheated
environments, thrashing on heavily loaded or poorly managed machines. So,
it's not surprising that experienced users are all too familiar with the
symptoms of a dying disk. Strange things start happening. Inscrutable kernel
error messages cover the console and then the system becomes unstable and
locks up. Often, entire days are lost repeating recent work, re-installing
the OS and trying to recover data. Even if you have a recent backup, sudden
disk failure is a minor catastrophe.
smartmontools Home Page
Welcome! This is the home page for the smartmontools package.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the CentOS