[CentOS] [Q} how can O.S. predicate a disk going to failure??

Tue Aug 4 15:50:52 UTC 2009
nate <centos at linuxpowered.net>

mcclnx mcc wrote:

> 1. is this disk really "degrade" or not?

Depends on your point of view, to me it would be. I remember
two situations with "predictive" failure on HP Smart arrays a
few years ago where the drives were practically dead but the
controller kept using them dragging performance down something
like 90%. The drives were detected as about to fail but there
was no way to remove/disable the disk from the array remotely,
so we had to send someone on site to yank the disk to force the
array to rebuild. HP later said a firmware update should fix
the issue, never got around to upgrading it before we migrated
off those systems onto a real SAN.

> 2. how O.S. can predicate disk going to failure?

In this case it's not the OS, it's the controller that is
keeping track of a bunch of internal counters on the disk and
perhaps even scrubbing it every so often. If # of soft errors
exceeds a threshold it triggers the predictive failure logic.

> 3. do I need replace this disk now?

Based on my past experience yes, and any enterprise storage array's
support contract(for comparison) will trigger an immediate replacement
if the array detects that condition.

nate