[CentOS] OT: SMART warning on hard drive, same warning for 2 1 /2 years

Mon May 25 12:11:42 UTC 2009
William L. Maltby <CentOS4Bill at triad.rr.com>

On Sun, 2009-05-24 at 23:39 -0500, Robert Nichols wrote:
> Lanny Marcus wrote:
> > My wife's box has a very intermittent problem,  when booting from the
> > Maxtor IDE hard drive. This has been going on for about 2 1/2
> > years.... The box is a Compaq EVO D300v for the Enterprise. When it
> > boots, there is a SMART advisory from the BIOS that says failure is
> > immenient. Occasionally, it will not boot, because the BIOS does not
> > see the hard drive.  I replaced the EIDE cable, but  the problem of
> > sometimes not seeing the hard drive on boot continues. I suspect it
> > has to do with something loose in the electronics of the drive,
> > because if I press on both ends of the EIDE cable, the problem goes
> > away and then it will boot OK.
> [SNIP]
> > === START OF READ SMART DATA SECTION ===
> > SMART overall-health self-assessment test result: FAILED!
> > Drive failure expected in less than 24 hours. SAVE ALL DATA.
> > Failed Attributes:
> > ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> > UPDATED  WHEN_FAILED RAW_VALUE
> >  10 Spin_Retry_Count        0x002b   222   215   223    Pre-fail
> > Always   FAILING_NOW 29
> 
> A spin-up failure could be caused by a weak power supply or a power

If that's the case (spin-up delay), maybe ...

My BIOS has a setting to wait x seconds for the disk to spin up (I don't
need it). If yours has that, maybe "That's the ticket, yeah" (Thanks
SNL).

If it's a weak power supply, keep in mind that there are multiple rails
with different capacities. PS rating may seem sufficient, but may be
weak on one or more rails. Try using a PS connector from a different
rail to split up the start-up draw.

If pushing on the connector on the drive seem to solve it, there's
several things possible. Could be cable, could be cable end, could be a
"cold solder joint" on the HD circuit board. Cable and connector can be
easily, and inexpensively, replaced to test that. But it still could be
the "colder solder joint" - replacing the cable might (temporarily)
mimic the effects of your pushing on it,

If it's the "colder solder joint, the most likely spot is where the
connector pins attach to the board. With a magnifying class, you may be
able to see a hairline crack at one of the bins. The symptoms would be
temperature-sensitive - it would tend to appear (presuming no external
influence such as vibration or torquing of the unit) more consistently
at cooler or warmer ambient temperatures, or if air circulation in the
box is poor it might not seem related.

If you can see the poor joint, a skilled solderer with appropriate-sized
irons and solder wire might be able to fix it. But that might be more
expensive than buying a new one. I have repaired these in the past this
way, but that was when 5.25 form-factor was standard and thing were
larger. My hands are not really that nimble and I wouldn't try on
today's stuff - it's all so much smaller.
 
> connector that is not making good contact.  It is not likely to be
> a problem with the EIDE cable.  Note that even if you do correct the
> problem, the SMART advisory will likely remain due to the accumulated
> failure count, but the boot failures should stop.
> 

Maybe the failure count gets cleared by a full run with no errors?

-- 
Bill