[CentOS] OT: SMART warning on hard drive, same warning for 2 1 /2 years

Sun May 24 22:10:06 UTC 2009
Lanny Marcus <lmmailinglists at gmail.com>

My wife's box has a very intermittent problem,  when booting from the
Maxtor IDE hard drive. This has been going on for about 2 1/2
years.... The box is a Compaq EVO D300v for the Enterprise. When it
boots, there is a SMART advisory from the BIOS that says failure is
immenient. Occasionally, it will not boot, because the BIOS does not
see the hard drive.  I replaced the EIDE cable, but  the problem of
sometimes not seeing the hard drive on boot continues. I suspect it
has to do with something loose in the electronics of the drive,
because if I press on both ends of the EIDE cable, the problem goes
away and then it will boot OK. The box is currently M$ Windows only. I
just booted it from my Knoppix Live CD and ran smartctl on it. Below
are the results. When I ran the Maxtor Diagnostics on the hard drive,
3 times, each time the quick 90 second SMART test said that I should
run the Read test, which takes about one hour. Each time I ran the
Read test, it passed OK.  3 times.  Should I suggest to my wife that
she let me  replace the hard drive, now, at her convenience, before it
fails completely?  Is there anything in the smartctl results that
indicates that is not the appropriate thing to do,  considering the
length of time this problem has existed?   I did not run the Maxtor
Burn In test or low level format, because I do not  want to  reinstall
everything on this hard drive. The smartctl results certainly seem to
indicate something badly awry, which the Maxtor Diagnostics, on the
Read only test, did not  pick up.    TIA! Lanny


root at Knoppix:~# smartctl -d ata -H /dev/hda
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
Failed Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
 10 Spin_Retry_Count        0x002b   222   215   223    Pre-fail
Always   FAILING_NOW 29

root at Knoppix:~# smartctl -d ata -a /dev/hda
smartctl version 5.36 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Maxtor DiamondMax D540X-4D
Device Model:     Maxtor 4D080H4
Serial Number:    D40SBVYE
Firmware Version: DAH017K0
User Capacity:    81,964,302,336 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 0
Local Time is:    Sun May 24 17:44:28 2009 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  64) The previous self-test completed having
                                        a test element that failed and the test
                                        element that failed is not known.
Total time to complete Offline
data collection:                 (  30) seconds.
Offline data collection
capabilities:                    (0x1b) SMART execute Offline immediate.
                                        Auto Offline data collection
on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  51) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   202   199   063    Pre-fail
Always       -       18883
  4 Start_Stop_Count        0x0032   252   252   000    Old_age
Always       -       2809
  5 Reallocated_Sector_Ct   0x0033   239   239   063    Pre-fail
Always       -       37
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail
Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age
Always       -       0
  8 Seek_Time_Performance   0x0027   250   243   187    Pre-fail
Always       -       47480
  9 Power_On_Minutes        0x0032   253   250   000    Old_age
Always       -       0h+18m
 10 Spin_Retry_Count        0x002b   222   215   223    Pre-fail
Always   FAILING_NOW 29
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail
Always       -       0
 12 Power_Cycle_Count       0x0032   249   249   000    Old_age
Always       -       1722
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age
Always       -       0
193 Load_Cycle_Count        0x0032   253   253   000    Old_age
Always       -       0
194 Unknown_Attribute       0x0032   253   253   000    Old_age
Always       -       0
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age
Always       -       24
196 Reallocated_Event_Count 0x0008   251   251   000    Old_age
Offline      -       2
197 Current_Pending_Sector  0x0008   253   249   000    Old_age
Offline      -       0
198 Offline_Uncorrectable   0x0008   253   252   000    Old_age
Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age
Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age
Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age
Always       -       0
202 TA_Increase_Count       0x000a   253   251   000    Old_age
Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail
Always       -       0
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age
Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age
Always       -       0
207 Spin_High_Current       0x002a   239   235   000    Old_age
Always       -       13
208 Spin_Buzz               0x002a   245   242   000    Old_age
Always       -       8
209 Offline_Seek_Performnce 0x0024   253   253   000    Old_age
Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age
Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age
Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age
Offline      -       0

SMART Error Log Version: 1
Warning: ATA error count 3379 inconsistent with error log pointer 5

ATA Error Count: 3379 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 3379 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in
an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 01 a5 5a a0

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  08 d6 01 01 a5 5a a0 02      03:14:14.480  DEVICE RESET
  b0 d6 01 9f 4f c2 a0 00      03:12:29.984  SMART WRITE LOG
  b0 d5 01 9f 4f c2 a0 00      03:12:29.968  SMART READ LOG
  b0 d6 01 50 4f c2 a0 00      03:12:26.512  SMART WRITE LOG
  b0 d9 01 00 4f c2 a0 00      03:12:26.480  SMART DISABLE OPERATIONS

Error 3378 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in
an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 0b 4f c2 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 9f 4f c2 a0 00      03:12:29.984  SMART WRITE LOG
  b0 d5 01 9f 4f c2 a0 00      03:12:29.968  SMART READ LOG
  b0 d6 01 50 4f c2 a0 00      03:12:26.512  SMART WRITE LOG
  b0 d9 01 00 4f c2 a0 00      03:12:26.480  SMART DISABLE OPERATIONS
  b0 d6 01 50 4f c2 a0 00      03:12:26.416  SMART WRITE LOG

Error 3377 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in
an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 0b 4f c2 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d5 01 9f 4f c2 a0 00      03:12:29.968  SMART READ LOG
  b0 d6 01 50 4f c2 a0 00      03:12:26.512  SMART WRITE LOG
  b0 d9 01 00 4f c2 a0 00      03:12:26.480  SMART DISABLE OPERATIONS
  b0 d6 01 50 4f c2 a0 00      03:12:26.416  SMART WRITE LOG
  41 ff 00 00 b9 8a e9 00      03:12:26.416  READ VERIFY SECTOR(S) [OBS-5]

Error 3376 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in
an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 0b 4f c2 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 50 4f c2 a0 00      03:12:26.512  SMART WRITE LOG
  b0 d9 01 00 4f c2 a0 00      03:12:26.480  SMART DISABLE OPERATIONS
  b0 d6 01 50 4f c2 a0 00      03:12:26.416  SMART WRITE LOG
  41 ff 00 00 b9 8a e9 00      03:12:26.416  READ VERIFY SECTOR(S) [OBS-5]
  41 ff 00 00 b8 8a e9 00      03:12:26.400  READ VERIFY SECTOR(S) [OBS-5]

Error 3375 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was in
an unknown state.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 01 0b 4f c2 a0  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d6 01 50 4f c2 a0 00      03:12:26.416  SMART WRITE LOG
  41 ff 00 00 b9 8a e9 00      03:12:26.416  READ VERIFY SECTOR(S) [OBS-5]
  41 ff 00 00 b8 8a e9 00      03:12:26.400  READ VERIFY SECTOR(S) [OBS-5]
  41 ff 00 00 b7 8a e9 00      03:12:26.400  READ VERIFY SECTOR(S) [OBS-5]
  41 ff 00 00 b6 8a e9 00      03:12:26.400  READ VERIFY SECTOR(S) [OBS-5]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: unknown failure    00%         0         -
# 2  Short offline       Completed: unknown failure    00%         0         -
# 3  Short offline       Completed: unknown failure    00%       997         -
# 4  Short offline       Completed without error       00%       905         -
# 5  Short offline       Completed without error       00%       664         -
# 6  Short offline       Completed without error       00%       664         -
# 7  Short offline       Completed: unknown failure    00%         1         -
# 8  Short offline       Completed: unknown failure    00%         9         -
# 9  Short offline       Completed: unknown failure    00%       215         -
#10  Short offline       Completed without error       00%       215         -
#11  Extended offline    Completed without error       00%       213         -
#12  Short offline       Completed: read failure       60%       187
      80417451
#13  Extended offline    Completed: read failure       20%       184
      80417451
#14  Short offline       Completed without error       00%       181         -
#15  Extended offline    Completed: read failure       20%       151
      80417451
#16  Short offline       Completed without error       00%       151         -
#17  Short offline       Completed without error       00%       139         -
#18  Short offline       Completed: read failure       60%        45
      208052
#19  Short offline       Completed without error       00%         5         -
#20  Extended offline    Completed without error       00%         4         -
#21  Short offline       Completed without error       00%         3         -

Device does not support Selective Self Tests/Logging
root at Knoppix:~#