[CentOS] The amazing smartctl -a /dev/hda

Tue Dec 5 00:13:16 UTC 2006
Paul <unix at bikesn4x4s.com>

I finally fixed my drive error problem.  This has been going on quite a
while.  I've posted before with no success on getting this fixed.

I was getting these errors.

Dec  4 04:03:10 bikesn4x4s kernel: hda: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Dec  4 04:03:10 bikesn4x4s kernel: hda: dma_intr: error=0x84 {
DriveStatusError BadCRC }

And now for the amazing smartctl -a command.  The output actually had a
link that advised and directed me to a link to upgrade my firmware, which
looks to have fixed it.  Simply amazing.

smartctl -a /dev/hda:

smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     IBM-DTLA-307020
Serial Number:    YH0YHL24292
Firmware Version: TX3OA50C
User Capacity:    20,576,747,520 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   5
ATA Standard is:  ATA/ATAPI-5 T13 1321D revision 1
Local Time is:    Mon Dec  4 06:45:29 2006 EST

==> WARNING: IBM Deskstar 40GV and 75GXP drives may need upgraded SMART
firmware.
Please see http://www.geocities.com/dtla_update/ and
http://www-3.ibm.com/pc/support/site.wss/document.do?lndocid=MIGR-42215 or
http://www-1.ibm.com/support/docview.wss?uid=psg1MIGR-42215

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine
completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		 (1496) seconds.
Offline data collection
capabilities: 			 (0x1b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					No General Purpose Logging support.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  14) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED 
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   060    Pre-fail  Always  
    -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline 
    -       0
  3 Spin_Up_Time            0x0007   115   115   024    Pre-fail  Always  
    -       188 (Average 194)
  4 Start_Stop_Count        0x0012   099   099   000    Old_age   Always  
    -       4026
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always  
    -       9
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always  
    -       0
  8 Seek_Time_Performance   0x0005   100   100   020    Pre-fail  Offline 
    -       0
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always  
    -       21888
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always  
    -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always  
    -       2693
192 Power-Off_Retract_Count 0x0032   097   097   050    Old_age   Always  
    -       4026
193 Load_Cycle_Count        0x0012   097   097   050    Old_age   Always  
    -       4026
194 Temperature_Celsius     0x0002   183   183   000    Old_age   Always  
    -       30
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always  
    -       10
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always  
    -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline 
    -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always  
    -       367

SMART Error Log Version: 1
ATA Error Count: 431 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 431 occurred at disk power-on lifetime: 21885 hours (911 days + 21
hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 ac 28 f9 e1  Error: ICRC, ABRT at LBA = 0x01f928ac = 33106092

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 a5 28 f9 e1 00   9d+18:33:42.900  READ DMA
  c8 00 68 3d 28 f9 e1 00   9d+18:33:42.900  READ DMA
  c8 00 10 3d 1a f9 e1 00   9d+18:33:42.900  READ DMA
  c8 00 68 d5 19 f9 e1 00   9d+18:33:42.900  READ DMA
  c8 00 08 0d 12 f9 e1 00   9d+18:33:42.900  READ DMA

Error 430 occurred at disk power-on lifetime: 21885 hours (911 days + 21
hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 dc 29 f9 e1  Error: ICRC, ABRT at LBA = 0x01f929dc = 33106396

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 d5 29 f9 e1 00   9d+18:33:41.900  READ DMA
  c8 00 68 6d 29 f9 e1 00   9d+18:33:41.900  READ DMA
  c8 00 60 ad 0d f9 e1 00   9d+18:33:41.900  READ DMA
  c8 00 08 fd 5a fb e1 00   9d+18:33:41.900  READ DMA
  c8 00 68 95 5a fb e1 00   9d+18:33:41.900  READ DMA

Error 429 occurred at disk power-on lifetime: 21885 hours (911 days + 21
hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 1c 27 9c e1  Error: ICRC, ABRT at LBA = 0x019c271c = 27010844

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 18 05 27 9c e1 00   9d+18:33:40.900  READ DMA
  c8 00 08 9d a2 9e e1 00   9d+18:33:40.900  READ DMA
  c8 00 20 b5 a2 9e e1 00   9d+18:33:40.900  READ DMA
  c8 00 18 35 42 9c e1 00   9d+18:33:40.900  READ DMA
  c8 00 18 b5 1e a2 e1 00   9d+18:33:40.900  READ DMA

Error 428 occurred at disk power-on lifetime: 21885 hours (911 days + 21
hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 dc b2 a5 e1  Error: ICRC, ABRT at LBA = 0x01a5b2dc = 27636444

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 30 ad b2 a5 e1 00   9d+18:33:40.400  READ DMA
  c8 00 28 f5 d4 a2 e1 00   9d+18:33:40.400  READ DMA
  c8 00 18 1d d5 a2 e1 00   9d+18:33:40.400  READ DMA
  c8 00 10 3d dc a2 e1 00   9d+18:33:40.400  READ DMA
  c8 00 28 25 db a2 e1 00   9d+18:33:40.400  READ DMA

Error 427 occurred at disk power-on lifetime: 21885 hours (911 days + 21
hours)
  When the command that caused the error occurred, the device was active
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 00 0c 41 b0 e1  Error: ICRC, ABRT at LBA = 0x01b0410c = 28328204

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 20 ed 40 b0 e1 00   9d+18:33:37.900  READ DMA
  c8 00 50 d5 29 b0 e1 00   9d+18:33:37.900  READ DMA
  c8 00 30 75 42 b0 e1 00   9d+18:33:37.900  READ DMA
  c8 00 28 25 4a b0 e1 00   9d+18:33:37.900  READ DMA
  c8 00 18 45 d5 ab e1 00   9d+18:33:37.800  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining 
LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     17736      
  -
# 2  Short offline       Completed without error       00%     15777      
  -
# 3  Short offline       Completed without error       00%     15776      
  -

Device does not support Selective Self Tests/Logging



And now after upgrading the firmware:

smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce
Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     IBM-DTLA-307020
Serial Number:    YH0YHL24292
Firmware Version: TX3OA5AA
User Capacity:    20,576,747,520 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   5
ATA Standard is:  ATA/ATAPI-5 T13 1321D revision 1
Local Time is:    Mon Dec  4 18:56:58 2006 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine
completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		 (1496) seconds.
Offline data collection
capabilities: 			 (0x1b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					No General Purpose Logging support.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  14) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED 
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   060    Pre-fail  Always  
    -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline 
    -       0
  3 Spin_Up_Time            0x0007   100   100   024    Pre-fail  Always  
    -       0
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always  
    -       1
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always  
    -       9
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always  
    -       0
  8 Seek_Time_Performance   0x0005   100   100   020    Pre-fail  Offline 
    -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always  
    -       0
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always  
    -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always  
    -       1
192 Power-Off_Retract_Count 0x0032   100   100   050    Old_age   Always  
    -       1
193 Load_Cycle_Count        0x0012   100   100   050    Old_age   Always  
    -       1
194 Temperature_Celsius     0x0002   177   177   000    Old_age   Always  
    -       31
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always  
    -       10
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always  
    -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline 
    -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always  
    -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


Device does not support Selective Self Tests/Logging