[CentOS] Re: Disk near failure

Thu Oct 27 17:38:09 UTC 2016
Yamaban <foerster at lisas.de>

On Thu, 27 Oct 2016 11:25, Alessandro Baggi wrote:
> Il 24/10/2016 14:05, Leonard den Ottolander ha scritto:
>>  On Mon, 2016-10-24 at 12:07 +0200, Alessandro Baggi wrote:
>> >  === START OF READ SMART DATA SECTION ===
>> >  SMART Error Log not supported
>>
>>  I reckon there's a <snip> between those lines. The line right after the
>>  first should read something like:
>>
>>  SMART overall-health self-assessment test result: PASSED
>>
>>  or "FAILED" for that matter. If not try running
>>
>>  smartctl -t short /dev/sda
>>
>>  , wait for the indicated time to expire, then check the output of
>>  smartctl -a (or -x) again.
>>
>>  Regards,
>>  Leonard.
>> 
> Hi Leonard,
> after a smart short test, the output of smartctl -a /dev/... is
>
> === START OF INFORMATION SECTION ===
> Model Family:     SandForce Driven SSDs
> Device Model:     Corsair Force GT
> Serial Number:    12297948000015020A81
> LU WWN Device Id: 0 000000 000000000
> Firmware Version: 5.02
> User Capacity:    120,034,123,776 bytes [120 GB]
> Sector Size:      512 bytes logical/physical
> Rotation Rate:    Solid State Device
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Thu Oct 27 11:22:22 2016 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status:  (0x02) Offline data collection activity
>                                         was completed without error.
>                                         Auto Offline Data Collection: 
> Disabled.
> Self-test execution status:      (   0) The previous self-test routine 
> completed
>                                        without error or no self-test has ever
>                                        been run.
> Total time to complete Offline
> data collection:                (    0) seconds.
> Offline data collection
> capabilities:                    (0x7b) SMART execute Offline immediate.
>                                        Auto Offline data collection on/off 
> support.
>                                         Suspend Offline collection upon new
>                                         command.
>                                         Offline surface scan supported.
>                                         Self-test supported.
>                                         Conveyance Self-test supported.
>                                         Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                         power-saving mode.
>                                         Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                        General Purpose Logging supported.
> Short self-test routine
> recommended polling time:        (   1) minutes.
> Extended self-test routine
> recommended polling time:        (  48) minutes.
> Conveyance self-test routine
> recommended polling time:        (   2) minutes.
> SCT capabilities:              (0x0021) SCT Status supported.
>                                         SCT Data Table supported.
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED 
> WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x000f   120   120   050    Pre-fail Always  -  0/0
>  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail Always  -  0
>  9 Power_On_Hours_and_Msec 0x0032   000   000   000    Old_age  Always  -  17394h+07m+56.840s
> 12 Power_Cycle_Count       0x0032   099   099   000    Old_age  Always  -  1974
> 171 Program_Fail_Count     0x0032   000   000   000    Old_age  Always  -  0
> 172 Erase_Fail_Count       0x0032   000   000   000    Old_age  Always  -  0
> 174 Unexpect_Power_Loss_Ct 0x0030   000   000   000    Old_age  Offline -  780
> 177 Wear_Range_Delta       0x0000   000   000   000    Old_age  Offline -  3
> 181 Program_Fail_Count     0x0032   000   000   000    Old_age  Always  -  0
> 182 Erase_Fail_Count       0x0032   000   000   000    Old_age  Always  -  0
> 187 Reported_Uncorrect     0x0032   100   100   000    Old_age  Always  -  0
> 194 Temperature_Celsius    0x0022   029   042   000    Old_age  Always  -  29 (Min/Max 15/42)
> 195 ECC_Uncorr_Error_Count 0x001c   100   100   000    Old_age  Offline -  0/0
> 196 Reallocated_Event_Ct   0x0033   100   100   003    Pre-fail Always  -  0
> 201 Unc_Soft_Read_Err_Rate 0x001c   100   100   000    Old_age  Offline -  0/0
> 204 Soft_ECC_Correct_Rate  0x001c   100   100   000    Old_age  Offline -  0/0
> 230 Life_Curve_Status      0x0013   100   100   000    Pre-fail Always  -  100
> 231 SSD_Life_Left          0x0013   100   100   010    Pre-fail Always  -  0
> 233 SandForce_Internal     0x0000   000   000   000    Old_age  Offline -  6599
> 234 SandForce_Internal     0x0032   000   000   000    Old_age  Always  -  6894
> 241 Lifetime_Writes_GiB    0x0032   000   000   000    Old_age  Always  -  6894
> 242 Lifetime_Reads_GiB     0x0032   000   000   000    Old_age  Always  -  6326
>
> SMART Error Log not supported
>
> SMART Self-test Log not supported
>
> SMART Selective self-test log data structure revision number 1
>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>     1        0        0  Not_testing
>     2        0        0  Not_testing
>     3        0        0  Not_testing
>     4        0        0  Not_testing
>     5        0        0  Not_testing
> Selective self-test flags (0x0):
>  After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.

Hmm, lets do some math:
17394 hours "on"-time equals 724.7 days (at continous "on").
6894 GiB written at 120 GiB drive sizes gives 57.4 Drive-Writes
(at optimal wearleveling every cell would have been written 57-58 times)

The used Sandforce controller (likly a SF-2281) is not the best at
wearleveling, so the  "use"-count per cell will be most likely more
than double that.

For my personal use I would replace that Drive asap.
- There is no warranty for it anymore (time since buy)
- You can't buy it new anymore (discontinued)
- There are more reliable drives available.

I'd go for a Samsung Evo 850, that will give you five years of warranty.

But, it's your drive, you make the decissions.

  - Yamaban.