[CentOS] Disk near failure

Fri Oct 28 07:42:39 UTC 2016
Alessandro Baggi <alessandro.baggi at gmail.com>

Il 27/10/2016 19:38, Yamaban ha scritto:
> On Thu, 27 Oct 2016 11:25, Alessandro Baggi wrote:
>> Il 24/10/2016 14:05, Leonard den Ottolander ha scritto:
>>>  On Mon, 2016-10-24 at 12:07 +0200, Alessandro Baggi wrote:
>>> >  === START OF READ SMART DATA SECTION ===
>>> >  SMART Error Log not supported
>>>
>>>  I reckon there's a <snip> between those lines. The line right after the
>>>  first should read something like:
>>>
>>>  SMART overall-health self-assessment test result: PASSED
>>>
>>>  or "FAILED" for that matter. If not try running
>>>
>>>  smartctl -t short /dev/sda
>>>
>>>  , wait for the indicated time to expire, then check the output of
>>>  smartctl -a (or -x) again.
>>>
>>>  Regards,
>>>  Leonard.
>>>
>> Hi Leonard,
>> after a smart short test, the output of smartctl -a /dev/... is
>>
>> === START OF INFORMATION SECTION ===
>> Model Family:     SandForce Driven SSDs
>> Device Model:     Corsair Force GT
>> Serial Number:    12297948000015020A81
>> LU WWN Device Id: 0 000000 000000000
>> Firmware Version: 5.02
>> User Capacity:    120,034,123,776 bytes [120 GB]
>> Sector Size:      512 bytes logical/physical
>> Rotation Rate:    Solid State Device
>> Device is:        In smartctl database [for details use: -P show]
>> ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
>> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is:    Thu Oct 27 11:22:22 2016 CEST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> General SMART Values:
>> Offline data collection status:  (0x02) Offline data collection activity
>>                                         was completed without error.
>>                                         Auto Offline Data Collection:
>> Disabled.
>> Self-test execution status:      (   0) The previous self-test routine
>> completed
>>                                        without error or no self-test
>> has ever
>>                                        been run.
>> Total time to complete Offline
>> data collection:                (    0) seconds.
>> Offline data collection
>> capabilities:                    (0x7b) SMART execute Offline immediate.
>>                                        Auto Offline data collection
>> on/off support.
>>                                         Suspend Offline collection
>> upon new
>>                                         command.
>>                                         Offline surface scan supported.
>>                                         Self-test supported.
>>                                         Conveyance Self-test supported.
>>                                         Selective Self-test supported.
>> SMART capabilities:            (0x0003) Saves SMART data before entering
>>                                         power-saving mode.
>>                                         Supports SMART auto save timer.
>> Error logging capability:        (0x01) Error logging supported.
>>                                        General Purpose Logging supported.
>> Short self-test routine
>> recommended polling time:        (   1) minutes.
>> Extended self-test routine
>> recommended polling time:        (  48) minutes.
>> Conveyance self-test routine
>> recommended polling time:        (   2) minutes.
>> SCT capabilities:              (0x0021) SCT Status supported.
>>                                         SCT Data Table supported.
>>
>> SMART Attributes Data Structure revision number: 10
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED
>> WHEN_FAILED RAW_VALUE
>>  1 Raw_Read_Error_Rate     0x000f   120   120   050    Pre-fail
>> Always  -  0/0
>>  5 Retired_Block_Count     0x0033   100   100   003    Pre-fail
>> Always  -  0
>>  9 Power_On_Hours_and_Msec 0x0032   000   000   000    Old_age
>> Always  -  17394h+07m+56.840s
>> 12 Power_Cycle_Count       0x0032   099   099   000    Old_age
>> Always  -  1974
>> 171 Program_Fail_Count     0x0032   000   000   000    Old_age
>> Always  -  0
>> 172 Erase_Fail_Count       0x0032   000   000   000    Old_age
>> Always  -  0
>> 174 Unexpect_Power_Loss_Ct 0x0030   000   000   000    Old_age
>> Offline -  780
>> 177 Wear_Range_Delta       0x0000   000   000   000    Old_age
>> Offline -  3
>> 181 Program_Fail_Count     0x0032   000   000   000    Old_age
>> Always  -  0
>> 182 Erase_Fail_Count       0x0032   000   000   000    Old_age
>> Always  -  0
>> 187 Reported_Uncorrect     0x0032   100   100   000    Old_age
>> Always  -  0
>> 194 Temperature_Celsius    0x0022   029   042   000    Old_age
>> Always  -  29 (Min/Max 15/42)
>> 195 ECC_Uncorr_Error_Count 0x001c   100   100   000    Old_age
>> Offline -  0/0
>> 196 Reallocated_Event_Ct   0x0033   100   100   003    Pre-fail
>> Always  -  0
>> 201 Unc_Soft_Read_Err_Rate 0x001c   100   100   000    Old_age
>> Offline -  0/0
>> 204 Soft_ECC_Correct_Rate  0x001c   100   100   000    Old_age
>> Offline -  0/0
>> 230 Life_Curve_Status      0x0013   100   100   000    Pre-fail
>> Always  -  100
>> 231 SSD_Life_Left          0x0013   100   100   010    Pre-fail
>> Always  -  0
>> 233 SandForce_Internal     0x0000   000   000   000    Old_age
>> Offline -  6599
>> 234 SandForce_Internal     0x0032   000   000   000    Old_age
>> Always  -  6894
>> 241 Lifetime_Writes_GiB    0x0032   000   000   000    Old_age
>> Always  -  6894
>> 242 Lifetime_Reads_GiB     0x0032   000   000   000    Old_age
>> Always  -  6326
>>
>> SMART Error Log not supported
>>
>> SMART Self-test Log not supported
>>
>> SMART Selective self-test log data structure revision number 1
>>  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
>>     1        0        0  Not_testing
>>     2        0        0  Not_testing
>>     3        0        0  Not_testing
>>     4        0        0  Not_testing
>>     5        0        0  Not_testing
>> Selective self-test flags (0x0):
>>  After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute
>> delay.
>
> Hmm, lets do some math:
> 17394 hours "on"-time equals 724.7 days (at continous "on").
> 6894 GiB written at 120 GiB drive sizes gives 57.4 Drive-Writes
> (at optimal wearleveling every cell would have been written 57-58 times)
>
> The used Sandforce controller (likly a SF-2281) is not the best at
> wearleveling, so the  "use"-count per cell will be most likely more
> than double that.
>
> For my personal use I would replace that Drive asap.
> - There is no warranty for it anymore (time since buy)
> - You can't buy it new anymore (discontinued)
> - There are more reliable drives available.
>
> I'd go for a Samsung Evo 850, that will give you five years of warranty.
>
> But, it's your drive, you make the decissions.
>
>  - Yamaban.
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>

Thank you for your suggestion.

What do you think about Corsair Neutron XTi 240 MLC?