[CentOS] Disk near failure
Alessandro Baggi
alessandro.baggi at gmail.com
Fri Oct 28 07:42:39 UTC 2016
Il 27/10/2016 19:38, Yamaban ha scritto:
> On Thu, 27 Oct 2016 11:25, Alessandro Baggi wrote:
>> Il 24/10/2016 14:05, Leonard den Ottolander ha scritto:
>>> On Mon, 2016-10-24 at 12:07 +0200, Alessandro Baggi wrote:
>>> > === START OF READ SMART DATA SECTION ===
>>> > SMART Error Log not supported
>>>
>>> I reckon there's a <snip> between those lines. The line right after the
>>> first should read something like:
>>>
>>> SMART overall-health self-assessment test result: PASSED
>>>
>>> or "FAILED" for that matter. If not try running
>>>
>>> smartctl -t short /dev/sda
>>>
>>> , wait for the indicated time to expire, then check the output of
>>> smartctl -a (or -x) again.
>>>
>>> Regards,
>>> Leonard.
>>>
>> Hi Leonard,
>> after a smart short test, the output of smartctl -a /dev/... is
>>
>> === START OF INFORMATION SECTION ===
>> Model Family: SandForce Driven SSDs
>> Device Model: Corsair Force GT
>> Serial Number: 12297948000015020A81
>> LU WWN Device Id: 0 000000 000000000
>> Firmware Version: 5.02
>> User Capacity: 120,034,123,776 bytes [120 GB]
>> Sector Size: 512 bytes logical/physical
>> Rotation Rate: Solid State Device
>> Device is: In smartctl database [for details use: -P show]
>> ATA Version is: ATA8-ACS, ACS-2 T13/2015-D revision 3
>> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
>> Local Time is: Thu Oct 27 11:22:22 2016 CEST
>> SMART support is: Available - device has SMART capability.
>> SMART support is: Enabled
>>
>> === START OF READ SMART DATA SECTION ===
>> SMART overall-health self-assessment test result: PASSED
>>
>> General SMART Values:
>> Offline data collection status: (0x02) Offline data collection activity
>> was completed without error.
>> Auto Offline Data Collection:
>> Disabled.
>> Self-test execution status: ( 0) The previous self-test routine
>> completed
>> without error or no self-test
>> has ever
>> been run.
>> Total time to complete Offline
>> data collection: ( 0) seconds.
>> Offline data collection
>> capabilities: (0x7b) SMART execute Offline immediate.
>> Auto Offline data collection
>> on/off support.
>> Suspend Offline collection
>> upon new
>> command.
>> Offline surface scan supported.
>> Self-test supported.
>> Conveyance Self-test supported.
>> Selective Self-test supported.
>> SMART capabilities: (0x0003) Saves SMART data before entering
>> power-saving mode.
>> Supports SMART auto save timer.
>> Error logging capability: (0x01) Error logging supported.
>> General Purpose Logging supported.
>> Short self-test routine
>> recommended polling time: ( 1) minutes.
>> Extended self-test routine
>> recommended polling time: ( 48) minutes.
>> Conveyance self-test routine
>> recommended polling time: ( 2) minutes.
>> SCT capabilities: (0x0021) SCT Status supported.
>> SCT Data Table supported.
>>
>> SMART Attributes Data Structure revision number: 10
>> Vendor Specific SMART Attributes with Thresholds:
>> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
>> WHEN_FAILED RAW_VALUE
>> 1 Raw_Read_Error_Rate 0x000f 120 120 050 Pre-fail
>> Always - 0/0
>> 5 Retired_Block_Count 0x0033 100 100 003 Pre-fail
>> Always - 0
>> 9 Power_On_Hours_and_Msec 0x0032 000 000 000 Old_age
>> Always - 17394h+07m+56.840s
>> 12 Power_Cycle_Count 0x0032 099 099 000 Old_age
>> Always - 1974
>> 171 Program_Fail_Count 0x0032 000 000 000 Old_age
>> Always - 0
>> 172 Erase_Fail_Count 0x0032 000 000 000 Old_age
>> Always - 0
>> 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age
>> Offline - 780
>> 177 Wear_Range_Delta 0x0000 000 000 000 Old_age
>> Offline - 3
>> 181 Program_Fail_Count 0x0032 000 000 000 Old_age
>> Always - 0
>> 182 Erase_Fail_Count 0x0032 000 000 000 Old_age
>> Always - 0
>> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age
>> Always - 0
>> 194 Temperature_Celsius 0x0022 029 042 000 Old_age
>> Always - 29 (Min/Max 15/42)
>> 195 ECC_Uncorr_Error_Count 0x001c 100 100 000 Old_age
>> Offline - 0/0
>> 196 Reallocated_Event_Ct 0x0033 100 100 003 Pre-fail
>> Always - 0
>> 201 Unc_Soft_Read_Err_Rate 0x001c 100 100 000 Old_age
>> Offline - 0/0
>> 204 Soft_ECC_Correct_Rate 0x001c 100 100 000 Old_age
>> Offline - 0/0
>> 230 Life_Curve_Status 0x0013 100 100 000 Pre-fail
>> Always - 100
>> 231 SSD_Life_Left 0x0013 100 100 010 Pre-fail
>> Always - 0
>> 233 SandForce_Internal 0x0000 000 000 000 Old_age
>> Offline - 6599
>> 234 SandForce_Internal 0x0032 000 000 000 Old_age
>> Always - 6894
>> 241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age
>> Always - 6894
>> 242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age
>> Always - 6326
>>
>> SMART Error Log not supported
>>
>> SMART Self-test Log not supported
>>
>> SMART Selective self-test log data structure revision number 1
>> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
>> 1 0 0 Not_testing
>> 2 0 0 Not_testing
>> 3 0 0 Not_testing
>> 4 0 0 Not_testing
>> 5 0 0 Not_testing
>> Selective self-test flags (0x0):
>> After scanning selected spans, do NOT read-scan remainder of disk.
>> If Selective self-test is pending on power-up, resume after 0 minute
>> delay.
>
> Hmm, lets do some math:
> 17394 hours "on"-time equals 724.7 days (at continous "on").
> 6894 GiB written at 120 GiB drive sizes gives 57.4 Drive-Writes
> (at optimal wearleveling every cell would have been written 57-58 times)
>
> The used Sandforce controller (likly a SF-2281) is not the best at
> wearleveling, so the "use"-count per cell will be most likely more
> than double that.
>
> For my personal use I would replace that Drive asap.
> - There is no warranty for it anymore (time since buy)
> - You can't buy it new anymore (discontinued)
> - There are more reliable drives available.
>
> I'd go for a Samsung Evo 850, that will give you five years of warranty.
>
> But, it's your drive, you make the decissions.
>
> - Yamaban.
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> https://lists.centos.org/mailman/listinfo/centos
>
Thank you for your suggestion.
What do you think about Corsair Neutron XTi 240 MLC?
More information about the CentOS
mailing list