[CentOS] Re: Disk near failure
Yamaban
foerster at lisas.de
Thu Oct 27 17:38:09 UTC 2016
On Thu, 27 Oct 2016 11:25, Alessandro Baggi wrote:
> Il 24/10/2016 14:05, Leonard den Ottolander ha scritto:
>> On Mon, 2016-10-24 at 12:07 +0200, Alessandro Baggi wrote:
>> > === START OF READ SMART DATA SECTION ===
>> > SMART Error Log not supported
>>
>> I reckon there's a <snip> between those lines. The line right after the
>> first should read something like:
>>
>> SMART overall-health self-assessment test result: PASSED
>>
>> or "FAILED" for that matter. If not try running
>>
>> smartctl -t short /dev/sda
>>
>> , wait for the indicated time to expire, then check the output of
>> smartctl -a (or -x) again.
>>
>> Regards,
>> Leonard.
>>
> Hi Leonard,
> after a smart short test, the output of smartctl -a /dev/... is
>
> === START OF INFORMATION SECTION ===
> Model Family: SandForce Driven SSDs
> Device Model: Corsair Force GT
> Serial Number: 12297948000015020A81
> LU WWN Device Id: 0 000000 000000000
> Firmware Version: 5.02
> User Capacity: 120,034,123,776 bytes [120 GB]
> Sector Size: 512 bytes logical/physical
> Rotation Rate: Solid State Device
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: ATA8-ACS, ACS-2 T13/2015-D revision 3
> SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is: Thu Oct 27 11:22:22 2016 CEST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> General SMART Values:
> Offline data collection status: (0x02) Offline data collection activity
> was completed without error.
> Auto Offline Data Collection:
> Disabled.
> Self-test execution status: ( 0) The previous self-test routine
> completed
> without error or no self-test has ever
> been run.
> Total time to complete Offline
> data collection: ( 0) seconds.
> Offline data collection
> capabilities: (0x7b) SMART execute Offline immediate.
> Auto Offline data collection on/off
> support.
> Suspend Offline collection upon new
> command.
> Offline surface scan supported.
> Self-test supported.
> Conveyance Self-test supported.
> Selective Self-test supported.
> SMART capabilities: (0x0003) Saves SMART data before entering
> power-saving mode.
> Supports SMART auto save timer.
> Error logging capability: (0x01) Error logging supported.
> General Purpose Logging supported.
> Short self-test routine
> recommended polling time: ( 1) minutes.
> Extended self-test routine
> recommended polling time: ( 48) minutes.
> Conveyance self-test routine
> recommended polling time: ( 2) minutes.
> SCT capabilities: (0x0021) SCT Status supported.
> SCT Data Table supported.
>
> SMART Attributes Data Structure revision number: 10
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED
> WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000f 120 120 050 Pre-fail Always - 0/0
> 5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0
> 9 Power_On_Hours_and_Msec 0x0032 000 000 000 Old_age Always - 17394h+07m+56.840s
> 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1974
> 171 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
> 172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
> 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 780
> 177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 3
> 181 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
> 182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
> 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
> 194 Temperature_Celsius 0x0022 029 042 000 Old_age Always - 29 (Min/Max 15/42)
> 195 ECC_Uncorr_Error_Count 0x001c 100 100 000 Old_age Offline - 0/0
> 196 Reallocated_Event_Ct 0x0033 100 100 003 Pre-fail Always - 0
> 201 Unc_Soft_Read_Err_Rate 0x001c 100 100 000 Old_age Offline - 0/0
> 204 Soft_ECC_Correct_Rate 0x001c 100 100 000 Old_age Offline - 0/0
> 230 Life_Curve_Status 0x0013 100 100 000 Pre-fail Always - 100
> 231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0
> 233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 6599
> 234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 6894
> 241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 6894
> 242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 6326
>
> SMART Error Log not supported
>
> SMART Self-test Log not supported
>
> SMART Selective self-test log data structure revision number 1
> SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
> 1 0 0 Not_testing
> 2 0 0 Not_testing
> 3 0 0 Not_testing
> 4 0 0 Not_testing
> 5 0 0 Not_testing
> Selective self-test flags (0x0):
> After scanning selected spans, do NOT read-scan remainder of disk.
> If Selective self-test is pending on power-up, resume after 0 minute delay.
Hmm, lets do some math:
17394 hours "on"-time equals 724.7 days (at continous "on").
6894 GiB written at 120 GiB drive sizes gives 57.4 Drive-Writes
(at optimal wearleveling every cell would have been written 57-58 times)
The used Sandforce controller (likly a SF-2281) is not the best at
wearleveling, so the "use"-count per cell will be most likely more
than double that.
For my personal use I would replace that Drive asap.
- There is no warranty for it anymore (time since buy)
- You can't buy it new anymore (discontinued)
- There are more reliable drives available.
I'd go for a Samsung Evo 850, that will give you five years of warranty.
But, it's your drive, you make the decissions.
- Yamaban.
More information about the CentOS
mailing list