[CentOS] Disk near failure

Fri Oct 21 11:29:15 UTC 2016

Hello Alessandro,

On Fri, 2016-10-21 at 11:03 +0200, Alessandro Baggi wrote:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE UPDATED 
> WHEN_FAILED RAW_VALUE
>    1 Raw_Read_Error_Rate     0x000f   100   100   050    Pre-fail Always 
>        -       0/4754882

smartctl -A only show a total error count for my disks, but I suppose
this means 0 errors on 4754882 reads...

Note that the "Pre-fail" does not indicate that your disk is about to
fail, it is an indication of the type of is issue that causes this
particular class of errors.

>    5 Retired_Block_Count     0x0033   100   100   003    Pre-fail Always 
>        -       0

No retired blocks, that seems alright...

> My ssds are failing?

The easiest way to test for disk errors is by issuing

smartctl -l xerror /dev/sda

If the output contains "No Errors Logged" your disks are fine.

Quite easy to put this in a (daily) cron job that greps the output of
smartctl for that string and if it does not find a match sends a mail
warning you about those disk errors.

#!/bin/bash

SMARTCTL=/usr/sbin/smartctl
GREP=/bin/grep

DEVICES='sda sdb'
HOST='hostname'
TO='a at example.com'
CC='b at example.com'

for d in $DEVICES ; do
    if [ "$($SMARTCTL -l xerror /dev/$d | $GREP No\ Errors\ Logged)" == '' ]; then
        # ERRORS FOUND
        $SMARTCTL -x /dev/$d | mail -c $CC -s "$HOST /dev/$d SMART errors" $TO
    fi
done

Regards,
Leonard.

-- 
mount -t life -o ro /dev/dna /genetic/research