[CentOS] Failing Hard Disk?

On Tue, Oct 6, 2009 at 11:25 AM, Stewart Williams
lists-at-pinkyboots.co.uk wrote:

> I am fairly certain that this disk is failing in my server, and I am
> replacing it straight away anyway.

Good idea.  Looks like it's dying.

> Oct  5 08:34:47 server1 kernel:          res
> 41/40:00:40:1f:71/a0:00:14:00:00/00 Emask 0x409 (media error) <F>
> Oct  5 08:34:47 server1 kernel: ata1.00: status: { DRDY ERR }
> Oct  5 08:34:47 server1 kernel: ata1.00: error: { UNC }
> Oct  5 08:34:47 server1 kernel: ata1.00: cmd

I've yet to see a media error that wasn't from a dying drive.

> Oct  5 08:35:13 server1 kernel: SCSI device sda: drive cache: write through

Nice.  I didn't realize that Linux would disable the unsafe-but-faster
write back cache for a slower-but-safer write through cache when
errors were detected.  Or perhaps your drive is just a bit fancier
than mine.  :-)

> I am also getting these errors every 30 minutes:
>
> Oct  5 06:22:06 server1 smartd[3118]: Device: /dev/sda, 12 Offline
> uncorrectable sectors

Not good.

> Below is the smart selftest log:

> SMART overall-health self-assessment test result: PASSED

I always get a kick out of that, especially given..

> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining
> LifeTime(hours)  LBA_of_first_error
> # 1  Extended offline    Completed: read failure       90%      5689
>     526673
> # 2  Extended offline    Completed: read failure       90%      5685
>     526673

Two self-tests show read failures yet the status is PASSED?  Ridiculous...

  -- Steve