On Tue, Oct 6, 2009 at 11:25 AM, Stewart Williams lists-at-pinkyboots.co.uk wrote:
I am fairly certain that this disk is failing in my server, and I am replacing it straight away anyway.
Good idea. Looks like it's dying.
Oct 5 08:34:47 server1 kernel: res 41/40:00:40:1f:71/a0:00:14:00:00/00 Emask 0x409 (media error) <F> Oct 5 08:34:47 server1 kernel: ata1.00: status: { DRDY ERR } Oct 5 08:34:47 server1 kernel: ata1.00: error: { UNC } Oct 5 08:34:47 server1 kernel: ata1.00: cmd
I've yet to see a media error that wasn't from a dying drive.
Oct 5 08:35:13 server1 kernel: SCSI device sda: drive cache: write through
Nice. I didn't realize that Linux would disable the unsafe-but-faster write back cache for a slower-but-safer write through cache when errors were detected. Or perhaps your drive is just a bit fancier than mine. :-)
I am also getting these errors every 30 minutes:
Oct 5 06:22:06 server1 smartd[3118]: Device: /dev/sda, 12 Offline uncorrectable sectors
Not good.
Below is the smart selftest log:
SMART overall-health self-assessment test result: PASSED
I always get a kick out of that, especially given..
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 5689 526673 # 2 Extended offline Completed: read failure 90% 5685 526673
Two self-tests show read failures yet the status is PASSED? Ridiculous...
-- Steve