[CentOS] Re: Hard drive errors

Fri Sep 22 13:33:58 UTC 2006
Bowie Bailey <Bowie_Bailey at BUC.com>

Peter Farrow wrote:
> This can be caused by overheating.  If the drives are mounted close
> together as some chassis configurations permit, then it make sense
> they may both exhibit the same problem, but not necessarily
> simultaneously.   
> 
> Are this log entries adjacent to each other?

The drives have an open slot between them and a 120mm fan right in
front of them, so I don't think overheating is an issue.  At one
point, I shut the machine down and let it sit for an hour or so.  The
errors started back up immediately during the boot process when I
turned it back on.

The log entries are adjacent...usually with the exact same timestamp.
There are no other errors mixed in with them.

I was able to get the system to boot yesterday.  It triggered a raid
rebuild which finished last night.  The last batch of errors happened
at 4am and the system appears to be running normally now.

Here's a snip from the logs showing the errors and the rebuild:
(I've cut out a few columns to shorten the lines)

----------------------------------
22:36:21 kernel: ata3: command 0x25 timeout, stat 0x50 host_stat 0x24
22:36:21 kernel: ata3: status=0x50 { DriveReady SeekComplete }
22:36:21 kernel: Current sda: sense key No Sense
22:36:21 kernel: ata4: command 0x35 timeout, stat 0x50 host_stat 0x24
22:36:21 kernel: ata4: status=0x50 { DriveReady SeekComplete }
22:36:21 kernel: Current sdb: sense key No Sense
22:37:09 kernel: ata3: command 0x25 timeout, stat 0x50 host_stat 0x24
22:37:09 kernel: ata3: status=0x50 { DriveReady SeekComplete }
22:37:09 kernel: Current sda: sense key No Sense
22:38:03 kernel: md: md1: sync done.
22:38:03 kernel: RAID1 conf printout:
22:38:03 kernel:  --- wd:2 rd:2
22:38:03 kernel:  disk 0, wo:0, o:1, dev:sda2
22:38:03 kernel:  disk 1, wo:0, o:1, dev:sdb2
23:25:39 kernel: ata3: command 0xca timeout, stat 0x50 host_stat 0x24
23:25:39 kernel: ata3: status=0x50 { DriveReady SeekComplete }
23:25:39 kernel: Info fld=0x662270, Current sda: sense key No Sense
23:25:39 kernel: ata4: command 0xca timeout, stat 0x50 host_stat 0x24
23:25:39 kernel: ata4: status=0x50 { DriveReady SeekComplete }
23:25:39 kernel: Info fld=0x662270, Current sdb: sense key No Sense
----------------------------------

These errors are showing ata3 and ata4 because I switched the drives
to different SATA connections on the MB.  These are the same drives
that I showed previously with errors on ata1 and ata2.

-- 
Bowie