[CentOS] hardware issues? driver issues?

Wed Mar 7 16:43:38 UTC 2012
Peter Kjellström <cap at nsc.liu.se>

On Wednesday 07 March 2012 11.17.15 m.roth at 5-cent.us wrote:
> Got a bunch of servers from Penguin. Supermicro m/b's H8QG6. We put a 3tb
> drive in for additional workspace for the users, and some of them won't
> read, others will go for weeks, then spit out DRDY errors. lshw shows the
> controller as an ATI SB7x0/SB8x0/SB9x0 SATA.
> Now, I've been working on one with Penguin. I noticed one thing, that it
> was set to native IDE. After googling, I saw that the most recent spec,
> which included EIDE, should be good to petabytes... but I tried resetting
> it to AHCI anyway.
> The user ran one job, ok... then another last night, and it's spitting the
> same errors.
> Mar  7 00:53:28 <server> kernel: ata2.00: failed command: WRITE FPDMA QUEUED
> 40/00:04:20:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
> Mar  7 00:53:28 <server> kernel: ata2: hard resetting link

While writing the drive timed out and the link to it was then subjected to a 
hard reset. This is not normal and usually points to bad drive or buggy 

Have you had a look at smartdata for the drive(s)? (you may want to run the 
smart selftests)

Also, I'd suggest you test it in a controlled environment. For example, can 
any of your drives survive a full surface write? (dd if=/dev/zero bs=1M of=..) 
Full surface read? Do the tests against /dev/sdX to be sure (excludes 
partitioning, filesystems, volume management, etc.)

Do note that writing your drive full of zeros _will_ destroy your data (I 
really hope that's stating the obvious...).

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.centos.org/pipermail/centos/attachments/20120307/b1f27fe1/attachment-0004.sig>