[CentOS] SATA errors in log

Fri Jun 22 14:10:42 UTC 2012
Steve Brooks <steveb at mcs.st-and.ac.uk>

On Fri, 22 Jun 2012, m.roth at 5-cent.us wrote:

> Steve Brooks wrote:
>>
>> I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
>> kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>>
>>   Marvell Technology Group Ltd. 88SE9123
>>
>> I use it to provide extra SATA ports to a raid system.
>> The HD's are all "WD2003FYYS" and so run at 3Gbps on the 6Gbps controller.
>> However I am seeing lots of instances of errors like this
>>
>> -----------------------------------------
>>
>> Jun 22 03:13:23 viz1 kernel: ata13.00: exception Emask 0x10 SAct 0x4 SErr
>> 0x400000 action 0x6 frozen
>> Jun 22 03:13:23 viz1 kernel: ata13.00: irq_stat 0x08000000, interface
>> fatal error
>> Jun 22 03:13:23 viz1 kernel: ata13: SError: { Handshk }
>> Jun 22 03:13:23 viz1 kernel: ata13.00: failed command: WRITE FPDMA QUEUED
>> Jun 22 03:13:23 viz1 kernel: ata13.00: cmd
>> 61/e8:10:98:05:1b/01:00:66:00:00/40 tag 2 ncq 249856 out
>> Jun 22 03:13:23 viz1 kernel: ata13.00: status: { DRDY }
>> Jun 22 03:13:23 viz1 kernel: ata13: hard resetting link
> <snip>
> Crap. First question: what make & model are the drives on it? If they're
> Caviar Green, you're hosed. WD, and *maybe* Seagate as well, disabled a
> certain function you used to be able to set on the lower cost,
> consumer-grade models (in '09, I believe), and so when a server controller
> is trying to do i/o, and has a problem, in server-grade drives, it gives
> up after something like 6 sec, and does error handling, I *think* to other
> sectors. The consumer ones, on the other hand, keep trying for 1? 2?
> *minutes*; the disabled function allowed a used to tell it to give up in a
> shorter time. Meanwhile, a hardware controller will, as I said, have fits.
>
>        mark "you'd think I just spent months dealing with this...."
>

As mentioned in the original post the drives are all "WD2003FYYS". I am 
convinced it has nothing to do with TLER enabled on the WD drives as we 
run hundreds of them using linux mdadm raid on motherboard SATA 
controllers with no problems in the last eight or so years. This appears 
to be specific to the SATA PCIe 6Gbps 4 port controller card made by 
Startech. There are four other HD's (WD2003FYYS) in the machine running on 
an onboard "Intel Corporation Patsburg 6-Port SATA AHCI Controller" with 
no problems.

Steve