[CentOS] SATA errors in log

Fri Jun 22 14:32:56 UTC 2012
Steve Brooks <steveb at mcs.st-and.ac.uk>

On Fri, 22 Jun 2012, m.roth at 5-cent.us wrote:

> Steve Brooks wrote:
>> On Fri, 22 Jun 2012, m.roth at 5-cent.us wrote:
>>> Steve Brooks wrote:
>>>>
>>>> I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
>>>> kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>>>>
>>>>   Marvell Technology Group Ltd. 88SE9123
>>>>
>>>> I use it to provide extra SATA ports to a raid system.
>>>> The HD's are all "WD2003FYYS" and so run at 3Gbps on the 6Gbps
>>>> controller. However I am seeing lots of instances of errors like this
>>>>
>>>> Jun 22 03:13:23 viz1 kernel: ata13.00: exception Emask 0x10 SAct 0x4
>>>> SErr
>>>> 0x400000 action 0x6 frozen
>>>> Jun 22 03:13:23 viz1 kernel: ata13.00: irq_stat 0x08000000, interface
>>>> fatal error
>>>> Jun 22 03:13:23 viz1 kernel: ata13: SError: { Handshk }
>>>> Jun 22 03:13:23 viz1 kernel: ata13.00: failed command: WRITE FPDMA
>>>> QUEUED
>>>> Jun 22 03:13:23 viz1 kernel: ata13.00: cmd
>>>> 61/e8:10:98:05:1b/01:00:66:00:00/40 tag 2 ncq 249856 out
>>>> Jun 22 03:13:23 viz1 kernel: ata13.00: status: { DRDY }
>>>> Jun 22 03:13:23 viz1 kernel: ata13: hard resetting link
>>> <snip>
>>> Crap. First question: what make & model are the drives on it? If they're
>>> Caviar Green, you're hosed. WD, and *maybe* Seagate as well, disabled a
>>> certain function you used to be able to set on the lower cost,
>>> consumer-grade models (in '09, I believe), and so when a server
>>> controller is trying to do i/o, and has a problem, in server-grade drives,
>>>  it gives up after something like 6 sec, and does error handling, I *
>>> think* to other sectors. The consumer ones, on the other hand, keep trying
>>> for 1? 2? *minutes*; the disabled function allowed a used to tell it to
>>> give up in a shorter time. Meanwhile, a hardware controller will, as I
> said,
>>> have fits.
>>>
>>>        mark "you'd think I just spent months dealing with this...."
>>>
>>
>> As mentioned in the original post the drives are all "WD2003FYYS". I am
>
> Missed the original post; sorry.
>
>> convinced it has nothing to do with TLER enabled on the WD drives as we
>
> Thanks, that was the acronym I was trying to remember.
>
>> run hundreds of them using linux mdadm raid on motherboard SATA
>> controllers with no problems in the last eight or so years. This appears
>> to be specific to the SATA PCIe 6Gbps 4 port controller card made by
>> Startech. There are four other HD's (WD2003FYYS) in the machine running on
>> an onboard "Intel Corporation Patsburg 6-Port SATA AHCI Controller" with
>> no problems.
>
> I also see those are "enterprise" drives, not consumer grade, which
> implies that they ought to work. It still looks to me as though it's
> timing out, which I'd think is a function of the RAID card. You might see
> if it has any firmware configuration options.


Thanks for the reply, the card is purely JBOD no RAID or other 
configuration available. It simply posts the SATA devices attached to the 
OS. I am wondering if it could be a strange symptom of running SATA3 
drives on this particular SATA6 controller but that is just a stab in the 
dark.