[CentOS] SATA errors in log

Fri Jun 22 13:57:19 UTC 2012

Steve Brooks wrote:
>
> I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
> kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>
>   Marvell Technology Group Ltd. 88SE9123
>
> I use it to provide extra SATA ports to a raid system.
> The HD's are all "WD2003FYYS" and so run at 3Gbps on the 6Gbps controller.
> However I am seeing lots of instances of errors like this
>
> -----------------------------------------
>
> Jun 22 03:13:23 viz1 kernel: ata13.00: exception Emask 0x10 SAct 0x4 SErr
> 0x400000 action 0x6 frozen
> Jun 22 03:13:23 viz1 kernel: ata13.00: irq_stat 0x08000000, interface
> fatal error
> Jun 22 03:13:23 viz1 kernel: ata13: SError: { Handshk }
> Jun 22 03:13:23 viz1 kernel: ata13.00: failed command: WRITE FPDMA QUEUED
> Jun 22 03:13:23 viz1 kernel: ata13.00: cmd
> 61/e8:10:98:05:1b/01:00:66:00:00/40 tag 2 ncq 249856 out
> Jun 22 03:13:23 viz1 kernel: ata13.00: status: { DRDY }
> Jun 22 03:13:23 viz1 kernel: ata13: hard resetting link
<snip>
Crap. First question: what make & model are the drives on it? If they're
Caviar Green, you're hosed. WD, and *maybe* Seagate as well, disabled a
certain function you used to be able to set on the lower cost,
consumer-grade models (in '09, I believe), and so when a server controller
is trying to do i/o, and has a problem, in server-grade drives, it gives
up after something like 6 sec, and does error handling, I *think* to other
sectors. The consumer ones, on the other hand, keep trying for 1? 2?
*minutes*; the disabled function allowed a used to tell it to give up in a
shorter time. Meanwhile, a hardware controller will, as I said, have fits.

        mark "you'd think I just spent months dealing with this...."