[CentOS] VT6420 sata_via problem

Mon May 28 21:16:44 UTC 2007
Oren Held <oren at held.org.il>

Hello,

A message was already posted here regarding this issue-
I experience the same problem with 6 machines, all with Hitachi 
harddrive, VIA Motherboard which uses the sata_via driver:
Once every few days each machine freezes or just stops reading from the 
HD and throws i/o error messages as pasted below.

Note that the first "ata exception Emask" errors apear right after 
booting, when harddisks plays smoothly. the I/O errors apear only after 
few hours-days.

Is it a known problem? (Already two posts regarding this)? Any solution 
or at least a way to check it?

(I'm using CentOS5 with the latest updates)

===============
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
ata1: EH complete
SCSI device sda: 160834367 512-byte hdwr sectors (82347 MB)
sda: Write Protect is off
sda: Mode Sense: 00 3a 00 00
SCSI device sda: drive cache: write back
nfsd: last server has exited
nfsd: unexporting all filesystems
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: (BMDMA stat 0x24)
ata1.00: tag 0 cmd 0xca Emask 0x4 stat 0x40 err 0x0 (timeout)
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1: soft resetting port
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1: soft resetting port
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: failed to recover some devices, retrying in 5 secs
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1: soft resetting port
ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1.00: disabled
ata1: EH complete
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 4840645
Buffer I/O error on device sda2, logical block 578975
lost page write due to I/O error on sda2
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 5065613
Buffer I/O error on device sda2, logical block 607096
lost page write due to I/O error on sda2
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 17800565
Buffer I/O error on device sda2, logical block 2198965
lost page write due to I/O error on sda2
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 4835797
Buffer I/O error on device sda2, logical block 578369
lost page write due to I/O error on sda2
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 4840653
Buffer I/O error on device sda2, logical block 578976
lost page write due to I/O error on sda2
Buffer I/O error on device sda2, logical block 578977
lost page write due to I/O error on sda2
Buffer I/O error on device sda2, logical block 578978
lost page write due to I/O error on sda2
Buffer I/O error on device sda2, logical block 578979
lost page write due to I/O error on sda2
Buffer I/O error on device sda2, logical block 578980
lost page write due to I/O error on sda2
Buffer I/O error on device sda2, logical block 578981
lost page write due to I/O error on sda2
sd 0:0:0:0: SCSI error: return code = 0x00040000
end_request: I/O error, dev sda, sector 4841541
=========================
etc, etc..

Thanks

 - Oren