MHR wrote:
The one problem I've seen and posted here was w.r.t. smartd error reports showing 2^32 - 1 errors on one of the disks (probably my system disk) every few minutes. I thought this was more than just a bit suspicious, since there are only 4,687,500,000 sectors on a 300GB disk, and the likelihood of having errors on 4,294,967,295 (~92%) of them is rather slim unless the whole system is crashing a lot (it's not). It's a Seagate 300GB, so I ran Seagate's SeaTools on it in lightweight mode, and no problems were reported, which is good because the disk is only about a year and a half old and has my CentOS root, swap, boot and home partitions on it.
Precisely what error counters are alarming you? If these are the raw numbers for Raw_Read_Error_Rate, Hardware_ECC_Recovered, and Seek_Error_Rate, it is normal for Seagate drives. Look at the normalized values for these attributes. As long as they are not approaching their failure thresholds, the drive is OK. For further reassurance you can run the SMART long offline tests ("smartctl -t long /dev/whatever" -- see smartctl manpage for details) on the drive.
You need to understand something about modern drives. In the past, drives achieved the first level of redundancy by recording each bit in a large enough area to include many magnetic domains. If some percentage of the domains failed to hold the data (a highly likely situation), that was OK because the read head would get enough signal from the rest of the domains so that the bit would be detected correctly. Fast forward to today. That multi-domain redundancy is all but gone, having been replaced by more advanced error correcting codes implemented in hardware. Seagate has elected to have the raw number for Raw_Read_Error_Rate report each instance of sectors needing this level of correction and let the normalized values reflect whether these corrections are occurring at a rate higher than expected.
A similar situation exists for Seek_Error_Rate. When a drive performs a seek, there is a trade-off between speed and accuracy. You can make it more likely that the heads go directly to the right track by moving them more slowly and allowing more settling time. Performance can be improved significantly by moving the heads more abruptly and accepting that some percentage of the time a subsequent small adjustment will be needed to get to the right track. Again, it is the normalized value for Seek_Error_Rate that reports whether these adjustments are becoming necessary more often than expected.