On Sun, Jan 22, 2012 at 9:06 AM, Boris Epstein borepstein@gmail.com wrote:
Hello all,
I have a CentOS 5.7 machine hosting a 16 TB XFS partition used to house backups. The backups are run via rsync/rsnapshot and are large in terms of the number of files: over 10 million each.
Now the machine is not particularly powerful: it is 64-bit machine, dual core CPU, 3 GB RAM. So perhaps this is a factor in why I am having the following problem: once in awhile that XFS partition starts generating multiple I/O errors, files that had content become 0 byte, directories disappear, etc. Every time a reboot fixes that, however. So far I've looked at logs but could not find a cause of precipitating event.
Hence the question: has anyone experienced anything along those lines? What could be the cause of this?
Thanks.
Boris.
Correction to the above: the XFS partition is 26TB, not 16 TB (not that it should matter in the context of this particular situation).
Also, here's somethine else I have discovered. Apparently there is an potential intermittent RAID disk trouble. At least I found the following in the system log:
Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x0026): Drive ECC error reported:port=4, unit=0. Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x002D): Source drive error occurred:port=4, unit=0. Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: ERROR (0x04:0x0004): Rebuild failed:unit=0. Jan 22 09:17:53 nrims-bs kernel: 3w-9xxx: scsi6: AEN: INFO (0x04:0x003B): Rebuild paused:unit=0.
...
Jan 22 09:55:23 nrims-bs kernel: 3w-9xxx: scsi6: AEN: WARNING (0x04:0x000F): SMART threshold exceeded:port=9. Jan 22 09:55:23 nrims-bs kernel: 3w-9xxx: scsi6: AEN: WARNING (0x04:0x000F): SMART threshold exceeded:port=9. Jan 22 09:56:17 nrims-bs kernel: 3w-9xxx: scsi6: AEN: INFO (0x04:0x000B): Rebuild started:unit=0.
Even if a disk is misbehaving in a RAID6 that should not be causing I/O errors. Plus, why is it never straight after a rebbot and is always fixed by a reboot?
Be that as it may, I am still puzzled.
Boris.