I have a CentOS 5.7 machine hosting a 16 TB XFS partition used to house backups. The backups are run via rsync/rsnapshot and are large in terms of the number of files: over 10 million each.
Now the machine is not particularly powerful: it is 64-bit machine, dual core CPU, 3 GB RAM. So perhaps this is a factor in why I am having the following problem: once in awhile that XFS partition starts generating multiple I/O errors, files that had content become 0 byte, directories disappear, etc. Every time a reboot fixes that, however. So far I've looked at logs but could not find a cause of precipitating event.
Hence the question: has anyone experienced anything along those lines? What could be the cause of this?
In every situation like this that I have seen, it was hardware that never had adequate memory provisioned.
Another consideration is you almost certainly wont be able to run a repair on that fs with so little ram.
Finally, it would be interesting to know how you architected the storage hardware. Hardware raid, BBC, drive cache status, barrier status etc...