On Sun, Jan 22, 2012 at 2:56 PM, Joseph L. Casale <jcasale@activenetwerx.com
wrote:
I have a CentOS 5.7 machine hosting a 16 TB XFS partition used to house backups. The backups are run via rsync/rsnapshot and are large in terms of the number of files: over 10 million each.
Now the machine is not particularly powerful: it is 64-bit machine, dual core CPU, 3 GB RAM. So perhaps this is a factor in why I am having the following problem: once in awhile that XFS partition starts generating multiple I/O errors, files that had content become 0 byte, directories disappear, etc. Every time a reboot fixes that, however. So far I've
looked
at logs but could not find a cause of precipitating event.
Hence the question: has anyone experienced anything along those lines?
What
could be the cause of this?
In every situation like this that I have seen, it was hardware that never had adequate memory provisioned.
Another consideration is you almost certainly wont be able to run a repair on that fs with so little ram.
Finally, it would be interesting to know how you architected the storage hardware. Hardware raid, BBC, drive cache status, barrier status etc...
Joseph,
If I remember correctly I pretty much went with the defaults when I created this XFS on top of a 16-drive RAID6 configuration.
Now as far as memory - I think for the purpose of XFS repair RAM and swap ought to be the same. And I've got plenty of swap on this system. I also host an 5 TB XFS in a file there and I ran XFS repair on it and it ran within no more than 5 minutes. Now this is 20% of the larger XFS, roughly speaking.
I should try to collect the info you mentioned, though - that was a good thought, some clue might be contained in there for sure.
Thanks for your input.
Boris.