On 07/25/2014 03:06 PM, Les Mikesell wrote:
On Fri, Jul 25, 2014 at 3:08 PM, Benjamin Smith lists@benjamindsmith.com wrote:
On 07/25/2014 12:12 PM, Michael Hennebry wrote:
Is there soome reason that the existing files cannot be accessed while they are being copied to the raid?
Sheer volume. With something in the range of 100,000,000 small files, it takes a good day or two to rsync. This means that getting a consistent image without significant downtime is impossible. I can handle a few minutes, maybe an hour. Much more than that and I have to explore other options. (In this case, it looks like we'll be biting the bullet and switching to ZFS)
Rsync is really pretty good at that, especially the 3.x versions. If you've just done a live rsync (or a few so there won't be much time for changes during the last live run), the final one with the system idle shouldn't take much more time than a 'find' traversing the same tree. If you have space and time to test, I'd time the third pass or so before deciding it won't work (unless even find would take too long).
Thanks for your feedback - it's advice I would have given myself just a few years ago. We have *literally* in the range of one hundred million small PDF documents. The simple command
find /path/to/data > /dev/null
takes between 1 and 2 days, system load depending. We had to give up on rsync for backups in this context a while ago - we just couldn't get a "daily" backup more often then about 2x per week. Now we're using ZFS + send/receive to get daily backup times down into the "sub 60 minutes" range, and I'm just going to bite the bullet and synchronize everything at the application level over the next week.
Was just looking for a shortcut...