[CentOS] recommendations for copying large filesystems

Tue Jun 24 17:17:54 UTC 2008
Jerry Franz <jfranz at freerun.com>

Mag Gam wrote:
> I need to copy over 100TB of data from one server to another via 
> network. What is the best option to do this? I am planning to use 
> rsync but is there a better tool or better way of doing this?
>
> For example, I plan on doing
> rsync -azv /largefs /targetfs
>
> /targetfs is a NFS mounted filesystem.
>
> Any thoughts
You are going to pay a large performance penalty for the simplicity of 
using a local form rsync. Between the substantial overheads of rsync 
itself and NFS you are not going to come anywhere near your maximum 
possible speed and you will probably need a lot of memory if you have a 
lot of files (rsync uses a lot of memory to track all the files). When 
I'm serious about moving large amounts of data at the highest speed I 
use tar tunneled through ssh. The rough invokation to pull from a remote 
machine looks like this:

ssh -2 -c arcfour -T -x sourcemachine.com 'tar --directory=/data -Scpf - 
.' | tar --directory=/local-data-dir -Spxf -"

That should pull the contents of the sourcemachine's /data directory to 
an already existing local /local-data-dir. On reasonably fast machines  
(better than 3 Ghz CPUs) it tends to approach the limit of either your 
hard drives' speed or your network capacity.

If you don't like the ssh tunnel, you can strip it down to just the two 
tars (one to throw and one to catch) and copy it over NFS. It will still 
be faster than what you are proposing. Or you can use cpio.

Rsync is best at synchonizing two already nearly identical trees. Not so 
good as a bulk copier.

-- 
Benjamin Franz

-- 
Benjamin Franz







> TIA
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>