I need to copy over 100TB of data from one server to another via network. What is the best option to do this? I am planning to use rsync but is there a better tool or better way of doing this?
For example, I plan on doing rsync -azv /largefs /targetfs
/targetfs is a NFS mounted filesystem.
Any thoughts?
TIA
Mag Gam wrote:
I need to copy over 100TB of data from one server to another via network. What is the best option to do this? I am planning to use rsync but is there a better tool or better way of doing this?
For example, I plan on doing rsync -azv /largefs /targetfs
/targetfs is a NFS mounted filesystem.
Any thoughts?
rsync would probably work better if you ran it in client server mode rather than over NFS, especially if you have to restart it.
Mag Gam wrote:
I need to copy over 100TB of data from one server to another via network. What is the best option to do this? I am planning to use rsync but is there a better tool or better way of doing this?
For example, I plan on doing rsync -azv /largefs /targetfs
/targetfs is a NFS mounted filesystem.
The only problem you are likely to have is that rsync reads the entire directory contents into RAM before starting, then walks the list fixing the differences. If you have a huge number of files and a small amount of RAM, it may slow down due to swapping. 'cp -a ' can be faster if the target doesn't already have any matching files. Also, the -v to display the names can take longer than the file transfer on small files. Running rsync over ssh instead of nfs has a tradeoff in that the remote does part of the work but you lose some speed to ssh encryption. If the filesystem is live, you might make an initial run copying the larger directories with rsync or cp, then do whatever you can to stop the files from changing and make another pass with 'rsync --av --delete' which should go fairly quickly and and fix any remaining differences.
Am 21.06.2008 um 15:33 schrieb Mag Gam:
I need to copy over 100TB of data from one server to another via network. What is the best option to do this? I am planning to use rsync but is there a better tool or better way of doing this?
For example, I plan on doing rsync -azv /largefs /targetfs
/targetfs is a NFS mounted filesystem.
What network link is there between these hosts?
Are these 1 or 2 million small files or bigger ones?
Does the data change a lot?
Is it a SAN or JBOD?
cheers, Rainer
Network is a 10/100 1 million large files No SAN, JBOD
On Sat, Jun 21, 2008 at 1:19 PM, Rainer Duffner rainer@ultra-secure.de wrote:
Am 21.06.2008 um 15:33 schrieb Mag Gam:
I need to copy over 100TB of data from one server to another via network.
What is the best option to do this? I am planning to use rsync but is there a better tool or better way of doing this?
For example, I plan on doing rsync -azv /largefs /targetfs
/targetfs is a NFS mounted filesystem.
What network link is there between these hosts?
Are these 1 or 2 million small files or bigger ones?
Does the data change a lot?
Is it a SAN or JBOD?
cheers, Rainer -- Rainer Duffner CISSP, LPI, MCSE rainer@ultra-secure.de
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Am 21.06.2008 um 21:51 schrieb Mag Gam:
Network is a 10/100
You're kidding?
1 million large files No SAN, JBOD
Move the data by moving the storage itself. It will take months to transfer 100 TB via FastEthernet.
cheers, Rainer
Mag Gam wrote:
Network is a 10/100 1 million large files No SAN, JBOD
assuming 100baseT wire speed of about 10Mbyte/sec, moving 100TB will take a minimum of 100TB/10MB/s = 10,000,000 seconds or 2900 hours, or about 4 months. even on a gigE network, this would still take about 2 weeks or more.
Can add fiber network card to each server? fiber switch? if not try to plugin to each server giga ethernet card.... put a crossover cable and start rsync... i did that with 1tb of photos and takes a lot of time....keep power supply working and cross the fingers....
I hope this can help
2008/6/21 John R Pierce pierce@hogranch.com:
Mag Gam wrote:
Network is a 10/100 1 million large files No SAN, JBOD
assuming 100baseT wire speed of about 10Mbyte/sec, moving 100TB will take a minimum of 100TB/10MB/s = 10,000,000 seconds or 2900 hours, or about 4 months. even on a gigE network, this would still take about 2 weeks or more.
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Sat, Jun 21, 2008 at 5:12 PM, nightduke nightduke2005@gmail.com wrote:
Can add fiber network card to each server? fiber switch? if not try to plugin to each server giga ethernet card.... put a crossover cable and start rsync... i did that with 1tb of photos and takes a lot of time....keep power supply working and cross the fingers....
I hope this can help
2008/6/21 John R Pierce pierce@hogranch.com:
Mag Gam wrote:
Network is a 10/100 1 million large files No SAN, JBOD
assuming 100baseT wire speed of about 10Mbyte/sec, moving 100TB will take
a
minimum of 100TB/10MB/s = 10,000,000 seconds or 2900 hours, or about 4 months. even on a gigE network, this would still take about 2 weeks or more.
Then if you get the network sorted out, the fastest & most reliable way I know to copy lots of files is
star --copy
You can get star with
yum install star
--Matt
Am 21.06.2008 um 23:44 schrieb Matt Morgan:
O Then if you get the network sorted out, the fastest & most reliable way I know to copy lots of files is
star --copy
You can get star with
yum install star
Now that I know the details - I don' think this is going to work. Not with 100 TB of data. It kind-of-works with 1 TB. Can anybody comment on the feasibility of rsync on 1 million files? Maybe DRBD would be a solution. If you can retrofit DRDB to an existing setup...
If not it's faster to move the drives physically - believe me, this will create far less problems. In a SAN, you would have the possibility of synching the data outside of the filesystem, during normal operations.
100 TB is a lot of data. How do you back that up, BTW? What is your estimated time to restore it from the medium you back it up to?
cheers, Rainer
On Sun, Jun 22, 2008 at 3:36 AM, Rainer Duffner rainer@ultra-secure.de wrote:
Now that I know the details - I don' think this is going to work. Not with 100 TB of data. It kind-of-works with 1 TB. Can anybody comment on the feasibility of rsync on 1 million files?
rsync always broke on my filesystems with 200-300k files due to out of memory errors (my box had 2GB RAM).
- Raja
stops sync?
2008/6/22 Raja Subramanian rajasuperman@gmail.com:
On Sun, Jun 22, 2008 at 3:36 AM, Rainer Duffner rainer@ultra-secure.de wrote:
Now that I know the details - I don' think this is going to work. Not with 100 TB of data. It kind-of-works with 1 TB. Can anybody comment on the feasibility of rsync on 1 million files?
rsync always broke on my filesystems with 200-300k files due to out of memory errors (my box had 2GB RAM).
- Raja
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Sun, 22 Jun 2008, Raja Subramanian wrote:
On Sun, Jun 22, 2008 at 3:36 AM, Rainer Duffner rainer@ultra-secure.de wrote:
Now that I know the details - I don' think this is going to work. Not with 100 TB of data. It kind-of-works with 1 TB. Can anybody comment on the feasibility of rsync on 1 million files?
rsync always broke on my filesystems with 200-300k files due to out of memory errors (my box had 2GB RAM).
I have done 700k and 800k files transfers (including hardlinks), but indeed it could take a while to compute the transferlist. Newer rsync versions bring down the amount of memory needed drastically. That is one of the reasons I offer a recent rsync in RPMforge. There is almost never a good reason to use a dated rsync.
On Sun, Jun 22, 2008 at 4:32 PM, Dag Wieers dag@centos.org wrote:
I have done 700k and 800k files transfers (including hardlinks), but indeed it could take a while to compute the transferlist. Newer rsync versions bring down the amount of memory needed drastically. That is one of the reasons I offer a recent rsync in RPMforge. There is almost never a good reason to use a dated rsync.
I just thought I'd de-lurk and chime in that there are some patches for ssh to allow better performance:
http://www.psc.edu/networking/projects/hpn-ssh/
If you do end up using rsync for something like this via ssh, you might want to look at some of the Pittsburgh Supercomputing Center's patches. The high-performance patches can allow you to see dramatic increases in throughput.
If you do end up using rsync for something like this via ssh, you might want to look at some of the Pittsburgh Supercomputing Center's patches. The high-performance patches can allow you to see dramatic increases in throughput.
Or, if it's over a secure network, drop ssh entirely and use the rsync protocol.
--Erek
I have done 700k and 800k files transfers (including hardlinks), but indeed it could take a while to compute the transferlist. Newer rsync versions bring down the amount of memory needed drastically. That is one of the reasons I offer a recent rsync in RPMforge. There is almost never a good reason to use a dated rsync.
i have used rsync on ~16million files with a filesystem size of about 1.5TB - worked fine but took a while
Rainer Duffner wrote: ...
Can anybody comment on the feasibility of rsync on 1 million files?
I rsync 2.6M files daily. No problem.
It takes 15 minutes, if there's only a few changes.
For fast transfer of files between two machines I usually use ttcp:
From machine:
tar cf - .|ttcp -l5120 -t to_machine
To machine
cd /whatever ttcp -l5120 -r | tar xf -
I get ~100Mbytes/sec on a gigabit connection.
Note this is unsecure, no way of restarting, etc.
Mogens
On Sat, 2008-06-21 at 09:33 -0400, Mag Gam wrote:
I need to copy over 100TB of data from one server to another via network. What is the best option to do this? I am planning to use rsync but is there a better tool or better way of doing this?
At gigabit speeds, you're looking at over a week of transfer time: 1 gigabit = 125MB/sec = 800,000 seconds = 9.25 days, not counting protocol overhead. You could speed this up with link bonding, which from previous threads sounds like something you're working on already.
If it's a oneoff transfer and you can afford downtime while you're fiddling with hardware, you may consider directly attaching both sets of storage to the same machine and doing a local copy.
Mag Gam wrote:
I need to copy over 100TB of data from one server to another via network. What is the best option to do this? I am planning to use rsync but is there a better tool or better way of doing this?
For example, I plan on doing rsync -azv /largefs /targetfs
/targetfs is a NFS mounted filesystem.
Any thoughts
You are going to pay a large performance penalty for the simplicity of using a local form rsync. Between the substantial overheads of rsync itself and NFS you are not going to come anywhere near your maximum possible speed and you will probably need a lot of memory if you have a lot of files (rsync uses a lot of memory to track all the files). When I'm serious about moving large amounts of data at the highest speed I use tar tunneled through ssh. The rough invokation to pull from a remote machine looks like this:
ssh -2 -c arcfour -T -x sourcemachine.com 'tar --directory=/data -Scpf - .' | tar --directory=/local-data-dir -Spxf -"
That should pull the contents of the sourcemachine's /data directory to an already existing local /local-data-dir. On reasonably fast machines (better than 3 Ghz CPUs) it tends to approach the limit of either your hard drives' speed or your network capacity.
If you don't like the ssh tunnel, you can strip it down to just the two tars (one to throw and one to catch) and copy it over NFS. It will still be faster than what you are proposing. Or you can use cpio.
Rsync is best at synchonizing two already nearly identical trees. Not so good as a bulk copier.