I'm trying to rsync a 8TB data folder containing squillions of small files and it's taking forever (i.e. weeks) to get anywhere. I'm assuming the slow bit is check-summing everything with a single CPU (even though it's on a 12-core server ;-( ) Is it possible to do something simple like scp the whole dir in one go so they're duplicates in the first instance, then get rsync to just keep them in sync without an initial transfer?
Or is there a better way?
Thanx,
Russell Smithies Infrastructure Technician T 03 489 9085 M 027 4734 600 E russell.smithies@agresearch.co.nz Invermay Agricultural Centre Puddle Alley, Private Bag 50034, Mosgiel 9053, New Zealand T +64 3 489 3809 F +64 3 489 3739 www.agresearch.co.nzhttp://www.agresearch.co.nz/
======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. =======================================================================
On 07/30/12 10:05 PM, Smithies, Russell wrote:
I'm trying to rsync a 8TB data folder containing squillions of small files and it's taking forever (i.e. weeks) to get anywhere. I'm assuming the slow bit is check-summing everything with a single CPU (even though it's on a 12-core server ;-( ) Is it possible to do something simple like scp the whole dir in one go so they're duplicates in the first instance, then get rsync to just keep them in sync without an initial transfer?
use the rsync mode that goes off file timestamp and size. the checksuming block algorithm is only useful on large files that get small random block changes.
As far as I can see timestamp and size is the default. I've turned off compression and I think I'm getting better throughput. Running 4 rsync tasks and getting sustained transfers for several hours of just over 800Mb/sec :- )
--Russell
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of John R Pierce Sent: Tuesday, 31 July 2012 5:16 p.m. To: centos@centos.org Subject: Re: [CentOS] rsync question
On 07/30/12 10:05 PM, Smithies, Russell wrote:
I'm trying to rsync a 8TB data folder containing squillions of small files and it's taking forever (i.e. weeks) to get anywhere. I'm assuming the slow bit is check-summing everything with a single CPU (even though it's on a 12-core server ;-( ) Is it possible to do something simple like scp the whole dir in one go so they're duplicates in the first instance, then get rsync to just keep them in sync without an initial transfer?
use the rsync mode that goes off file timestamp and size. the checksuming block algorithm is only useful on large files that get small random block changes.
On 07/31/2012 07:05 AM, Smithies, Russell wrote:
Is it possible to do something simple like scp the whole dir in one go so they're duplicates in the first instance, then get rsync to just keep them in sync without an initial transfer?
Or is there a better way?
I use tar and ttcp for an initial transfer:
On the receiving end:
ttcp -l5120 -r | tar xf -
On the transmitter:
tar cf - . | ttcp -l5120 -t name-of-receiver
Note: The files are transmitted without encryption.
I easily get 110 Mbytes/sec. on a gigabit connection.
If you need encryption, and your transfer is CPU limited, you should investigate which cipher to use. In my case arcfour128 is the fastest, so I use:
rsync --rsh='/usr/bin/ssh -c arcfour128' ...
after the initial transfer with ttcp.
Mogens