[CentOS] network copy performance is poor (rsync) - debugging suggestions?

Thu Jan 29 13:05:13 UTC 2015
Joseph L. Brunner <joe at affirmedsystems.com>

We routinely have to sync 4TB, which is about 2M files...

Rsync never does well for us - it just cant push the line at all

So, this may or may not work for you - but this is a huge problem - so we tried  whole excel spreadsheet worth of combinations, every protocol imaginable to make this happen

In the end, after a year of constant work on this -

We found if we map a network share from Server Source to Server Destination, and use CIFS protocol to "map a drive" then sync say

/srv/www -> to /mnt/shadow-www

It worked at 99% of line rate ONLY if we used the cp command to sync the source and destination

Cd /srv/www

root at pas01#cp -R -u * /mnt/shadow-www/

something to consider if you find yourself not getting "line rate"

our investigation showed the rsync process even with all switches we found has to "open" the file a bit before it copies it... so rsync sucks for this kind of stuff with 2 MILLION small files - it never gets going moving millions of small files it has to keep reading. There a switch that says don't do that - but never really helped :)


-----Original Message-----
From: centos-bounces at centos.org [mailto:centos-bounces at centos.org] On Behalf Of Gordon Messmer
Sent: Wednesday, January 28, 2015 06:40 PM
To: CentOS mailing list
Subject: Re: [CentOS] network copy performance is poor (rsync) - debugging suggestions?

On 01/23/2015 01:44 AM, Götz Reinicke - IT Koordinator wrote:

> I do have two centos 6.6 servers. With a "performance optimized" rsync

> I get an speed of 15 - 20 MB/s

That *is* pretty slow for sustained writes.  Does the same rate hold true for individual large files as it does for lots of small ones?  What filesystem are you using on each side?

> rsync -aHAXxv --numeric-ids --progress -e "ssh -T -c arcfour -o

> Compression=no -x"

It's worth noting that -X and -A are going to perform filesystem IO that you don't see on SMB, because it isn't going to preserve/set ACLs and extended attributes (IIRC).  So, one possibility is that you're seeing a difference in rate because you're doing lots of small files and filesystem operations are relatively slow.

You might drop those two options and see how that affects the rate.  If you determine that those are the cause of the performance difference, you can turn them back on, understanding that there's a cost associated with preserving that data.

> Both servers have plenty of memory and cpu usage looks low.

Define low.  If you're using top and press '1' to expand the CPU lines, you'll probably see one cpu with higher "us" percentage, which is SSH encrypting the data.  What percentage is that?  Is there a large value in "sy" or "hi" on any CPU?  Probably not since you see good rates using 'dd' and smb copies, but I've seen systems where interrupt processing was a major bottleneck, so I make it a standard check.


CentOS mailing list

CentOS at centos.org<mailto:CentOS at centos.org>