On Thu, Jan 29, 2015 at 7:05 AM, Joseph L. Brunner joe@affirmedsystems.com wrote:
our investigation showed the rsync process even with all switches we found has to "open" the file a bit before it copies it... so rsync sucks for this kind of stuff with 2 MILLION small files - it never gets going moving millions of small files it has to keep reading. There a switch that says don't do that - but never really helped :)
Rsync is going to read the directory tree first, then walk it on both sides comparing timestamps (for incrementals) and block checksums. Pre 3.0 versions would read the entire directory tree before even starting anything else. So, there is quite a bit of overhead with the point being to avoid using network bandwidth when the source and destination are already mostly identical. Splitting the work into some sensible directory tree structure might help a lot. Or if you know it is mostly different, just tar it up and stream it.