Anyone got any actual comparisons between unison and rsync specifically related to the performance of synchronization of large data sets over slow links?
I have a huge tree to start replication of Friday and know that if I sync the root paths it will take ages and with the lack of any overall state of progress this won't be optimal as its likely to fail for whatever reason before it can finish. Initially I just thought I would break it down to several smaller jobs but that becomes a burden to maintain...
We use bacula internally but sending the diffs would be cumbersome as the individual files would be rather large...
Thanks! jlc
On Wed, Jan 13, 2010 at 6:54 PM, Joseph L. Casale jcasale@activenetwerx.com wrote:
Anyone got any actual comparisons between unison and rsync specifically related to the performance of synchronization of large data sets over slow links?
I have a huge tree to start replication of Friday and know that if I sync the root paths it will take ages and with the lack of any overall state of progress this won't be optimal as its likely to fail for whatever reason before it can finish. Initially I just thought I would break it down to several smaller jobs but that becomes a burden to maintain...
We use bacula internally but sending the diffs would be cumbersome as the individual files would be rather large...
Thanks! jlc
Use rsync. It's used far more than unison, so it has been tested better. Unison has always been slow for me.
One thing you might want to look at is performing the initial copy, or some chunks of it, using tar over a netcat link, then rsync after that. Since rsync uses SSH, it can be 33% slower than a pure data transfer connection. Using netcat won't get you encryption though, so make sure you're on a local/trusted link.
On 1/13/2010 5:54 PM, Joseph L. Casale wrote:
Anyone got any actual comparisons between unison and rsync specifically related to the performance of synchronization of large data sets over slow links?
I have a huge tree to start replication of Friday and know that if I sync the root paths it will take ages and with the lack of any overall state of progress this won't be optimal as its likely to fail for whatever reason before it can finish. Initially I just thought I would break it down to several smaller jobs but that becomes a burden to maintain...
We use bacula internally but sending the diffs would be cumbersome as the individual files would be rather large...
I didn't think unison was maintained any more - and I wouldn't expect anything to beat rsync with the -z option on a slow link. I'd just use the -P option and restart it when/if it fails. It wouldn't hurt to do subsets first since they will be quickly skipped when you repeat from the root. If you have a huge number of files it might be worth finding a way to update rsync to a 3.x version which will not need to xfer the entire directory listing before starting.
I didn't think unison was maintained any more - and I wouldn't expect anything to beat rsync with the -z option on a slow link. I'd just use the -P option and restart it when/if it fails. It wouldn't hurt to do subsets first since they will be quickly skipped when you repeat from the root. If you have a huge number of files it might be worth finding a way to update rsync to a 3.x version which will not need to xfer the entire directory listing before starting.
Looks like rf has 3.0.7, thanks for that tip. Frankly, I abhor the thought of even using rsync for this, it's over a vpn so there is absolutely no need for encryption but I don't know another tool that can transfer diffs only?
Thanks guys, jlc
Joseph L. Casale wrote:
Looks like rf has 3.0.7, thanks for that tip. Frankly, I abhor the thought of even using rsync for this, it's over a vpn so there is absolutely no need for encryption but I don't know another tool that can transfer diffs only?
Check out HPN-SSH, I use it extensively to transfer files over ssh, it provides a null cipher which you can use to disable encryption of data, while still maintaining encryption of authentication credentials.
http://www.psc.edu/networking/projects/hpn-ssh/
I transfer over a terrabyte of data a day using rsync+hpnssh.
nate
Check out HPN-SSH, I use it extensively to transfer files over ssh, it provides a null cipher which you can use to disable encryption of data, while still maintaining encryption of authentication credentials.
http://www.psc.edu/networking/projects/hpn-ssh/
I transfer over a terrabyte of data a day using rsync+hpnssh.
Nate, That looks impressive, I would love to use that for other needs as well. How exactly do you go about installing this under CentOS? I can pretty well assume that patching the stock rpm would not work:)
Thanks! jlc
Joseph L. Casale wrote:
That looks impressive, I would love to use that for other needs as well. How exactly do you go about installing this under CentOS? I can pretty well assume that patching the stock rpm would not work:)
For me I just built it from source and patched it, then built custom rpms(I use alien to build my RPMs). I install it to /usr/local/hpn-ssh and have it listen on a special high port so it doesn't interfere with anything centos-related. I only use it for file transfers.
nate
nate wrote:
Joseph L. Casale wrote:
That looks impressive, I would love to use that for other needs as well. How exactly do you go about installing this under CentOS? I can pretty well assume that patching the stock rpm would not work:)
For me I just built it from source and patched it, then built custom rpms(I use alien to build my RPMs). I install it to /usr/local/hpn-ssh and have it listen on a special high port so it doesn't interfere with anything centos-related. I only use it for file transfers.
Am I missing something or does it only matter where you have a very high bandwidth connection with some latency?
Am I missing something or does it only matter where you have a very high bandwidth connection with some latency?
I would imagine, but I have a server that takes rsync/ssh connections from multiple windows boxes everyday for differential updates to copies of databases and the load on that machine is really high.
I never thought to use a daemon (looked quickly at `man rsyncd.conf`) and plan to look at this tomorrow. I was intrigued at the patch for no encryption.
Nate, care to share those packages:)
nate wrote:
Sure I can post them somewhere tomorrow probably, nothing fancy..
Put them here: http://rpms.linuxpowered.net/hpn-ssh/
All the usual disclaimers apply, I have these running on a few dozen systems at different data centers running file transfers 24/7 for the past year now that I think about it.
nate
Put them here: http://rpms.linuxpowered.net/hpn-ssh/
All the usual disclaimers apply, I have these running on a few dozen systems at different data centers running file transfers 24/7 for the past year now that I think about it.
Nate, That's great! I am not convinced yet that an rsync daemon will suit my needs from the authentication standpoint. That being said, from the windows side I have to figure out how to get hpn-ssh in windows to do simple rsync/ssh w/o encryption. The only CopSSH implementation I have found is based on OpenSSH_4.1p1-hpn OpenSSL 0.9.8 05 Jul 2005...
For the sake of being ready Saturday, what is the most secure (not using ssh) to authenticate in a script with an rsync daemon? Looks like it only does the user:pass pairs (not really good for script) or host based wrapper style security?
Thanks everyone! jlc
Joseph L. Casale wrote:
For the sake of being ready Saturday, what is the most secure (not using ssh) to authenticate in a script with an rsync daemon? Looks like it only does the user:pass pairs (not really good for script) or host based wrapper style security?
If the IPs are static then probably a firewall approach?
You may be able to patch openssh with HPN on cygwin, not sure though.
nate
Joseph L. Casale wrote:
Am I missing something or does it only matter where you have a very high bandwidth connection with some latency?
I would imagine, but I have a server that takes rsync/ssh connections from multiple windows boxes everyday for differential updates to copies of databases and the load on that machine is really high.
Processes in iowait are counted in the load average. Your real problem may be that rsync copies the unchanged portions of the (probably huge) original file while merging in the changes, then renames to the original name when complete. Do top/sar show the CPU pegged?
Les Mikesell wrote:
Am I missing something or does it only matter where you have a very high bandwidth connection with some latency?
That is why I use it, high bandwidth and some latency. I mentioned it because it also has the none cipher which disables encryption, might be more flexible then using rsync in daemon mode.
nate
When you are ove a tunnel such as vpn connection, you can use it in daemon mode on one side and rsync to or from that side on the other side. Sounds like server/client application.
2010-01-14
xufengnju
发件人: Joseph L. Casale 发送时间: 2010-01-14 09:12:16 收件人: 'CentOS mailing list' 抄送: 主题: Re: [CentOS] unison versus rsync
I didn't think unison was maintained any more - and I wouldn't expect anything to beat rsync with the -z option on a slow link. I'd just use the -P option and restart it when/if it fails. It wouldn't hurt to do subsets first since they will be quickly skipped when you repeat from the root. If you have a huge number of files it might be worth finding a way to update rsync to a 3.x version which will not need to xfer the entire directory listing before starting.
Looks like rf has 3.0.7, thanks for that tip. Frankly, I abhor the thought of even using rsync for this, it's over a vpn so there is absolutely no need for encryption but I don't know another tool that can transfer diffs only? Thanks guys, jlc _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Joseph L. Casale wrote:
I didn't think unison was maintained any more - and I wouldn't expect anything to beat rsync with the -z option on a slow link. I'd just use the -P option and restart it when/if it fails. It wouldn't hurt to do subsets first since they will be quickly skipped when you repeat from the root. If you have a huge number of files it might be worth finding a way to update rsync to a 3.x version which will not need to xfer the entire directory listing before starting.
Looks like rf has 3.0.7, thanks for that tip. Frankly, I abhor the thought of even using rsync for this, it's over a vpn so there is absolutely no need for encryption but I don't know another tool that can transfer diffs only?
If your bandwidth is limited you shouldn't have any trouble encrypting fast enough to fill it anyway - but setting your ssh encryption to blowfish might help some. You can run rsync in daemon mode if you really want to avoid ssh, thoug.
On Thu, Jan 14, 2010, Joseph L. Casale wrote:
I didn't think unison was maintained any more - and I wouldn't expect anything to beat rsync with the -z option on a slow link. I'd just use the -P option and restart it when/if it fails. It wouldn't hurt to do subsets first since they will be quickly skipped when you repeat from the root. If you have a huge number of files it might be worth finding a way to update rsync to a 3.x version which will not need to xfer the entire directory listing before starting.
Looks like rf has 3.0.7, thanks for that tip. Frankly, I abhor the thought of even using rsync for this, it's over a vpn so there is absolutely no need for encryption but I don't know another tool that can transfer diffs only?
If you use rsync modules, the transfer can be done without encryption, and you restrict access to directories and specific IPs and CIDR blocks.
We use this extensively to allow remote clients to update things like DNS files which go to client-specific directories, and are restricted to the IP address(es) of the client's system(s).
Another feature of rsync modules that can be useful is that each module can specify a user and group thus one can rsync user directories between systems where the user names are the same but uid and gid may differ.
Rsync does not use ssh when doing module transfers so if the data is sensitive, I do the transfers through OpenVPN tunnels. This also eliminates the problems of ssh authentication between trusted systems.
Given the ability of rsync modules to restrict access by IP address, I have never bothered with additional authentication for this type of transfer.
Bill
Another feature of rsync modules that can be useful is that each module can specify a user and group thus one can rsync user directories between systems where the user names are the same but uid and gid may differ.
I have been looking at this all morning. Is there any way to auth with keys or something unique so I can script this securely? Iiuc, the only auth is done through these rsync user/pass pairs unless you do it with hosts etc.
Thanks everyone! jlc
Joseph L. Casale wrote:
Another feature of rsync modules that can be useful is that each module can specify a user and group thus one can rsync user directories between systems where the user names are the same but uid and gid may differ.
I have been looking at this all morning. Is there any way to auth with keys or something unique so I can script this securely? Iiuc, the only auth is done through these rsync user/pass pairs unless you do it with hosts etc.
I was also looking at unison/rsync to solve a problem, came across this, has potential for me.
http://samba.anu.edu.au/rsync/firewall.html
I may have to connect to a Windows box - I'm not excited about that. I've made it work on Windows before - just dislike the inherent extra layer of setup glop one has to go through to do it.
On Thu, Jan 14, 2010, Joseph L. Casale wrote:
Another feature of rsync modules that can be useful is that each module can specify a user and group thus one can rsync user directories between systems where the user names are the same but uid and gid may differ.
I have been looking at this all morning. Is there any way to auth with keys or something unique so I can script this securely? Iiuc, the only auth is done through these rsync user/pass pairs unless you do it with hosts etc.
Using rsync in daemon mode with modules requires no authentication if you are comfortable with restricting access to each module by IP address or CIDR block. The rsync man page also says:
Some modules on the remote daemon may require authentication. If so, you will receive a password prompt when you connect. You can avoid the password prompt by setting the environment variable RSYNC_PASSWORD to the password you want to use or using the --password-file option. This may be useful when scripting rsync.
Bill
Joseph L. Casale schrieb:
Another feature of rsync modules that can be useful is that each module can specify a user and group thus one can rsync user directories between systems where the user names are the same but uid and gid may differ.
I have been looking at this all morning. Is there any way to auth with keys or something unique so I can script this securely? Iiuc, the only auth is done through these rsync user/pass pairs unless you do it with hosts etc.
rsync already defaults to ssh as transport but with $ rsync -e 'ssh -i keyfile' you can use rsync with a ssh key.
Rainer
rsync already defaults to ssh as transport but with $ rsync -e 'ssh -i keyfile' you can use rsync with a ssh key.
Yes, that was what I had been doing, but I wanted to avoid the use of ssh completely, the connection is secured over a vpn, silly to incur the overhead of encryption, twice.
An rsync daemon was used and the passwords automated with a password file and this has been working well for a few days...
On 1/14/2010 12:27 PM, Bill Campbell wrote:
Another feature of rsync modules that can be useful is that each module can specify a user and group thus one can rsync user directories between systems where the user names are the same but uid and gid may differ.
If you are running as root, rsync should normally map the user/group ids locally by names unless you turn the feature off with the --numeric-ids option (more or less like tar does).