Guys,
I've setup an rsync between two directories that I've mounted locally on a jump box. Long story short, the two directories are both NFS shares from two different hosts. Our security dept won't allow us to SSH between the two data centers, directly. But the jump host can contact both. So what I've done is mount the NFS shares from one host in each data center on the jump box using sshfs.
The directory I'm trying to rsync from has 111GB of data in it. I don't think I've ever setup an rsync for quite so much data before.
But I started the rsync at approx. 7pm last night. And as of now the rsync is still building it's file list.
[root@sshproxygw ~]# rsync -avzp /mnt/db_space/timd/www1/ /mnt/db_space/timd/www2/svn2/ building file list ...
So my question to you is, is this a normal amount of time to wait for the file list to be built? Considering the amount of data involved. I have it running in a screen session I can attach to to find out what's going on.
Thanks Tim
2014-10-19 18:55 GMT+03:00 Tim Dunphy bluethundr@gmail.com:
Guys,
I've setup an rsync between two directories that I've mounted locally on a jump box. Long story short, the two directories are both NFS shares from two different hosts. Our security dept won't allow us to SSH between the two data centers, directly. But the jump host can contact both. So what I've done is mount the NFS shares from one host in each data center on the jump box using sshfs.
The directory I'm trying to rsync from has 111GB of data in it. I don't think I've ever setup an rsync for quite so much data before.
But I started the rsync at approx. 7pm last night. And as of now the rsync is still building it's file list.
[root@sshproxygw ~]# rsync -avzp /mnt/db_space/timd/www1/ /mnt/db_space/timd/www2/svn2/ building file list ...
So my question to you is, is this a normal amount of time to wait for the file list to be built? Considering the amount of data involved. I have it running in a screen session I can attach to to find out what's going on.
Regerating rsync file list can take a very long time :/
Make sure that you are using fast link on both ends and at least version 3 of rsync.
-- Eero
2014-10-19 20:03 GMT+03:00 Eero Volotinen eero.volotinen@iki.fi:
2014-10-19 18:55 GMT+03:00 Tim Dunphy bluethundr@gmail.com:
Guys,
I've setup an rsync between two directories that I've mounted locally on a jump box. Long story short, the two directories are both NFS shares from two different hosts. Our security dept won't allow us to SSH between the two data centers, directly. But the jump host can contact both. So what I've done is mount the NFS shares from one host in each data center on the jump box using sshfs.
The directory I'm trying to rsync from has 111GB of data in it. I don't think I've ever setup an rsync for quite so much data before.
But I started the rsync at approx. 7pm last night. And as of now the rsync is still building it's file list.
[root@sshproxygw ~]# rsync -avzp /mnt/db_space/timd/www1/ /mnt/db_space/timd/www2/svn2/ building file list ...
So my question to you is, is this a normal amount of time to wait for the file list to be built? Considering the amount of data involved. I have it running in a screen session I can attach to to find out what's going on.
Regerating rsync file list can take a very long time :/
Make sure that you are using fast link on both ends and at least version 3 of rsync.
... and remember to use tcp for nfs transfer ;)
-- Eero
... and remember to use tcp for nfs transfer ;)
Hmm you mean specify tcp for rsync? I thought that's default. But holy crap, you were right about it taking a long time to build a file list! The rsync just started a few minutes ago... !
dumps/dotmedia.031237.svndmp dumps/dotmedia.031238.svndmp dumps/dotmedia.031239.svndmp dumps/dotmedia.031240.svndmp dumps/dotmedia.031241.svndmp dumps/dotmedia.031242.svndmp dumps/dotmedia.031243.svndmp dumps/dotmedia.031244.svndmp dumps/dotmedia.031245.svndmp
On Sun, Oct 19, 2014 at 1:31 PM, Eero Volotinen eero.volotinen@iki.fi wrote:
2014-10-19 20:03 GMT+03:00 Eero Volotinen eero.volotinen@iki.fi:
2014-10-19 18:55 GMT+03:00 Tim Dunphy bluethundr@gmail.com:
Guys,
I've setup an rsync between two directories that I've mounted locally
on
a jump box. Long story short, the two directories are both NFS shares from two different hosts. Our security dept won't allow us to SSH between the two data centers, directly. But the jump host can contact both. So what I've done is mount the NFS shares from one host in each data center on
the
jump box using sshfs.
The directory I'm trying to rsync from has 111GB of data in it. I don't think I've ever setup an rsync for quite so much data before.
But I started the rsync at approx. 7pm last night. And as of now the rsync is still building it's file list.
[root@sshproxygw ~]# rsync -avzp /mnt/db_space/timd/www1/ /mnt/db_space/timd/www2/svn2/ building file list ...
So my question to you is, is this a normal amount of time to wait for
the
file list to be built? Considering the amount of data involved. I have
it
running in a screen session I can attach to to find out what's going on.
Regerating rsync file list can take a very long time :/
Make sure that you are using fast link on both ends and at least version
3
of rsync.
... and remember to use tcp for nfs transfer ;)
-- Eero _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
2014-10-19 20:49 GMT+03:00 Tim Dunphy bluethundr@gmail.com:
... and remember to use tcp for nfs transfer ;)
Hmm you mean specify tcp for rsync? I thought that's default. But holy crap, you were right about it taking a long time to build a file list! The rsync just started a few minutes ago... !
No, tcp for nfs mounts as udp is a bit complicated :)
-- Eero
On 2014-10-19, Tim Dunphy bluethundr@gmail.com wrote:
... and remember to use tcp for nfs transfer ;)
Hmm you mean specify tcp for rsync? I thought that's default.
No, he means use TCP for NFS (which is also the default).
I suspect that sshfs's relatively poor performance is having an impact on your transfer. I have a 30TB filesystem which I rsync over an OpenVPN link, and building the file list doesn't take that long (maybe an hour?). (The links themselves are reasonably fast; if yours are not that would have a negative impact too.)
If you have the space on the jump host, it may end up being faster to rsync over ssh (not using NFS or sshfs) from node 1 to the jump host, then from the jump host to node 2.
--keith
On Sun, 19 Oct 2014, Keith Keller wrote:
I suspect that sshfs's relatively poor performance is having an impact on your transfer. I have a 30TB filesystem which I rsync over an OpenVPN link, and building the file list doesn't take that long (maybe an hour?). (The links themselves are reasonably fast; if yours are not that would have a negative impact too.)
Don't forget that the time taken to build the file list is a function of the number of files present, and not their size. If you have many millions of small files, it will indeed take a very long time. Over sshfs with a slowish link, it could be days.
Steve
On 2014-10-19, Steve Thompson smt@vgersoft.com wrote:
Don't forget that the time taken to build the file list is a function of the number of files present, and not their size. If you have many millions of small files, it will indeed take a very long time. Over sshfs with a slowish link, it could be days.
Well, sure. My assumption is that the OP's ~120GB of storage was likely not more files than my 30TB. (I have a lot of large files, but a lot of small files too.)
--keith
On Mon, Oct 20, 2014 at 7:57 AM, Steve Thompson smt@vgersoft.com wrote:
On Sun, 19 Oct 2014, Keith Keller wrote:
I suspect that sshfs's relatively poor performance is having an impact
on your transfer. I have a 30TB filesystem which I rsync over an OpenVPN link, and building the file list doesn't take that long (maybe an hour?). (The links themselves are reasonably fast; if yours are not that would have a negative impact too.)
Don't forget that the time taken to build the file list is a function of the number of files present, and not their size. If you have many millions of small files, it will indeed take a very long time. Over sshfs with a slowish link, it could be days.
....and it may end up failing silently or noisily anyway.
Cheers,
Cliff
Don't forget that the time taken to build the file list is a function of the number of files present, and not their size. If you have many
millions
of small files, it will indeed take a very long time. Over sshfs with a slowish link, it could be days.
....and it may end up failing silently or noisily anyway.
Ahhh, but isn't that part of the beauty of adventure that being a linux admin is all about? *twitch*
On Sun, Oct 19, 2014 at 8:27 PM, Cliff Pratt enkiduonthenet@gmail.com wrote:
On Mon, Oct 20, 2014 at 7:57 AM, Steve Thompson smt@vgersoft.com wrote:
On Sun, 19 Oct 2014, Keith Keller wrote:
I suspect that sshfs's relatively poor performance is having an impact
on your transfer. I have a 30TB filesystem which I rsync over an OpenVPN link, and building the file list doesn't take that long (maybe an hour?). (The links themselves are reasonably fast; if yours are not that would have a negative impact too.)
Don't forget that the time taken to build the file list is a function of the number of files present, and not their size. If you have many
millions
of small files, it will indeed take a very long time. Over sshfs with a slowish link, it could be days.
....and it may end up failing silently or noisily anyway.
Cheers,
Cliff _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Sun, Oct 19, 2014 at 9:05 PM, Tim Dunphy bluethundr@gmail.com wrote:
Don't forget that the time taken to build the file list is a function of the number of files present, and not their size. If you have many
millions
of small files, it will indeed take a very long time. Over sshfs with a slowish link, it could be days.
....and it may end up failing silently or noisily anyway.
Ahhh, but isn't that part of the beauty of adventure that being a linux admin is all about? *twitch*
There's not that much magic involved. The time it takes rsync to read a directory tree to get started should approximate something like 'find /path -links +0" (i.e. something that has to read the directory tree and the associated inodes). Pre-3.0 rsync versions read and transfer the whole thing before starting the comparison and might trigger swapping if you are low on RAM.
So, you probably want the 'source' side of the transfer to be local for faster startup. But... in what universe is NFS mounting across data centers considered more secure than ssh? Or even a reasonable thing to do? How about a VPN between the two hosts?
On 2014-10-20, Les Mikesell lesmikesell@gmail.com wrote:
So, you probably want the 'source' side of the transfer to be local for faster startup. But... in what universe is NFS mounting across data centers considered more secure than ssh? Or even a reasonable thing to do? How about a VPN between the two hosts?
The OP said he was mounting the NFS over sshfs.
--keith
On Mon, Oct 20, 2014 at 1:12 PM, Keith Keller kkeller@wombat.san-francisco.ca.us wrote:
On 2014-10-20, Les Mikesell lesmikesell@gmail.com wrote:
So, you probably want the 'source' side of the transfer to be local for faster startup. But... in what universe is NFS mounting across data centers considered more secure than ssh? Or even a reasonable thing to do? How about a VPN between the two hosts?
The OP said he was mounting the NFS over sshfs.
OK, I don't see how that is possible because something would either be mounted as nfs or sshfs, not one over the other. But if the intermediate host is allowed to ssh (as it must for sshfs to work) I'd throw a bunch of disk space at the problem and rsync a copy to the intermediate host, then rsync from there to the target at the other data center. Or work out a way to do port-fowarding over ssh connections from the intermediate host.
On 2014-10-20, Les Mikesell lesmikesell@gmail.com wrote:
On Mon, Oct 20, 2014 at 1:12 PM, Keith Keller
The OP said he was mounting the NFS over sshfs.
OK, I don't see how that is possible because something would either be mounted as nfs or sshfs, not one over the other.
I'm just repeating what he wrote; perhaps the OP can elaborate.
But if the intermediate host is allowed to ssh (as it must for sshfs to work) I'd throw a bunch of disk space at the problem and rsync a copy to the intermediate host, then rsync from there to the target at the other data center.
That was one of my suggestions earlier in the thread.
Or work out a way to do port-fowarding over ssh connections from the intermediate host.
Or (as you and perhaps I suggested) some sort of OpenVPN link.
--keith
On Mon, Oct 20, 2014 at 3:05 PM, Tim Dunphy bluethundr@gmail.com wrote:
Don't forget that the time taken to build the file list is a function
of
the number of files present, and not their size. If you have many
millions
of small files, it will indeed take a very long time. Over sshfs with a slowish link, it could be days.
....and it may end up failing silently or noisily anyway.
Ahhh, but isn't that part of the beauty of adventure that being a linux admin is all about? *twitch*
Adventure? Nah, that's why my rsync scripts rsync chunks of the filesystem rather than all of it in one go, and why it gets to run twice each time. Once bitten, twice shy.
Cheers,
Cliff
On 2014/10/19 08:01, Keith Keller wrote:
On 2014-10-19, Tim Dunphy bluethundr@gmail.com wrote:
... and remember to use tcp for nfs transfer ;)
Hmm you mean specify tcp for rsync? I thought that's default.
No, he means use TCP for NFS (which is also the default).
I suspect that sshfs's relatively poor performance is having an impact on your transfer. I have a 30TB filesystem which I rsync over an OpenVPN link, and building the file list doesn't take that long (maybe an hour?). (The links themselves are reasonably fast; if yours are not that would have a negative impact too.)
If you have the space on the jump host, it may end up being faster to rsync over ssh (not using NFS or sshfs) from node 1 to the jump host, then from the jump host to node 2.
--keith
Another option that might help is to break the transfer up into smaller pieces. We have a 3TB filesystem that has a lot of small data files in some of the subdirectories and it used to take a long time (close to an hour) and impacted fs performance to build the file list. But, since the volume mount point has only directories beneath it, we were able to tweak our rsync script to iterate over the subdirectories as individual rsyncs. Not only did that isolate the specific directories with the large number of files to their own rsync instances but an added bonus of this is that if for some reason there is an error in a given rsync attempt, the script is written to pick up at the same area and try again (a couple times) and does not then need to restart the entire filesystem rsync.
Hope this helps! Miranda
On 10/19/2014 8:55 AM, Tim Dunphy wrote:
I've setup an rsync between two directories that I've mounted locally on a jump box. Long story short, the two directories are both NFS shares from two different hosts. Our security dept won't allow us to SSH between the two data centers, directly. But the jump host can contact both. So what I've done is mount the NFS shares from one host in each data center on the jump box using sshfs.
can this 'jump host' ssh to either of the servers ? it might be worth using rsync-over-ssh protocol on one side of this xfer, probably the destination.
so rsync -avh... /sourcenfs/path/to... user@desthost:path
Are you "allowed" to temporarily run an ssh tunnel (or stunnel) on your jumpbox? So connecting from host1 to jumpbox on port XXX would be tunneled to ssh port on host2...
Or with netcat (if you can mkfifo)? mkfifo backpipe nc -l 12345 0<backpipe | nc host2 22 1>backpipeBut you will have to trick ssh into accepting your jumpbox "fingerprint"...
JD
-----Original Message----- From: John Doe [mailto:jdmls@yahoo.com] Sent: Monday, October 20, 2014 5:30 AM To: CentOS mailing list; Tim Dunphy Subject: Re: [CentOS] rsync question: building list taking forever
Are you "allowed" to temporarily run an ssh tunnel (or stunnel) on your jumpbox? So connecting from host1 to jumpbox on port XXX would be tunneled to ssh port on host2...
Or with netcat (if you can mkfifo)? mkfifo backpipe nc -l 12345 0<backpipe | nc host2 22 1>backpipeBut you will have to trick ssh into accepting your jumpbox "fingerprint"...
JD
Or perhaps easier (depending on how paranoid sshd configs are) with ProxyCommand in ssh/config, i.e., setup config so one ssh command can get you logged onto the final target and then use rsync across ssh as per normal:
http://sshmenu.sourceforge.net/articles/transparent-mulithop.html
Then rsync will be running on both ends, where the data (filesystem information) is LOCAL, i.e., fast.
I would use, if possible/allowed, key[s] with ssh(-agent) to make the whole connect to multiple hosts thing easier (i.e., fewer passphrase requests).
[OP: `they don't allow ssh between the datacenters` ...but... they nfs between them...??? ME: much head scratching.]
Even when this disclaimer is not here: I am not a contracting officer. I do not have authority to make or modify the terms of any contract.