evaluating backup systems: rsync

List overview All Threads
Download

newer

older

CentOS 6.4 kickstart bonding

Re: [CentOS] DNS problem

ken

11 Jan 2013 11 Jan '13

5:29 p.m.

Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

Show replies by date

Les Mikesell

11 Jan 11 Jan

5:36 p.m.

On Fri, Jan 11, 2013 at 11:29 AM, ken gebser@mousecar.com wrote:

...

Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

Can you be more specific about the problem you are trying to solve? Backuppc normally expires/deletes backups at a specified rate by itself and it only stores one copy of any identical file regardless of how many times it is backed up. You aren't going to save any space by deleting old copies of something that is still on any target you are backing up.

-- Les Mikesell lesmikesell@gmail.com

ken

6:53 p.m.

On 01/11/2013 12:36 PM Les Mikesell wrote:

...

On Fri, Jan 11, 2013 at 11:29 AM, kengebser@mousecar.com wrote:

...
Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

Can you be more specific about the problem you are trying to solve? Backuppc normally expires/deletes backups at a specified rate by itself and it only stores one copy of any identical file regardless of how many times it is backed up. You aren't going to save any space by deleting old copies of something that is still on any target you are backing up.

Les, thanks for replying. Yeah, I guess I need to clarify.

I've got a system which is due for an upgrade and, at the same time, would like to clean up (delete) files and, in some instances, entire directories. Insurance against sudden disk failure is one other concern.

If I delete files and entire directories on that (source) machine, will rsync then subsequently automatically delete them on the destination (backup) system? Or would I need also to run an rsync command to delete the same on the destination system? And, if yes, what rsync command would do that?

I remember you speaking well of Backuppc previously and so am open to using that in future. At the moment though, I'm looking for the simplest possible solution for those three current concerns.

Thanks again.

Leon Fauster

6:59 p.m.

Am 11.01.2013 um 19:53 schrieb ken:

...

On 01/11/2013 12:36 PM Les Mikesell wrote:

...
On Fri, Jan 11, 2013 at 11:29 AM, kengebser@mousecar.com wrote:

...
Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

Can you be more specific about the problem you are trying to solve? Backuppc normally expires/deletes backups at a specified rate by itself and it only stores one copy of any identical file regardless of how many times it is backed up. You aren't going to save any space by deleting old copies of something that is still on any target you are backing up.

Les, thanks for replying. Yeah, I guess I need to clarify.

I've got a system which is due for an upgrade and, at the same time, would like to clean up (delete) files and, in some instances, entire directories. Insurance against sudden disk failure is one other concern.

If I delete files and entire directories on that (source) machine, will rsync then subsequently automatically delete them on the destination (backup) system?

How looks your "rsync" command that you execute?

If you specify --delete : yes

e.g. rsync --delete /sourceroot destination:/srv/backup/machinex/

...

Or would I need also to run an rsync command to delete the same on the destination system? And, if yes, what rsync command would do that?

-- LF

m.roth＠5-cent.us

7:33 p.m.

ken wrote:

...

On 01/11/2013 12:36 PM Les Mikesell wrote:

...
On Fri, Jan 11, 2013 at 11:29 AM, kengebser@mousecar.com wrote:

...
Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

<snip>

...

Les, thanks for replying. Yeah, I guess I need to clarify.

I've got a system which is due for an upgrade and, at the same time, would like to clean up (delete) files and, in some instances, entire directories. Insurance against sudden disk failure is one other concern.

If I delete files and entire directories on that (source) machine, will rsync then subsequently automatically delete them on the destination (backup) system? Or would I need also to run an rsync command to delete the same on the destination system? And, if yes, what rsync command would do that?

I remember you speaking well of Backuppc previously and so am open to using that in future. At the moment though, I'm looking for the simplest possible solution for those three current concerns.

We use rsync here. Actually, we've got a home-rolled system. We created timestamped backups, which also removes them after a configuration file item of how many days or weeks. Note that we *heavily* use rsync's parm to use hard links, which saves a lot of space.

mark

ken

9:47 p.m.

On 01/11/2013 02:33 PM m.roth@5-cent.us wrote:

...

ken wrote:

...
On 01/11/2013 12:36 PM Les Mikesell wrote:

...
On Fri, Jan 11, 2013 at 11:29 AM, kengebser@mousecar.com wrote:

...
Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

<snip> > Les, thanks for replying. Yeah, I guess I need to clarify. > > I've got a system which is due for an upgrade and, at the same time, > would like to clean up (delete) files and, in some instances, entire > directories. Insurance against sudden disk failure is one other concern. > > If I delete files and entire directories on that (source) machine, will > rsync then subsequently automatically delete them on the destination > (backup) system? Or would I need also to run an rsync command to delete > the same on the destination system? And, if yes, what rsync command > would do that? > > I remember you speaking well of Backuppc previously and so am open to > using that in future. At the moment though, I'm looking for the > simplest possible solution for those three current concerns.

We use rsync here. Actually, we've got a home-rolled system. We created timestamped backups, which also removes them after a configuration file item of how many days or weeks. Note that we *heavily* use rsync's parm to use hard links, which saves a lot of space.
    mark

Cool. Thanks for mentioning time-stamps. I've been assuming that rsync would maintain the source files' original permissions and timestamps. (Heck, even tar from decades past would do that.) I hope that wasn't an unwarranted assumption. It's good to hear too that I can configure how long to keep files on destination which have been deleted from the source (if that's what you meant).

Mark, maybe you could explain what a "parm" is and how using hard links saves space.

tia, ken

m.roth＠5-cent.us

10:05 p.m.

ken wrote:

...

On 01/11/2013 02:33 PM m.roth@5-cent.us wrote:

...
ken wrote:

...
On 01/11/2013 12:36 PM Les Mikesell wrote:

...
On Fri, Jan 11, 2013 at 11:29 AM, kengebser@mousecar.com wrote:

...
Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

<snip>

...

...
We use rsync here. Actually, we've got a home-rolled system. We created timestamped backups, which also removes them after a configuration file item of how many days or weeks. Note that we *heavily* use rsync's parm to use hard links, which saves a lot of space.

Cool. Thanks for mentioning time-stamps. I've been assuming that rsync would maintain the source files' original permissions and timestamps.

It does, if you use the right parm (parameter). We timestamp the backup directories that we create, like spiderman.2013-01-11-17:01

...

(Heck, even tar from decades past would do that.) I hope that wasn't an unwarranted assumption. It's good to hear too that I can configure how long to keep files on destination which have been deleted from the source (if that's what you meant).

Mark, maybe you could explain what a "parm" is and how using hard links saves space.

A hard link isn't easy: it's an inode that is referenced by more than one other inode. In effect, it's a pointer, rather than a reference, so that it really, in effect, acts like the real file, and is almost undistinguishable from one. You don't actually delete the real file until all hard links pointing to it are gone.

Google it a bit - it really is hard to wrap your head around.

A symlink is like a reference "this file is found over there", where a hard link is like "I point to that location, the same way the inode that was created when the file was points to the location.

I know that's not right. As I said, try googling.

mark

Les Mikesell

10:24 p.m.

On Fri, Jan 11, 2013 at 4:05 PM, m.roth@5-cent.us wrote:

...

...
Mark, maybe you could explain what a "parm" is and how using hard links saves space.

A hard link isn't easy: it's an inode that is referenced by more than one other inode. In effect, it's a pointer, rather than a reference, so that it really, in effect, acts like the real file, and is almost undistinguishable from one. You don't actually delete the real file until all hard links pointing to it are gone.

Close... A directory entry has a pointer to an inode and the inode has the information about the file attributes and location. A directory entry is a 'link' with a name. There can be 0 or more links to and inode and a link count is maintained atomically in the inode as links are added or removed. A 'hard link' is the scenario when two or more directory entries (names) point to the same inode. The file data is not removed and the space freed until the inode link count is zero _and_ there are no open file handles that reference it.

...

Google it a bit - it really is hard to wrap your head around.

A symlink is like a reference "this file is found over there", where a hard link is like "I point to that location, the same way the inode that was created when the file was points to the location.

I know that's not right. As I said, try googling.

A symlink references another file name, which is found in another directory entry so it is not the same concept at all. Hardlinks can only exist within the same filesystem - symlinks can reference other mount points even to places that don't exist, and operations on it and the file it references aren't atomic.

-- Les Mikesell lesmikesell@gmail.com

m.roth＠5-cent.us

10:34 p.m.

Les Mikesell wrote:

...

On Fri, Jan 11, 2013 at 4:05 PM, m.roth@5-cent.us wrote:

...
...
Mark, maybe you could explain what a "parm" is and how using hard links saves space.

A hard link isn't easy: it's an inode that is referenced by more than one other inode. In effect, it's a pointer, rather than a reference, so

that

...

...
it really, in effect, acts like the real file, and is almost undistinguishable from one. You don't actually delete the real file until all hard links pointing to it are gone.

Close... A directory entry has a pointer to an inode and the inode has the information about the file attributes and location. A directory entry is a 'link' with a name. There can be 0 or more links to and inode and a link count is maintained atomically in the inode as links are added or removed. A 'hard link' is the scenario when two or more directory entries (names) point to the same inode. The file data is not removed and the space freed until the inode link count is zero _and_ there are no open file handles that reference it.

<snip> Thanks, Les.

At any rate, the point is that the hard links point to *exactly* the same file on the disk, so it *looks* as though they take up equal space, but in reality, there's only one copy.

So, if you're copying a directory to a timestamped named directory, and all the files except one or two being backed up haven't changed, then there's entries in both backup directories, but there's really only one physical copy, that of the one or two that are changed.

mark

Les Mikesell

10:57 p.m.

On Fri, Jan 11, 2013 at 4:34 PM, m.roth@5-cent.us wrote:

...

At any rate, the point is that the hard links point to *exactly* the same file on the disk, so it *looks* as though they take up equal space, but in reality, there's only one copy.

So, if you're copying a directory to a timestamped named directory, and all the files except one or two being backed up haven't changed, then there's entries in both backup directories, but there's really only one physical copy, that of the one or two that are changed.

Rsync itself knows something about hardlinking against previous runs if you use --link-dest=. Backuppc goes it one (or several) better by making an additional hardlink with a name based on a hash of the file contents in a pool area. Then any files with identical content are linked to the same copy even if found in different places or from different machines. And it can store the data compressed and still work with standard rsync clients on the target machines.

-- Les Mikesell lesmikesell@gmail.com

Gordon Messmer

13 Jan 13 Jan

8:47 a.m.

On 01/11/2013 02:05 PM, m.roth@5-cent.us wrote:

...

A hard link isn't easy: it's an inode that is referenced by more than one other inode.

Don't make it complicated. All regular files are hard links to an inode. The inode contains information about the owner, group, permissions, and modification times (among other things) and a description of the blocks holding the associated data. If you create a new file, the filesystem allocates a new inode and creates one hard link to it.

In the output of 'ls -l', you'll see a number just after the file permissions. That is the number of hard links to the referenced file. As others have stated, when you "rm" (which calls unlink() in libc), the kernel removes the link and decrements the link count. If the count is 0 and the file is not open, the filesystem will release the data blocks, and 'df' will report the change in disk space used.

Gordon Messmer

8:39 a.m.

On 01/11/2013 01:47 PM, ken wrote:

...

Cool. Thanks for mentioning time-stamps. I've been assuming that rsync would maintain the source files' original permissions and timestamps.

It will if you specify -t/--times and -p/--perms (or -a, which implies both).

...

It's good to hear too that I can configure how long to keep files on destination which have been deleted from the source (if that's what you meant).

rsync doesn't do that on its own. Mark was referring to his home-rolled system. If you want that, it's best to use rsnapshot or backuppc (or something like them).

...

Mark, maybe you could explain what a "parm" is and how using hard links saves space.

Consider a simple configuration of rsnapshot, in which there are 7 daily backups. When rsnapshot runs, it will check the directory to find out how many snapshots exist and remove the oldest if there are already 7. It then renames daily.1 through daily.5, such that they are daily.2 through daily.6 afterward. It then runs "cp -al daily.0 daily.1". At that point, daily.0 and daily.1 are identical trees of hard links to the same files. They are both complete, full backups of the source data, but the disk space used is only slightly more than the space used for daily.0. rsnapshot then runs rsync with daily.0 as the destination. Any files which are updated are overwritten by new files rather than updating in place, in order to keep the data in daily.1 from being modified.

The space saving referenced is the ability to have full backups on a regular filesystem without duplicating data for files that have not changed.

Les Mikesell

11 Jan 11 Jan

7:37 p.m.

On Fri, Jan 11, 2013 at 12:53 PM, ken gebser@mousecar.com wrote:

...

Les, thanks for replying. Yeah, I guess I need to clarify.

I've got a system which is due for an upgrade and, at the same time, would like to clean up (delete) files and, in some instances, entire directories. Insurance against sudden disk failure is one other concern.

As a special-case 'restore from scratch' backup, look at clonezilla-live (a bootable iso) or ReaR (a package you can run while the machine is still working). Either of these will make a backup copy that you can restore on bare metal and are good to use before big changes for a quick way to recover from any mistake.

...

If I delete files and entire directories on that (source) machine, will rsync then subsequently automatically delete them on the destination (backup) system?

If you do a simple rsync, you can add the --delete option to tell it to delete files on the destination if they don't exist in the source. You can use rsync -avn --delete ... to see what will happen if you aren't sure about the source/dest args (-n = --dry-run so it won't actually do anything), then repeat without the -n to do it.

...

I remember you speaking well of Backuppc previously and so am open to using that in future. At the moment though, I'm looking for the simplest possible solution for those three current concerns.

Backuppc is pretty simple since you can install via yum and fill in the details in the web interface (and I'll answer questions on its mail list...). But it is somewhat hard to move its archive once you get started, so you might want to think about how you want it to work first.

-- Les Mikesell lesmikesell@gmail.com

SilverTip257

12 Jan 12 Jan

2:36 a.m.

On Fri, Jan 11, 2013 at 1:53 PM, ken gebser@mousecar.com wrote:

...

If I delete files and entire directories on that (source) machine, will rsync then subsequently automatically delete them on the destination

Not automatically without the --delete flag as others have mentioned. You delete on the source and with --delete the old files are removed by rsync when it runs again.

Set up a small test (even just on your local machine between two directories) and give it a test. And --dry-run is your best friend when testing!

...

(backup) system? Or would I need also to run an rsync command to delete the same on the destination system? And, if yes, what rsync command would do that?

You mentioned about it running with other people changing files ... it works ok for me. I have gigabytes of backups that get rsynced in the early to late morning ... not always are backups completely finished when rsync scans the files. So it picks up on it when the cronjob runs the sync a few hours later.

If you're going to place one box off-site, sync it up before placing it off-site ... it takes some real time to sync up a few terabytes of backups/archives. :)

*** You may have to run rsync as root with sudo to preserve all permissions/ownership. *** At work we have it locked down in sudoers to do so. It was a setup that predated my employment there, so I don't know if running it as root was necessary. Using SSH keys for auth.

...

...
Call me out on rsync as root if it isn't necessary to preserve all

permissions. I think I was told ownership got whacked if it wasn't ran that way.

On Fri, Jan 11, 2013 at 2:33 PM, m.roth@5-cent.us wrote:

...

We use rsync here. Actually, we've got a home-rolled system. We created timestamped backups, which also removes them after a configuration file item of how many days or weeks. Note that we *heavily* use rsync's parm to use hard links, which saves a lot of space.

At work I've got a similar set up to Mark's - time stamped from network storage to a backup box. I have scripts on the on-site backup server that purge the old time stamped directories and various other items. And then an off-site server that rsync's everything to a LUKS encrypted volume.

-- ---~~.~~--- Mike // SilverTip257 //

Keith Keller

3:26 a.m.

On 2013-01-12, SilverTip257 silvertip257@gmail.com wrote:

...

You mentioned about it running with other people changing files ... it works ok for me. I have gigabytes of backups that get rsynced in the early to late morning ... not always are backups completely finished when rsync scans the files. So it picks up on it when the cronjob runs the sync a few hours later.

Since rsnapshot uses rsync under the hood, this strategy works for rsnapshot as well. The only real hiccup is if a user deletes a file between when it's scheduled to be synced and when rsync actuall reaches it to sync, rsync might produce a harmless error message.

...

*** You may have to run rsync as root with sudo to preserve all permissions/ownership. *** At work we have it locked down in sudoers to do so. It was a setup that predated my employment there, so I don't know if running it as root was necessary. Using SSH keys for auth.

You can also use an OpenVPN tunnel and NFS mount with no_root_squash. I like this method a lot because the mount can be made read-only, to ensure that no source data ever gets accidentally clobbered. With an ssh key there's a risk (probably minimal, but nonzero) that a fumblefingers might delete some data on the wrong side.

--keith

-- kkeller@wombat.san-francisco.ca.us

SilverTip257

13 Jan 13 Jan

7:10 a.m.

On Fri, Jan 11, 2013 at 10:26 PM, Keith Keller < kkeller@wombat.san-francisco.ca.us> wrote:

...

On 2013-01-12, SilverTip257 silvertip257@gmail.com wrote:

...
You mentioned about it running with other people changing files ... it works ok for me. I have gigabytes of backups that get rsynced in the

early

...
to late morning ... not always are backups completely finished when rsync scans the files. So it picks up on it when the cronjob runs the sync a

few

...
hours later.

Since rsnapshot uses rsync under the hood, this strategy works for rsnapshot as well. The only real hiccup is if a user deletes a file between when it's scheduled to be synced and when rsync actuall reaches it to sync, rsync might produce a harmless error message.

Yep, a harmless error message.

...

...
*** You may have to run rsync as root with sudo to preserve all permissions/ownership. *** At work we have it locked down in sudoers to do so. It was a setup that predated my employment there, so I don't know if running it as root was necessary. Using SSH keys for auth.

You can also use an OpenVPN tunnel and NFS mount with no_root_squash. I like this method a lot because the mount can be made read-only, to ensure that no source data ever gets accidentally clobbered. With an ssh key there's a risk (probably minimal, but nonzero) that a fumblefingers might delete some data on the wrong side.

NFS over a VPN tunnel isn't a bad idea -- being able to make the mount read-only can be beneficial.

True, a risk is present if one is manually syncing the data. I run my routine/daily rsyncs via a cronjob, so once it's set it is not going to get fumbled. ;) --dry-run is important to test before clobbering.

...

--keith

-- kkeller@wombat.san-francisco.ca.us

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- ---~~.~~--- Mike // SilverTip257 //

Craig White

11 Jan 11 Jan

5:53 p.m.

On Jan 11, 2013, at 10:29 AM, ken wrote:

...

Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

---- rysnc has 'delete, delete-before, delete-after' options which I do use sometimes. Do a 'man rsync' for details.

Craig

zGreenfelder

5:54 p.m.

On Fri, Jan 11, 2013 at 12:29 PM, ken gebser@mousecar.com wrote:

...

Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

it's certainly feasible for a a fairly lackluster backup solution (e.g. gonna rebuild machine, want all of /home saved to other machine, rsync then reinstall to try $new ditro!) but I wouldn't recommend rsync for product grade backups; it'd get very complex very quickly trying to figure a way to do versioning (rsync would be really good for 'oops, I removed X file, but I'd copied it over to M machine, so I can recover', not very good at 'someone changed this file 4 days ago and now it doesn't do what I want, I'd like to go back to a previous version). at least in my estimation.

-- Even the Magic 8 ball has an opinion on email clients: Outlook not so good.

Les Mikesell

6:29 p.m.

On Fri, Jan 11, 2013 at 11:54 AM, zGreenfelder zgreenfelder@gmail.com wrote:

...

On Fri, Jan 11, 2013 at 12:29 PM, ken gebser@mousecar.com wrote:

...
Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

it's certainly feasible for a a fairly lackluster backup solution (e.g. gonna rebuild machine, want all of /home saved to other machine, rsync then reinstall to try $new ditro!) but I wouldn't recommend rsync for product grade backups; it'd get very complex very quickly trying to figure a way to do versioning (rsync would be really good for 'oops, I removed X file, but I'd copied it over to M machine, so I can recover', not very good at 'someone changed this file 4 days ago and now it doesn't do what I want, I'd like to go back to a previous version). at least in my estimation.

Urk, insufficient coffee this morning. In my previous reply I thought this was the backuppc list. Backuppc does in fact do a very good job of storing backups in minimal space - and can use rsync to do it while also maintaining versioning so it is great as a generic backup solution. But, it doesn't have anything built-in to delete target files after the copy. There is an option to run post-backup scripts that might work.

-- Les Mikesell lesmikesell@gmail.com

Leon Fauster

6:49 p.m.

Am 11.01.2013 um 19:29 schrieb Les Mikesell:

...

On Fri, Jan 11, 2013 at 11:54 AM, zGreenfelder zgreenfelder@gmail.com wrote:

...
On Fri, Jan 11, 2013 at 12:29 PM, ken gebser@mousecar.com wrote:

...
Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

it's certainly feasible for a a fairly lackluster backup solution (e.g. gonna rebuild machine, want all of /home saved to other machine, rsync then reinstall to try $new ditro!) but I wouldn't recommend rsync for product grade backups; it'd get very complex very quickly trying to figure a way to do versioning (rsync would be really good for 'oops, I removed X file, but I'd copied it over to M machine, so I can recover', not very good at 'someone changed this file 4 days ago and now it doesn't do what I want, I'd like to go back to a previous version). at least in my estimation.

Urk, insufficient coffee this morning. In my previous reply I thought this was the backuppc list. Backuppc does in fact do a very good job of storing backups in minimal space - and can use rsync to do it while also maintaining versioning so it is great as a generic backup solution. But, it doesn't have anything built-in to delete target files after the copy. There is an option to run post-backup scripts that might work.

alternative: check rsnapshot.

-- LF

ken

9:37 p.m.

On 01/11/2013 12:54 PM zGreenfelder wrote:

...

On Fri, Jan 11, 2013 at 12:29 PM, kengebser@mousecar.com wrote:

...
Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it, to delete files which have been backed up (in order to save space on the backup media).

Anyone with experience doing this?

it's certainly feasible for a a fairly lackluster backup solution (e.g. gonna rebuild machine, want all of /home saved to other machine, rsync then reinstall to try $new ditro!) but I wouldn't recommend rsync for product grade backups; it'd get very complex very quickly trying to figure a way to do versioning (rsync would be really good for 'oops, I removed X file, but I'd copied it over to M machine, so I can recover', not very good at 'someone changed this file 4 days ago and now it doesn't do what I want, I'd like to go back to a previous version). at least in my estimation.

Thanks for your expression of caution. But, yeah, I'm looking for a solution just to back up a few LVs on this one machine. Keeping prior versions of files isn't really necessary in this particular instance. (I guess my Subject line may have been a bit misleading on that count.)

I've admin'd both extremes of the backup world, a lot of Veritas Netbackup and a lot of tar+gzip, and I prefer something which is easier to understand and so not so much the grand package with so many bells and whistles that it takes days (or longer) in the manual to set it up. From what I've read, rsync will work in the background and just fine while users (and several system processes) are copying, deleting, editing and other things which would confuse tar. Plus, it handles the network stuff. If it does all that well and the other stuff mentioned previously, I'm in.

Probably not too far into the future I'll be back to talk about a more sophisticated backup systems with versioning etc. But for right now, for the too-quickly-approaching future, simpler is going to be better.

Thanks for the input.

Gordon Messmer

13 Jan 13 Jan

8:30 a.m.

On 01/11/2013 01:37 PM, ken wrote:

...

From what I've read, rsync will work in the background and just fine while users (and several system processes) are copying, deleting, editing and other things which would confuse tar.

No, it doesn't. rsync does not have any magic powder for making or keeping files consistent. If you rsync a file that is actively being written to, the destination will probably be inconsistent (i.e. corrupt). Don't use rsync, alone, for backing up any files that are open for writing.

If you want to make a backup that's consistent across a filesystem, you need to make a snapshot, mount it, and back up the snapshot content. If there are files open for writing, you need to make them consistent while the snapshot is made. While I rarely say nice things about Windows, this is an area where Linux falls far short. There is no common mechanism for making files and databases consistent and making a snapshot for backups. Admins must do this on their own. If you aren't actively taking steps to make your backups consistent, they aren't.

...

Plus, it handles the network stuff.

It doesn't really do that on its own, either. Most people probably use rsync with ssh for a network link. That's the default behavior if the source or destination is formatted properly. It can be changed with the -e flag, or you can handle the pipe on your own. rsync can run as a daemon, but I think that's most commonly used for publishing public data.

Gordon Messmer

18 Mar 18 Mar

5:26 p.m.

On 01/13/2013 12:30 AM, Gordon Messmer wrote:

...

If you want to make a backup that's consistent across a filesystem, you need to make a snapshot, mount it, and back up the snapshot content. If there are files open for writing, you need to make them consistent while the snapshot is made. While I rarely say nice things about Windows, this is an area where Linux falls far short. There is no common mechanism for making files and databases consistent and making a snapshot for backups. Admins must do this on their own. If you aren't actively taking steps to make your backups consistent, they aren't.

After publicly bitching about Linux's poor backup infrastructure for the hundredth time, I decided to write a system largely similar to VSS. I've written the first iteration in bash. It took one day to do most of the work, and then a few hours of testing and fixing to get things working reasonably well.

https://bitbucket.org/gordonmessmer/dragonsdawn-snapshot

At this point, there's enough working for other people to start looking at. Systems with ext3/4 filesystems on LVM are supported. btrfs will follow. PostgreSQL has a script to make its data consistent, but other common systems like MySQL, OpenLDAP, and 389 DS need similar support. Documentation needs to be written. A few architectural issues need to be ironed out.

If you're interested in improving the state of backups on GNU/Linux, please have a look and contact me if you want to help with code, testing, documentation, packaging, or maintaining packages in distributions so that this becomes a standard feature.

M. Fioretti

18 Jan 18 Jan

9:59 a.m.

On Fri, Jan 11, 2013 12:29:48 PM -0500, ken wrote:

...

Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it...

sorry to step in so late, but I have another question on this very topic.

I have noticed that if I just _change_ the name of a folder, rsync doesn't realizes it. That is, if folder holidays_2013 contains, say, 1000 pictures of 10 MB each, I rsync it to a remote computer and then change its name locally to family_holidays_2013, on the next run rsync:

- deletes the remote holidays_2013 and all its content

- creates a remote family_holidays_2013

- uploads again to it ALL the 1000 pictures of 10 MB each

even if all the "rsyncing" needed would be something equivalent to "mv holidays_2013 family_holidays_2013" on the remote server. Is it possible to tell rsync to behave in that way? I think not, but I'd like to be proven wrong on this.

TIA, Marco

keshab mahapatra

10:25 a.m.

rsync -v -d root@192.168.200.10:/var/lib/ .

Use rsync -d option to synchronize only directory tree from source to the destination. The below example, synchronize only directory tree in recursive manner, not the files in the directories

-- With Thanks & Regards, Keshaba Mahapatra Technical Consultant Complete Open Source Solutions #512, Aditya Trade Centre Ameerpet, Hyderabad,500038 Ph - +91 40 66773365 Mob - +91 7569071776

SilverTip257

1:12 p.m.

On Fri, Jan 18, 2013 at 5:25 AM, keshab mahapatra ping2kpm@gmail.comwrote:

...

rsync -v -d root@192.168.200.10:/var/lib/ .

Use rsync -d option to synchronize only directory tree from source to the destination. The below example, synchronize only directory tree in recursive manner, not the files in the directories

From the rsync manpage:

-d, --dirs transfer directories without recursing

How does the rsync -d option help us here? Transferring directories _only_ and not recursing isn't all that useful. Generally there are files in directories that are necessary to back up.

I don't see an example here.

...

-- With Thanks & Regards,

Keshaba Mahapatra Technical Consultant Complete Open Source Solutions #512, Aditya Trade Centre Ameerpet, Hyderabad,500038 Ph - +91 40 66773365 Mob - +91 7569071776 _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- ---~~.~~--- Mike // SilverTip257 //

SilverTip257

1:07 p.m.

On Fri, Jan 18, 2013 at 4:59 AM, M. Fioretti mfioretti@nexaima.net wrote:

...

On Fri, Jan 11, 2013 12:29:48 PM -0500, ken wrote:

...
Considering using rsync on a couple systems for backup, I was wondering if it's possible, and if so how difficult is it...

sorry to step in so late, but I have another question on this very topic.

I have noticed that if I just _change_ the name of a folder, rsync doesn't realizes it. That is, if folder holidays_2013 contains, say, 1000 pictures of 10 MB each, I rsync it to a remote computer and then change its name locally to family_holidays_2013, on the next run rsync:

deletes the remote holidays_2013 and all its content

creates a remote family_holidays_2013

uploads again to it ALL the 1000 pictures of 10 MB each

Yes, that's the way it works. If you change a directory name, rsync has no way of knowing that you moved it. And since the new directory doesn't exist on the rsync source that new directory is removed and those items are rsynced again.

Bottom-line: Change things on the source and don't fiddle with them on the destination. Or if you really want to eliminate that data being transferred, I suppose you could do the extra work and rename the directory at the same time on the source and destination. Not ideal in the least.

Rsync is comparing files or more precisely blocks of two files (source to destination comparison).

...

even if all the "rsyncing" needed would be something equivalent to "mv holidays_2013 family_holidays_2013" on the remote server. Is it possible to tell rsync to behave in that way? I think not, but I'd like to be proven wrong on this.

TIA, Marco _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- ---~~.~~--- Mike // SilverTip257 //

M. Fioretti

19 Jan 19 Jan

8:14 a.m.

On Fri, Jan 18, 2013 08:07:40 AM -0500, SilverTip257 wrote:

...

Yes, that's the way it works. If you change a directory name, rsync has no way of knowing that you moved it.

I was almost sure that this was the case, but it didn't hurt to ask for confirmation. Thanks to you, Reindl and all the others who answered.

...

if you really want to eliminate that data being transferred, I suppose you could do the extra work and rename the directory at the same time on the source and destination. Not ideal in the least.

Not ideal indeed, but I'll probably do it that way next time that some renaming like this happens on very large folders. I assume that after that, I'd also have to launch rsync with the options that says to not consider modification time. I'll try.

All the other solutions mentioned in this sub-thread would require more work. Rsync is already configured and works, with just this limit, and in practice it doesn't happen too often on large directories, so I'll live with it.

Thanks, Marco

Nicolas Thierry-Mieg

2:46 p.m.

M. Fioretti wrote:

...

On Fri, Jan 18, 2013 08:07:40 AM -0500, SilverTip257 wrote:

...
if you really want to eliminate that data being transferred, I suppose you could do the extra work and rename the directory at the same time on the source and destination. Not ideal in the least.

Not ideal indeed, but I'll probably do it that way next time that some renaming like this happens on very large folders. I assume that after that, I'd also have to launch rsync with the options that says to not consider modification time.

no I don't think you will, since the file modification times won't have changed.

Stuart Barkley

24 Jan 24 Jan

6:53 a.m.

On Fri, 18 Jan 2013 at 08:07 -0000, SilverTip257 wrote:

...

If you change a directory name, rsync has no way of knowing that you moved it. And since the new directory doesn't exist on the rsync source that new directory is removed and those items are rsynced again.

Bottom-line: Change things on the source and don't fiddle with them on the destination. Or if you really want to eliminate that data being transferred, I suppose you could do the extra work and rename the directory at the same time on the source and destination. Not ideal in the least.

I use a home grown rsync based backup process (for 15+ years).

When traveling or otherwise worried about large directory tree renames, I will sometimes do a 'cp -rpl' instead of a rename. This will allow rsync to notice the new hard links and it just create the new directory structure without transferring the data again. After the rsync is complete I can then remove the original directory tree and rsync again. The rsync needs to use --hard-links for this to work.

I only recommend doing this if you understand and are comfortable with hard links and rsync.

Stuart

-- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone

Gordon Messmer

18 Jan 18 Jan

5:53 p.m.

On 01/18/2013 01:59 AM, M. Fioretti wrote:

...

even if all the "rsyncing" needed would be something equivalent to "mv holidays_2013 family_holidays_2013" on the remote server. Is it possible to tell rsync to behave in that way?

No, which is why some people use Mercurial or git to keep data in sync in multiple places. It's not very space efficient, but can be more network efficient than rsync in transferring changes and renames.

Les Mikesell

6:13 p.m.

On Fri, Jan 18, 2013 at 11:53 AM, Gordon Messmer yinyang@eburg.com wrote:

...

On 01/18/2013 01:59 AM, M. Fioretti wrote:

...
even if all the "rsyncing" needed would be something equivalent to "mv holidays_2013 family_holidays_2013" on the remote server. Is it possible to tell rsync to behave in that way?

No, which is why some people use Mercurial or git to keep data in sync in multiple places. It's not very space efficient, but can be more network efficient than rsync in transferring changes and renames.

I think an incremental 'dump' can catch renames. zfs's incremental send/receive would track just the changed disk blocks. Not sure what else would handle it better. Backuppc would transfer a new copy but if you were keeping more than one backup the matching files would end up being stored as links to each other and not take additional space.

-- Les Mikesell lesmikesell@gmail.com

4482

Age (days ago)

4548

Last active (days ago)

discuss@lists.centos.org

31 comments

13 participants

tags (0)

participants (13)

Craig White
Gordon Messmer
Keith Keller
ken
keshab mahapatra
Leon Fauster
Les Mikesell
M. Fioretti
m.roth＠5-cent.us
Nicolas Thierry-Mieg
SilverTip257
Stuart Barkley
zGreenfelder