Rsync and differential Backups

List overview All Threads
Download

newer

older

CentOS-announce Digest, Vol 129,...

Fwd: After installation

Alessandro Baggi

9 Nov 2015 9 Nov '15

4:01 p.m.

Hi list, how to perform a differential backup using rsync?

On web there is a great confusion about diff backup concept when searched with rsync.

Users says diff because it copy only differences. For me differential is backup from last full backup.

Other users says that to perform a differential backup I must include in rsync command: --backup --backup-dir=/some/path but from manual page of rsync:

############# --backup-dir=DIR In combination with the --backup option, this tells rsync to store all backups in the specified directory on the receiving side. This can be used for incremental backups. You can additionally specify a backup suffix using the --suffix option (otherwise the files backed up in the specified directory will keep their original filenames). .... ###################

Then at this point, I can perform a full backup copying base dir after last incremental. I can performa an incremental backup saving change on a specified destdir (using --backup-dir).

How I can perform a diff backup?

I know that rsync check differences using "the base dir". This dir have "the same content" of backupped source. To make incremental, this base is used. Supposing that I've 500 GB data on source. Make/sync the base-dir of 500GB. Running a full backup (the result file must be a fullbackup.tar.gz), at the end of the process I get a base-dir of 500GB and a .tar.gz of +/- 500GB compressed. Is correct make full backup, performing first an incremental backup on the base-dir and then compress it on a .tar.gz? Or is better resync all source in alternative destdir?

In this example I've spent the double space for a full and a base-dir. 500GB Source vs 1TB for base-dir and a full.tar.gz. There is a way to performs other operation (incr and diff) without using the base and save disk space?

Thanks in advance.

Show replies by date

melkor.kp

9 Nov 9 Nov

4:26 p.m.

For backups with rsync a recommend you to follow the approach discussed on this website. It provides you everything for getting a full backup and then the incremental ones (deltas) using rsync. The only thing you need in order to do that is that the hosting filesystem supports hard links,

http://www.mikerubel.org/computers/rsync_snapshots/

Cheers, Roberto Nebot

2015-11-09 17:01 GMT+01:00 Alessandro Baggi alessandro.baggi@gmail.com:

...

Hi list, how to perform a differential backup using rsync?

On web there is a great confusion about diff backup concept when searched with rsync.

Users says diff because it copy only differences. For me differential is backup from last full backup.

Other users says that to perform a differential backup I must include in rsync command: --backup --backup-dir=/some/path but from manual page of rsync:

############# --backup-dir=DIR In combination with the --backup option, this tells rsync to store all backups in the specified directory on the receiving side. This can be used for incremental backups. You can additionally specify a backup suffix using the --suffix option (otherwise the files backed up in the specified directory will keep their original filenames). .... ###################

Then at this point, I can perform a full backup copying base dir after last incremental. I can performa an incremental backup saving change on a specified destdir (using --backup-dir).

How I can perform a diff backup?

I know that rsync check differences using "the base dir". This dir have "the same content" of backupped source. To make incremental, this base is used. Supposing that I've 500 GB data on source. Make/sync the base-dir of 500GB. Running a full backup (the result file must be a fullbackup.tar.gz), at the end of the process I get a base-dir of 500GB and a .tar.gz of +/- 500GB compressed. Is correct make full backup, performing first an incremental backup on the base-dir and then compress it on a .tar.gz? Or is better resync all source in alternative destdir?

In this example I've spent the double space for a full and a base-dir. 500GB Source vs 1TB for base-dir and a full.tar.gz. There is a way to performs other operation (incr and diff) without using the base and save disk space?

Thanks in advance. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

-- It's a dangerous business, Frodo, going out your door. You step onto the road, and if you don't keep your feet, there's no knowing where you might be swept off to. Bilbo Baggins. http://vanishedintheshell.apsila.org http://vanishedintheshell.blogspot.com

Valeri Galtsev

4:26 p.m.

On Mon, November 9, 2015 10:01 am, Alessandro Baggi wrote:

...

Hi list, how to perform a differential backup using rsync?

Differential comes from real backup systems. Rsync is much simpler IMHO, "-b" backup flag only keeps older version or deleted file/directory with extra "~" (or whatever you define) in its name. Making rsync behaving as full blown backup system is too time consuming. Much less time consuming will be to just to install some backup software. Backuppc I would recommend for simple case like I understand yours is. Bacula will be my choice when I need enterprise level system.

Just my $0.02.

Valeri

...

On web there is a great confusion about diff backup concept when searched with rsync.

Users says diff because it copy only differences. For me differential is backup from last full backup.

Other users says that to perform a differential backup I must include in rsync command: --backup --backup-dir=/some/path but from manual page of rsync:

############# --backup-dir=DIR In combination with the --backup option, this tells rsync to store all backups in the specified directory on the receiving side. This can be used for incremental backups. You can additionally specify a backup suffix using the --suffix option (otherwise the files backed up in the specified directory will keep their original filenames). .... ###################

Then at this point, I can perform a full backup copying base dir after last incremental. I can performa an incremental backup saving change on a specified destdir (using --backup-dir).

How I can perform a diff backup?

I know that rsync check differences using "the base dir". This dir have "the same content" of backupped source. To make incremental, this base is used. Supposing that I've 500 GB data on source. Make/sync the base-dir of 500GB. Running a full backup (the result file must be a fullbackup.tar.gz), at the end of the process I get a base-dir of 500GB and a .tar.gz of +/- 500GB compressed. Is correct make full backup, performing first an incremental backup on the base-dir and then compress it on a .tar.gz? Or is better resync all source in alternative destdir?

In this example I've spent the double space for a full and a base-dir. 500GB Source vs 1TB for base-dir and a full.tar.gz. There is a way to performs other operation (incr and diff) without using the base and save disk space?

Thanks in advance. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++

Gordon Messmer

5:50 p.m.

On 11/09/2015 08:01 AM, Alessandro Baggi wrote:

...

how to perform a differential backup using rsync?

rsync backups are always incremental against the most recent backup (assuming you're copying to the same location).

...

Users says diff because it copy only differences. For me differential is backup from last full backup.

I don't see the distinction you're making.

rsync examines each file. If you specify --delete, files that are in the destination but not the source will be removed. Generally, files that match last-modified-time and size will not be copied, but flags like -c change the criteria for determining whether a file needs to be copied. Files which do not match will be copied using an efficient algorithm to send the minimum amount of data (just the changes in the file) from the source to the destination.

...

Other users says that to perform a differential backup I must include in rsync command: --backup --backup-dir=/some/path but from manual page of rsync:

You probably only need to use --backup-dir on systems which don't have GNU cp. On systems with GNU cp, differential backups normally do something like:

cp -a daily.0 daily.1 rsync -a --delete source/ daily.0/

Whereas with --backup-dir, you can use rsync to do both tasks in one command, but your directory layout is a little messier.

...

How I can perform a diff backup?

Save yourself a lot of trouble and use a front-end like rsnapshot or backuppc.

John R Pierce

5:59 p.m.

On 11/9/2015 9:50 AM, Gordon Messmer wrote:

...

I don't see the distinction you're making.

a incremental backup copies everything since the last incremental

a differential copies everything since the last full.

rsync is NOT a backup system, its just a incremental file copy

with the full/incremental/differential approach, a restore to a given date would need to restore the last full, then the last differential, then any incrementals since that differential, for instance, if you do monthly full, weekly differential and daily incrementals. If you don't use differentials, then you'd have to restore every incremental since that last full, which in a monthly full, daily incremental scenario could be as many as 30 incrementals.

-- john r pierce, recycling bits in santa cruz

Gordon Messmer

6:20 p.m.

On 11/09/2015 09:59 AM, John R Pierce wrote:

...

On 11/9/2015 9:50 AM, Gordon Messmer wrote:

...
I don't see the distinction you're making.

a incremental backup copies everything since the last incremental a differential copies everything since the last full.

I guess that makes sense, but in backup systems based on rsync and hard links (such as rsnapshot), *every* backup on the backup volume is a "full" backup, so incremental and differential are the same thing.

...

rsync is NOT a backup system, its just a incremental file copy

..which can be used as a component of a backup system, such as rsnapshot or backuppc.

m.roth＠5-cent.us

6:42 p.m.

Gordon Messmer wrote:

...

On 11/09/2015 09:59 AM, John R Pierce wrote:

...
On 11/9/2015 9:50 AM, Gordon Messmer wrote:

...
I don't see the distinction you're making.

a incremental backup copies everything since the last incremental a differential copies everything since the last full.

I guess that makes sense, but in backup systems based on rsync and hard links (such as rsnapshot), *every* backup on the backup volume is a "full" backup, so incremental and differential are the same thing.

...
rsync is NOT a backup system, its just a incremental file copy

..which can be used as a component of a backup system, such as rsnapshot or backuppc.

Actually, we use rsync for backups. We have a script that creates a new daily directory... and uses hard links to previous dates. That way, it looks like a full b/u... but you can go to a previous date to restore an older version of the file (aka ACK! I saved that file full of garbage to my Great American Novel filename! <g>).

And if you aren't familiar with hard links, which rsync happily creates, they were certainly hard enough to wrap my head around, until I got it... and really like them. Just note that they *must* be on one filesystem, as opposed to symlinks, which can cross filesystems.

mark

Frank Cox

7:10 p.m.

On Mon, 9 Nov 2015 13:42:08 -0500 m.roth@5-cent.us wrote:

...

And if you aren't familiar with hard links, which rsync happily creates, they were certainly hard enough to wrap my head around, until I got it...

More than one filename for a particular file. What's difficult about that?

...

and really like them. Just note that they *must* be on one filesystem, as opposed to symlinks, which can cross filesystems.

Obviously, since a hard link is part of the file and directory structure of the filesystem.

-- MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com

Gordon Messmer

7:36 p.m.

On 11/09/2015 11:10 AM, Frank Cox wrote:

...

...
...
And if you aren't familiar with hard links, which rsync happily creates, they were certainly hard enough to wrap my head around, until I got it...

More than one filename for a particular file. What's difficult about that?

I think the difficult part is that so many people don't understand that EVERY regular file is a hard link. It doesn't mean "more than one" at all. A hard link is the association between a directory entry (filename) and an inode in the filesystem.

Frank Cox

8:02 p.m.

On Mon, 9 Nov 2015 11:36:18 -0800 Gordon Messmer wrote:

...

I think the difficult part is that so many people don't understand that EVERY regular file is a hard link. It doesn't mean "more than one" at all. A hard link is the association between a directory entry (filename) and an inode in the filesystem.

Now that you point that out, I agree. I never thought about it that way before since I've always looked at a hard link as a link that you create after you create the initial file, though they become interchangeable after that.

But you're absolutely right and I've learned something today. Thanks!

-- MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com

John R Pierce

8:11 p.m.

On 11/9/2015 12:02 PM, Frank Cox wrote:

...

Now that you point that out, I agree. I never thought about it that way before since I've always looked at a hard link as a link that you create after you create the initial file, though they become interchangeable after that.

on Unix systems, the actual 'file' is known as an inode, and is identified by a inode number. Directories are other files that contain indexed directory entries with filenames pointing to these inodes.

the tricky thing with hard links is, you have to walk the whole directory tree of a given file system to find every entry pointing to the same inode if you want to identify these links.

-- john r pierce, recycling bits in santa cruz

Valeri Galtsev

7:34 p.m.

On Mon, November 9, 2015 12:42 pm, m.roth@5-cent.us wrote:

...

Gordon Messmer wrote:

...
On 11/09/2015 09:59 AM, John R Pierce wrote:

...
On 11/9/2015 9:50 AM, Gordon Messmer wrote:

...
I don't see the distinction you're making.

a incremental backup copies everything since the last incremental a differential copies everything since the last full.

I guess that makes sense, but in backup systems based on rsync and hard links (such as rsnapshot), *every* backup on the backup volume is a "full" backup, so incremental and differential are the same thing.

...
rsync is NOT a backup system, its just a incremental file copy

..which can be used as a component of a backup system, such as rsnapshot or backuppc.

Actually, we use rsync for backups. We have a script that creates a new daily directory... and uses hard links to previous dates. That way, it looks like a full b/u... but you can go to a previous date to restore an older version of the file (aka ACK! I saved that file full of garbage to my Great American Novel filename! <g>).

I wonder how filesystem behaves when almost every file has some 400 hard links to it. (thinking in terms of a year worth of daily backups).

Valeri

...

And if you aren't familiar with hard links, which rsync happily creates, they were certainly hard enough to wrap my head around, until I got it... and really like them. Just note that they *must* be on one filesystem, as opposed to symlinks, which can cross filesystems.
    mark
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Gordon Messmer

7:41 p.m.

On 11/09/2015 11:34 AM, Valeri Galtsev wrote:

...

I wonder how filesystem behaves when almost every file has some 400 hard links to it. (thinking in terms of a year worth of daily backups).

Why do you think that would be a problem?

Most inodes have one hard link. When that link is removed, the link count in the inode is decremented (inodes are reference-counted, you can see their ref count in "ls -l" output). When the link count reaches 0 and no open file descriptors exist, the inode is removed.

Creating more hard links just increases the ref count. That's it. It's not a weird special case.

Valeri Galtsev

7:50 p.m.

On Mon, November 9, 2015 1:41 pm, Gordon Messmer wrote:

...

On 11/09/2015 11:34 AM, Valeri Galtsev wrote:

...
I wonder how filesystem behaves when almost every file has some 400 hard links to it. (thinking in terms of a year worth of daily backups).

Why do you think that would be a problem?

Probably not. You are not impacting something that has notably finite count (like inode count on given fs). You just use a bit more disk space for metadata which is nothing (space wise) compared to data (the files themselves). Thanks!

...

Most inodes have one hard link. When that link is removed, the link count in the inode is decremented (inodes are reference-counted, you can see their ref count in "ls -l" output). When the link count reaches 0 and no open file descriptors exist, the inode is removed.

Creating more hard links just increases the ref count. That's it. It's not a weird special case. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

John R Pierce

7:49 p.m.

On 11/9/2015 11:34 AM, Valeri Galtsev wrote:

...

I wonder how filesystem behaves when almost every file has some 400 hard links to it. (thinking in terms of a year worth of daily backups).

XFS handles this fine. I have a backuppc storage pool with backups of 27 servers going back a year... now, I just have 30 days of incrementals, and 12 months of fulls, but in backuppc's implementation the distinction between incremental and full is quite blurred as both are fully deduped across the whole pool via use of hard links.

* Pool is 5510.40GB comprising 9993293 files and 4369 directories (as of 11/9 02:08), * Pool hashing gives 3452 repeated files with longest chain 54, * Nightly cleanup removed 737 files of size 1.64GB (around 11/9 02:08), * Pool file system was recently at 35% (11/9 11:44), today's max is 35% (11/9 01:00) and yesterday's max was 36%.

There are 27 hosts that have been backed up, for a total of:

* 441 full backups of total size 71125.43GB (prior to pooling and compression), * 623 incr backups of total size 20775.88GB (prior to pooling and compression).

so 90+TB of backups take 5.5TB of actual space.

-- john r pierce, recycling bits in santa cruz

Keith Keller

10 Nov 10 Nov

1:52 a.m.

On 2015-11-09, John R Pierce pierce@hogranch.com wrote:

...

XFS handles this fine. I have a backuppc storage pool with backups of 27 servers going back a year... now, I just have 30 days of incrementals, and 12 months of fulls,

I'm sure you know this already, but for those who may not, be sure to mount your XFS filesystem with the inode64 option. Otherwise XFS will try to save all of its inodes in the first 1TB of space, and with so many inodes needed, you may run out more quickly than you anticipate. Then you'll have "no space left on device" errors when df reports plenty of space (at least till you do df -i; actually I'm not 100% sure df -i will show it).

--keith

-- kkeller@wombat.san-francisco.ca.us

Valeri Galtsev

2:36 a.m.

On Mon, November 9, 2015 7:52 pm, Keith Keller wrote:

...

On 2015-11-09, John R Pierce pierce@hogranch.com wrote:

...
XFS handles this fine. I have a backuppc storage pool with backups of 27 servers going back a year... now, I just have 30 days of incrementals, and 12 months of fulls,

I'm sure you know this already, but for those who may not, be sure to mount your XFS filesystem with the inode64 option. Otherwise XFS will try to save all of its inodes in the first 1TB of space, and with so many inodes needed, you may run out more quickly than you anticipate. Then you'll have "no space left on device" errors when df reports plenty of space (at least till you do df -i; actually I'm not 100% sure df -i will show it).

I'm fully with you on -o inode64, but I would think it is not inode number that becomes large with extensive use of hard links, but the space used by directory data, thus requiring to relocate these once they exceed some size so ultimately some of them will be pushed beyond 1 TB border (depending on how the filesystem is used). Someone, correct me if I'm wrong.

Valeri

...

--keith

-- kkeller@wombat.san-francisco.ca.us

CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Keith Keller

3:42 a.m.

On 2015-11-10, Valeri Galtsev galtsev@kicp.uchicago.edu wrote:

...

I'm fully with you on -o inode64, but I would think it is not inode number that becomes large with extensive use of hard links, but the space used by directory data, thus requiring to relocate these once they exceed some size so ultimately some of them will be pushed beyond 1 TB border (depending on how the filesystem is used). Someone, correct me if I'm wrong.

Does this answer the question you're asking? I think so but I'm not sure.

http://www.xfs.org/index.php/XFS_FAQ#Q:_What_is_the_inode64_mount_option_for...

--keith

-- kkeller@wombat.san-francisco.ca.us

m.roth＠5-cent.us

9 Nov 9 Nov

9:02 p.m.

Valeri Galtsev wrote:

...

On Mon, November 9, 2015 12:42 pm, m.roth@5-cent.us wrote:

...
Gordon Messmer wrote:

...
On 11/09/2015 09:59 AM, John R Pierce wrote:

...
On 11/9/2015 9:50 AM, Gordon Messmer wrote:

...
I don't see the distinction you're making.

a incremental backup copies everything since the last incremental a differential copies everything since the last full.

I guess that makes sense, but in backup systems based on rsync and hard links (such as rsnapshot), *every* backup on the backup volume is a "full" backup, so incremental and differential are the same thing.

...
rsync is NOT a backup system, its just a incremental file copy

<snip>

...

...
Actually, we use rsync for backups. We have a script that creates a new daily directory... and uses hard links to previous dates. That way, it looks like a full b/u... but you can go to a previous date to restore an older version of the file (aka ACK! I saved that file full of garbage to my Great American Novel filename! <g>).

I wonder how filesystem behaves when almost every file has some 400 hard links to it. (thinking in terms of a year worth of daily backups).

That, I can't answer - what we have is "disaster recovery", not "archive", so we only keep them for no more than five weeks.

On the other hand... a reasonable approach would be for, over maybe two months old, to keep the first of the month, and rm everything else for the month.

mark

David Both

6:21 p.m.

I beg to differ.

The rsync command is a fantastic backup system. It may not meet your needs, but it works really great to make different types of backups for me. I have a script I use (automate everything) to perform nightly backups with rsync. Using rsync with USB external hard drives works far better than any other backup system I have ever tried.

As for your other statements, they may be meaningful to you and that is OK, but to me are just so much irrelevant semantics. If one's backup system works, terminology and which commands used to achieve it are beside the point - it is a true backup system.

On 11/09/2015 12:59 PM, John R Pierce wrote:

...

On 11/9/2015 9:50 AM, Gordon Messmer wrote:

...
I don't see the distinction you're making.

a incremental backup copies everything since the last incremental

a differential copies everything since the last full.

rsync is NOT a backup system, its just a incremental file copy

with the full/incremental/differential approach, a restore to a given date would need to restore the last full, then the last differential, then any incrementals since that differential, for instance, if you do monthly full, weekly differential and daily incrementals. If you don't use differentials, then you'd have to restore every incremental since that last full, which in a monthly full, daily incremental scenario could be as many as 30 incrementals.

--

David P. Both, RHCE Millennium Technology Consulting LLC Raleigh, NC, USA 919-389-8678

dboth@millennium-technology.com

www.millennium-technology.com www.databook.bz - Home of the DataBook for Linux DataBook is a Registered Trademark of David Both

This communication may be unlawfully collected and stored by the National Security Agency (NSA) in secret. The parties to this email do not consent to the retrieving or storing of this communication and any related metadata, as well as printing, copying, re-transmitting, disseminating, or otherwise using it. If you believe you have received this communication in error, please delete it immediately.

Александр Кириллов

7:36 p.m.

...

cp -a daily.0 daily.1

cp -al daily.0 daily.1

All these can be combined with an rsyncd module to allow read only root access to a remote system excluding the dirs you don't normally want to be backed up like /proc, /var/lib/mysql, /var/lib/libvirt, ...

Oops... My provider email gateway has been blacklisted by anti spam vigilantes.

Benjamin Smith

11 Nov 11 Nov

12:23 a.m.

On Monday, November 09, 2015 09:50:52 AM Gordon Messmer wrote:

...

...
How I can perform a diff backup?

Save yourself a lot of trouble and use a front-end like rsnapshot or backuppc.

If I may, I'd like to put in a plug for ZFS:

Combining rsync and ZFS, you can rsync, then make a ZFS snapshot, which gives you the best of both worlds:

1) No messy filesystem with multiple directories full of hardlinks to manage. 2) Immutable backups. 3) Crazy efficient storage space, including built-in compression. Much more efficient than rsync + hard links. 4) Ability to send the entire filesystem (binary perfect) to another system. 5) Ability to upgrade and add storage space without taking it offline. 6) Ability to "restore" a snapshot to read/write status in seconds with a clone that you can throw away later just as easily. 7) Or you can skip rsync, do the snapshots on the source server, and replicate the snapshots with send/receive. 8) Uses inexpensive, commodity hardware.

... and on and on....

We've moved *all* our backups to ZFS, the benefits are just too many. I'd like to plug BTRFS in a similar vein, but it's "not yet production ready" and it's been that way for a long, long time...

Ben S

Frank Thommen

9 Nov 9 Nov

10:19 p.m.

Ciao Alessandro,

On 11/09/2015 05:01 PM, Alessandro Baggi wrote:

...

Hi list, how to perform a differential backup using rsync?

On web there is a great confusion about diff backup concept when searched with rsync.

Users says diff because it copy only differences. For me differential is backup from last full backup.

Which is basically the same...if you always use your last full backup as "base" directory. Use rsyn's --link-dest option to achieve this. Nice thing: Unchanged files will just be hardlinked to the original files and won't use additional disk space, but still each dataset is a coopmlete backup. There is no need to combine several incremental or differential backups to restore a certain state.

Mike Rubel's page has already been mentioned. On http://www.drosera.ch/frank/computer/rsync.html I describe an alternate mechanism (using above mentioned --link-dest and an rsync-server) which overcomes some of the - imho - shortcomings of Mike's setup.

And: rsync is a fan-tas-tic backup tool ;-)

HTH Frank

Arun Khan

10 Nov 10 Nov

5:22 a.m.

On Mon, Nov 9, 2015 at 9:31 PM, Alessandro Baggi alessandro.baggi@gmail.com wrote:

...

Hi list, how to perform a differential backup using rsync?

On web there is a great confusion about diff backup concept when searched with rsync.

Users says diff because it copy only differences. For me differential is backup from last full backup.

You can use "newer" options of the find command and pass the file list to rsync or scp to "backup" only those files that have changed since the last run. You can keep a file like .lastbackup and timestamp it (touch) at the start of the backup process. Next backup you compare the current timestamp with the timestamp on this file.

HTH, -- Arun Khan

John Logsdon

8:18 a.m.

Folks

I have been using rsnapshot for years now. The only problem I've found is that it is possible to run out of inodes. So my heads-up is that when you create the file system, ensure you have more than the default inodes - I usually multiply the default by 10. Otherwise you can find your 1Tb USB drive failing after 259Mb and you can't then recover the files. Rather embarrassing.

...

On Mon, Nov 9, 2015 at 9:31 PM, Alessandro Baggi alessandro.baggi@gmail.com wrote:

...
Hi list, how to perform a differential backup using rsync?

On web there is a great confusion about diff backup concept when searched with rsync.

Users says diff because it copy only differences. For me differential is backup from last full backup.

You can use "newer" options of the find command and pass the file list to rsync or scp to "backup" only those files that have changed since the last run. You can keep a file like .lastbackup and timestamp it (touch) at the start of the backup process. Next backup you compare the current timestamp with the timestamp on this file.

HTH, -- Arun Khan _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Best wishes

John

John Logsdon Quantex Research Ltd +44 161 445 4951/+44 7717758675

John R Pierce

8:25 a.m.

On 11/10/2015 12:18 AM, John Logsdon wrote:

...

I have been using rsnapshot for years now. The only problem I've found is that it is possible to run out of inodes. So my heads-up is that when you create the file system, ensure you have more than the default inodes - I usually multiply the default by 10. Otherwise you can find your 1Tb USB drive failing after 259Mb and you can't then recover the files. Rather embarrassing.

or use a file system, like xfs, that has no static allocations.

-- john r pierce, recycling bits in santa cruz

John Logsdon

8:51 a.m.

Thanks John - I haven't used XFS.

This issue arose on ext3 I think some years ago on a rather elderly system. If XFS avoids this that's great but if someone is still using legacy systems, they need to be warned!

...

On 11/10/2015 12:18 AM, John Logsdon wrote:

...
I have been using rsnapshot for years now. The only problem I've found is that it is possible to run out of inodes. So my heads-up is that when you create the file system, ensure you have more than the default inodes - I usually multiply the default by 10. Otherwise you can find your 1Tb USB drive failing after 259Mb and you can't then recover the files. Rather embarrassing.

or use a file system, like xfs, that has no static allocations.

Best wishes

John

John Logsdon Quantex Research Ltd +44 161 445 4951/+44 7717758675

Arun Khan

8:46 a.m.

On Tue, Nov 10, 2015 at 10:52 AM, Arun Khan knura9@gmail.com wrote:

...

On Mon, Nov 9, 2015 at 9:31 PM, Alessandro Baggi alessandro.baggi@gmail.com wrote:

...
Hi list, how to perform a differential backup using rsync?

On web there is a great confusion about diff backup concept when searched with rsync.

Users says diff because it copy only differences. For me differential is backup from last full backup.

You can use "newer" options of the find command and pass the file list to rsync or scp to "backup" only those files that have changed since the last run. You can keep a file like .lastbackup and timestamp it (touch) at the start of the backup process. Next backup you compare the current timestamp with the timestamp on this file.

Clarification -- for diffrential back ups, you should touch the file only when you do the *full* backup.

-- Arun Khan

Gordon Messmer

3:46 p.m.

On 11/09/2015 09:22 PM, Arun Khan wrote:

...

You can use "newer" options of the find command and pass the file list to rsync or scp to "backup" only those files that have changed since the last run. You can keep a file like .lastbackup and timestamp it (touch) at the start of the backup process. Next backup you compare the current timestamp with the timestamp on this file.

Absolutely none of that is necessary with rsync, and the process you described is likely to miss files that are modified while "find" runs.

If you're going to use rsync to make backups, just use a frontend like rsnapshot or backuppc.

Warren Young

8:16 p.m.

On Nov 10, 2015, at 8:46 AM, Gordon Messmer gordon.messmer@gmail.com wrote:

...

On 11/09/2015 09:22 PM, Arun Khan wrote:

...
You can use "newer" options of the find command and pass the file list

the process you described is likely to miss files that are modified while "find" runs.

Well, be fair, rsync can also miss files if files are changing while the backup occurs. Once rsync has passed through a given section of the tree, it will not see any subsequent changes.

If you need guaranteed-complete filesystem-level snapshots, you need to be using something at the kernel level that can atomically collect the set of modified blocks/files, rather than something that crawls the tree in user space.

On the BSD Now podcast, they recently told a war story about moving one of the main FreeBSD servers to a new data center. rsync was taking 21 hours in back-to-back runs purely due to the amount of files on that server, which gave plenty of time for files to change since the last run.

Solution? ZFS send:

http://128bitstudios.com/2010/07/23/fun-with-zfs-send-and-receive/

Gordon Messmer

9:05 p.m.

On 11/10/2015 12:16 PM, Warren Young wrote:

...

Well, be fair, rsync can also miss files if files are changing while the backup occurs. Once rsync has passed through a given section of the tree, it will not see any subsequent changes.

I think you miss my meaning. Consider this sequence of events:

* "find" begins and processes dirA and then dirB * another application writes files in dirA * "find" completes * a new timestamp file is written

Now, the new file in dirA wasn't seen by find during this run, and it won't be seen on the next run either. That's what I mean by missed. Not temporarily missed, but permanently. That file won't ever be backed up in this very naïve process.

There's no benefit to the process, either. rsync can efficiently examine and synchronize filesystems without using find. And while it may miss files that are written while it's running, it *will* get them on the next run, unlike using "find".

...

If you need guaranteed-complete filesystem-level snapshots, you need to be using something at the kernel level that can atomically collect the set of modified blocks/files, rather than something that crawls the tree in user space.

Generally, I agree with you. In fact: https://bitbucket.org/gordonmessmer/dragonsdawn-snapshot https://github.com/rsnapshot/rsnapshot/pull/44

Doing block-level differentials is nice, if you're using ZFS. But not everyone wants to run ZFS on Linux. I do think that backing up snapshots is important, though.

J Martin Rushton

11:38 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 10/11/15 21:05, Gordon Messmer wrote:

...

On 11/10/2015 12:16 PM, Warren Young wrote:

...
Well, be fair, rsync can also miss files if files are changing while the backup occurs. Once rsync has passed through a given section of the tree, it will not see any subsequent changes.

I think you miss my meaning. Consider this sequence of events:

"find" begins and processes dirA and then dirB * another

application writes files in dirA * "find" completes * a new timestamp file is written

Now, the new file in dirA wasn't seen by find during this run, and it won't be seen on the next run either. That's what I mean by missed. Not temporarily missed, but permanently. That file won't ever be backed up in this very naïve process.

That's plain bad system analysis. Read the start date, record the current date and THEN start processing. You will get the odd extra file but will not loose any.

...

There's no benefit to the process, either. rsync can efficiently examine and synchronize filesystems without using find. And while it may miss files that are written while it's running, it *will* get them on the next run, unlike using "find".

...
If you need guaranteed-complete filesystem-level snapshots, you need to be using something at the kernel level that can atomically collect the set of modified blocks/files, rather than something that crawls the tree in user space.

Generally, I agree with you. In fact: https://bitbucket.org/gordonmessmer/dragonsdawn-snapshot https://github.com/rsnapshot/rsnapshot/pull/44

Doing block-level differentials is nice, if you're using ZFS. But not everyone wants to run ZFS on Linux. I do think that backing up snapshots is important, though.

_______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJWQn/cAAoJEAF3yXsqtyBl6YwQAKRneHW0APPxY9znjXFcSiuC M3r5a5cYDOIdXNkFRg3Psoa38vaiHnmV3BFWfWWZzo4qAYfUP9fNP5i9McXWslK+ Jv2S0s4WoUcvyguRStpw++HmOBJ4Hywr2qXQnnPNuzuJXLD/EgxGgA5hv0lbYAN1 fzci8UuGpZI/WsCMIib3f8u6tQ4CTsaZm7AUjAZ/EdxyhNl17bjAC/U9tWYpCKtr DF5jEyU18ZiHcWDym45SI1fjaiRf8jR0OrennViFyThYVLsaYgRiHn0gZAl9H2jx 5oL0fONkNY0f4hNN0XFyJkbDfdF4QCNu6WBHByv5I2MlOBIZBOI5zWagRTZ1uLCG mMvpnj2Mmbugq+8p+XAhyB/xvMLaIA5suUSmIDej+rylH5FgrSUa+aHflUNBWZcK P7odBqEDIc27CCSckdrAxY16+Drs5gjfw0EFmKXVaZ2SIyeJ4gTKGororEdOH9bI ry6ItSuKD732OLx33/j15YuEUBhzwGHYiuE4xCZCegKK64/zFvVtLVO6z6kBN17T BtOMX6VOyfJl3zxmlMfiZx6A01TxSqOw+QS9Tepcz0tAauFGN3UOow/Kv2RnrwhZ LwbjRqPto/0599owBromt6lzZSoI9/4JX6DS+XQeLOPsuq8z6myxZc6AXFxE20NR G9NoAxU24PbTq4VtLiKV =+3+X -----END PGP SIGNATURE-----

Gordon Messmer

11 Nov 11 Nov

12:09 a.m.

On 11/10/2015 03:38 PM, J Martin Rushton wrote:

...

That's plain bad system analysis. Read the start date, record the current date and THEN start processing. You will get the odd extra file but will not loose any.

That's my point. "find" doesn't do that and naïve implementations of the original suggestion are likely to do work poorly. For no reason. Just don't use "find" to feed rsync a list of files to sync. It's not more efficient, it might miss files, it won't sync deleted files, etc etc. rsync is designed to synchronize two directory trees. It doesn't need external helpers (except for a pipe, like ssh).

Arun Khan

7:27 a.m.

On Wed, Nov 11, 2015 at 5:39 AM, Gordon Messmer gordon.messmer@gmail.com wrote:

...

On 11/10/2015 03:38 PM, J Martin Rushton wrote:

...
That's plain bad system analysis. Read the start date, record the current date and THEN start processing. You will get the odd extra file but will not loose any.

That's my point. "find" doesn't do that and naïve implementations of the original suggestion are likely to do work poorly.

<.... snip ...>

A good systems analysis is a must in whatever one does. Be it system admin, software developer, accountant, lawyer etc.

My suggestion about using "find" was in response to OP's question/clarification on incremental/differential backup and I assumed due diligence with respective to designing the script.

<quote> how to perform a differential backup using rsync?

On web there is a great confusion about diff backup concept when searched with rsync. </quote>

rsync will do incremental backup as already discussed earlier in this thread.

Please suggest how to achieve a differential backup with rsync (the original query).

Thanks, -- Arun Khan

Gordon Messmer

8:47 a.m.

On 11/10/2015 11:27 PM, Arun Khan wrote:

...

rsync will do incremental backup as already discussed earlier in this thread.

Please suggest how to achieve a differential backup with rsync (the original query).

Already answered. Under rsync based backup systems like rsnapshot, every backup is a full backup. Therefore, incremental and differential backups are the same thing. As you already understand that rsync will do incremental backups without using find, you also understand that it will do differential backups without using find.

Arun Khan

6:21 a.m.

On Wed, Nov 11, 2015 at 5:08 AM, J Martin Rushton martinrushton56@btinternet.com wrote:

...

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 10/11/15 21:05, Gordon Messmer wrote:

...
On 11/10/2015 12:16 PM, Warren Young wrote:

...
Well, be fair, rsync can also miss files if files are changing while the backup occurs. Once rsync has passed through a given section of the tree, it will not see any subsequent changes.

I think you miss my meaning. Consider this sequence of events:

"find" begins and processes dirA and then dirB * another

application writes files in dirA * "find" completes * a new timestamp file is written

Now, the new file in dirA wasn't seen by find during this run, and it won't be seen on the next run either. That's what I mean by missed. Not temporarily missed, but permanently. That file won't ever be backed up in this very naïve process.

That's plain bad system analysis. Read the start date, record the current date and THEN start processing. You will get the odd extra file but will not loose any.

Heartily agree. I was about to post my response but saw yours.

Cheers, -- Arun Khan

Benjamin Smith

13 Nov 13 Nov

1:52 a.m.

I did exactly this with ZFS on Linux and cut over 24 hours of backup lag to just minutes.

If you're managing data at scale, ZFS just rocks...

On Tuesday, November 10, 2015 01:16:28 PM Warren Young wrote:

...

On Nov 10, 2015, at 8:46 AM, Gordon Messmer gordon.messmer@gmail.com

wrote:

...

...
On 11/09/2015 09:22 PM, Arun Khan wrote:

...
You can use "newer" options of the find command and pass the file list

the process you described is likely to miss files that are modified while "find" runs.

Well, be fair, rsync can also miss files if files are changing while the backup occurs. Once rsync has passed through a given section of the tree, it will not see any subsequent changes.

If you need guaranteed-complete filesystem-level snapshots, you need to be using something at the kernel level that can atomically collect the set of modified blocks/files, rather than something that crawls the tree in user space.

On the BSD Now podcast, they recently told a war story about moving one of the main FreeBSD servers to a new data center. rsync was taking 21 hours in back-to-back runs purely due to the amount of files on that server, which gave plenty of time for files to change since the last run.

Solution? ZFS send:

http://128bitstudios.com/2010/07/23/fun-with-zfs-send-and-receive/ _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

J Martin Rushton

9:46 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 13/11/15 01:52, Benjamin Smith wrote:

...

I did exactly this with ZFS on Linux and cut over 24 hours of backup lag to just minutes.

If you're managing data at scale, ZFS just rocks...

On Tuesday, November 10, 2015 01:16:28 PM Warren Young wrote:

...
On Nov 10, 2015, at 8:46 AM, Gordon Messmer gordon.messmer@gmail.com

wrote:

...
...
On 11/09/2015 09:22 PM, Arun Khan wrote:

...
You can use "newer" options of the find command and pass the file list

the process you described is likely to miss files that are modified while "find" runs.

Well, be fair, rsync can also miss files if files are changing while the backup occurs. Once rsync has passed through a given section of the tree, it will not see any subsequent changes.

If you need guaranteed-complete filesystem-level snapshots, you need to be using something at the kernel level that can atomically collect the set of modified blocks/files, rather than something that crawls the tree in user space.

On the BSD Now podcast, they recently told a war story about moving one of the main FreeBSD servers to a new data center. rsync was taking 21 hours in back-to-back runs purely due to the amount of files on that server, which gave plenty of time for files to change since the last run.

Solution? ZFS send:

http://128bitstudios.com/2010/07/23/fun-with-zfs-send-and-receive/

_______________________________________________

...

...
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

_______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

If you really _need_ the guarantee of a snapshot, consider either LVM or RAID1. Break out a volume from the RAID set, back it up, then rebuild. If you are paranoid you might want to consider a 3-way RAID1 to ensure you have full shadowing during the backup. Some commercial filesystems (such as IBM's GPFS) also include a snapshot command, but you may need deep pockets.

Other than that, accept as harmless the fact that your backup takes a finite time. Provided that you record the time before starting the sweep, and do the next incremental from that time, then you will catch all files eventually. The time lag shouldn't be much though, decent backup systems scan the sources and generate a work list before starting to move data.

OT - is ZFS part of the CentOS distro? I did a quick yum list | grep - -i zfs and got nothing on a 7.1.1503.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJWRbGGAAoJEAF3yXsqtyBlxfoP/2Z7GrmFXz/me/w6Ps9gvdWh d6wWzwwBUooMjOiUFXspzkXpAIm12cB+M2il4KOutknWDbL5/8iMGgUykoj9Xh7J lqoG76RJfttdyBomivOqix4Ylx8RMA/SXQlFzJHp7oPowSh8MJpfRXIsWQZCMxWH 8v/gk5ZyGN5Lax9yM6MAyT7YVxh39mYh9+wsP8i3No3UjCLoGBGMEMG75doyl1Tp zo91dvcYjLiO71zhjg2wG3YJYsxYleJoCHEo/L+2/OgbUi+Pm8JnG0FpinyTbsMm XkIGcjC+EMS2DQ4rerE+sHnr10N+Z+KJYyk5YX7TEXyID0Vmglfl8ApBMlL44MtK RhKieM+3KTqmHAwjoQh37RNH9Sfadq130GgPaeHxEWXGQwkaivQaLcge72Cj5r6w ZCpivejBHpHqyyKnlQcwFwIsoGoWftVJrKZ27tolxIG3A3KiJTGo/uCP5hW/fVjV giNyRAjfTE8cT5DxOcssIAfjtFwpfx2XxsI4T0p1Hof7S0jYLmCZeQzNGSfMwtvX oHnXcYg7cer2D4Xfwy9dHGkmqAjMVFUvFMqt6X1EfKfZVtSY3WUAqeouSP57bajZ KaW/WXJjgp9kF5mKCuS5UsmjnvXJfvmBAcDhzPl3Ut+9Y0oD0/qtfm4Mk9Oyexpn knL5rfIPygISKma6OTx9 =FM4A -----END PGP SIGNATURE-----

Gordon Messmer

5:55 p.m.

On 11/13/2015 01:46 AM, J Martin Rushton wrote:

...

If you really_need_ the guarantee of a snapshot, consider either LVM or RAID1. Break out a volume from the RAID set, back it up, then rebuild.

FFS, don't do the latter. LVM is the standard filesystem backing for Red Hat and CentOS systems, and fully supports consistent snapshots without doing half-ass shit like breaking a RAID volume.

Breaking a RAID volume doesn't make filesystems consistent, so when you try to mount it, you might have a corrupt filesystem, or corrupt data. Breaking the RAID will duplicate UUIDs of filesystems and the name of volume groups. There are a whole bunch of configurations where it just won't work. At best, it's unreliable. Never do this. Don't advise other people to do it. Use LVM snapshots (or ZFS if that's an option for you).

J Martin Rushton

8:59 p.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 13/11/15 17:55, Gordon Messmer wrote:

...

On 11/13/2015 01:46 AM, J Martin Rushton wrote:

...
If you really_need_ the guarantee of a snapshot, consider either LVM or RAID1. Break out a volume from the RAID set, back it up, then rebuild.

FFS, don't do the latter. LVM is the standard filesystem backing for Red Hat and CentOS systems, and fully supports consistent snapshots without doing half-ass shit like breaking a RAID volume.

Breaking a RAID volume doesn't make filesystems consistent, so when you try to mount it, you might have a corrupt filesystem, or corrupt data. Breaking the RAID will duplicate UUIDs of filesystems and the name of volume groups. There are a whole bunch of configurations where it just won't work. At best, it's unreliable. Never do this. Don't advise other people to do it. Use LVM snapshots (or ZFS if that's an option for you). _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

Maybe I should have been clearer: use (LVM) OR (RAID1 and break). Don't use LVM and break, that would be silly.

I hope I'm wrong, but you wouldn't be thinking of mounting the broken out copy on a the same system would you? You must never do that, not even during disaster recovery. Use dd or similar on the disk, not the mounted partitions - isn't that obvious? I wasn't trying to give step by step instructions.

Way before LVM existed we used this technique to back up VAXes (and later Alphas) under VMS using "volume shadowing" (ie RAID1). It worked quite happily for several years with disks shared across the cluster. IIRC it was actually recommended by DEC, indeed a selling point, but I don't have any manuals to hand to confirm that nowadays! One thing I did omit was you MUST sync first (there was an equivalent VMS command, don't ask me now), and also ensure that as the disks are added back a full catchup copy occurs. You may consider it half a mule's droppings, but it is, after all, what happens if you loose a spindle and hot replace.

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJWRk8fAAoJEAF3yXsqtyBlM50P/iHt5rwT/sGWSaNnsBNNoS0L WKb8Z9M7nVwaWsjceHPwEWMDrW2M7TUlXWCWhmDOL5oWP2PtX5J0YkXZ3ADmn5cp GE8gKmFDIdoepxs/GQREpryh+mT+kyhr+3WIISgSplEsGP0PezEBEX5jemvMAFcn bVH4KYj5Cqt/xludubqxaNe/GF72FwJKVl/ie5GIMF1gk039QpykOvI8GZzcXXVU /vEUH72i+JOgyrLCMIuzH6na2YSiXI1pav8NnPV4pZCX6Rre8/MTtNGRd8Lda3zR Nqb1Jow7ozTRwYWJpORU0ZiPN4aTQakktSLuPxN3KpAFOUiKbt4EMAI7YwceYh0b DwE7fml1auINn2XhwLYyHyX6bu0TJQmC8PbOxtx2J79wO0707ZPpvNN2imgGqYbg zdO7cQMlI04MqeRn9A+OgtLzAh/yrJaVDNYNN6OFbSpyfB0FrmrZpKxozX2gMOp2 C1WBffSflKN+RLAaWGsXY+CIDyHvkIJifUx+618O1iOqxXlWuMTmMa+Ez4DVpVLZ SIoQBmcE950ZE6mZZazdb2rGVC1OhQVcvsIxv3qDqxPUkBP5rrbYhW3SziqaOZHj M5o5iVkCRwBbxyV8GyVK8YYgBlu9CqxjMAfuNxT8aWZfl5kXVMxmfU8W2x4Pgjm1 n7ygIadGuw+cXEcr9ech =8SBu -----END PGP SIGNATURE-----

Gordon Messmer

14 Nov 14 Nov

12:42 a.m.

On 11/13/2015 12:59 PM, J Martin Rushton wrote:

...

Maybe I should have been clearer: use (LVM) OR (RAID1 and break).

I took your meaning. I'm saying that's a terrible backup strategy, for a list of reasons.

For instance, it only works if you mirror a single disk. It doesn't work if you use RAID10 or RAID5, or RAID6, or RAIDZ, etc. Breaking RAID doesn't make the data consistent, so you might have corrupt files (especially if the system runs any kind of database. SQL, LDAP, etc). It doesn't make the filesystem consistent, so you might have a corrupt filesystem.

Even if you ignore the potential for corruption, you have a backup process that only works on some specific hardware configurations. Everything else has to have a different backup solution. That's insane. Use one backup process that works for everything. You're much more likely to consistently back up your data that way.

...

I hope I'm wrong, but you wouldn't be thinking of mounting the broken out copy on a the same system would you? You must never do that, not even during disaster recovery. Use dd or similar on the disk, not the mounted partitions - isn't that obvious? I wasn't trying to give step by step instructions.

Well, that's *one* of the problems with your advice. Even if we ignore the fact that it doesn't work reliably (and IMO, it therefore doesn't work), it's far more complicated than you pretend it is.

Because now you're talking about quiescing your services, breaking your RAID, physically removing the drive, connecting it to another system, fsck the filesystems, mount them, and backing up the data. For each backup. Every day.

Or using 'dd' and... backing up the whole image? No incremental or differentials?

Your process involves a human being doing physical tasks as part of the backup. Maybe I'm the only one, but I want my backups fully automated. People make mistakes. I don't want them involved in regular processes. In fact, the entire point of computing is that the computer should do the work so that I don't have to.

...

Way before LVM existed we used this technique to back up VAXes (and later Alphas) under VMS using "volume shadowing" (ie RAID1). It worked quite happily for several years with disks shared across the cluster. IIRC it was actually recommended by DEC, indeed a selling point, but I don't have any manuals to hand to confirm that nowadays! One thing I did omit was you MUST sync first

sync flushes the OS data buffers to disk, but it does not sync application data buffers, it does not flush the journal, it doesn't make filesystems "clean", and even if you break the RAID volume immediately after "sync" there's no guarantee that there weren't cached writes from other processes in between those two steps.

There is absolutely no way to make this a reliable process without a full shutdown.

J Martin Rushton

11:04 a.m.

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

Have a coffee or a beer, breathe deeply, then:

On 14/11/15 00:42, Gordon Messmer wrote:

...

On 11/13/2015 12:59 PM, J Martin Rushton wrote:

...
Maybe I should have been clearer: use (LVM) OR (RAID1 and break).

I took your meaning. I'm saying that's a terrible backup strategy, for a list of reasons.

For instance, it only works if you mirror a single disk. It doesn't work if you use RAID10 or RAID5, or RAID6, or RAIDZ, etc.

That of course is exactly why I said RAID1.

Breaking RAID

...

doesn't make the data consistent, so you might have corrupt files (especially if the system runs any kind of database. SQL, LDAP, etc). It doesn't make the filesystem consistent, so you might have a corrupt filesystem.

Possibly, but that is another problem altogether. Any low level backup will do the same. You need to have an understanding of the filesystem to handle filesystem problems. Even if the utility understands the filesystem you have problems with open files such as databases.

More generally, for anything except a trivial database you should use the database to dump itself; for instance using mysqldump. Have a look at the page https://mariadb.com/kb/en/mariadb/backup-and-restore-overview/ for (as it says) an overview. Try running a database backup timed to complete before your normal filesystem backups run, whatever method you use.

...

Even if you ignore the potential for corruption, you have a backup process that only works on some specific hardware configurations. Everything else has to have a different backup solution. That's insane. Use one backup process that works for everything. You're much more likely to consistently back up your data that way.

Remember that this is a last resort if (1) the user can't accept more sensible backups and handle (or let the backup handle) the dates safely; (2) the user insists on a snapshot; (3) the user can't use a filesytem snapshot (ZFS, GPFS etc) and (4) the user can't/won't use LVM. You can't refuse to use better solutions" and then complain that last resort is not as good as the better solutions"!

...

...
I hope I'm wrong, but you wouldn't be thinking of mounting the broken out copy on a the same system would you? You must never do that, not even during disaster recovery. Use dd or similar on the disk, not the mounted partitions - isn't that obvious? I wasn't trying to give step by step instructions.

Well, that's *one* of the problems with your advice. Even if we ignore the fact that it doesn't work reliably (and IMO, it therefore doesn't work), it's far more complicated than you pretend it is.

Because now you're talking about quiescing your services, breaking your RAID, physically removing the drive, connecting it to another system, fsck the filesystems, mount them, and backing up the data. For each backup. Every day.

No need to remove if you handle whole disk. When we used this technique we only did it monthly - it would be pretty crazy to do level 0 backups daily.

...

Or using 'dd' and... backing up the whole image? No incremental or differentials?

See the previous.

...

Your process involves a human being doing physical tasks as part of the backup. Maybe I'm the only one, but I want my backups fully automated. People make mistakes. I don't want them involved in regular processes. In fact, the entire point of computing is that the computer should do the work so that I don't have to.

See the comments about using better solutions. I'd be worried though if you use a solution that doesn't remove the backup media from the vicinity of the machine. Fine if you have a remote site, but otherwise you still need a person to physically take the tapes (or whatever) out of the machine room to fireproof storage. That's pretty manual.

...

...
Way before LVM existed we used this technique to back up VAXes (and later Alphas) under VMS using "volume shadowing" (ie RAID1). It worked quite happily for several years with disks shared across the cluster. IIRC it was actually recommended by DEC, indeed a selling point, but I don't have any manuals to hand to confirm that nowadays! One thing I did omit was you MUST sync first

sync flushes the OS data buffers to disk, but it does not sync application data buffers, it does not flush the journal, it doesn't make filesystems "clean", and even if you break the RAID volume immediately after "sync" there's no guarantee that there weren't cached writes from other processes in between those two steps.

The journal is a fair point if it is stored on an separate spindle, as for instance is possible under XFS.

...

There is absolutely no way to make this a reliable process without a full shutdown.

Not IME. At that date the preferred method for monthly backups was a shutdown and standalone utility for disk-disk copies, but that was not always possible. The technique worked.

...

_______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos

...PGP SIGNATURE...

-----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAEBAgAGBQJWRxU9AAoJEAF3yXsqtyBldK8P+gMocnEFL0d5ciFhl/QUj50V z1GU4zMOhJeVZgS+KW2WM48/YYd9XdTX82G3352UEbnwOd7OmWkt3JhQ5QsmeZRP F1AwmetHCt0+RtQli9uAPywGvPtnc7ROJPEznZa97YJU4G56/8sEqxA26On5G2h9 uCNUG69dyI4yhAH/liW76iJWRZt6TJQVKaHeMXUX9lqdTACZ64WCWAS+dJACmMiA mrOYFUbey5EBRHcqlXYX4Az3O/9btD2++bTdqqMJ3BN8Q7NF3pbfrxVvqeghR8mV kqkFs5W7kk4xbJS+yMgbMnPkE4LpCxgIDBpKg/7pLqYVBjs91TqzSXWWVGluAdM4 5I4mI5lbvqA+OZjV5sIfKhyv+SfrQJm0Y6+FXZjPq1ul9xVbi9DWYMVEJGeRc+Gj bbU83nnK7L01i6yEANP6UIN07BKfciAwrDHy6VZBJsQn4cM2ce0YGgKMlobfgp1D XFuR2RncDzcgpVEhz9r4nsc9vVt3WRQrk4KcxP1AA5VjMR6YD8wS47Ssox1nNnx8 T85DCupNZsXIlUp7AqWiSZTLYx9O9Ulkdhpt2uUx4/aC0GIUdNnyGEpcyHGlkI3K FAYXSYF5nEukpU5km0iX67vAcJe9EjfiuEIwp0w25YdNIQYzOI/HnuPgSTEsO1au J9hexRZOa30aSACVye8S =r9jt -----END PGP SIGNATURE-----

Gordon Messmer

8:02 p.m.

On 11/14/2015 03:04 AM, J Martin Rushton wrote:

...

On 14/11/15 00:42, Gordon Messmer wrote:

...
For instance, it only works if you mirror a single disk. It doesn't work if you use RAID10 or RAID5, or RAID6, or RAIDZ, etc.

That of course is exactly why I said RAID1.

I know. And I was trying to make the point that the process of breaking RAID1 for backup purposes is inflexible in addition to being unreliable. Users should not have to re-engineer their backup system for every hardware configuration.

...

Breaking RAID

...
doesn't make the data consistent, so you might have corrupt files (especially if the system runs any kind of database. SQL, LDAP, etc). It doesn't make the filesystem consistent, so you might have a corrupt filesystem.

Possibly, but that is another problem altogether. Any low level backup will do the same.

If you were to attempt a block-level backup of the raw device, then yes, you would have similar problems. But since that is insane, and no one is suggesting that process, I didn't feel the need to address it.

...

You need to have an understanding of the filesystem to handle filesystem problems. Even if the utility understands the filesystem you have problems with open files such as databases.

There *are* tools that exist to dump filesystems, but they're not intended to be used for backup, and they won't operate on mounted filesystems. For instance, clonezilla includes tools to dump ext4 and ntfs filesystems for the purpose of cloning a system. You could treat that as a backup, but you have to shut down the host OS to boot clonezilla.

...

More generally, for anything except a trivial database you should use the database to dump itself; for instance using mysqldump.

Uhh.... no. I'd argue the opposite. You should only use a DB dump tools for trivial databases (or in some cases, such as PostgreSQL, upgrades). Dumping a database is *slow*. The only thing slower than dumping a database is restoring a database dump. If you have a non-trivial database, you definitely want to quiesce, snapshot, resume, and back up the snapshot.

...

Have a look at the page https://mariadb.com/kb/en/mariadb/backup-and-restore-overview/ for (as it says) an overview. Try running a database backup timed to complete before your normal filesystem backups run, whatever method you use.

Again, you seem entirely too willing to accept unreliable processes. Timing? You should absolutely, under no circumstances, trust the timing of two processes to not overlap. If you're dumping data, you should either trigger the backup from the dump job, after it completes, or you should employ a locking system so that only one of the two processes can operate simultaneously.

...

Remember that this is a last resort if (1) the user can't accept more sensible backups and handle (or let the backup handle) the dates safely; (2) the user insists on a snapshot; (3) the user can't use a filesytem snapshot (ZFS, GPFS etc) and (4) the user can't/won't use LVM. You can't refuse to use better solutions" and then complain that last resort is not as good as the better solutions"!

No one is refusing better solutions. You are tilting at windmills.

...

See the comments about using better solutions. I'd be worried though if you use a solution that doesn't remove the backup media from the vicinity of the machine. Fine if you have a remote site

We agree, there. You should have backups in a physically separate location.

Mark Milhollan

5:01 p.m.

On Fri, 13 Nov 2015, Gordon Messmer wrote:

...

Breaking a RAID volume doesn't make filesystems consistent,

While using LVM arranges for some filesystems to be consistent (it is not always possible), it does nothing to ensure application consistency which can be just as important. Linux doesn't have a widely deployed analog to Windows' VSS, which provides both though only for those that cooperate. On Linux you must arrange to quiesce applications yourself, which is seldom possible.

...

Breaking the RAID will duplicate UUIDs of filesystems and the name of volume groups.

Making an LVM snapshot duplicates UUIDs (and LABELs) too, the whole LV is the same in the snapshot as it was in the source. There are ways to cope with that for XFS (I usually use mount -ro nouuid) -- ext2/3/4 doesn't care (so just mount -r for them). If the original filesystem isn't yet mounted then a mount by uuid (or label) would not be pretty for either. And that's just two filesystems, others are supported and they too will potentially have issues.

/mark

Gordon Messmer

8:12 p.m.

On 11/14/2015 09:01 AM, Mark Milhollan wrote:

...

On Fri, 13 Nov 2015, Gordon Messmer wrote:

...
Breaking a RAID volume doesn't make filesystems consistent,

While using LVM arranges for some filesystems to be consistent (it is not always possible)

Can you explain what you mean? The standard filesytems, ext4 and XFS, both will be made consistent when making an LVM snapshot.

...

, it does nothing to ensure application consistency which can be just as important. Linux doesn't have a widely deployed analog to Windows' VSS, which provides both though only for those that cooperate.

I know. That's why I wrote snapshot: https://bitbucket.org/gordonmessmer/dragonsdawn-snapshot

...

On Linux you must arrange to quiesce applications yourself, which is seldom possible.

I have not found that to be true. Examples?

...

...
Breaking the RAID will duplicate UUIDs of filesystems and the name of volume groups.

Making an LVM snapshot duplicates UUIDs (and LABELs) too, the whole LV is the same in the snapshot as it was in the source.

The VG name is the bigger problem. If you tried to activate the VG in the broken RAID1 component, Very Bad Things(TM) would happen.

Anthony K

1:55 a.m.

On 11/11/15 02:46, Gordon Messmer wrote:

...

... the process you described is likely to miss files that are modified while "find" runs.

That's just being picky for the sake of it. A backup is a *point-in-time* snapshot of the files being backed up. It will not capture files modified after that point.

So, saying that find won't find files modified while the backup is running is frankly the same as saying it won't find files modified anytime in the future after that *point-in-time* when the backup started!

If there's a point to be made by the quoted statement above, I missed it and I surely deserve to be educated!

ak.

3633

Age (days ago)

3638

Last active (days ago)

discuss@lists.centos.org

45 comments

18 participants

tags (0)

participants (18)

Alessandro Baggi
Anthony K
Arun Khan
Benjamin Smith
David Both
Frank Cox
Frank Thommen
Gordon Messmer
J Martin Rushton
John Logsdon
John R Pierce
Keith Keller
m.roth＠5-cent.us
Mark Milhollan
melkor.kp
Valeri Galtsev
Warren Young
Александр Кириллов