New subject: Best backup software for linux

6 Nov 2006


      Les Mikesell wrote:
...
On Mon, 2006-11-06 at 18:42 +0000, Peter Crighton wrote:
...
You wrote "Hardlinks are key to this backup strategy. Using cp -al
creates hardlinks to files, and this simple command is what does all
the heavy lifting for daily and weekly backups. Wikipedia has a very
good explanation on how hardlinks work. In a nutshell, when there's a
hardlink pointing to a file from the hourly directory, to a file in
the current directory, and that current file gets deleted, all the
links that point to that now deleted current file gets the file data
'pushed' back towards all the links. I'll have to think how to
explain this better."
Do you mean that the hourly files are written when created, the
hardlink for the daily doesn't actually copy the file (simply makes a
link), but if the file is set to be deleted from it's location
(because it's gone from the server) then it is actually moved so that
it still exists in the daily backup  but is removed from the hourly?
--
Think of all directory entries as links.  The real entries that
map disk space to files are inodes and links are names pointing
to the inodes.  There can be any number - including 0 - of links
to an inode.  The space is not released for re-use until the
link count goes to 0 and no process has the file open.  So hardlinks
are just multiple names pointing to the same data, and the data
doesn't go away until the last name is removed.
You did much better explaining what's going on with hardlinks than I did.
I'm going to have to rewrite that part of the blog a few times before it
reads better. I can picture it all in my head, but describing how it works
is another.
...
Note that this only works as a backup if the original filename is removed.
If it
...
is overwritten or truncated instead, all links now point to the changed
version.
This is true if you're doing it with only filesystem tools, but this system
is using rsync. What's happening is the cp -al occurs first making hardlinks
that point to an hourly directory into the current directory, then rsync is
run to update current. Because rsync will create a new temp file when any
file changes, the original is deleted with it's data 'pushed' to any
hardlinks pointing at the original file. Rsync then renames the temp file
the original file name that has changed, therefore assuring that any
hardlinks will always have the previous copy of any changed files. With
rsync running in --delete mode, any files from the source server that get
deleted, will get deleted out of current in the backup server, causing this
cascade of hardlinks to get updated with the deleted files data. That's how
this system can create incremental backups of only changed data, but with
hardlinks, it looks like full backups are made each and every time. Really
saves disk space, that's for sure!
Hope this clears things up...
Mark

RE: [CentOS] Best backup software for linux