On Mon, 2006-11-06 at 18:42 +0000, Peter Crighton wrote:
You wrote "Hardlinks are key to this backup strategy. Using cp -al creates hardlinks to files, and this simple command is what does all the heavy lifting for daily and weekly backups. Wikipedia has a very good explanation on how hardlinks work. In a nutshell, when there's a hardlink pointing to a file from the hourly directory, to a file in the current directory, and that current file gets deleted, all the links that point to that now deleted current file gets the file data 'pushed' back towards all the links. I'll have to think how to explain this better."
Do you mean that the hourly files are written when created, the hardlink for the daily doesn't actually copy the file (simply makes a link), but if the file is set to be deleted from it's location (because it's gone from the server) then it is actually moved so that it still exists in the daily backup but is removed from the hourly? --
Think of all directory entries as links. The real entries that map disk space to files are inodes and links are names pointing to the inodes. There can be any number - including 0 - of links to an inode. The space is not released for re-use until the link count goes to 0 and no process has the file open. So hardlinks are just multiple names pointing to the same data, and the data doesn't go away until the last name is removed. Note that this only works as a backup if the original filename is removed. If it is overwritten or truncated instead, all links now point to the changed version.