[CentOS] Deduplicated archives via hardlinks [Was: XFS or EXT3 ?]

Bowie Bailey Bowie_Bailey at BUC.com
Fri Dec 3 21:36:51 UTC 2010



On 12/3/2010 4:14 PM, Adam Tauno Williams wrote:
> On Fri, 2010-12-03 at 12:51 -0800, John R Pierce wrote: 
>> On 12/03/10 12:25 PM, Les Mikesell wrote:
>>> Whenever anyone mentions backups, I like to plug the backuppc program
>>> (http://backuppc.sourceforge.net/index.html and packaged in EPEL).  It
>>> uses compression and hardlinks all duplicate files to keep much more
>>> history than you'd expect on line with a nice web interface - and does
>>> pretty much everything automatically.
>> I'm curious how you backup backuppc, like for disaster recovery, 
> I know nothing about backuppc;  I don't use it.  But we use rsync with
> the same concept for a deduplicated archive.
>
>> archival, etc?   since all the files are in a giant mess of symlinks
> No, they are not symbolic links - they are *hard links*.   That they are
> hard-links is the actual magic.  Symbolic links would provide the
> automatic deallocation of expires files.
>
>> (for deduplication) with versioning, I'd have to assume the archive 
>> volume gets really messy after awhile, and further, something like that 
>> is pretty darn hard to make a replica of it.
> I don't see why;  only the archive is deduplicated in this manner, and
> it certainly isn't "messy".  One simply makes a backup [for us that
> means to tape - a disk is not a backup] of the most current snapshot.

Actually, making a backup of BackupPC's data pool (or just moving it to
new disks) does get messy.  With a large pool there are so many
hardlinks that rsync has trouble dealing with it, eats all your memory,
and takes forever.  This is a frequent topic of conversation on the
BackupPC list.  However, the next major version of BackupPC is supposed
to use a different method of deduplication that will not use hardlinks
and will be much easier to back up.

-- 
Bowie



More information about the CentOS mailing list