[CentOS] Deduplicated archives via hardlinks [Was: XFS or EXT3 ?]

Fri Dec 3 22:54:50 UTC 2010
Les Mikesell <lesmikesell at gmail.com>

On 12/3/2010 4:32 PM, Gavin Carr wrote:
> On Fri, Dec 03, 2010 at 04:07:06PM -0600, Les Mikesell wrote:
>> The backuppc scheme works pretty well in normal usage, but most
>> file-oriented approaches to copy the whole backuppc archive have scaling
>> problems because they have to track all the inodes and names to match up
>> the hard links.
>
> That's been my experience with other hard-linked based backup schemes as
> well. For 'normal' sized backups they work pretty well, but for some
> value of 'large' backups the number of inodes and the tree traversal
> time starts to cause real performance problems.
>
> I'd be interested to know how large people's backups are where they're
> still seeing decent performance using approaches like this? I believe we
> started seeing problems once we hit a few TB (on ext3)?

You should probably ask this on the backuppc list.   But note that the 
performance issue is not using backuppc itself, it is only a problem 
when you try to copy the whole archive by some file-oriented method.

> We've moved to brackup (http://code.google.com/p/brackup/) for these
> reasons, and are doing nightly backups of 18TB of data quite happily.
> Brackup does fancy chunk-based deduplication (somewhat like git), and so
> avoids the hard link approach entirely.

Brackup looks more like a 'push out a backup from a single host' concept 
as opposed to backuppc's 'pull all backups from many targets to a common 
server with appropriate scheduling' so you'd probably use them in 
different scenarios.  Or did you mean you are backing up backuppc's 
archive with brackup?

The author has announced plans to re-do the storage scheme in backuppc, 
but I'm not sure if it will be chunked.  One down side of the current 
scheme is that small changes in big files result in a complete separate 
copy being stored.  The rsync based transfer will only send the 
differences but the server ends up (like normal rysnc) reconstructing a 
complete copy of the modified file.

-- 
   Les Mikesell
    lesmikesell at gmail.com