On 12/3/2010 4:32 PM, Gavin Carr wrote: > On Fri, Dec 03, 2010 at 04:07:06PM -0600, Les Mikesell wrote: >> The backuppc scheme works pretty well in normal usage, but most >> file-oriented approaches to copy the whole backuppc archive have scaling >> problems because they have to track all the inodes and names to match up >> the hard links. > > That's been my experience with other hard-linked based backup schemes as > well. For 'normal' sized backups they work pretty well, but for some > value of 'large' backups the number of inodes and the tree traversal > time starts to cause real performance problems. > > I'd be interested to know how large people's backups are where they're > still seeing decent performance using approaches like this? I believe we > started seeing problems once we hit a few TB (on ext3)? You should probably ask this on the backuppc list. But note that the performance issue is not using backuppc itself, it is only a problem when you try to copy the whole archive by some file-oriented method. > We've moved to brackup (http://code.google.com/p/brackup/) for these > reasons, and are doing nightly backups of 18TB of data quite happily. > Brackup does fancy chunk-based deduplication (somewhat like git), and so > avoids the hard link approach entirely. Brackup looks more like a 'push out a backup from a single host' concept as opposed to backuppc's 'pull all backups from many targets to a common server with appropriate scheduling' so you'd probably use them in different scenarios. Or did you mean you are backing up backuppc's archive with brackup? The author has announced plans to re-do the storage scheme in backuppc, but I'm not sure if it will be chunked. One down side of the current scheme is that small changes in big files result in a complete separate copy being stored. The rsync based transfer will only send the differences but the server ends up (like normal rysnc) reconstructing a complete copy of the modified file. -- Les Mikesell lesmikesell at gmail.com