Re: [CentOS] [OT] Building a new backup server

backuppc could be a big win. I think it is the only thing that can keep a compressed copy on the server side and work directly with a stock rsync and uncompressed files on the target hosts (and it can cache the block-checksums so it doesn't have to uncompress and

...

recompute them every run). While it is 'just a perl script' it's not

quite what you expect from simple scripting...

We have a *bunch* of d/bs. Oracle. MySQL. Postgresql. All with about a week's dumps from every night, and then backups of them to the b/u servers. I can't imagine how they'd be a win - don't remember just off the top of my head if they're compressed or not.

A *lot* of our data is not huge text files - lots and lots of pure datafiles, output from things like Matlab, R, and some local programs, like the one for modeling protein folding.

mark

Show replies by date

John R Pierce

6 Nov 6 Nov

2:18 a.m.

New subject: [OT] Building a new backup server

On 11/5/2013 12:41 PM, m.roth@5-cent.us wrote:

...

We have a*bunch* of d/bs. Oracle. MySQL. Postgresql. All with about a week's dumps from every night, and then backups of them to the b/u servers. I can't imagine how they'd be a win - don't remember just off the top of my head if they're compressed or not.

A*lot* of our data is not huge text files - lots and lots of pure datafiles, output from things like Matlab, R, and some local programs, like the one for modeling protein folding.

lots of binary data files are full of zeros and/ore repetitive patterns that compress quite easily.

-- john r pierce 37N 122W somewhere on the middle of the left coast

Les Mikesell

2:40 a.m.

New subject: [OT] Building a new backup server

On Tue, Nov 5, 2013 at 2:41 PM, m.roth@5-cent.us wrote:

...

Hey, Les,

Thanks for changing the subject to OT.

Errr... I just replied in gmail - I think it has been there all along.

...

We have a *bunch* of d/bs. Oracle. MySQL. Postgresql. All with about a week's dumps from every night, and then backups of them to the b/u servers. I can't imagine how they'd be a win - don't remember just off the top of my head if they're compressed or not.

If the dumps aren't pre-compressed, they would be compressed on the backuppc side. And if there are unchanged copies on the target hosts (i.e. more than the current night's dumps) that would still be recognized by the rsync run as unchanged, even though backuppc is looking at the compressed copy. If you already compress on the target host, there's not much more you can do.

...

A *lot* of our data is not huge text files - lots and lots of pure datafiles, output from things like Matlab, R, and some local programs, like the one for modeling protein folding.

Anything that isn't already compressed, encrypted, or isn't strictly intentionally random is likely to compress 2 to 10x. Just poking through the 'compression summary' on my backuppc servers, I don't see anything less than 55% and most of the bigger targets are closer to 80% compression. One that has 50Gb of logfiles, is around 90%.

-- Les Mikesell lesmikesell@gmail.com

m.roth＠5-cent.us

3:15 a.m.

New subject: [OT] Building a new backup server

Les Mikesell wrote:

...

On Tue, Nov 5, 2013 at 2:41 PM, m.roth@5-cent.us wrote:

<snip>

...

...
We have a *bunch* of d/bs. Oracle. MySQL. Postgresql. All with about a week's dumps from every night, and then backups of them to the b/u servers. I can't imagine how they'd be a win - don't remember just off the top of my head if they're compressed or not.

If the dumps aren't pre-compressed, they would be compressed on the backuppc side. And if there are unchanged copies on the target hosts

Right, but

...

(i.e. more than the current night's dumps) that would still be recognized by the rsync run as unchanged, even though backuppc is looking at the compressed copy. If you already compress on the target host, there's not much more you can do.

...
A *lot* of our data is not huge text files - lots and lots of pure datafiles, output from things like Matlab, R, and some local programs, like the one for modeling protein folding.

Anything that isn't already compressed, encrypted, or isn't strictly intentionally random is likely to compress 2 to 10x. Just poking through the 'compression summary' on my backuppc servers, I don't see anything less than 55% and most of the bigger targets are closer to 80% compression. One that has 50Gb of logfiles, is around 90%.

Oh, please - I see a filesystem fill up, and I start looking for what did it so suddenly... just the other week, I had one of our interns run Matlab and create a 35G nohup.out in his home directory... which was on the same filesystem mine was, and I was Not Amused when that blew out the filesystem.

Yeah, I know, we're trying to move stuff around, that's not infrequent, given the amount of data my folks generate.

mark

Les Mikesell

3:46 a.m.

New subject: [OT] Building a new backup server

On Tue, Nov 5, 2013 at 3:45 PM, m.roth@5-cent.us wrote:

...

...
Yeah, I know, we're trying to move stuff around, that's not infrequent, given the amount of data my folks generate.

And that's the other place that backuppc will help. If you move a file that is already in an existing backup, backuppc's rsync will copy it over the network because it doesn't have a match in that location, but when it goes to add the compressed copy to the pool it will notice that there is already a file with identical content there and use a hardlink instead of needing additional space.

-- Les Mikesell lesmikesell@gmail.com

m.roth＠5-cent.us

3:55 a.m.

New subject: [OT] Building a new backup server

Les Mikesell wrote:

...

On Tue, Nov 5, 2013 at 3:45 PM, m.roth@5-cent.us wrote:

...
...
Yeah, I know, we're trying to move stuff around, that's not infrequent, given the amount of data my folks generate.

And that's the other place that backuppc will help. If you move a file that is already in an existing backup, backuppc's rsync will copy it over the network because it doesn't have a match in that location, but when it goes to add the compressed copy to the pool it will notice that there is already a file with identical content there and use a hardlink instead of needing additional space.

Um, but rsync will already do that. Anyway, when I mean move things, I meant whole backups to a less-full drive, or the much rarer times that we need to move a user who's using a *large* amount of space.

mark

Les Mikesell

4:01 a.m.

New subject: [OT] Building a new backup server

On Tue, Nov 5, 2013 at 4:25 PM, m.roth@5-cent.us wrote:

...

...
...
Yeah, I know, we're trying to move stuff around, that's not infrequent, given the amount of data my folks generate.

And that's the other place that backuppc will help. If you move a file that is already in an existing backup, backuppc's rsync will copy it over the network because it doesn't have a match in that location, but when it goes to add the compressed copy to the pool it will notice that there is already a file with identical content there and use a hardlink instead of needing additional space.

Um, but rsync will already do that.

No, rsync itself will only do it when the identical file is still in the identical path from the identical host.

...

Anyway, when I mean move things, I meant whole backups to a less-full drive, or the much rarer times that we need to move a user who's using a *large* amount of space.

Backuppc will match up identical content, no matter where it finds it. If it is a different copy or moved to a different location it does have to transfer it to the backuppc server, but then it will be discarded and replaced with a link to the existing pooled copy.

-- Les Mikesell lesmikesell@gmail.com

m.roth＠5-cent.us

4:12 a.m.

New subject: [OT] Building a new backup server

Les Mikesell wrote:

...

On Tue, Nov 5, 2013 at 4:25 PM, m.roth@5-cent.us wrote:

...
...
...
Yeah, I know, we're trying to move stuff around, that's not infrequent, given the amount of data my folks generate.

And that's the other place that backuppc will help. If you move a file that is already in an existing backup, backuppc's rsync will copy it over the network because it doesn't have a match in that location, but when it goes to add the compressed copy to the pool it will notice that there is already a file with identical content there and use a hardlink instead of needing additional space.

Um, but rsync will already do that.

No, rsync itself will only do it when the identical file is still in the identical path from the identical host.

Unless you tell it a path to compare to - as I said, we point it to <backupdirectory><smylinkg "latest">.

...

...
Anyway, when I mean move things, I meant whole backups to a less-full

drive, or the much rarer

...

...
...
times that we need to move a user who's using a *large* amount of space.

Backuppc will match up identical content, no matter where it finds it. If it is a different copy or moved to a different location it does have to transfer it to the backuppc server, but then it will be discarded and replaced with a link to the existing pooled copy.

Right. Moving things, though, for us is manual, esp. since it can sometimes take days (like the 700+G I've been trying to copy from a 3TB drive that was defective to another that seems ok...)

mark

Les Mikesell

7:20 p.m.

New subject: [OT] Building a new backup server

On Tue, Nov 5, 2013 at 4:42 PM, m.roth@5-cent.us wrote:

...

...
Backuppc will match up identical content, no matter where it finds it. If it is a different copy or moved to a different location it does have to transfer it to the backuppc server, but then it will be discarded and replaced with a link to the existing pooled copy.

Right. Moving things, though, for us is manual, esp. since it can sometimes take days (like the 700+G I've been trying to copy from a 3TB drive that was defective to another that seems ok...)

But even little automated things like logfile rotation can add up when you catch it across a bunch of noisy hosts. You don't really need to store the whole contents of yesterday's messages.1 and today's messages.2 separately when they are the same thing, just renamed.

-- Les Mikesell lesmikesell@gmail.com

m.roth＠5-cent.us

8:04 p.m.

New subject: [OT] Building a new backup server

Les Mikesell wrote:

...

On Tue, Nov 5, 2013 at 4:42 PM, m.roth@5-cent.us wrote:

...
...
Backuppc will match up identical content, no matter where it finds it. If it is a different copy or moved to a different location it does have to transfer it to the backuppc server, but then it will be discarded and replaced with a link to the existing pooled copy.

Right. Moving things, though, for us is manual, esp. since it can sometimes take days (like the 700+G I've been trying to copy from a 3TB drive that was defective to another that seems ok...)

But even little automated things like logfile rotation can add up when you catch it across a bunch of noisy hosts. You don't really need to store the whole contents of yesterday's messages.1 and today's messages.2 separately when they are the same thing, just renamed.

We don't back them up, except for /var/log on the central logging host.

But to return to the first para, there's no identical identical content. There's similar content on development and prod servers for each team, but that's not identical, so it's really not an issue.

mark

Les Mikesell

9:25 p.m.

New subject: [OT] Building a new backup server

On Wed, Nov 6, 2013 at 8:34 AM, m.roth@5-cent.us wrote:

...

...
...
But even little automated things like logfile rotation can add up when you catch it across a bunch of noisy hosts. You don't really need to store the whole contents of yesterday's messages.1 and today's messages.2 separately when they are the same thing, just renamed.

We don't back them up, except for /var/log on the central logging host.

Are they rotated by renaming there?

...

But to return to the first para, there's no identical identical content. There's similar content on development and prod servers for each team, but that's not identical, so it's really not an issue.

If the data is compressible, you'd still likely get 2x+ space saving from compression on the backup server side. If the data sets are something like time series data that just change as additional samples are added it might be worth working out a scheme to chunk it up so only the 'current' time range changes and all of the historic instances would stay identical.

-- Les Mikesell lesmikesell@gmail.com

Sorin Srbu

6:26 p.m.

New subject: [OT] Building a new backup server

...

-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Les Mikesell Sent: den 5 november 2013 22:10 To: CentOS mailing list Subject: Re: [CentOS] [OT] Building a new backup server

...
Thanks for changing the subject to OT.

Errr... I just replied in gmail - I think it has been there all along.

I did it from the beginning, wasn't sure if this topic was strictly CentOS.

-- //Sorin

4256

Age (days ago)

4257

Last active (days ago)

discuss@lists.centos.org

11 comments

4 participants

tags (0)

participants (4)

John R Pierce
Les Mikesell
m.roth＠5-cent.us
Sorin Srbu