CentOS and LessFS

...
We have been looking at implementing deduplication on a

backup server.

...
If not LessFS can you suggest an alternate deduplication software?

http://openindiana.org/ Solaris 11 Express http://www.freebsd.org/releases/9.0R/announce.html

These being different OSs would not be viable for us as we need to maintain RHEL compatibility.

...

(ZFS pool version >= 28)

This looks promising but the latest Linux version (0.7.0) only has pool version 23. I will check this out further.

Thanks for your response.

Regards, Hugh

-- Hugh E Cruickshank, Forward Software, www.forward-software.com

Les Mikesell

10:24 a.m.

On Mon, Jan 16, 2012 at 5:50 PM, Hugh E Cruickshank hugh@forsoft.com wrote:

...

We have been looking at implementing deduplication on a backup server.

...
From what I have been able to find the available documentation is

pretty thin. I ended up trying to install LessFS on this CentOS 5.7 box but we have now encountered problems with fuse version.

Has anyone out there been able to get LessFS running on CentOS 5.7 and can provide some pointers?

If not LessFS can you suggest an alternate deduplication software?

Backuppc dedups (and compresses) at the file level using hardlinks. Not quite as effective as a block level if you have frequent small changes in large files, but still very good with no unusual filesystem requirements other than keeping the whole archive on one filesystem. It will link all identical content, whether from the same or different systems and it's rsync implementation can work with local compressed copies while chatting with a stock remote version.

-- Les Mikesell lesmikesell@gmail.com

Hugh E Cruickshank

10:56 a.m.

From: Les Mikesell Sent: January 16, 2012 20:55

...

On Mon, Jan 16, 2012 at 5:50 PM, Hugh E Cruickshank wrote:

...
If not LessFS can you suggest an alternate deduplication software?

Backuppc dedups (and compresses) at the file level using hardlinks. Not quite as effective as a block level if you have frequent small changes in large files, but still very good with no unusual filesystem requirements other than keeping the whole archive on one filesystem. It will link all identical content, whether from the same or different systems and it's rsync implementation can work with local compressed copies while chatting with a stock remote version.

Hi Les:

Trust you to always come up with an interesting suggestion or two. I will have a further look at this but, on first blush, I do not think that this will be very effective in our environment. We will be backing up several small databases 1-8 GB each along with the related programs from our development system, out users home directories which include their Outlook PST files, Word/Excel files, etc. While the compression should work for all files I can not see the dedup working for much beyond the Word/Excel files. We will definitely have a look at it.

Thanks for you suggestion.

Regards, Hugh

-- Hugh E Cruickshank, Forward Software, www.forward-software.com

John R Pierce

11:15 a.m.

On 01/16/12 9:26 PM, Hugh E Cruickshank wrote:

...

Trust you to always come up with an interesting suggestion or two. I will have a further look at this but, on first blush, I do not think that this will be very effective in our environment. We will be backing up several small databases 1-8 GB each along with the related programs from our development system, out users home directories which include their Outlook PST files, Word/Excel files, etc. While the compression should work for all files I can not see the dedup working for much beyond the Word/Excel files. We will definitely have a look at it.

I hope you know, dedup systems rarely scale well, as the corpus of files get bigger and bigger, they can really grind to a halt.

-- john r pierce N 37, W 122 santa cruz ca mid-left coast

Hugh E Cruickshank

18 Jan 18 Jan

1:54 a.m.

From: John R Pierce Sent: January 16, 2012 21:45

...

I hope you know, dedup systems rarely scale well, as the corpus of files get bigger and bigger, they can really grind to a halt.

Thanks, I have read that but I have not seen any quantitative qualifications on this so I was planning on doing some testing to see if our requirements would be practical or not.

Regards, Hugh

-- Hugh E Cruickshank, Forward Software, www.forward-software.com

Les Mikesell

17 Jan 17 Jan

7:26 p.m.

On Mon, Jan 16, 2012 at 11:26 PM, Hugh E Cruickshank hugh@forsoft.com wrote:

...

...
...
If not LessFS can you suggest an alternate deduplication software?

Backuppc dedups (and compresses) at the file level using hardlinks.

Trust you to always come up with an interesting suggestion or two. I will have a further look at this but, on first blush, I do not think that this will be very effective in our environment. We will be backing up several small databases 1-8 GB each along with the related programs from our development system, out users home directories which include their Outlook PST files, Word/Excel files, etc. While the compression should work for all files I can not see the dedup working for much beyond the Word/Excel files. We will definitely have a look at it.

Big disks are cheap these days - I wouldn't worry that much about the total space that much and you'll still be able to keep a lot online. The db's are probably best handled in a pre-backup script that dumps/compresses them, then excluding the live files - and then even block de-dup won't help. Pst's are a problem any way you look at them but more because of Outlook's locking than their size. Backuppc is packaged in EPEL so it's easy to install and shows the compression and file re-use stats so you'll know in a few runs how it will handle your data.

-- Les Mikesell lesmikesell@gmail.com

Hugh E Cruickshank

18 Jan 18 Jan

2:30 a.m.

From: Les Mikesell Sent: January 17, 2012 05:56

...

Big disks are cheap these days - I wouldn't worry that much about the total space that much and you'll still be able to keep a lot online.

This is true for current hardware however I am attempting to reuse our existing hardware that has been pulled from our production systems. It tends to be older technology but still usable. In this case, it is a set of disk arrays using SCSI3 drives.

...

The db's are probably best handled in a pre-backup script that dumps/compresses them, then excluding the live files - and then even block de-dup won't help. Pst's are a problem any way you look at them but more because of Outlook's locking than their size. Backuppc is packaged in EPEL so it's easy to install and shows the compression and file re-use stats so you'll know in a few runs how it will handle your data.

While all of this is true I was kind of hoping that I could come up with something that was more "plug and play". The LessFS looked promising. I will continue to check this concept out further (be it LessFS, ZFS, or something else) but I am going to be avoiding the bleeding edge and can only afford to spend a limited amount of time chasing this down before I have to bite the bullet and go with what we have.

Thanks again of your feedback and to all the others who have responded. Everyone's comments have been greatly appreciated.

Regards, Hugh

-- Hugh E Cruickshank, Forward Software, www.forward-software.com

John R Pierce

2:47 a.m.

On 01/17/12 1:00 PM, Hugh E Cruickshank wrote:

...

From: Les Mikesell Sent: January 17, 2012 05:56

...
...
Big disks are cheap these days - I wouldn't worry that much about the total space that much and you'll still be able to keep a lot online.

This is true for current hardware however I am attempting to reuse our existing hardware that has been pulled from our production systems. It tends to be older technology but still usable. In this case, it is a set of disk arrays using SCSI3 drives.

penny wise, and pound foolish comes to mind here. that older server probably has 1-2 single core processors, too, right? a 2 socket modern 2U could virtualize a dozen of those and outperform each one.

-- john r pierce N 37, W 122 santa cruz ca mid-left coast

Hugh E Cruickshank

3:15 a.m.

From: John R Pierce Sent: January 17, 2012 13:17

...

penny wise, and pound foolish comes to mind here. that older server probably has 1-2 single core processors, too, right? a 2 socket modern 2U could virtualize a dozen of those and outperform each one.

This may be true in your environment but I have hardware that is capable of doing the job that I am looking for so why should I buy new hardware? I would never get approval for the purchase because there is no way that I could justify the expenditure.

Regards, Hugh

-- Hugh E Cruickshank, Forward Software, www.forward-software.com

Les Mikesell

5:06 a.m.

On Tue, Jan 17, 2012 at 3:00 PM, Hugh E Cruickshank hugh@forsoft.com wrote:

...

...
Big disks are cheap these days - I wouldn't worry that much about the total space that much and you'll still be able to keep a lot online.

This is true for current hardware however I am attempting to reuse our existing hardware that has been pulled from our production systems. It tends to be older technology but still usable. In this case, it is a set of disk arrays using SCSI3 drives.

If they have a backplane and hotswap bays you'd have to use an external case, but stuff in a sata controller and move on.

...

...
The db's are probably best handled in a pre-backup script that dumps/compresses them, then excluding the live files - and then even block de-dup won't help. Pst's are a problem any way you look at them but more because of Outlook's locking than their size. Backuppc is packaged in EPEL so it's easy to install and shows the compression and file re-use stats so you'll know in a few runs how it will handle your data.

While all of this is true I was kind of hoping that I could come up with something that was more "plug and play".

If you haven't used backuppc, try it. Other than setting up the ssh keys it is as easy as it gets. There are even web forms where you can fill in the pre/post backup scripts - and you aren't going to get reliable database snapshots without them using any system.

...

The LessFS looked promising. I will continue to check this concept out further (be it LessFS, ZFS, or something else) but I am going to be avoiding the bleeding edge and can only afford to spend a limited amount of time chasing this down before I have to bite the bullet and go with what we have.

I wouldn't trust any of the software block-dedup systems with my only copy of something important - plus they need a lot of RAM which your old systems probably don't have either.

-- Les Mikesell lesmikesell@gmail.com

Nataraj

6:11 a.m.

On 01/17/2012 03:36 PM, Les Mikesell wrote:

...

I wouldn't trust any of the software block-dedup systems with my only copy of something important - plus they need a lot of RAM which your old systems probably don't have either.

I am interested in backuppc, however from what I read online it appears that zfs is a very featureful robust high performance filesystem that is heavily used in production environments. It has features that allow you to specify that if the reference count for a block goes above certain levels it should keep two or three copies of that block and that could be on separate storage devices within the pool. It also supports compression. With backuppc deduplication, your still hosed if your only copy of the file goes bad. Why should block level deduplication be any worse than file level deduplication?

Furthermore, zfs has very high redundancy and recovery ability for the internal filesystem data structures. Here's a video describing ZFS's deduplication implementation: http://blogs.oracle.com/video/entry/zfs_dedup

At this point I am only reading the experience of others, but I am inclined to try it. I backup a mediawiki/mysql database and the new records are added to the database largely by appending. Even with compression, it's a pain to backup the whole thing every day. Block level dedup seems like it would be a good solution for that.

I'm not a big fan of Oracle, but from a technical standpoint zfs sounds quite good. I'm thinking of trying it on my laptop, because it's supposed to work well for storing things like virtual machines, and if a decent implementation runs on CentOS, Why not?

Les, do you run backuppc on ext3 or ext4 filesystems? I remember a while back, someone saying that a filesystem with more inodes was required for substantial backuppc deployment.

Nataraj

John R Pierce

6:29 a.m.

On 01/17/12 4:41 PM, Nataraj wrote:

...

On 01/17/2012 03:36 PM, Les Mikesell wrote:

...
...
I wouldn't trust any of the software block-dedup systems with my only copy of something important - plus they need a lot of RAM which your old systems probably don't have either.

I am interested in backuppc, however from what I read online it appears that zfs is a very featureful robust high performance filesystem that is heavily used in production environments.

ZFS is very memory intensive on larger file systems. I believe they recommend on the order of 1GB ram per terabyte of storage for decent performance.

Personally, I would only run ZFS for any sort of production application on a Solaris 10/11 system where its natively supported, and then only with a support contract from Oracle.

When its good, its very good, when its bad, its reformat and restore from backup time...

-- john r pierce N 37, W 122 santa cruz ca mid-left coast

Nataraj

8:59 a.m.

On 01/17/2012 04:59 PM, John R Pierce wrote:

...

On 01/17/12 4:41 PM, Nataraj wrote:

...
On 01/17/2012 03:36 PM, Les Mikesell wrote:

...
...
I wouldn't trust any of the software block-dedup systems with my only copy of something important - plus they need a lot of RAM which your old systems probably don't have either.

I am interested in backuppc, however from what I read online it appears that zfs is a very featureful robust high performance filesystem that is heavily used in production environments.

ZFS is very memory intensive on larger file systems. I believe they recommend on the order of 1GB ram per terabyte of storage for decent performance.

I think that is not so unreasonable for the features you are getting. I wonder if it would be possible to put the file system data structures on an SSD? I also have read that it is a good idea to use ECC memory on such a fileserver, but that's really true of any computer. Undetected memory errors will cause data loss.

...

Personally, I would only run ZFS for any sort of production application on a Solaris 10/11 system where its natively supported, and then only with a support contract from Oracle.

I am inclined to agree. If I was setting it up for a serious production environment, I would bite the bullet and run Solaris as well.

...

When its good, its very good, when its bad, its reformat and restore from backup time...

We'll maybe I'll live with backuppc for now.

Nataraj

Les Mikesell

9:01 a.m.

On Tue, Jan 17, 2012 at 6:41 PM, Nataraj incoming-centos@rjl.com wrote:

...

...
I wouldn't trust any of the software block-dedup systems with my only copy of something important - plus they need a lot of RAM which your old systems probably don't have either.

I am interested in backuppc, however from what I read online it appears that zfs is a very featureful robust high performance filesystem that is heavily used in production environments. It has features that allow you to specify that if the reference count for a block goes above certain levels it should keep two or three copies of that block and that could be on separate storage devices within the pool. It also supports compression.

It's probably fine on Solaris where it has had years of development and testing. But I don't expect the linux ports to be very mature yet - hence the lack of trust.

...

With backuppc deduplication, your still hosed if your only copy of the file goes bad. Why should block level deduplication be any worse than file level deduplication?

Nothing will fix a file if the disk underneath goes bad and you aren't running raid. And in my case I run raid1 and regularly swap disks out for offsite copies and resync. But, backuppc makes the links based on an actual comparison, so if an old copy is somehow corrupted, the next full will be stored separately, not linked.

...

Furthermore, zfs has very high redundancy and recovery ability for the internal filesystem data structures. Here's a video describing ZFS's deduplication implementation: http://blogs.oracle.com/video/entry/zfs_dedup

I agree that the design sounds good and I'd probably be using it if I used solaris - or maybe even the freebsd.

...

At this point I am only reading the experience of others, but I am inclined to try it. I backup a mediawiki/mysql database and the new records are added to the database largely by appending. Even with compression, it's a pain to backup the whole thing every day. Block level dedup seems like it would be a good solution for that.

You are still going to have to go through the motions of copying the whole thing and letting the receiving filesystem do hash comparisons on each block to accomplish the dedup.

...

Les, do you run backuppc on ext3 or ext4 filesystems? I remember a while back, someone saying that a filesystem with more inodes was required for substantial backuppc deployment.

That really depends on the size of the files you back up and how much churn there is in the history you keep. I wouldn't expect it to be a problem unless you have a lot of users with big maildir type directories. Eons ago when I used it with smaller drives and the alternative was ext2 I used reiserfs, but more recently I just use ext3 (and 4 in the newest setup) with the defaults. Some people on the backuppc mail list prefer xfs, though.

-- Les Mikesell lesmikesell@gmail.com > > > Nataraj > > _______________________________________________ > CentOS mailing list > CentOS@centos.org > http://lists.centos.org/mailman/listinfo/centos

Nataraj

9:13 a.m.

On 01/17/2012 07:31 PM, Les Mikesell wrote:

...

Nothing will fix a file if the disk underneath goes bad and you aren't running raid. And in my case I run raid1 and regularly swap disks out for offsite copies and resync. But, backuppc makes the links based on an actual comparison, so if an old copy is somehow corrupted, the next full will be stored separately, not linked.

ZFS has an option to turn on full data comparison instead of just checksums.

...

...
At this point I am only reading the experience of others, but I am inclined to try it. I backup a mediawiki/mysql database and the new records are added to the database largely by appending. Even with compression, it's a pain to backup the whole thing every day. Block level dedup seems like it would be a good solution for that.

You are still going to have to go through the motions of copying the whole thing and letting the receiving filesystem do hash comparisons on each block to accomplish the dedup.

I'm not sure about that. They support deduplication over the network. There is a command somethink like 'zfs send', but maybe it requires that the filesystem you are backing up is also zfs.

Nataraj

Les Mikesell

9:32 a.m.

On Tue, Jan 17, 2012 at 9:43 PM, Nataraj incoming-centos@rjl.com wrote:

...

...
...
At this point I am only reading the experience of others, but I am inclined to try it. I backup a mediawiki/mysql database and the new records are added to the database largely by appending. Even with compression, it's a pain to backup the whole thing every day. Block level dedup seems like it would be a good solution for that.

You are still going to have to go through the motions of copying the whole thing and letting the receiving filesystem do hash comparisons on each block to accomplish the dedup.

I'm not sure about that. They support deduplication over the network. There is a command somethink like 'zfs send', but maybe it requires that the filesystem you are backing up is also zfs.

Yes, you can make a filesystem snapshot on zfs and do an incremental 'send' to a remote copy of the previous snapshot where the receive operation will merge the changed blocks. That does sound efficient in terms of bandwidth, but would require a one-to-one setup for every filesystem you want to back up, and I'm not sure what kind of contortions it takes to get the whole snapshot back and revert it to the live filesystem. If you run backuppc over low bandwidth connections you might come out ahead copying an uncompressed database dump with rsync as the transport because it may match up some existing data and avoid the network hop. However, the way backuppc works if the file has changed at all, the server side will end up reconstructing the whole file and saving a complete new copy. On a fast local connection you are probably better off compressing the db dump (and they usually compress a lot) and letting it copy the whole thing.

-- Les Mikesell lesmikesell@gmail.com

Ross Walker

22 Jan 22 Jan

11:30 p.m.

On Jan 17, 2012, at 4:00 PM, "Hugh E Cruickshank" hugh@forsoft.com wrote:

...

From: Les Mikesell Sent: January 17, 2012 05:56

...
Big disks are cheap these days - I wouldn't worry that much about the total space that much and you'll still be able to keep a lot online.

This is true for current hardware however I am attempting to reuse our existing hardware that has been pulled from our production systems. It tends to be older technology but still usable. In this case, it is a set of disk arrays using SCSI3 drives.

...
The db's are probably best handled in a pre-backup script that dumps/compresses them, then excluding the live files - and then even block de-dup won't help. Pst's are a problem any way you look at them but more because of Outlook's locking than their size. Backuppc is packaged in EPEL so it's easy to install and shows the compression and file re-use stats so you'll know in a few runs how it will handle your data.

While all of this is true I was kind of hoping that I could come up with something that was more "plug and play". The LessFS looked promising. I will continue to check this concept out further (be it LessFS, ZFS, or something else) but I am going to be avoiding the bleeding edge and can only afford to spend a limited amount of time chasing this down before I have to bite the bullet and go with what we have.

Thanks again of your feedback and to all the others who have responded. Everyone's comments have been greatly appreciated.

If this is only a 1-2 year temporary solution and the backups will be discarded once a permanent solution is obtained then I'm sure it will be OK.

If your thinking of building a long-term backup solution this way then your building your castles on a foundation of sand. As backup sets grow and hardware/software ages you may find yourself in a technological dead-end unable to migrate the data off and unable to continue going forward.

If it is such an essential thing as backups (it's backup data right not redundant systems?) then I suggest telling the client to open their wallet cause when the shit hits the fan you either have solid backups or you have bankruptcy courts.

Buy a Data Domain, Exagrid or Falconstor backup storage appliance with builtin compression/de-duplication that is fully supported and has a viable upgrade path. Use a good centralized backup platform such as netbackup, networker, etc. The investment made in backup is an investment in the business' future.

-Ross

Les Mikesell

23 Jan 23 Jan

2:21 a.m.

On Sun, Jan 22, 2012 at 12:00 PM, Ross Walker rswwalker@gmail.com wrote:

...

...
If this is only a 1-2 year temporary solution and the backups will be discarded once a permanent solution is obtained then I'm sure it will be OK.

If your thinking of building a long-term backup solution this way then your building your castles on a foundation of sand. As backup sets grow and hardware/software ages you may find yourself in a technological dead-end unable to migrate the data off and unable to continue going forward.

On the other hand, if you have a predictable churn of high performance production boxes being replaced every few years, tossing a few new big cheap drives into a still nice but retired server and starting over is a very attractive option. You don't need to migrate anything - just keep the old box around until the replacement has the history you need to keep.

...

Buy a Data Domain, Exagrid or Falconstor backup storage appliance with builtin compression/de-duplication that is fully supported and has a viable upgrade path. Use a good centralized backup platform such as netbackup, networker, etc. The investment made in backup is an investment in the business' future.

There's a place for those, but probably not for someone who doesn't even want to buy new drives.

-- Les Mikesell lesmikesell@gmail.com

David Hrbáč

17 Jan 17 Jan

12:25 p.m.

Dne 17.1.2012 0:50, Hugh E Cruickshank napsal(a):

...

Hi All:

We have been looking at implementing deduplication on a backup server.

...
From what I have been able to find the available documentation is

pretty thin. I ended up trying to install LessFS on this CentOS 5.7 box but we have now encountered problems with fuse version.

Has anyone out there been able to get LessFS running on CentOS 5.7 and can provide some pointers?

If not LessFS can you suggest an alternate deduplication software?

TIA

Regards, Hugh

Hi Hugh, I've got something in my repo http://fs12.vsb.cz/hrb33/el5/hrb/stable/i386/repoview/fuse-lessfs.html. Might be somewhat outdated. You can try it and we can build new versions. As to alternatives I'm happy with rdiff-backup. DH

Hugh E Cruickshank

18 Jan 18 Jan

1:59 a.m.

From: David Hrbác Sent: January 16, 2012 22:55

...

I've got something in my repo http://fs12.vsb.cz/hrb33/el5/hrb/stable/i386/repoview/fuse-les

sfs.html.

...

Might be somewhat outdated. You can try it and we can build new versions. As to alternatives I'm happy with rdiff-backup.

Hi David:

Both suggestions look interesting and we will check them both out.

Thanks, Hugh

-- Hugh E Cruickshank, Forward Software, www.forward-software.com

Nataraj

17 Jan 17 Jan

1:25 p.m.

On 01/16/2012 03:50 PM, Hugh E Cruickshank wrote:

...

Hi All:

We have been looking at implementing deduplication on a backup server.

...
From what I have been able to find the available documentation is

pretty thin. I ended up trying to install LessFS on this CentOS 5.7 box but we have now encountered problems with fuse version.

Has anyone out there been able to get LessFS running on CentOS 5.7 and can provide some pointers?

If not LessFS can you suggest an alternate deduplication software?

TIA

Regards, Hugh

The ZFSonlinux project from LLNL looks promising (native mode kernel implementation, pool version 28), although the version that supports mountable filesystems is still in the RC stage. I would want some solid testing before deploying in a backup system.

http://zfsonlinux.org/

Nataraj

Hugh E Cruickshank

18 Jan 18 Jan

1:59 a.m.

From: Nataraj Sent: January 16, 2012 23:56

...

The ZFSonlinux project from LLNL looks promising (native mode kernel implementation, pool version 28), although the version that supports mountable filesystems is still in the RC stage. I would want some solid testing before deploying in a backup system.

http://zfsonlinux.org/

Hi Nataraj:

Thanks. I had not seen this one. It does look more promising than the zfs-fuse package.

Regards, Hugh

-- Hugh E Cruickshank, Forward Software, www.forward-software.com

Ljubomir Ljubojevic

4:06 a.m.

On 01/17/2012 09:29 PM, Hugh E Cruickshank wrote:

...

From: Nataraj Sent: January 16, 2012 23:56

...
The ZFSonlinux project from LLNL looks promising (native mode kernel implementation, pool version 28), although the version that supports mountable filesystems is still in the RC stage. I would want some solid testing before deploying in a backup system.

http://zfsonlinux.org/

Hi Nataraj:

Thanks. I had not seen this one. It does look more promising than the zfs-fuse package.

As much as I could deduce, Btrfs outperforms ZFS, and it is at the moment only missing btrfsck (in development). And it supports (almost) all features.

I was really hot for ZFS, but I have seen one thorough test with various sizes of data and in some cases Btrfs outperformed ZFS, but I cleaned my Firefox cache and history for the first time in at least a year :( and I can not find it now.

Btrfs is pushed and sponsored by Oracle, for their uses, and since ZFS is also theirs, I guess they will implement all ZFS's good featuries.

-- Ljubomir Ljubojevic (Love is in the Air) PL Computers Serbia, Europe Google is the Mother, Google is the Father, and traceroute is your trusty Spiderman... StarOS, Mikrotik and CentOS/RHEL/Linux consultant

Nataraj

6:16 a.m.

On 01/17/2012 02:36 PM, Ljubomir Ljubojevic wrote:

...

On 01/17/2012 09:29 PM, Hugh E Cruickshank wrote:

...
From: Nataraj Sent: January 16, 2012 23:56

...
The ZFSonlinux project from LLNL looks promising (native mode kernel implementation, pool version 28), although the version that supports mountable filesystems is still in the RC stage. I would want some solid testing before deploying in a backup system.

http://zfsonlinux.org/

Hi Nataraj:

Thanks. I had not seen this one. It does look more promising than the zfs-fuse package.

As much as I could deduce, Btrfs outperforms ZFS, and it is at the moment only missing btrfsck (in development). And it supports (almost) all features.

I was really hot for ZFS, but I have seen one thorough test with various sizes of data and in some cases Btrfs outperformed ZFS, but I cleaned my Firefox cache and history for the first time in at least a year :( and I can not find it now.

Btrfs is pushed and sponsored by Oracle, for their uses, and since ZFS is also theirs, I guess they will implement all ZFS's good featuries.

Is btrfs widely deployed and running solidly in production environments? I thought the dedup code for btrfs was still a bunch of patches that had to be applied and not in the mainstream implementation yet. The LLNL zfs port is a loadable kernel module.

Nataraj

Ljubomir Ljubojevic

6:24 a.m.

On 01/18/2012 01:46 AM, Nataraj wrote:

...

On 01/17/2012 02:36 PM, Ljubomir Ljubojevic wrote:

...
On 01/17/2012 09:29 PM, Hugh E Cruickshank wrote:

...
From: Nataraj Sent: January 16, 2012 23:56

...
The ZFSonlinux project from LLNL looks promising (native mode kernel implementation, pool version 28), although the version that supports mountable filesystems is still in the RC stage. I would want some solid testing before deploying in a backup system.

http://zfsonlinux.org/

Hi Nataraj:

Thanks. I had not seen this one. It does look more promising than the zfs-fuse package.

As much as I could deduce, Btrfs outperforms ZFS, and it is at the moment only missing btrfsck (in development). And it supports (almost) all features.

I was really hot for ZFS, but I have seen one thorough test with various sizes of data and in some cases Btrfs outperformed ZFS, but I cleaned my Firefox cache and history for the first time in at least a year :( and I can not find it now.

Btrfs is pushed and sponsored by Oracle, for their uses, and since ZFS is also theirs, I guess they will implement all ZFS's good featuries.

Is btrfs widely deployed and running solidly in production environments? I thought the dedup code for btrfs was still a bunch of patches that had to be applied and not in the mainstream implementation yet. The LLNL zfs port is a loadable kernel module.

Nataraj

No, Btrfs is still not production worthy. But ZFS is not either. It is still missing a lot of stuff, and stability??? I do not think so (This is only what I have read about it). I should have been more clear, I think Btrfs will much faster reach it's goal, since both Oracle and (Red Hat) want it as their default FS, as soon as possible.

At the moment, if you want ZFS you better install Solaris.

Lars Hecking

17 Jan 17 Jan

3:20 p.m.

Hugh E Cruickshank writes:

...

Hi All:

We have been looking at implementing deduplication on a backup server. From what I have been able to find the available documentation is pretty thin. I ended up trying to install LessFS on this CentOS 5.7 box but we have now encountered problems with fuse version.

Maybe try CentOS6. We've had numerous fuse issues with other software on CentOS5 and one recommendation was to use a newer kernel, which essentially means a newer distro.

Hugh E Cruickshank

18 Jan 18 Jan

2:10 a.m.

From: Lars Hecking Sent: January 17, 2012 01:51

...

Maybe try CentOS6. We've had numerous fuse issues with other software on CentOS5 and one recommendation was to use a newer kernel, which essentially means a newer distro.

I had considered this but I have been avoiding it. All our production servers are currently running RHEL5 and I have been specifically using CentOS5 on all our backup and development systems in order to maintain as much consistency between servers as possible.

Later this year or early next year we will replacing all our production servers and use the latest RHEL available at the time (probably RHEL6). We will then look at upgrading all the backup and development servers to the corresponding CentOS version (CentOS6?).

Regards, Hugh

-- Hugh E Cruickshank, Forward Software, www.forward-software.com

Les Mikesell

2:19 a.m.

On Tue, Jan 17, 2012 at 2:40 PM, Hugh E Cruickshank hugh@forsoft.com wrote:

...

Later this year or early next year we will replacing all our production servers and use the latest RHEL available at the time (probably RHEL6). We will then look at upgrading all the backup and development servers to the corresponding CentOS version (CentOS6?).

Don't you usually get some experience with things on the development side first?

-- Les Mikesell lesmikesell@gmail.com

Adam Tauno Williams

23 Jan 23 Jan

7:25 a.m.

On Mon, 2012-01-16 at 15:50 -0800, Hugh E Cruickshank wrote:

...

We have been looking at implementing deduplication on a backup server.

...
From what I have been able to find the available documentation is

pretty thin. I ended up trying to install LessFS on this CentOS 5.7 box but we have now encountered problems with fuse version Has anyone out there been able to get LessFS running on CentOS 5.7 and can provide some pointers If not LessFS can you suggest an alternate deduplication software?

SFDS / openDEDUPE http://wmmi.net/documents/OpenDedup.pdf http://www.opendedup.org/

-- Adam Tauno Williams http://www.whitemiceconsulting.com System Administrator, OpenGroupware Developer, LPI / CNA Fingerprint 8C08 209A FBE3 C41A DD2F A270 2D17 8FA4 D95E D383

Lists

24 Jan 24 Jan

12:26 p.m.

This thread has been beat to death, so perhaps my $0.02 isn't so meaningful, but I wrote a set of rsync scripts in php that I've used for years to manage terabytes of backups going back years of time. It's called TINBackupBuddy and you can get it at http://www.effortlessis.com/thisisnotbackupbuddy/ - it is a set of scripts that allow you to manage and back up numerous hosts, called via cron on a regular, graceful failure basis, via rsync. It de-duplicates files that have not changed between backup sets, so depending on the churn on your servers, you can get an astonishing number of backups onto a single drive...

I've managed backups for a rather large cluster (now over 200 schools and school districts) of data automatically, on a 24 hour basis using these scripts, for years, so they really do work. And for our development team, we recover from these backups in order to replicate reported issues, so these backups are verified numerous times per day.

Get a computer with some big disks in it. (We have about 20 TB of disk space on our backups server right now) Set up TinBackupBuddy and point to the big disks, use symlinks where it makes sense. Set a few options, call bbbackup.php via cron, and you're golden. Been doing it for close to 10 years now....

Good luck!

On 01/16/2012 03:50 PM, Hugh E Cruickshank wrote:

...

Hi All:

We have been looking at implementing deduplication on a backup server.

...
From what I have been able to find the available documentation is

pretty thin. I ended up trying to install LessFS on this CentOS 5.7 box but we have now encountered problems with fuse version.

Has anyone out there been able to get LessFS running on CentOS 5.7 and can provide some pointers?

If not LessFS can you suggest an alternate deduplication software?

TIA

Regards, Hugh

-- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.

4910

Age (days ago)

4918

Last active (days ago)

discuss@lists.centos.org

31 comments

11 participants

tags (0)

participants (11)

Adam Tauno Williams
David Hrbáč
Hugh E Cruickshank
John R Pierce
Ken godee
Lars Hecking
Les Mikesell
Lists
Ljubomir Ljubojevic
Nataraj
Ross Walker