Hi list,
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
We have looked into lessfs, sdfs and ddar. Are these filesystems ready to use (on centos)? ddar is sthg different, I know.
Thx Rainer
From: Rainer Traut tr.ml@gmx.de
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
We have looked into lessfs, sdfs and ddar. Are these filesystems ready to use (on centos)? ddar is sthg different, I know.
Never tried but what about zfs?
JD
Am 27.08.2012 14:15, schrieb John Doe:
From: Rainer Traut tr.ml@gmx.de
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
We have looked into lessfs, sdfs and ddar. Are these filesystems ready to use (on centos)? ddar is sthg different, I know.
Never tried but what about zfs?
Yeah I know it has this feature, but is there a working zfs implementation for linux? Linux is a must, because the data we are backing up are Domino databases and also is a customer's requirement.
And btrfs has not yet implemented this feature I think.
On 08/27/2012 07:23 PM, Rainer Traut wrote:
Yeah I know it has this feature, but is there a working zfs implementation for linux?
I have heard some positive feedback about http://zfsonlinux.org/ but I have not had time to test myself yet. It probably depends on your intended usage. It is a new in-kernel ZFS implementation (different from the old FUSE implementation).
RHEL 6.2 x86_64 is listed as one of the supported OSes, so it probably works fine with CentOS too.
There is some positive and negative feedback in the following links:
https://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/browse_thread/t...
http://pingd.org/2012/installing-zfs-raid-z-on-centos-6-2-with-ssd-caching.h...
Please share your results if you do any testing :)
Am 27.08.2012 16:04, schrieb Janne Snabb:
On 08/27/2012 07:23 PM, Rainer Traut wrote:
Yeah I know it has this feature, but is there a working zfs implementation for linux?
I have heard some positive feedback about http://zfsonlinux.org/ but I have not had time to test myself yet. It probably depends on your intended usage. It is a new in-kernel ZFS implementation (different from the old FUSE implementation).
RHEL 6.2 x86_64 is listed as one of the supported OSes, so it probably works fine with CentOS too.
There is some positive and negative feedback in the following links:
https://groups.google.com/a/zfsonlinux.org/group/zfs-discuss/browse_thread/t...
http://pingd.org/2012/installing-zfs-raid-z-on-centos-6-2-with-ssd-caching.h...
Please share your results if you do any testing :)
The website looks promising. They are using a thing called SPL, Sun/Solaris Porting Layer to be able to use the Solaris ZFS code. But there is no more OpenSolaris, isn't it? Means they have to stay with the ZFS code from when it was open?
On 08/28/12 12:58 AM, Rainer Traut wrote:
The website looks promising. They are using a thing called SPL, Sun/Solaris Porting Layer to be able to use the Solaris ZFS code. But there is no more OpenSolaris, isn't it? Means they have to stay with the ZFS code from when it was open?
opensolaris spawned ilumnos (the kernel) and openindiana (a complete OS based on ilumnos and opensolaris) as well as some other ilumnos based distributions like nexenta.
On 08/27/12 4:55 AM, Rainer Traut wrote:
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
BackupPC does exactly this. its not a generalized solution to deduplication of a file system, instead, its a backup system, designed to backup multiple targets, that implements deduplication on the backup tree it maintains.
On Mon, Aug 27, 2012 at 9:23 AM, John R Pierce pierce@hogranch.com wrote:
On 08/27/12 4:55 AM, Rainer Traut wrote:
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
BackupPC does exactly this. its not a generalized solution to deduplication of a file system, instead, its a backup system, designed to backup multiple targets, that implements deduplication on the backup tree it maintains.
Not _exactly_, but maybe close enough and it is very easy to install and try. Backuppc will use rsync for transfers and thus only uses bandwidth for the differences, but it uses hardlinks to files to dedup the storage. It will find and link duplicate content even from different sources, but the complete file must be identical. It does not store deltas, so large files that change even slightly between backups end up stored as complete copies (with optional compression).
Deduplication with ZFS takes a lot of RAM.
I would not yet trust any of the linux zfs projects for data that I wanted to keep long term.
On Mon, Aug 27, 2012 at 8:26 AM, Les Mikesell lesmikesell@gmail.com wrote:
On Mon, Aug 27, 2012 at 9:23 AM, John R Pierce pierce@hogranch.com wrote:
On 08/27/12 4:55 AM, Rainer Traut wrote:
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
BackupPC does exactly this. its not a generalized solution to deduplication of a file system, instead, its a backup system, designed to backup multiple targets, that implements deduplication on the backup tree it maintains.
Not _exactly_, but maybe close enough and it is very easy to install and try. Backuppc will use rsync for transfers and thus only uses bandwidth for the differences, but it uses hardlinks to files to dedup the storage. It will find and link duplicate content even from different sources, but the complete file must be identical. It does not store deltas, so large files that change even slightly between backups end up stored as complete copies (with optional compression).
-- Les Mikesell lesmikesell@gmail.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
The better option for ZFS would be to get a SSD and move the dedupe table onto that drive instead of having it in RAM, because it can become massive.
Thank you,
Ryan Palamara ZAIS Group, LLC 2 Bridge Avenue, Suite 322 Red Bank, New Jersey 07701 Phone: (732) 450-7444 Ryan.palamara@zaisgroup.com
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Dean Jones Sent: Monday, August 27, 2012 11:45 AM To: CentOS mailing list Subject: Re: [CentOS] Deduplication data for CentOS?
Deduplication with ZFS takes a lot of RAM.
I would not yet trust any of the linux zfs projects for data that I wanted to keep long term.
On Mon, Aug 27, 2012 at 8:26 AM, Les Mikesell lesmikesell@gmail.com wrote:
On Mon, Aug 27, 2012 at 9:23 AM, John R Pierce pierce@hogranch.com wrote:
On 08/27/12 4:55 AM, Rainer Traut wrote:
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
BackupPC does exactly this. its not a generalized solution to deduplication of a file system, instead, its a backup system, designed to backup multiple targets, that implements deduplication on the backup tree it maintains.
Not _exactly_, but maybe close enough and it is very easy to install and try. Backuppc will use rsync for transfers and thus only uses bandwidth for the differences, but it uses hardlinks to files to dedup the storage. It will find and link duplicate content even from different sources, but the complete file must be identical. It does not store deltas, so large files that change even slightly between backups end up stored as complete copies (with optional compression).
-- Les Mikesell lesmikesell@gmail.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos ________________________________
This e-mail message is intended only for the named recipient(s) above. It may contain confidential information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you.
This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation. This is not research and is not from ZAIS Group but it may refer to a research analyst/research report. Unless indicated, these views are the author's and may differ from those of ZAIS Group research or others in the Firm. We do not represent this is accurate or complete and we may not update this. Past performance is not indicative of future returns.
IRS CIRCULAR 230 NOTICE:.
To comply with requirements imposed by the IRS, we inform you that any U.S. federal tax advice contained herein (including any attachments), unless specifically stated otherwise, is not intended or written to be used, and cannot be used, for the purpose of (i) avoiding penalties under the Internal Revenue Code or (ii) promoting, marketing or recommending any transaction or matter addressed herein to another party. Each taxpayer should seek advice based on the taxpayer's particular circumstances from an independent tax advisor.
"ZAIS", "ZAIS Group" and "ZAIS Solutions" are trademarks of ZAIS Group, LLC.
On Thu, Sep 13, 2012 at 12:06 PM, Ryan Palamara Ryan.Palamara@zaisgroup.com wrote:
The better option for ZFS would be to get a SSD and move the dedupe table onto that drive instead of having it in RAM, because it can become massive.
What's 'massive' in dollars these days?
It depends on size of the data that you are storing and the block size. Here is a good primer on it: http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe
As a quick estimate, about 5GB per 1TB or storage for SSD. However I believe that you would need even more RAM since only a 1/4 of the RAM will be used for the dedupe table with ZFS.
Thank you,
Ryan Palamara ZAIS Group, LLC 2 Bridge Avenue, Suite 322 Red Bank, New Jersey 07701 Phone: (732) 450-7444 Ryan.palamara@zaisgroup.com
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Les Mikesell Sent: Thursday, September 13, 2012 3:09 PM To: CentOS mailing list Subject: Re: [CentOS] Deduplication data for CentOS?
On Thu, Sep 13, 2012 at 12:06 PM, Ryan Palamara Ryan.Palamara@zaisgroup.com wrote:
The better option for ZFS would be to get a SSD and move the dedupe table onto that drive instead of having it in RAM, because it can become massive.
What's 'massive' in dollars these days?
-- Les Mikesell lesmikesell@gmail.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos ________________________________
This e-mail message is intended only for the named recipient(s) above. It may contain confidential information. If you are not the intended recipient you are hereby notified that any dissemination, distribution or copying of this e-mail and any attachment(s) is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender by replying to this e-mail and delete the message and any attachment(s) from your system. Thank you.
This is not an offer (or solicitation of an offer) to buy/sell the securities/instruments mentioned or an official confirmation. This is not research and is not from ZAIS Group but it may refer to a research analyst/research report. Unless indicated, these views are the author's and may differ from those of ZAIS Group research or others in the Firm. We do not represent this is accurate or complete and we may not update this. Past performance is not indicative of future returns.
IRS CIRCULAR 230 NOTICE:.
To comply with requirements imposed by the IRS, we inform you that any U.S. federal tax advice contained herein (including any attachments), unless specifically stated otherwise, is not intended or written to be used, and cannot be used, for the purpose of (i) avoiding penalties under the Internal Revenue Code or (ii) promoting, marketing or recommending any transaction or matter addressed herein to another party. Each taxpayer should seek advice based on the taxpayer's particular circumstances from an independent tax advisor.
"ZAIS", "ZAIS Group" and "ZAIS Solutions" are trademarks of ZAIS Group, LLC.
At our shop we have used quadstor - http://www.quadstor.com with good amount of success. But our use is specifically for vmware environments over a SAN. However it is possible (i have tried this a couple of times) to use the quadstor virtual disks as a local block device, format it with ext4 or btrfs etc. and get the benefits of deduplication, compression etc. Yes btrfs deduplication is possible :-), i have tried it. You might need to check on the memory requirements for NAS/local filesystems. We use 8 GB in our SAN box and so far things are fine.
- jb
Rainer Traut <tr.ml@...> writes:
Hi list,
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
We have looked into lessfs, sdfs and ddar. Are these filesystems ready to use (on centos)? ddar is sthg different, I know.
Thx Rainer
Am 27.08.2012 um 16:23 schrieb John R Pierce:
On 08/27/12 4:55 AM, Rainer Traut wrote:
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
BackupPC does exactly this. its not a generalized solution to deduplication of a file system, instead, its a backup system, designed to backup multiple targets, that implements deduplication on the backup tree it maintains.
AFAIK - bacula has deduplication capabilities.
-- LF
On Mon, Aug 27, 2012 at 6:55 AM, Rainer Traut tr.ml@gmx.de wrote:
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
Below forwarded on behalf of mroth:
Les,
A favor, please? Could you post this for me? Spamhouse is bouncing me again, this time because *they* have a bug (see below). I tried asking Karanbir, but I guess he's not online yet....
Thanks in advance.
John R Pierce wrote:
On 08/27/12 4:55 AM, Rainer Traut wrote:
is there any working solution for deduplication of data for centos? We
are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
BackupPC does exactly this. its not a generalized solution to
deduplication of a file system, instead, its a backup system, designed to backup multiple targets, that implements deduplication on the backup tree it maintains.
I've tried, twice, to suggest that a workaround that doesn't involve a new, and possibly experimental f/s would be to use rsync with hard links, which is what we do. There's no way we have enough disk space for 5 weeks of terabytes of data....
However, the reason I haven't been able to suggest it is that I'm being blocked by spamhost. And when I go there, it asserts I'm listed in the CBL. And when I go *THERE*, it tells me I'm not.
Oh, and now, when I try to go to the CBL, it's down.
I don't suppose the CentOS list has a whitelist....
mark
Am 27.08.2012 18:04, schrieb Les Mikesell:
On Mon, Aug 27, 2012 at 6:55 AM, Rainer Traut tr.ml@gmx.de wrote:
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
Below forwarded on behalf of mroth:
Les,
A favor, please? Could you post this for me? Spamhouse is bouncing me
again, this time because *they* have a bug (see below). I tried asking Karanbir, but I guess he's not online yet....
Thanks in advance.
John R Pierce wrote:
On 08/27/12 4:55 AM, Rainer Traut wrote:
is there any working solution for deduplication of data for centos? We
are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
I've tried, twice, to suggest that a workaround that doesn't involve a new, and possibly experimental f/s would be to use rsync with hard links, which is what we do. There's no way we have enough disk space for 5 weeks of terabytes of data....
Rsync is of no use for us. We have mainly big Domino .nsf files which only change slightly. So rsync would not be able to make many hardlinks. :)
On 08/28/12 1:03 AM, Rainer Traut wrote:
Rsync is of no use for us. We have mainly big Domino .nsf files which only change slightly. So rsync would not be able to make many hardlinks. :)
so you need block level dedup? good luck with that. never seen a scheme yet that wasn't full of issues or had really bad performance.
On Tue, Aug 28, 2012 at 3:03 AM, Rainer Traut tr.ml@gmx.de wrote:
Rsync is of no use for us. We have mainly big Domino .nsf files which only change slightly. So rsync would not be able to make many hardlinks. :)
Rdiff-backup might work for this since it stores deltas. Are you doing something to snapshot the filesystem during the copy or are these just growing logs where consistency doesn't matter?
I'd probably look at freebsd with zfs on a machine with a boatload of ram if I needed dedup in the filesystem right now. Or put together some scripts that would copyand split the large files to chunks in a directory and let backuppc take it from there.
On 08/28/12 11:41 AM, Les Mikesell wrote:
On Tue, Aug 28, 2012 at 3:03 AM, Rainer Trauttr.ml@gmx.de wrote:
Rsync is of no use for us. We have mainly big Domino .nsf files which only change slightly. So rsync would not be able to make many hardlinks. :)
Rdiff-backup might work for this since it stores deltas. Are you doing something to snapshot the filesystem during the copy or are these just growing logs where consistency doesn't matter?
NSF files are a proprietary database format used by Lotus Notes and Domino, very complex, there's a pile of versions, and they are totally opaque. Pretty sure that if they are being accessed or updated while being copied the copy is invalid, so yes, some form of snapshotting is required.
commercial backup software uses Domino/Notes APIs to do incremental backups, for example http://www.symantec.com/business/support/index?page=content&id=TECH46513
On Tue, Aug 28, 2012 at 2:04 PM, John R Pierce pierce@hogranch.com wrote:
On 08/28/12 11:41 AM, Les Mikesell wrote:
On Tue, Aug 28, 2012 at 3:03 AM, Rainer Trauttr.ml@gmx.de wrote:
Rsync is of no use for us. We have mainly big Domino .nsf files which only change slightly. So rsync would not be able to make many hardlinks. :)
Rdiff-backup might work for this since it stores deltas. Are you doing something to snapshot the filesystem during the copy or are these just growing logs where consistency doesn't matter?
NSF files are a proprietary database format used by Lotus Notes and Domino, very complex, there's a pile of versions, and they are totally opaque. Pretty sure that if they are being accessed or updated while being copied the copy is invalid, so yes, some form of snapshotting is required.
commercial backup software uses Domino/Notes APIs to do incremental backups, for example http://www.symantec.com/business/support/index?page=content&id=TECH46513
If there is a command-line way to generate an incremental backup file, backuppc could run it via ssh as a pre-backup command.
Am 28.08.2012 21:26, schrieb Les Mikesell:
On Tue, Aug 28, 2012 at 2:04 PM, John R Pierce pierce@hogranch.com wrote:
On 08/28/12 11:41 AM, Les Mikesell wrote:
On Tue, Aug 28, 2012 at 3:03 AM, Rainer Trauttr.ml@gmx.de wrote:
>
Rsync is of no use for us. We have mainly big Domino .nsf files which only change slightly. So rsync would not be able to make many hardlinks. :)
Rdiff-backup might work for this since it stores deltas. Are you doing something to snapshot the filesystem during the copy or are these just growing logs where consistency doesn't matter?
NSF files are a proprietary database format used by Lotus Notes and Domino, very complex, there's a pile of versions, and they are totally opaque. Pretty sure that if they are being accessed or updated while being copied the copy is invalid, so yes, some form of snapshotting is required.
commercial backup software uses Domino/Notes APIs to do incremental backups, for example http://www.symantec.com/business/support/index?page=content&id=TECH46513
If there is a command-line way to generate an incremental backup file, backuppc could run it via ssh as a pre-backup command.
Yes, there is commercial software to do incremental backups but I do not know of commandline options to do this. Maybe anyone?
Les is right, I stop the server, take the snapshot, start the server and do the xdelta on the snapshot NSF files. Having that minimal downtime is ok and acknowledged by the customer.
On 08/29/12 2:43 AM, Rainer Traut wrote:
Yes, there is commercial software to do incremental backups but I do not know of commandline options to do this. Maybe anyone?
Les is right, I stop the server, take the snapshot, start the server and do the xdelta on the snapshot NSF files. Having that minimal downtime is ok and acknowledged by the customer.
I found some more stuff on a IBM site talking about the API (has to be called from software, not command line) to generate and keep track of transaction log files which the backup software archives. nothing about de-dup.
----- Original Message -----
From: "Rainer Traut" tr.ml@gmx.de To: centos@centos.org Sent: Monday, August 27, 2012 4:55:03 AM Subject: [CentOS] Deduplication data for CentOS?
Hi list,
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
We have looked into lessfs, sdfs and ddar. Are these filesystems ready to use (on centos)? ddar is sthg different, I know.
Thx Rainer
Although not open source, CrashplanPROe only costs $365 for a perpetual five client license. I use it to backup some of my Linux boxes. It has very good deduplication, compression, and encryption. For example I have 1.7TB of data on one linux system and another system that has 1.5TB. I NFS mount one of the systems to another and only use one Crashplan client to backup both data sets to a single backup archive. The backup archive is only 1.2TB and that also spans 90 days worth of file modification and deletion I can recover.
David.
On Mon, Aug 27, 2012 at 7:55 AM, Rainer Traut tr.ml@gmx.de wrote:
Hi list,
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
We have looked into lessfs, sdfs and ddar. Are these filesystems ready to use (on centos)? ddar is sthg different, I know.
Thx Rainer
This is something I have been thinking about peripherally for a while now. What are your impressions of SDFS (OpenDedupe)? I had been hoping it would be pretty good. Any issues with it on CentOS?
❧ Brian Mathis
On Mon, 2012-08-27 at 14:32 -0400, Brian Mathis wrote:
On Mon, Aug 27, 2012 at 7:55 AM, Rainer Traut tr.ml@gmx.de wrote:
We have looked into lessfs, sdfs and ddar. Are these filesystems ready to use (on centos)? ddar is sthg different, I know.
This is something I have been thinking about peripherally for a while now. What are your impressions of SDFS (OpenDedupe)? I had been hoping it would be pretty good. Any issues with it on CentOS?
I've used it for backups; it works reliably. It is memory hungry however [sort of the nature of block-level deduplication]. http://www.wmmi.net/documents/OpenDedup.pdf
Am 27.08.2012 22:55, schrieb Adam Tauno Williams:
On Mon, 2012-08-27 at 14:32 -0400, Brian Mathis wrote:
On Mon, Aug 27, 2012 at 7:55 AM, Rainer Traut tr.ml@gmx.de wrote:
We have looked into lessfs, sdfs and ddar. Are these filesystems ready to use (on centos)? ddar is sthg different, I know.
This is something I have been thinking about peripherally for a while now. What are your impressions of SDFS (OpenDedupe)? I had been hoping it would be pretty good. Any issues with it on CentOS?
I've used it for backups; it works reliably. It is memory hungry however [sort of the nature of block-level deduplication]. http://www.wmmi.net/documents/OpenDedup.pdf
I have read the pdf and one thing strikes me: --io-chunk-size <SIZE in kB; use 4 for VMDKs, defaults to 128>
and later: ● Memory ● 2GB allocation OK for: ● 200GB@4KB chunks ● 6TB@128KB chunks ... 32TB of data at 128KB requires 8GB of RAM. 1TB @ 4KB equals the same 8GB.
We are using ESXi5 in a SAN environment, right now with a 2TB backup volume. You are right, 16GB of ram is still much... And why 4k chunk size for VMDKs?
Sorry for the top posting. Dedup is just a hype. After a while the table that manage the deduped data will be just too big. Don't use it for long term.
Sent from Samsung Galaxy ^^
Rainer Traut <tr.ml@...> writes:
Hi list,
is there any working solution for deduplication of data for centos? We are trying to find a solution for our backup server which runs a bash script invoking xdelta(3). But having this functionality in fs is much more friendly...
We have looked into lessfs, sdfs and ddar. Are these filesystems ready to use (on centos)? ddar is sthg different, I know.
Thx Rainer
Not sure if it's already been mentioned but storeBackup uses rsync and hardlinks to minimise storage - and it break up big files and backs up the fragments separately. May help ... http://www.nongnu.org/storebackup/en/node2.html