[CentOS] Re: Filesystem that doesn't store duplicate data
Scott Silva
ssilva at sgvwater.com
Thu Dec 6 19:59:51 UTC 2007
on 12/5/2007 8:57 PM Ruslan Sivak spake the following:
> Luke Dudney wrote:
>> NetApp's WAFL with A-SIS (advanced single instance storage) does this.
>> From a quick google:
>>
>> http://searchstorage.techtarget.com/originalContent/0,289142,sid5_gci1255018,00.html
>> says:
>> ... calculates a 16-bit checksum for each block of data it stores.
>> For data deduplication, the hashes are pulled into a database and
>> "redundancy candidates" that look similar are identified. Those blocks
>> are then compared bit by bit, and if they are identical, the new block
>> is discarded.
>>
>> The pre-sales engineer I spoke to regarding this said that it's not
>> done on demand but rather by a periodic background process. It's
>> pitched for backup and archiving functions. If you have NetApp kit it
>> can apply this to any of your data on the Filer, be it via CIFS, NFS,
>> FC or iSCSI.
>>
>> While this isn't available on Linux it proves that there is market
>> demand for it, that it can be done and probably also appears to some
>> kernel hackers as a challenge...
>>
>> cheers
>> Luke
>>
> Yea, I originally got the idea from the NetApp marketing materials.
> Would be cool if this was available for free for linux.
> Russ
But the netapp appliance has a processor that is only doing so much. It isn't
doing any other tasks and has lots of free time to handle the work. And I
wouldn't be suprised if there were a few ASIC's or PLC's doing much of the
checksumming and block compares.
--
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!
More information about the CentOS
mailing list