[CentOS] Re: Filesystem that doesn't store duplicate data

Scott Silva ssilva at sgvwater.com
Thu Dec 6 19:59:51 UTC 2007


on 12/5/2007 8:57 PM Ruslan Sivak spake the following:
> Luke Dudney wrote:
>> NetApp's WAFL with A-SIS (advanced single instance storage) does this. 
>> From a quick google:
>>
>> http://searchstorage.techtarget.com/originalContent/0,289142,sid5_gci1255018,00.html 
>> says:
>> ...  calculates a 16-bit checksum for each block of data it stores. 
>> For data deduplication, the hashes are pulled into a database and 
>> "redundancy candidates" that look similar are identified. Those blocks 
>> are then compared bit by bit, and if they are identical, the new block 
>> is discarded.
>>
>> The pre-sales engineer I spoke to regarding this said that it's not 
>> done on demand but rather by a periodic background process. It's 
>> pitched for backup and archiving functions. If you have NetApp kit it 
>> can apply this to any of your data on the Filer, be it via CIFS, NFS, 
>> FC or iSCSI.
>>
>> While this isn't available on Linux it proves that there is market 
>> demand for it, that it can be done and probably also appears to some 
>> kernel hackers as a challenge...
>>
>> cheers
>> Luke
>>
> Yea, I originally got the idea from the NetApp marketing materials.  
> Would be cool if this was available for free for linux.
> Russ
But the netapp appliance has a processor that is only doing so much. It isn't 
doing any other tasks and has lots of free time to handle the work. And I 
wouldn't be suprised if there were a few ASIC's or PLC's doing much of the 
checksumming and block compares.

-- 
MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!




More information about the CentOS mailing list