[CentOS] Filesystem that doesn't store duplicate data

Luke Dudney listmail at lukedudney.com
Thu Dec 6 01:41:03 UTC 2007


NetApp's WAFL with A-SIS (advanced single instance storage) does  
this. From a quick google:

http://searchstorage.techtarget.com/originalContent/ 
0,289142,sid5_gci1255018,00.html says:
...  calculates a 16-bit checksum for each block of data it stores.  
For data deduplication, the hashes are pulled into a database and  
"redundancy candidates" that look similar are identified. Those  
blocks are then compared bit by bit, and if they are identical, the  
new block is discarded.

The pre-sales engineer I spoke to regarding this said that it's not  
done on demand but rather by a periodic background process. It's  
pitched for backup and archiving functions. If you have NetApp kit it  
can apply this to any of your data on the Filer, be it via CIFS, NFS,  
FC or iSCSI.

While this isn't available on Linux it proves that there is market  
demand for it, that it can be done and probably also appears to some  
kernel hackers as a challenge...

cheers
Luke

On 06/12/2007, at 1:33 AM, rsivak at istandfor.com wrote:

> Is there such a filesystem available?  It seems like it wouldn't be  
> too hard to implement...  Basically do things on a block by block  
> basis.  Store md5 of a block in the table, and when writing a new  
> block, check if the md5 already exists and then point the new block  
> to the old block.  Since md5 is not guaranteed unique, might need  
> to do a diff between the 2 blocks and if the blocks are indeed  
> different, handle it somehow.
>
> When modifying an existing block that has multiple pointers, copy  
> the block and modify the new block.
>
> I know I'm oversimplifying things a lot, but something like this  
> could work, no?  Would be a great filesystem to store backups on,  
> or things like vmware volumes...
>
> Russ
> Sent from my Verizon Wireless BlackBerry
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos




More information about the CentOS mailing list