[CentOS] Filesystem that doesn't store duplicate data

Les Mikesell lesmikesell at gmail.com
Thu Dec 6 05:59:53 UTC 2007


John R Pierce wrote:
> Ross S. W. Walker wrote:
>> How about a FUSE file system (userland, ie NTFS 3G) that layers
>> on top of any file system that supports hard links, intercepts
>> the FS API and stores all files in a hidden directory and names
>> them after their MD5 hash and hard links to the file name in
>> the user directory stucture. When the # of links drops to 1
>> then the hash is removed, when new files are copied in if the
>> hash collides with an existing one the data is discarded and
>> only a hard link is made.
>>
>> Of course it will be a little more involved then this, but the
>> idea is to keep it really simple so it's less likely to break.
>>   
> 
> yeah, be REAL fun when an app random updates one of said files.

Backuppc stores its backup archive this way - all files are compressed 
and all duplicate content is hard-linked to a pooled copy (and it knows 
how to run a remote rsync against this storage to only transfer 
changes).  You could probably write a FUSE filesystem that would allow 
direct read-only access - although the web interface isn't bad.

-- 
   Les Mikesell
    lesmikesell at gmail.com



More information about the CentOS mailing list