[CentOS] Filesystem that doesn't store duplicate data

Thu Dec 6 17:20:04 UTC 2007

Ruslan Sivak wrote:

> This is a bit different then what I was proposing.  I know that backupPC 
> already does this on a file level, but I want a filesystem that does it 
> at a block level.  File level only helps if you're backing up multiple 
> systems and they all have the same exact files. 

It also collapses multiple runs of the same machine so you can keep a 
long history without taking a lot of additional space.

> Block level would help 
> a lot more I think.  You'd be able to do a full backup every night and 
> have it only take up around the same space as a differential backup.  

Agreed, but the main difference would be in big files that change 
slightly, like growing log files, or mailboxes in unix format.  These 
are often compressible, so there is some tradeoff, and you can make 
things more backup-friendly by switching to a one-file-per-message mail 
format like maildir and rotating log files often.

> Things like virtual machine disk images which a lot of times are clones 
> of each other, could take up only a small additinal amount of space for 
> each clone, proportional to the changes that are made to that disk image.

Some virtual machine managers (e.g. vmware workstation) already provide 
this facility - and you might be able to get it with one of the existing 
overlay filesystems if you are willing to start with an immutable base. 
However, I agree that it would be really nice to have a filesystem that 
overlaid blocks of identical content with full copy-on-write semantics 
to permit any instance to be modified transparently.  If it had some 
concept of large/small block sizes and had a good hit ratio on the large 
blocks it might not add that much overhead.

-- 
   Les Mikesell
    lesmikesell at gmail.com