This is a bit different then what I was proposing. I know that backupPC already does this on a file level, but I want a filesystem that does it at a block level. File level only helps if you're backing up multiple systems and they all have the same exact files. Block level would help a lot more I think. You'd be able to do a full backup every night and have it only take up around the same space as a differential backup. Things like virtual machine disk images which a lot of times are clones of each other, could take up only a small additinal amount of space for each clone, proportional to the changes that are made to that disk image. Nobody really answered this, so I'll ask again. Is there a windows version of Fuse? How does one test a fuse filesystem while developing it? Would be nice to just be able to run something from eclipse, once you've made your changes and have a drive mounted and ready to test. Being able to debug a filesystem while it's running would be great too. Anyone here with experience building Fuse filesystems? Russ Ross S. W. Walker wrote: > > These are all good and valid issues. > > Thinking about it some more I might just implement it as a system > service that scans given disk volumes in the background, keeps a > hidden directory where it stores it's state information and hardlinks > named after the md5 hash of the files on the volume. If a collission > occurs with an existing md5 hash then the new file is unlinked and > re-linked to the md5 hash file, if an md5 hash file exists with no > secondary links then it is removed. Maybe monitor the journal or use > inotify to just get new files and once a week do a full volume scan. > > This way the file system performs as well as it normally does and as > things go forward duplicate files are eliminated (combined). Of course > the problem arises of what to do when 1 duplicate is modified, but the > other should remain the same... > > Of course what you said about revisions that differ just a little > won't take advantage of this, but it's file level so it only works > with whole files, still better then nothing. > > -Ross > > > -----Original Message----- > From: centos-bounces at centos.org <centos-bounces at centos.org> > To: CentOS mailing list <centos at centos.org> > Sent: Thu Dec 06 08:10:38 2007 > Subject: Re: [CentOS] Filesystem that doesn't store duplicate data > > On Thursday 06 December 2007, Ross S. W. Walker wrote: > > How about a FUSE file system (userland, ie NTFS 3G) that layers > > on top of any file system that supports hard links > > That would be easy but I can see a few issues with that approach: > > 1) On file level rather than block level you're going to be much more > inefficient. I for one have gigabytes of revisions of files that have > changed > a little between each file. > > 2) You have to write all datablocks to disk and then erase them again > if you > find a match. That will slow you down and create some weird behavior. I.e. > you know the FS shouldn't store duplicate data, yet you can't use cp > to copy > a 10G file if only 9G are free. If you copy a 8G file, you see the usage > increase till only 1G is free, then when your app closes the file, you are > going to go back to 9G free... > > 3) Rather than continuously looking for matches on block level, you > have to > search for matches on files that can be any size. That is fine if you > have a > 100K file - but if you have a 100M or larger file, the checksum > calculations > will take you forever. This means rather than adding a specific, small > penalty to every write call, you add a unknown penalty, proportional > to file > size when closing the file. Also, the fact that most C coders don't > check the > return code of close doesn't make me happy there... > > Peter. > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > > ------------------------------------------------------------------------ > This e-mail, and any attachments thereto, is intended only for use by > the addressee(s) named herein and may contain legally privileged > and/or confidential information. If you are not the intended recipient > of this e-mail, you are hereby notified that any dissemination, > distribution or copying of this e-mail, and any attachments thereto, > is strictly prohibited. If you have received this e-mail in error, > please immediately notify the sender and permanently delete the > original and any copy or printout thereof. > ------------------------------------------------------------------------ > > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos >