Peter Arremann wrote: >> How about a FUSE file system (userland, ie NTFS 3G) that layers >> on top of any file system that supports hard links > > That would be easy but I can see a few issues with that approach: > > 1) On file level rather than block level you're going to be much more > inefficient. I for one have gigabytes of revisions of files that have changed > a little between each file. That is a problem for the way backuppc stores things - but at least it can compress the files. > 2) You have to write all datablocks to disk and then erase them again if you > find a match. That will slow you down and create some weird behavior. I.e. > you know the FS shouldn't store duplicate data, yet you can't use cp to copy > a 10G file if only 9G are free. If you copy a 8G file, you see the usage > increase till only 1G is free, then when your app closes the file, you are > going to go back to 9G free... Only using it for backup storage is a special case where this is not so bad. Backuppc also has a way to rsync against the stored copy so matching files (or parts) may not need to be transfered at all. > 3) Rather than continuously looking for matches on block level, you have to > search for matches on files that can be any size. That is fine if you have a > 100K file - but if you have a 100M or larger file, the checksum calculations > will take you forever. The backuppc scheme is to use a hash of some amount of the uncompressed file as a pooled filename for the link to quickly weed out most possibilities and permit the compression level to be changed. The full check then only has to be done on collisions. > This means rather than adding a specific, small > penalty to every write call, you add a unknown penalty, proportional to file > size when closing the file. Also, the fact that most C coders don't check the > return code of close doesn't make me happy there... In backuppc, the writer understands the scheme - and the linking is somewhat decoupled from the tranfers. But, even in a normal filesystem writes are buffered and if you don't fsync there is a lot that can go wrong after a close() reports success. -- Les Mikesell lesmikesell at gmail.com