Peter Arremann wrote: > On Wednesday 05 December 2007, redhat at mckerrs.net wrote: > >> You'd think that using this technology on a live filesystem could incur a >> significant performance penalty due to all those calculations (fuse module >> anyone ?). Imagine a hardware optimized data de-duplication disk >> controller, similar to raid XOR optimized cpus. Now that would be cool. All >> it would need to store was meta-data when it had already seen the exact >> same block. I think fundamentally it is similar in result to on the fly >> disk compression. >> > > Actually, the impact - if the filesystem is designed correctly - shouldn't be > that horrible. After all, Sun has managed to integrate checksums into ZFS and > still get great performance. In addition, ZFS doesn't directly overwrite data > but uses a new datablock each time... > > What you would have to do then is keep a lookup table with the checksums to > find possible matches quickly. Then when you find one, do another compare to > be 100% sure you didn't have a collision on your checksums. If that works, > then you can reference that datablock. > > It is still a lot of work, but as sun showed, on the fly compares and > checksums are doable without too much of a hit. > > Peter. > > > I'm not very knowledgeable on how filesystems work. Is there a primer I can brush up on somewhere? I'm thinking about implementing a proof of concept using Java and Fuse. Russ