Re: [CentOS] Filesystem that doesn't store duplicate data

6 Dec 2007


      Peter Arremann wrote:
...
On Wednesday 05 December 2007, redhat@mckerrs.net wrote:
...
You'd think that using this technology on a live filesystem could incur a
significant performance penalty due to all those calculations (fuse module
anyone ?). Imagine a hardware optimized data de-duplication disk
controller, similar to raid XOR optimized cpus. Now that would be cool. All
it would need to store was meta-data when it had already seen the exact
same block. I think fundamentally it is similar in result to on the fly
disk compression.
Actually, the impact - if the filesystem is designed correctly - shouldn't be 
that horrible. After all, Sun has managed to integrate checksums into ZFS and 
still get great performance. In addition, ZFS doesn't directly overwrite data 
but uses a new datablock each time...
What you would have to do then is keep a lookup table with the checksums to 
find possible matches quickly. Then when you find one, do another compare to 
be 100% sure you didn't have a collision on your checksums. If that works, 
then you can reference that datablock.
It is still a lot of work, but as sun showed, on the fly compares and 
checksums are doable without too much of a hit.
Peter.
I'm not very knowledgeable on how filesystems work.  Is there a primer I 
can brush up on somewhere?  I'm thinking about implementing a proof of 
concept using Java and Fuse.
Russ

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] Filesystem that doesn't store duplicate data