redhat at mckerrs.net wrote: > > ----- Original Message ----- > From: rsivak at istandfor.com > To: "CentOS Mailing list" <centos at centos.org> > Sent: Thursday, December 6, 2007 11:18:16 AM (GMT+1000) Australia/Brisbane > Subject: [CentOS] Filesystem that doesn't store duplicate data > > Is there such a filesystem available? It seems like it wouldn't be > too hard to implement... Basically do things on a block by block > basis. Store md5 of a block in the table, and when writing a new > block, check if the md5 already exists and then point the new block to > the old block. Since md5 is not guaranteed unique, might need to do a > diff between the 2 blocks and if the blocks are indeed different, > handle it somehow. > > When modifying an existing block that has multiple pointers, copy the > block and modify the new block. > > I know I'm oversimplifying things a lot, but something like this could > work, no? Would be a great filesystem to store backups on, or things > like vmware volumes... > > Russ > Sent from my Verizon Wireless BlackBerry > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > > > You are describing what I understand to be 'Data De-duplication". It > is all the rage for backups as it has the potential to decrease backup > times and volumes by significant amounts. I went to a presentation by > Avamar (a partner of EMC ?) regarding this technology and it seemed > really nice for your typical windows file server. I suppose it > effectively turns your data into 'single-instance' which is no bad > thing. I suppose it could be useful for large database backups as well. > > You'd think that using this technology on a live filesystem could > incur a significant performance penalty due to all those calculations > (fuse module anyone ?). Imagine a hardware optimized data > de-duplication disk controller, similar to raid XOR optimized cpus. > Now that would be cool. All it would need to store was meta-data when > it had already seen the exact same block. I think fundamentally it is > similar in result to on the fly disk compression. > > Let us know when you have a beta to test ! > > 8^) > I'm not sure if this would be possible to make available on a disk controller, as I don't think a controller can store the amount of data necessary to store the hashes. I am thinking of maybe making it as a fuse module. I'm most familiar with Java, and there are fuse bindings for java. I would love to make at least a proof of concept FS that does this. Does fuse exist for windows? How does one test a fuse module while developing it? Russ