On 8/4/10, JohnS <jses27 at gmail.com> wrote: > Would this be it? > http://www.mail-archive.com/lustre-discuss@lists.lustre.org/msg06952.html Yes, that is my thread :) > This is something like Virtual Storage like Covalent and IBM have. Very > costly to implement the right way and interesting also. That's what I end up concluding as well based on the replies. Effectively, I would need to double up each storage node with a failover node if I really want to guard against machine failure. It seems a lot cheaper to use gluster where the failover machine can also be an active node. So with a criss-cross arrangement suggested by one of the gluster experts, I could get machine redundancy with only half the physical servers. e.g. S1, S2, S3 with 2 RAID 1 block devices each. S1-A stores data1 and S1-B replicates S2-A, then S2-A stores data2 and S2-B replicates S3-A etc. Not as fully redundant as 1 for 1 failover but I could achieve that by replicating on another cheap server with N+1 RAID 5 for every N machine. So gluster seems a lot more flexible and cost effective to me, especially without the need for a dedicated metadata server. Last but most importantly, it seems easier to recover from since it works on top of the underlying fs, So I figured I can always pull drives from a dead machine and read the files directly off the disk if really necessary. Only concern now is the usual split-brain issue and whether linuxZFS is matured enough to be used in conjunction as the underlying fs on Centos5.