On Sat, May 17, 2014 at 10:30 AM, Steve Thompson <smt at vgersoft.com> wrote: > This idea is intruiging... > > Suppose one has a set of file servers called A, B, C, D, and so forth, all > running CentOS 6.5 64-bit, all being interconnected with 10GbE. These file > servers can be divided into identical pairs, so A is the same > configuration (diks, processors, etc) as B, C the same as D, and so forth > (because this is what I have; there are ten servers in all). Each file > server has four Xeon 3GHz processors and 16GB memory. File server A acts > as an iscsi target for logical volumes A1, A2,...An, and file server B > acts as an iscsi target for logical volumes B1, B2,...Bn, where each LVM > volume is 10 TB in size (a RAID-5 set of six 2TB NL-SAS disks). There are > no file systems directly built on any of the LVM volumes. Each member of a > server pair (A,B) are in different cabinets (albeit in the same machine > room) and are on different power circuits, and have UPS protection. > > A server system called S (which has six processors and 48 GB memory, and > is not one of the file servers), acts as iscsi initiator for all targets. > On S, A1 and B1 are combined into the software RAID-1 volume /dev/md101. > Sounds like you might be reinventing the wheel. DRBD [0] does what it sounds like you're trying to accomplish [1]. Especially since you have two nodes A+B or C+D that are RAIDed over iSCSI. It's rather painless to set up two-nodes with DRBD. But once you want to sync three [2] or more nodes with each other, the number of resources (DRBD block devices) becomes exponentially larger. Linbit, the developers behind DRBD, call it resource stacking. [0] http://www.drbd.org/ [1] http://www.drbd.org/users-guide-emb/ch-configure.html [2] http://www.drbd.org/users-guide-emb/s-three-nodes.html > Similarly, A2 and B2 are combined into /dev/md102, and so forth for as > many target pairs as one has. The initial sync of /dev/md101 takes about 6 > hours, with the sync speed being around 400 MB/sec for a 10TB volume. I > realize that only half of the 10-gig bandwidth is available while writing, > since the data is being written twice. > > All of the /dev/md10X volumes are LVM PV's and are members of the same > volume group, and there is one logical volume that occupies the entire > volume group. An XFS file system (-i size=512, inode64) is built on top of > this logical volume, and S NFS-exports that to the world (an HPC cluster > of about 200 systems). In my case, the size of the resulting file system > will ultimately be around 80 TB. The I/O performance of the xfs file > system is most excellent, and exceeds by a large amount the performance of > the equivalent file systems built with such packages as MooseFS and > GlusterFS: I get about 350 MB/sec write speed through the file system, and > up to 800 MB/sec read. > > I have built something like this, and by performing tests such as sending > a SIGKILL to one of the tgtd's, I have been unable to kill access to the > file system. Obviously one has to manually intervene on the return of the > tgtd in order to fail/hot-remove/hot-add the relevent target(s) to the md > device. Presumably this will be made easier by using persistent device > names for the targets on S. > > One could probably expand this to supplement the server S with a second > server T to allow the possibility of failover of the service should S > croak. I haven't tackled that part yet. > > So, what failure scenarios can take out the entire file system, assuming > that both members of a pair (A,B) or (C,D) don't go down at the same time? > There's no doubt that I haven't thought of something. > > Steve > -- > > ---------------------------------------------------------------------------- > Steve Thompson E-mail: smt AT vgersoft DOT com > Voyager Software LLC Web: http://www DOT vgersoft DOT > com > 39 Smugglers Path VSW Support: support AT vgersoft DOT com > Ithaca, NY 14850 > "186,282 miles per second: it's not just a good idea, it's the law" > > ---------------------------------------------------------------------------- > _______________________________________________ > CentOS mailing list > CentOS at centos.org > http://lists.centos.org/mailman/listinfo/centos > -- ---~~.~~--- Mike // SilverTip257 //