[CentOS] Large file system idea

Sat May 17 15:19:26 UTC 2014

On Sat, May 17, 2014 at 10:30 AM, Steve Thompson <smt at vgersoft.com> wrote:

> This idea is intruiging...
>
> Suppose one has a set of file servers called A, B, C, D, and so forth, all
> running CentOS 6.5 64-bit, all being interconnected with 10GbE. These file
> servers can be divided into identical pairs, so A is the same
> configuration (diks, processors, etc) as B, C the same as D, and so forth
> (because this is what I have; there are ten servers in all). Each file
> server has four Xeon 3GHz processors and 16GB memory. File server A acts
> as an iscsi target for logical volumes A1, A2,...An, and file server B
> acts as an iscsi target for logical volumes B1, B2,...Bn, where each LVM
> volume is 10 TB in size (a RAID-5 set of six 2TB NL-SAS disks). There are
> no file systems directly built on any of the LVM volumes. Each member of a
> server pair (A,B) are in different cabinets (albeit in the same machine
> room) and are on different power circuits, and have UPS protection.
>
> A server system called S (which has six processors and 48 GB memory, and
> is not one of the file servers), acts as iscsi initiator for all targets.
> On S, A1 and B1 are combined into the software RAID-1 volume /dev/md101.
>

Sounds like you might be reinventing the wheel.
DRBD [0] does what it sounds like you're trying to accomplish [1].

Especially since you have two nodes A+B or C+D that are RAIDed over iSCSI.

It's rather painless to set up two-nodes with DRBD.
But once you want to sync three [2] or more nodes with each other, the
number of resources (DRBD block devices) becomes exponentially larger.
 Linbit, the developers behind DRBD, call it resource stacking.

[0] http://www.drbd.org/
[1] http://www.drbd.org/users-guide-emb/ch-configure.html
[2] http://www.drbd.org/users-guide-emb/s-three-nodes.html

> Similarly, A2 and B2 are combined into /dev/md102, and so forth for as
> many target pairs as one has. The initial sync of /dev/md101 takes about 6
> hours, with the sync speed being around 400 MB/sec for a 10TB volume. I
> realize that only half of the 10-gig bandwidth is available while writing,
> since the data is being written twice.
>
> All of the /dev/md10X volumes are LVM PV's and are members of the same
> volume group, and there is one logical volume that occupies the entire
> volume group. An XFS file system (-i size=512, inode64) is built on top of
> this logical volume, and S NFS-exports that to the world (an HPC cluster
> of about 200 systems). In my case, the size of the resulting file system
> will ultimately be around 80 TB. The I/O performance of the xfs file
> system is most excellent, and exceeds by a large amount the performance of
> the equivalent file systems built with such packages as MooseFS and
> GlusterFS: I get about 350 MB/sec write speed through the file system, and
> up to 800 MB/sec read.
>
> I have built something like this, and by performing tests such as sending
> a SIGKILL to one of the tgtd's, I have been unable to kill access to the
> file system. Obviously one has to manually intervene on the return of the
> tgtd in order to fail/hot-remove/hot-add the relevent target(s) to the md
> device. Presumably this will be made easier by using persistent device
> names for the targets on S.
>
> One could probably expand this to supplement the server S with a second
> server T to allow the possibility of failover of the service should S
> croak. I haven't tackled that part yet.
>
> So, what failure scenarios can take out the entire file system, assuming
> that both members of a pair (A,B) or (C,D) don't go down at the same time?
> There's no doubt that I haven't thought of something.
>
> Steve
> --
>
> ----------------------------------------------------------------------------
> Steve Thompson                 E-mail:      smt AT vgersoft DOT com
> Voyager Software LLC           Web:         http://www DOT vgersoft DOT
> com
> 39 Smugglers Path              VSW Support: support AT vgersoft DOT com
> Ithaca, NY 14850
>    "186,282 miles per second: it's not just a good idea, it's the law"
>
> ----------------------------------------------------------------------------
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>

-- 
---~~.~~---
Mike
//  SilverTip257  //