Re: [CentOS] Large file system idea

17 May 2014


      On Sat, May 17, 2014 at 10:30 AM, Steve Thompson smt@vgersoft.com wrote:
...
This idea is intruiging...
Suppose one has a set of file servers called A, B, C, D, and so forth, all
running CentOS 6.5 64-bit, all being interconnected with 10GbE. These file
servers can be divided into identical pairs, so A is the same
configuration (diks, processors, etc) as B, C the same as D, and so forth
(because this is what I have; there are ten servers in all). Each file
server has four Xeon 3GHz processors and 16GB memory. File server A acts
as an iscsi target for logical volumes A1, A2,...An, and file server B
acts as an iscsi target for logical volumes B1, B2,...Bn, where each LVM
volume is 10 TB in size (a RAID-5 set of six 2TB NL-SAS disks). There are
no file systems directly built on any of the LVM volumes. Each member of a
server pair (A,B) are in different cabinets (albeit in the same machine
room) and are on different power circuits, and have UPS protection.
A server system called S (which has six processors and 48 GB memory, and
is not one of the file servers), acts as iscsi initiator for all targets.
On S, A1 and B1 are combined into the software RAID-1 volume /dev/md101.
Sounds like you might be reinventing the wheel.
DRBD [0] does what it sounds like you're trying to accomplish [1].
Especially since you have two nodes A+B or C+D that are RAIDed over iSCSI.
It's rather painless to set up two-nodes with DRBD.
But once you want to sync three [2] or more nodes with each other, the
number of resources (DRBD block devices) becomes exponentially larger.
 Linbit, the developers behind DRBD, call it resource stacking.
[0] http://www.drbd.org/
[1] http://www.drbd.org/users-guide-emb/ch-configure.html
[2] http://www.drbd.org/users-guide-emb/s-three-nodes.html
...
Similarly, A2 and B2 are combined into /dev/md102, and so forth for as
many target pairs as one has. The initial sync of /dev/md101 takes about 6
hours, with the sync speed being around 400 MB/sec for a 10TB volume. I
realize that only half of the 10-gig bandwidth is available while writing,
since the data is being written twice.
All of the /dev/md10X volumes are LVM PV's and are members of the same
volume group, and there is one logical volume that occupies the entire
volume group. An XFS file system (-i size=512, inode64) is built on top of
this logical volume, and S NFS-exports that to the world (an HPC cluster
of about 200 systems). In my case, the size of the resulting file system
will ultimately be around 80 TB. The I/O performance of the xfs file
system is most excellent, and exceeds by a large amount the performance of
the equivalent file systems built with such packages as MooseFS and
GlusterFS: I get about 350 MB/sec write speed through the file system, and
up to 800 MB/sec read.
I have built something like this, and by performing tests such as sending
a SIGKILL to one of the tgtd's, I have been unable to kill access to the
file system. Obviously one has to manually intervene on the return of the
tgtd in order to fail/hot-remove/hot-add the relevent target(s) to the md
device. Presumably this will be made easier by using persistent device
names for the targets on S.
One could probably expand this to supplement the server S with a second
server T to allow the possibility of failover of the service should S
croak. I haven't tackled that part yet.
So, what failure scenarios can take out the entire file system, assuming
that both members of a pair (A,B) or (C,D) don't go down at the same time?
There's no doubt that I haven't thought of something.
Steve

Steve Thompson                 E-mail:      smt AT vgersoft DOT com
Voyager Software LLC           Web:         http://www DOT vgersoft DOT
com
39 Smugglers Path              VSW Support: support AT vgersoft DOT com
Ithaca, NY 14850
   "186,282 miles per second: it's not just a good idea, it's the law"


CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
-- 
---~~.~~---
Mike
//  SilverTip257  //

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] Large file system idea

Steve