Hi all,
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
Thanks in advance for any and all advice.
Boris.
Boris wrote:
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
We've been looking at glusterfs here. It's under active development, has some problems, but it does work, and is in use a number of places around the world.
mark
On Wed, Jun 16, 2010 at 4:05 PM, m.roth@5-cent.us wrote:
Boris wrote:
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
We've been looking at glusterfs here. It's under active development, has some problems, but it does work, and is in use a number of places around the world.
mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Thanks Mark,
Will surely check Glusterfs out. What's your thoughts on GPFS: http://en.wikipedia.org/wiki/GPFS ?
Boris.
Boris wrote:
On Wed, Jun 16, 2010 at 4:05 PM, m.roth@5-cent.us wrote:
Boris wrote:
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
We've been looking at glusterfs here. It's under active development, has some problems, but it does work, and is in use a number of places around the world.
Will surely check Glusterfs out. What's your thoughts on GPFS: http://en.wikipedia.org/wiki/GPFS ?
No idea, never used it. Glusterfs is the first time I've ever needed to look at a clustered f/s... but then, where I'm at now is the first place I've ever worked with HPC clusters, as opposed to h/a clusters.
mark
On 16/06/2010 21:11, Boris Epstein wrote:
Will surely check Glusterfs out. What's your thoughts on GPFS: http://en.wikipedia.org/wiki/GPFS ?
I've used gpfs in the past, but it was a long time back. It works, mostly just does what it needs to do and stays out of your way. When we were using it, needed an AIX node for some of the director stuff, but I've seen it run from a pure linux environment recently ( on CentOS-4 ! )
If I inherited a gpfs run stack, I wont complain about it. But if I was doing something new, I'd look elsewhere. eg. Ceph is interesting.
- KB
On Wednesday 16 June 2010, Boris Epstein wrote:
On Wed, Jun 16, 2010 at 4:05 PM, m.roth@5-cent.us wrote:
Boris wrote:
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
We've been looking at glusterfs here. It's under active development, has some problems, but it does work, and is in use a number of places around the world.
...
Will surely check Glusterfs out. What's your thoughts on GPFS: http://en.wikipedia.org/wiki/GPFS ?
We run GPFS (and lustre) on CentOS-5(x86_64). GPFS is quite nice and very flexible but costs money. Lustre on the other hand is free and very scalable but lacks many of the features of GPFS.
Never tried Glusterfs and Ceph is not even close to mature enough for actual use (from what I've seen).
/Peter
Boris Epstein wrote, On 06/16/2010 03:33 PM:
Hi all,
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
Thanks in advance for any and all advice.
Boris.
I have not used a cluster FS, but have seen some discussions of them over on the drbd list[1] , and you did not mention what kind of backing devices you were going to have for the filesystem. In the drbd documentation[2] they have some discussion of gfs and ocfs2 which may be of some help.
In short if you are considering DRBD as a backing device, definitely ask over on their mailing list and I suspect that mailing list population has a higher percentage of folks who use cluster FSs.
[1] http://lists.linbit.com/mailman/listinfo/drbd-user [2] http://www.drbd.org/docs/applications/ http://www.drbd.org/users-guide-emb/ch-gfs.html#s-gfs-primer http://www.drbd.org/users-guide-emb/ch-ocfs2.html#s-ocfs2-primer
On 16/06/2010 21:12, Todd Denniston wrote:
In short if you are considering DRBD as a backing device, definitely ask over on their mailing list and I suspect that mailing list population has a higher percentage of folks who use cluster FSs.
DRBD is only worth looking at if you have something very small, or are in an edge case where distributing the application itself isnt an option. To be honest, those edge cases are drying up a bit these days.
I'd start by looking at the app and seeing if I can just distribute that. If not, then look at a distributed store ( riak anyone ? ) and if not then look at clustering a file system for legacy type use.
Let the app and deployment role define what sort of a hammer you want to use here :)
- KB
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
You haven't actually stated whether you want the backing devices distributed or have the file system support more than one mount?
You likely don't need a cluster aware fs, if you need to access the data in more than one place any of several file sharing methodologies will work.
I suspect as your storage need is large, you need to distribute it across more than one block device probably on several servers? DRBD is of no use here.
Clarify what you're after...
jlc
On Jun 16, 2010, at 4:17 PM, "Joseph L. Casale" <jcasale@activenetwerx.com
wrote:
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
You haven't actually stated whether you want the backing devices distributed or have the file system support more than one mount?
You likely don't need a cluster aware fs, if you need to access the data in more than one place any of several file sharing methodologies will work.
I suspect as your storage need is large, you need to distribute it across more than one block device probably on several servers? DRBD is of no use here.
You are probably looking to have multiple iSCSI/FC storage servers/ appliances in the backend with one or more NAS head servers serving it up via NFS/CIFS.
If the head servers will be serving the same file systems simultaneously then you need a cluster file system and clustering software. If each head server will be serving a distinct file system then you probably just need some HA software like heartbeat or pacemaker to have those exports fail-over to the other head server(s) in the event of a head server failure.
-Ross
On Thu, Jun 17, 2010 at 1:03 AM, Boris Epstein borepstein@gmail.com wrote:
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
You need a shared SAN back end to run traditional cluster file systems.
If you environment is all Linux, then Lustre (lustre.org) works well.
If you need other OS support, the commercial alternatives like Quantum StorNext and IBRIX (now acquired by HP) are good alternatives.
- Raja
Give GFS a chance, works very well for us and centos ships it
On 06/17/2010 10:19 AM, Raja Subramanian wrote:
On Thu, Jun 17, 2010 at 1:03 AM, Boris Epstein borepstein@gmail.com wrote:
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
You need a shared SAN back end to run traditional cluster file systems.
If you environment is all Linux, then Lustre (lustre.org) works well.
If you need other OS support, the commercial alternatives like Quantum StorNext and IBRIX (now acquired by HP) are good alternatives.
- Raja
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 17/06/2010 09:28, Juergen Gotteswinter wrote:
Give GFS a chance, works very well for us and centos ships it
yes, seconded. The gfs stack works really well too. I'm running 2 instances and have not really had any major 'issues'. Production grade clvm's snapshot's would be a nice-to-have, but not everyone needs those.
- KB
Boris Epstein sent a missive on 2010-06-16:
Hi all,
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
Thanks in advance for any and all advice.
Take a look at hadoop http://hadoop.apache.org and specifically HDFS (hadoop distributed file system) http://hadoop.apache.org/hdfs/ I've used it in conjunction with nutch across 20 odd servers (circa 10TB). When I used it the down side was a single metadata node, but this may have changed by now. The data is stored redundantly across the nodes and doesn't seem to require any special hardware (I ran it on dell 1425's).
HTH
Simon.
On Wed, 16 Jun 2010 15:33:02 -0400 Boris Epstein borepstein@gmail.com wrote:
Hi all,
I am just trying to consider my options for storing a large mass of data (tens of terrabytes of files) and one idea is to build a clustered FS of some kind. Has anybody had any experience with that? Any recommendations?
Thanks in advance for any and all advice.
Boris.
Hi, You can take a look at http://www.moosefs.org. It is a network, fault-tolerant FS, posix compliant, allows snapshots, uses fuse, your code doesn't need to be changed to access the FS. You can easily choose the number of replicas of files/dirs you want. It is easy to deploy, runs in user-space. Some people runs it successfully on 500+TB. Plus, I've made a CentOS repo here: http://centos.kodros.fr/moosefs.repo Regards, Laurent