Hello listmates,
This is not specifically CentOS-related - though I will probably execute this design on CentOS if I decide to do so. It will certainly be some kind of Linux.
At any rate, here's my situation. I would like to build a fairly large storage solution (let us say, 100 TB). I want this solution to be distributed and redundant. I want to be able to lose part of the machines involved and still stay operational (the bigger part, the better). I would prefer to avoid having to buy large servers to accomplish this task.
What I am soliciting here is thoughts, reports from experience, recommendations, etc.
Thanks in advance.
Boris.
Boris Epstein writes:
Hello listmates,
This is not specifically CentOS-related - though I will probably execute this design on CentOS if I decide to do so. It will certainly be some kind of Linux.
At any rate, here's my situation. I would like to build a fairly large storage solution (let us say, 100 TB). I want this solution to be distributed and redundant. I want to be able to lose part of the machines involved and still stay operational (the bigger part, the better). I would prefer to avoid having to buy large servers to accomplish this task.
What I am soliciting here is thoughts, reports from experience, recommendations, etc.
Thanks in advance.
Boris.
Hello Boris,
I'm in a similar search for a scalable and resilient solution. So far I like glusterfs, relatively easy to setup, no meta-server required, decent performance, but I haven't tested it thoroughly. Been playing with their latest beta release in a raid0+1 setup; haven't managed to lose any data yet.
I'll also be interested in opinions from other people.
-- Nux! www.nux.ro
Hi,
I'm happily running moosefs (packages available in rpmforge repo) for a year and a half, 120TB, soon 200. So easy to setup and grow it's indecent :)
Laurent.
On Sat, Feb 4, 2012 at 11:41 AM, Laurent Wandrebeck l.wandrebeck@gmail.comwrote:
Hi,
I'm happily running moosefs (packages available in rpmforge repo) for a year and a half, 120TB, soon 200. So easy to setup and grow it's indecent :)
Laurent.
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hello Laurent,
Thanks! Very useful info, I never even heard of MooseFS and it sounds very nice.
One question: what happens if you lose your master server in their designation? Or is it possible to make the master server redundant as well?
Boris.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Le 04/02/2012 18:39, Boris Epstein a écrit :
Hello Laurent,
Thanks! Very useful info, I never even heard of MooseFS and it sounds very nice.
One question: what happens if you lose your master server in their designation? Or is it possible to make the master server redundant as well?
Master HA is not yet possible from moosefs itself. You can use one (or more) metalogger(s) to keep backups of metadata, so you can start another master to replace the failing one. master (ECC ram, redundant psu) never failed here, fingers crossed :) Laurent.
On 02/04/2012 11:39 AM, Boris Epstein wrote:
On Sat, Feb 4, 2012 at 11:41 AM, Laurent Wandrebeck l.wandrebeck@gmail.comwrote:
Hi,
I'm happily running moosefs (packages available in rpmforge repo) for a year and a half, 120TB, soon 200. So easy to setup and grow it's indecent :)
Laurent.
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hello Laurent,
Thanks! Very useful info, I never even heard of MooseFS and it sounds very nice.
One question: what happens if you lose your master server in their designation? Or is it possible to make the master server redundant as well?
Boris.
You said Cloud and machines ... then you described something that you can do on one box with a bunch of drives.
Do you really want a cloud (a bunch of machines with their own drives) or a large RAID array?
You are getting answers for both now.
If you really do want some kind of cloud storage system and you are putting the machines in one datacenter ... I would recommend GlusterFS:
GlusterFS has been bought by Red Hat and they offer it in a Storage solution right now ... And they have CentOS RPMs here for centos5 and centos6:
http://download.gluster.com/pub/gluster/glusterfs/LATEST/CentOS/
If you use the replicated volumes, you can lose bunches of machines and still have functioning service:
http://download.gluster.com/pub/gluster/glusterfs/3.2/Documentation/AG/html/...
Hello Boris,
I'm in a similar search for a scalable and resilient solution. So far I like glusterfs, relatively easy to setup, no meta-server required, decent performance, but I haven't tested it thoroughly. Been playing with their latest beta release in a raid0+1 setup; haven't managed to lose any data yet.
I'll also be interested in opinions from other people.
-- Nux! www.nux.ro
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Nux,
Thanks for your response. I have looked into glusterfs and I like it too. I just haven't found the hardware to try it on.
What is RAID0+1? The flat RAID with one parity disk?
Boris.
Boris Epstein writes:
Hello Boris,
I'm in a similar search for a scalable and resilient solution. So far I like glusterfs, relatively easy to setup, no meta-server required, decent performance, but I haven't tested it thoroughly. Been playing with their latest beta release in a raid0+1 setup; haven't managed to lose any data yet.
I'll also be interested in opinions from other people.
-- Nux! www.nux.ro
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Nux,
Thanks for your response. I have looked into glusterfs and I like it too. I just haven't found the hardware to try it on.
I "tested" it on 4 VMs.. The performance was crap as expected, but wanted to see how it behaves when I suddenly remove a node from the setup and so on. (it went well, the setup froze for a second but after that kept working at normal parameters)
What is RAID0+1? The flat RAID with one parity disk?
No, I should've rephrased this, I meant the likes of raid10, of course, in Glusterfs "speak". Basically I had 2 pairs of replicated nodes and files stripped across all this. I even ran a VM on top of this VM based glusterfs setup.. not the speediest VM, but was usable. :-)
-- Nux! www.nux.ro
Boris Epstein wrote on 02/04/2012 11:57 AM:
What is RAID0+1?
Nested RAID. Paraphrasing http://en.wikipedia.org/wiki/RAID :
For a RAID 0+1, drives are first combined into multiple level 0 RAIDs that are themselves treated as single drives to be combined into a single RAID 1.
Phil
On Sun, Feb 5, 2012 at 10:32 AM, Phil Schaffner <Philip.R.Schaffner@nasa.gov
wrote:
Boris Epstein wrote on 02/04/2012 11:57 AM:
What is RAID0+1?
Nested RAID. Paraphrasing http://en.wikipedia.org/wiki/RAID :
For a RAID 0+1, drives are first combined into multiple level 0 RAIDs that are themselves treated as single drives to be combined into a single RAID 1.
Phil
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Thanks Phil!
On 02/05/2012 04:37 PM, Boris Epstein wrote:
On Sun, Feb 5, 2012 at 10:32 AM, Phil Schaffner<Philip.R.Schaffner@nasa.gov
wrote:
Boris Epstein wrote on 02/04/2012 11:57 AM:
What is RAID0+1?
Nested RAID. Paraphrasing http://en.wikipedia.org/wiki/RAID :
For a RAID 0+1, drives are first combined into multiple level 0 RAIDs that are themselves treated as single drives to be combined into a single RAID 1.
Google (or other search engine) and Wikipedia are truly a wonder :D
On Feb 5, 2012, at 10:32 AM, Phil Schaffner Philip.R.Schaffner@NASA.gov wrote:
Boris Epstein wrote on 02/04/2012 11:57 AM:
What is RAID0+1?
Nested RAID. Paraphrasing http://en.wikipedia.org/wiki/RAID :
For a RAID 0+1, drives are first combined into multiple level 0 RAIDs that are themselves treated as single drives to be combined into a single RAID 1.
Probably the worse setup, a failure on both sides of a mirror means total loss and with the # of disks on each side of this setup the chance of this is much greater, recovery from a failure is a lot longer cause the whole stripe needs to re-mirror. While performance of reads is equal to 1+0 the writes are equal to a single mirror cause both sides need to complete before the next operation can run or only one write operation on the array at a time.
Much better RAID level is 1+0 which is a series of mirrors striped together. While a failure on both sides of any one mirror is total for the array there is only 1 disk on either side so the odds are less, recovery from failure is faster as well cause only one disk needs to be re-mirrored. Performance of reads and writes are equal because each mirror can perform writes independant of the others, or # of write operations equal to the number of mirrors.
-Ross
On Sun, Feb 5, 2012 at 5:31 PM, Ross Walker rswwalker@gmail.com wrote:
On Feb 5, 2012, at 10:32 AM, Phil Schaffner Philip.R.Schaffner@NASA.gov wrote:
Boris Epstein wrote on 02/04/2012 11:57 AM:
What is RAID0+1?
Nested RAID. Paraphrasing http://en.wikipedia.org/wiki/RAID :
For a RAID 0+1, drives are first combined into multiple level 0 RAIDs that are themselves treated as single drives to be combined into a single RAID 1.
Probably the worse setup, a failure on both sides of a mirror means total loss and with the # of disks on each side of this setup the chance of this is much greater, recovery from a failure is a lot longer cause the whole stripe needs to re-mirror. While performance of reads is equal to 1+0 the writes are equal to a single mirror cause both sides need to complete before the next operation can run or only one write operation on the array at a time.
Much better RAID level is 1+0 which is a series of mirrors striped together. While a failure on both sides of any one mirror is total for the array there is only 1 disk on either side so the odds are less, recovery from failure is faster as well cause only one disk needs to be re-mirrored. Performance of reads and writes are equal because each mirror can perform writes independant of the others, or # of write operations equal to the number of mirrors.
-Ross
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Ross,
What you are saying seems to make sense actually. I wonder how much a RAID6 with a few spares would make sense. If we are talking a large number of disks then RAID 6 + 2 spares means overpaying only for 5 disks. Not a lot if the total number of them is, say, 20.
Boris.
On 02/05/12 2:42 PM, Boris Epstein wrote:
What you are saying seems to make sense actually. I wonder how much a RAID6 with a few spares would make sense. If we are talking a large number of disks then RAID 6 + 2 spares means overpaying only for 5 disks. Not a lot if the total number of them is, say, 20.
except, you don't want more than about 12 disks max in a single raid5/6 group, or the performance penalties become enormous and the rebuild times become astronomical.
On Feb 5, 2012, at 5:42 PM, Boris Epstein borepstein@gmail.com wrote:
What you are saying seems to make sense actually. I wonder how much a RAID6 with a few spares would make sense. If we are talking a large number of disks then RAID 6 + 2 spares means overpaying only for 5 disks. Not a lot if the total number of them is, say, 20.
Don't approach it as purely a cost analysis, but what you require for your application.
If you have a write-mostly transactional application then RAID10 makes sense, if you have 50/50 app then maybe a RAID50 out of several small RAID5s, if you have a read mostly or long-term archival storage then a RAID6.
I wouldn't create an array out of more then 12 disks unless it was a RAID10 cause rebuild times would put the array in jeopardy of a cascading failure. You could create a RAID50 out of 3 6 disk RAID5s with 2 hot spares. That's 15 disk usable space with 3 disks of parity and 2 disk spares. That would give decent performance with ability to handle 3 disk failures (spread across different RAID5s). When setting it up setup every third disk as part of a RAID5 just cause I have seen double failures and for some reason they were side-by-side for me.
It might be easier to do the striping in software cause that's a zero over-head operation and it makes the hardware RAID easier to setup, maintain and can make rebuilds less painful depending on the controller.
-Ross
On 02/05/12 3:24 PM, Ross Walker wrote:
It might be easier to do the striping in software cause that's a zero over-head operation and it makes the hardware RAID easier to setup, maintain and can make rebuilds less painful depending on the controller.
I just tried a bunch of combinations on a 3 x 11 raid60 configuration plus 3 global hotspares, and decided that letting the controller (LSI 9260-8i MegaSAS2) do it was easier all the way around. of course, with other controllerrs, your mileage may vary. and yes, megacli64 is an ugly tool to tame.
with 3TB SAS drives, single drive failures rebuild in 12 hours, double failures in 18 hours. (failures forced by disabling drives via megacli)
On 02/06/2012 12:33 AM, John R Pierce wrote:
On 02/05/12 3:24 PM, Ross Walker wrote:
It might be easier to do the striping in software cause that's a zero over-head operation and it makes the hardware RAID easier to setup, maintain and can make rebuilds less painful depending on the controller.
I just tried a bunch of combinations on a 3 x 11 raid60 configuration plus 3 global hotspares, and decided that letting the controller (LSI 9260-8i MegaSAS2) do it was easier all the way around. of course, with other controllerrs, your mileage may vary. and yes, megacli64 is an ugly tool to tame.
with 3TB SAS drives, single drive failures rebuild in 12 hours, double failures in 18 hours. (failures forced by disabling drives via megacli)
What about Software RAID 10 (far)? It gives 2 x read speed and 1 x write speed (speed of single HDD).
On 02/05/12 3:49 PM, Ljubomir Ljubojevic wrote:
What about Software RAID 10 (far)? It gives 2 x read speed and 1 x write speed (speed of single HDD).
we use raid10 for all our database servers. often as many as 20 disks in a single raid set.
On Feb 5, 2012, at 6:33 PM, John R Pierce pierce@hogranch.com wrote:
I just tried a bunch of combinations on a 3 x 11 raid60 configuration plus 3 global hotspares, and decided that letting the controller (LSI 9260-8i MegaSAS2) do it was easier all the way around. of course, with other controllerrs, your mileage may vary. and yes, megacli64 is an ugly tool to tame.
Some controllers are better.
Software based stripes do allow you to span RAID controllers though which provides a lot of flexibility.
When I do do software striping I do it within LVM instead of creating a RAID0 as I found it easier to manage long term.
-Ross