Hi All,
I am about to embark on a project that deals with allowing information archival, over time and seeing change over time as well. I can explain it a lot better, but I would certainly talk your ear off. I really don't have a lot of money to throw at the initial concept, but I have some. This device will host all of the operations for the first few months until I can afford to build a duplicate device. I already had a few parts of the idea done and ready to get live.
I am contemplating building a BackBlaze Style POD. The goal of the device is to start acting as a place to have the crawls store information, massage it, get it into db's and then notify the user the task is done so they can start looking at the results.
For reference here are a few links:
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-chea...
and
http://cleanenergy.harvard.edu/index.php?ira=Jabba&tipoContenido=sidebar...
There is room for 45 drives in the case (technically a few more).
45 x 1tb 7200rpm drives is really cheap, about $60 each.
45 x 1.5tb 7200rpm drives are about $70 each.
45 x 2tb 7200rpm drives are about $120 each
45 x 3tb 7200rpm drives are about $180-$230 each (or more, some are almost $400)
I have question before I commit to building one and I was hoping to get advice.
1. Can anyone recommend a mobo/processor setup that can hold lots of RAM? Like 24gb or 64gb or more?
2. Hardware RAID or Software RAID for this?
3. Would CentOS be a good choice? I have never used CentOS on a device so massive. Just ordinary servers, so to speak. I assume that it could handle so many drives, a large, expanding file system.
4. Someone recommended ZFS but I dont recall that being available on CentOS, but it is on FreeBSD which I have little experience with.
5. How would someone realistically back something like this up?
Ultimately I know over time I need to distribute my architecture out and have a number of web-servers, balancing, etc but to get started I think this device with good backups might fit the bill.
I can be way more detailed if it helps, I just didn't want to clutter with information that might not be relevant.
-----Original Message----- From: Jason Sent: Sunday, May 08, 2011 14:04 To: CentOS mailing list Subject: [CentOS] Building a Back Blaze style POD
Hi All,
I am about to embark on a project that deals with allowing information archival, over time and seeing change over time as well. I can explain it a lot better, but I would certainly talk your ear off. I really don't have a lot of money to throw at the initial concept, but I have some. This device will host all of the operations for the first few months until I can afford to build a duplicate device. I already had a few parts of the idea done and ready to get live.
I am contemplating building a BackBlaze Style POD. The goal of the device is to start acting as a place to have the crawls store information, massage it, get it into db's and then notify the user the task is done so they can start looking at the results.
For reference here are a few links:
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how
-to-build-cheap-cloud-storage/
and
http://cleanenergy.harvard.edu/index.php?ira=Jabba&tipoConteni
do=sidebar&sidebar=science
Distrubing, I was on the same pages a few hours ago.
There is room for 45 drives in the case (technically a few more).
45 x 1tb 7200rpm drives is really cheap, about $60 each.
45 x 1.5tb 7200rpm drives are about $70 each.
45 x 2tb 7200rpm drives are about $120 each
45 x 3tb 7200rpm drives are about $180-$230 each (or more, some are almost $400)
I have question before I commit to building one and I was hoping to get advice.
- Can anyone recommend a mobo/processor setup that can hold
lots of RAM? Like 24gb or 64gb or more?
- Hardware RAID or Software RAID for this?
Hardware to costly in $ Software to costly in CPU.
Try for redundancy.
- Would CentOS be a good choice? I have never used CentOS on
a device so massive. Just ordinary servers, so to speak. I assume that it could handle so many drives, a large, expanding file system.
Multiple file systems of GFS?
- Someone recommended ZFS but I dont recall that being
available on CentOS, but it is on FreeBSD which I have little experience with.
- How would someone realistically back something like this up?
You don't. You replicate it. We are looking at using it as an online cache of our backup media.
Ultimately I know over time I need to distribute my architecture out and have a number of web-servers, balancing, etc but to get started I think this device with good backups might fit the bill.
I can be way more detailed if it helps, I just didn't want to clutter with information that might not be relevant. -- Jason
-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100 - - +1 (443) 269-1555 x333 Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00.
Hi Jason,
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how
-to-build-cheap-cloud-storage/
and
http://cleanenergy.harvard.edu/index.php?ira=Jabba&tipoConteni
do=sidebar&sidebar=science
Distrubing, I was on the same pages a few hours ago.
The Internet is a small place!
BackBlaze actually send me the Harvard link when I inquired. They also told me they are coming out with an updated article based upon new specs, etc. They are not sure when it will be available.
- Would CentOS be a good choice? I have never used CentOS on
a device so massive. Just ordinary servers, so to speak. I assume that it could handle so many drives, a large, expanding file system.
Multiple file systems of GFS?
I don't quite know if file systems like this are avail for CentOS? I dont see it when I install, at least IIRC.
I will ned to research GFS more.
-Jason
On Sun, May 8, 2011 at 8:03 PM, Jason slackmoehrle.lists@gmail.com wrote:
Hi All,
I am about to embark on a project that deals with allowing information archival, over time and seeing change over time as well. I can explain it a lot better, but I would certainly talk your ear off. I really don't have a lot of money to throw at the initial concept, but I have some. This device will host all of the operations for the first few months until I can afford to build a duplicate device. I already had a few parts of the idea done and ready to get live.
I am contemplating building a BackBlaze Style POD. The goal of the device is to start acting as a place to have the crawls store information, massage it, get it into db's and then notify the user the task is done so they can start looking at the results.
For reference here are a few links:
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-chea...
and
http://cleanenergy.harvard.edu/index.php?ira=Jabba&tipoContenido=sidebar...
There is room for 45 drives in the case (technically a few more).
45 x 1tb 7200rpm drives is really cheap, about $60 each.
45 x 1.5tb 7200rpm drives are about $70 each.
45 x 2tb 7200rpm drives are about $120 each
45 x 3tb 7200rpm drives are about $180-$230 each (or more, some are almost $400)
I have question before I commit to building one and I was hoping to get advice.
- Can anyone recommend a mobo/processor setup that can hold lots of RAM?
Like 24gb or 64gb or more?
Any brand server motherboard will do. I prefer supermicro, but you can use Dell, HP, Intell, etc, etc.
- Hardware RAID or Software RAID for this?
Hardware RAID will be expensive on 45 drives. IF you can, split the 45 drives into a few smaller RAID arrays. To rebuild 1x large 45TB RAID array, with either hardware or software would probably take a week, or more, depending on which RAID type you use - i.e. RAID 5, or 6, or 10. I prefer RAID 10 since it's best for speed and the rebuilds are the quickest. But you loose half the space, i.e. 45TB drives will give you about 22TB space. 45x 2TB HDD's would give you about 44TB space though.
- Would CentOS be a good choice? I have never used CentOS on a device so
massive. Just ordinary servers, so to speak. I assume that it could handle so many drives, a large, expanding file system.
Yes it would be fine.
- Someone recommended ZFS but I dont recall that being available on
CentOS, but it is on FreeBSD which I have little experience with.
I would also prefer to use ZFS for this type of setup. use one 128GB SL type SSD drive as a cache drive to speed up things and 2x log drives to help with drive recovery. With ZFS you would be able to use one large RAID array if you have the log drives since it was recover from driver failure much better than other file systems. Although you can install ZFS as user-land tools, which will be slower than running it via the kernel. But, it would be better to use Solaris or FreeBSD for this - look @ Nexenta / FreeNAS / OpenIndia for this.
- How would someone realistically back something like this up?
To another one as large :)
OR, more realistically, if you already have some backup servers, and the full 45TB isn't full of data yet, then simply backup what you have. By the sounds of it your project is still new so your data won't be that much. I would simply build a gluster / CLVM cluster of smaller cheaper servers - which basically allows you to add say 4TB / 8TB (depending on what chassis you use and how many drives it can take) at a time to the backup cluster, which will be cheaper than buying another one identical to this right now.
Ultimately I know over time I need to distribute my architecture out and have a number of web-servers, balancing, etc but to get started I think this device with good backups might fit the bill.
If this device will be used for web + mail + SQL, then you may probably look at using 4 quad core CPU's + 128GB RAM. With this many drives (or rather, this much data) you'll probably run out of RAM / CPU / Network resources before you run out of HDD space.
With a device this big (in terms of storage) I would rather have 2 separate "processing" servers which just mounts LUN's from this POD (exported as NFS / iSCSI / FCoE / etc) and then have a few faster SAS / SSD drives for SQL / log processing.
I can be way more detailed if it helps, I just didn't want to clutter with information that might not be relevant. -- Jason
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Rudy,
Do you have a recommendation of a motherboard?
I am still reading the rest of your post. Thanks!
-Jason
On Sun, May 8, 2011 at 9:06 PM, Jason slackmoehrle.lists@gmail.com wrote:
Rudy,
Do you have a recommendation of a motherboard?
Well, choose one here: http://www.supermicro.com/products/motherboard/matrix/
I don't have specific recommendations but we've had great success with all our SuperMicro servers, both with single & dual CPU configurations, ranging from 4GB - 128GB RAM
I am still reading the rest of your post. Thanks!
-Jason
-- Jason
On Sunday, May 8, 2011 at 11:26 AM, Rudi Ahlers wrote:
On Sun, May 8, 2011 at 8:03 PM, Jason slackmoehrle.lists@gmail.com
wrote:
Hi All,
I am about to embark on a project that deals with allowing information
archival, over time and seeing change over time as well. I can explain it a lot better, but I would certainly talk your ear off. I really don't have a lot of money to throw at the initial concept, but I have some. This device will host all of the operations for the first few months until I can afford to build a duplicate device. I already had a few parts of the idea done and ready to get live.
I am contemplating building a BackBlaze Style POD. The goal of the
device is to start acting as a place to have the crawls store information, massage it, get it into db's and then notify the user the task is done so they can start looking at the results.
For reference here are a few links:
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-chea...
and
http://cleanenergy.harvard.edu/index.php?ira=Jabba&tipoContenido=sidebar...
There is room for 45 drives in the case (technically a few more).
45 x 1tb 7200rpm drives is really cheap, about $60 each.
45 x 1.5tb 7200rpm drives are about $70 each.
45 x 2tb 7200rpm drives are about $120 each
45 x 3tb 7200rpm drives are about $180-$230 each (or more, some are
almost $400)
I have question before I commit to building one and I was hoping to
get advice.
- Can anyone recommend a mobo/processor setup that can hold lots of
RAM? Like 24gb or 64gb or more?
Any brand server motherboard will do. I prefer supermicro, but you can
use Dell, HP, Intell, etc, etc.
- Hardware RAID or Software RAID for this?
Hardware RAID will be expensive on 45 drives. IF you can, split the 45
drives into a few smaller RAID arrays. To rebuild 1x large 45TB RAID array, with either hardware or software would probably take a week, or more, depending on which RAID type you use - i.e. RAID 5, or 6, or 10. I prefer RAID 10 since it's best for speed and the rebuilds are the quickest. But you loose half the space, i.e. 45TB drives will give you about 22TB space. 45x 2TB HDD's would give you about 44TB space though.
- Would CentOS be a good choice? I have never used CentOS on a device
so massive. Just ordinary servers, so to speak. I assume that it could handle so many drives, a large, expanding file system.
Yes it would be fine.
- Someone recommended ZFS but I dont recall that being available on
CentOS, but it is on FreeBSD which I have little experience with.
I would also prefer to use ZFS for this type of setup. use one 128GB SL
type SSD drive as a cache drive to speed up things and 2x log drives to help with drive recovery. With ZFS you would be able to use one large RAID array if you have the log drives since it was recover from driver failure much better than other file systems. Although you can install ZFS as user-land tools, which will be slower than running it via the kernel. But, it would be better to use Solaris or FreeBSD for this - look @ Nexenta / FreeNAS / OpenIndia for this.
- How would someone realistically back something like this up?
To another one as large :)
OR, more realistically, if you already have some backup servers, and the
full 45TB isn't full of data yet, then simply backup what you have. By the sounds of it your project is still new so your data won't be that much. I would simply build a gluster / CLVM cluster of smaller cheaper servers - which basically allows you to add say 4TB / 8TB (depending on what chassis you use and how many drives it can take) at a time to the backup cluster, which will be cheaper than buying another one identical to this right now.
Ultimately I know over time I need to distribute my architecture out
and have a number of web-servers, balancing, etc but to get started I think this device with good backups might fit the bill.
If this device will be used for web + mail + SQL, then you may probably
look at using 4 quad core CPU's + 128GB RAM. With this many drives (or rather, this much data) you'll probably run out of RAM / CPU / Network resources before you run out of HDD space.
With a device this big (in terms of storage) I would rather have 2
separate "processing" servers which just mounts LUN's from this POD (exported as NFS / iSCSI / FCoE / etc) and then have a few faster SAS / SSD drives for SQL / log processing.
I can be way more detailed if it helps, I just didn't want to clutter
with information that might not be relevant.
-- Jason
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
-- Kind Regards Rudi Ahlers SoftDux
Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532 _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Thanks Rudi, that helps as you have good luck with all of them. I see they have some boards that go up to 192gb (but not DDR3), but some do 144gb as well. I just need to find out if the POD supports extended ATX and I see others have just used regular ATX boards.
On 05/08/11 12:06 PM, Jason wrote:
Rudy,
Do you have a recommendation of a motherboard?
I am still reading the rest of your post. Thanks!
most any server board that supports dual intel xeon 5500/5600 will let you pretty easily add 24GB per CPU socket while using relatively affordable 4GB dimms.
http://www.supermicro.com/products/motherboard/QPI/5500/X8DA6.cfm?SAS=N or whatever
you might look at these chassis, which are, IMHO, better engineered than that backblaze thing http://www.supermicro.com/products/chassis/4U/847/SC847E16-R1400U.cfm
this supports 36 SAS/SATA drives in a 4U (24 in front, 12 in back) and has SAS2 backplane multiplexers so you don't need nearly as many SAS/SATA cards
-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100 - - +1 (443) 269-1555 x333 Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00.
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of John R Pierce Sent: Sunday, May 08, 2011 15:24 To: centos@centos.org Subject: Re: [CentOS] Building a Back Blaze style POD
On 05/08/11 12:06 PM, Jason wrote:
Rudy,
Do you have a recommendation of a motherboard?
I am still reading the rest of your post. Thanks!
most any server board that supports dual intel xeon 5500/5600 will let you pretty easily add 24GB per CPU socket while using relatively affordable 4GB dimms.
http://www.supermicro.com/products/motherboard/QPI/5500/X8DA6. cfm?SAS=N or whatever
you might look at these chassis, which are, IMHO, better engineered than that backblaze thing http://www.supermicro.com/products/chassis/4U/847/SC847E16-R1400U.cfm
If you can use less drives, this would be more cost effective (time building & time fixing)
http://www.newegg.com/Product/Product.aspx?Item=N82E16811219038 [400$]
And then if you wwant raid: http://www.newegg.com/Product/Product.aspx?Item=N82E16816118141 [1300$] or http://www.newegg.com/Product/Product.aspx?Item=N82E16816115095 [700$]
this supports 36 SAS/SATA drives in a 4U (24 in front, 12 in back) and has SAS2 backplane multiplexers so you don't need nearly as many SAS/SATA cards
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
-----Original Message----- From: Jason Pyeron Sent: Sunday, May 08, 2011 16:04 To: 'CentOS mailing list' Subject: Re: [CentOS] Building a Back Blaze style POD
-----Original Message----- From: John R Pierce Sent: Sunday, May 08, 2011 15:24 To: centos@centos.org Subject: Re: [CentOS] Building a Back Blaze style POD
On 05/08/11 12:06 PM, Jason wrote:
Rudy,
Do you have a recommendation of a motherboard?
I am still reading the rest of your post. Thanks!
most any server board that supports dual intel xeon
5500/5600 will let
you pretty easily add 24GB per CPU socket while using relatively affordable 4GB dimms.
http://www.supermicro.com/products/motherboard/QPI/5500/X8DA6. cfm?SAS=N or whatever
you might look at these chassis, which are, IMHO, better engineered than that backblaze thing
http://www.supermicro.com/products/chassis/4U/847/SC847E16-R1400U.cfm
And http://www.avsforum.com/avs-vb/showthread.php?t=1149005
If you can use less drives, this would be more cost effective (time building & time fixing)
http://www.newegg.com/Product/Product.aspx?Item=N82E16811219038 [400$]
And then if you wwant raid: http://www.newegg.com/Product/Product.aspx?Item=N82E1681611814
1 [1300$] or
http://www.newegg.com/Product/Product.aspx?Item=N82E16816115095 [700$]
this supports 36 SAS/SATA drives in a 4U (24 in front, 12 in back) and has SAS2 backplane multiplexers so you don't need
nearly as
many SAS/SATA cards
ps, I hate Outlook.
-- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- - - - Jason Pyeron PD Inc. http://www.pdinc.us - - Principal Consultant 10 West 24th Street #100 - - +1 (443) 269-1555 x333 Baltimore, Maryland 21218 - - - -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- This message is copyright PD Inc, subject to license 20080407P00.
On 05/08/11 1:03 PM, Jason Pyeron wrote:
If you can use less drives, this would be more cost effective (time building& time fixing)
http://www.newegg.com/Product/Product.aspx?Item=N82E16811219038 [400$]
multiple reports online indicate that norco case is very flimsy and poorly made.
On 05/08/11 1:11 PM, John R Pierce wrote:
On 05/08/11 1:03 PM, Jason Pyeron wrote:
If you can use less drives, this would be more cost effective (time building& time fixing)
http://www.newegg.com/Product/Product.aspx?Item=N82E16811219038 [400$]
multiple reports online indicate that norco case is very flimsy and poorly made.
ooops, hit send too fast.
also, that Norco case appears to require a seperate SATA channel for each of the 24 drives while the supermicro case has SAS2 multiplexed backplanes that will let you put 24 SATA drives on a single 4 channel SAS port, or 24 dual ported SAS drives on 2 4 channel SAS ports (using MPIO)... these backplanes have SES controllers on them for power and hotswap management (the SES functionality is integrated into the LSI SAS multiplexor chip used). note that SAS supports N:M multiplexing where any one of the N controller channels can address any of the M devices.... plain SATA only supports 1:M simple expanders
And, a significant problem in large drive arrays is mechanical resonance.... you get an array of 24 or whatever disks all being hammered at once in a RAID environment, and the mechanical vibrations can cause interactions which can increase the error rate, this is greatly compounded by a flimsy chassis.
On Sunday, May 08, 2011 04:23:23 PM John R Pierce wrote:
note that SAS supports N:M multiplexing where any one of the N controller channels can address any of the M devices.... plain SATA only supports 1:M simple expanders
Hmm, that explains how SAS can effectively replace fibre channel at the DAE..... (EMC has gone SAS on their newest midrange storage.....). For true fibre channel replacement you need dual-attach at the drive; N:M multiplex looks like dual-attach++, at least to my eye.
And, a significant problem in large drive arrays is mechanical resonance.... you get an array of 24 or whatever disks all being hammered at once in a RAID environment, and the mechanical vibrations can cause interactions which can increase the error rate, this is greatly compounded by a flimsy chassis.
I like the EMC DAE design; it is most definitely not flimsy. However, I had often wondered about some of the design features of the DAE chassis, and thinking about mechanical resonance makes some things 'click' in my mind that didn't before, things like the thick cast rack ears instead of extending the chassis sheet metal and folding an ear.....
And those EMC DAE's are 15 drive enclosures. I wonder how much the custom EMC drive firmware impacts mechanical resonance, especially on large RAID groups....
Hi John,
you might look at these chassis, which are, IMHO, better engineered than that backblaze thing http://www.supermicro.com/products/chassis/4U/847/SC847E16-R1400U.cfm
this supports 36 SAS/SATA drives in a 4U (24 in front, 12 in back) and has SAS2 backplane multiplexers so you don't need nearly as many SAS/SATA cards
The only thing that confuses me about chassis like these is I always miss something that I needed to order to complete the machine. They all come with different things. In the chassis you mention. I would obviously need to still need to buy a mobo, processor, RAM, RAID cards...
With cases like this I need to buy SATA cables too?
I can do a side by side comparison and see what works out to a reliable deal.
SuperMicro seems to be a great company (historically over the years), It seems the chassis is expensive for what you get? (About $1500). Again I should run the numbers for each idea and see how it shakes out. I can report back.
-Jason
On 05/09/11 8:10 AM, Jason wrote:
The only thing that confuses me about chassis like these is I always miss something that I needed to order to complete the machine. They all come with different things. In the chassis you mention. I would obviously need to still need to buy a mobo, processor, RAM, RAID cards...
I would use mdraid mirroring on a system like that. Cabling-wise, you would need a non-raid SAS card for the supermicro, one with 1-2 SFF-8087 4 channel SAS connectors, as the supermicro case has expander backplanes
If thats all too much, then buy a complete system from a VAR who integrates it for you, or something like...
http://www.sgi.com/products/storage/servers/iss3500.html (which sure looks like the exact same thing to me)