Hi All,
I started a 501c3 (not-for-profit) organization back in February 2011 to deal with information archival. A long vision here, I wont bore you with the details (if you really want to know, e-mail me privately) but the gist is I need to build an infrastructure to accommodate about 2PB of data that is database stuff, stored video, crawl data, static data sets, etc. Right now in my testing of the software I can easily bang down 300+gb a month of data. I have a Comcast business circuit and so far so good with them. I am investigating Sonic.net for a "Business T" solution as they call it.
As part of their deal, they want to lease me a "Managed Cisco Router". I know, i know which one? Well none of the Sales people know and they have to find out for me! They also told me that with this router, there is no reason to run my own dedicated firewall. Which I have been investigating recently as well. I do have cisco PIX experience and I am not sure how much of that translates to real world use now-a-days. I have not touched a PIX in 5 years.
So I am confused and I would appreciate some advice.
So this Cisco device they want to put in front of everything. I then wanted to run my own dedicated firewall (a custom build box probably thanks to John Pierce's advice about pfSense recently). Coming off that dedicated Firewall, I need a DMZ for web-serving, a private VLAN for database servers, etc and a private VLAN for my computers here that I use to do all the work behind the NPO.
Here is where I draw some confusion. Where do items such as Varnish Cache, HAProxy go in relationship to firewall, DMZ, etc?
HAProxy is a load-balancer, so It should do in front of web-servers so it can decide which web-server to send the traffic to?
Varnish Cache is all about caching commonly used resources so it seems that this has to go in front too?
Can this be the same box realistically? How does one spec this box out?
Database servers and storage servers would go on the private VLAN? I am building a box to store all the data (mysql, video, crawl data, static datasets) and I strongly think it might be a backBlaze POD running CentOS.
I know this is not the best list to ask these types of questions on, so if there is a better place besides ServerFault or SuperUser.com, I would appreciate knowing. I just find the folks here have so much knowledge besides CentOS.
I look at some of these organizations that talk about their infrastructure like WikiMedia Foundation, StackOverflow and I sort of really become quickly amazed that I could full the garage in my house with equipment easily and my wife wont like that!
-Jason
Am 26.01.2012 um 00:53 schrieb Jason T. Slack-Moehrle:
Hi All,
I started a 501c3 (not-for-profit) organization back in February 2011 to deal with information archival. A long vision here, I wont bore you with the details (if you really want to know, e-mail me privately) but the gist is I need to build an infrastructure to accommodate about 2PB of data
2PB? At home?
http://www.youtube.com/watch?v=Eu430bqbK5w
Rent a rack somewhere, or three. Unless nobody is retrieving the data and you are just archiving it.
Hi,
I started a 501c3 (not-for-profit) organization back in February 2011 to deal with information archival. A long vision here, I wont bore you with the details (if you really want to know, e-mail me privately) but the gist is I need to build an infrastructure to accommodate about 2PB of data
2PB? At home?
http://www.youtube.com/watch?v=Eu430bqbK5w
Rent a rack somewhere, or three. Unless nobody is retrieving the data and you are just archiving it.
Well, people will be retrieving the data, analyzing it, etc. Plus we allow them to conduct crawls for potential relevant data as well.
At home, sort of, my garage. We really have no need for office space with everything that can be done remote now-a-days.
I really dont want to rent a rack someplace as the cost is way up there and if I do it myself I always know the status of the equipment, quality of hardware, etc. I also get the burden of every problem too, I realize. Yes, I am thinking about dedicated power, backup generators, cooling, etc.
Thanks for the YouTube link, looks helpful.
-Jason
Hi,
On 01/25/2012 11:53 PM, Jason T. Slack-Moehrle wrote:
Hi All,
I started a 501c3 (not-for-profit) organization back in February 2011 to deal with information archival. A long vision here, I wont bore you with the details (if you really want to know, e-mail me privately) but the gist is I need to build an infrastructure to accommodate about 2PB of data that is database stuff, stored video, crawl data, static data sets, etc. Right now in my testing of the software I can easily bang down 300+gb a month of data.
300gb a month is barely 2mbps...
2PiB is a whole different ballgame.
Most of what how you setup, network, maintain and then grow/manage into the future will depend on what you want to do with the data, how you want to expose it to the user and how much money you want to throw at the issues.
Even using the most commodity of hardware, with 95 percentile psu's - your garage is unlikely to have enough electricity to power a 2PiB store. Or cool it.
can you explain to the calculation to determine that 300gb is 2mbps?
What it is 300gb a day? Comcast has told me in the last two days I went through 127gb
On 01/25/12 4:27 PM, Jason T. Slack-Moehrle wrote:
can you explain to the calculation to determine that 300gb is 2mbps?
300GB (big B for byte) / 30 days / 24 hours/day / 3600 seconds/hour, and I get 0.12MB/second, so multiplying by 10 to get bits allowing for basic protocol overhead, I come up with 1.2Mbit/sec sustained average.
racking 2 PiB (or 2048TiB) of nearline grade storage will require about 1000 3.5" 3TB drives, allowing for a reasonable raid level and suitable number of hotspares. If its frequently updated transactional database storage, I'd want to use raid10. Using somethign like the Supermicro 847 chassis, you can get 36 drives plus a server in 4U, and draw about 700 watts actual in use.... I estimate you'll want about 28 of these servers, which will take two full racks and draw about 20KW, or 180 amps off 120V household circuits (realistically, you'll need 208V for this many servers). You'll also need about 10-15KW worth of air conditioning equipment to deal with the generated 68000 BTUs of heat. HVAC will push your power usage up to the 30-40kW range, or 2500 KWH/month, at $0.20/KWH typical residential power usage, you're looking at a $5000/month power bill, give or take.
those 28 SuperMicro servers will cost about $200,000 for the 1000 3TB enterprise nearline disks, plus another $200,000 or so for reasonably well configured servers. 20KVA of redundant UPS and 70000 BTU worth of computer room A/C will add a good chunk more $$$$ to this.
are you serious?
On Jan 25, 2012, at 4:50 PM, John R Pierce wrote:
On 01/25/12 4:27 PM, Jason T. Slack-Moehrle wrote:
can you explain to the calculation to determine that 300gb is 2mbps?
300GB (big B for byte) / 30 days / 24 hours/day / 3600 seconds/hour, and I get 0.12MB/second, so multiplying by 10 to get bits allowing for basic protocol overhead, I come up with 1.2Mbit/sec sustained average.
racking 2 PiB (or 2048TiB) of nearline grade storage will require about 1000 3.5" 3TB drives, allowing for a reasonable raid level and suitable number of hotspares. If its frequently updated transactional database storage, I'd want to use raid10. Using somethign like the Supermicro 847 chassis, you can get 36 drives plus a server in 4U, and draw about 700 watts actual in use.... I estimate you'll want about 28 of these servers, which will take two full racks and draw about 20KW, or 180 amps off 120V household circuits (realistically, you'll need 208V for this many servers). You'll also need about 10-15KW worth of air conditioning equipment to deal with the generated 68000 BTUs of heat. HVAC will push your power usage up to the 30-40kW range, or 2500 KWH/month, at $0.20/KWH typical residential power usage, you're looking at a $5000/month power bill, give or take.
those 28 SuperMicro servers will cost about $200,000 for the 1000 3TB enterprise nearline disks, plus another $200,000 or so for reasonably well configured servers. 20KVA of redundant UPS and 70000 BTU worth of computer room A/C will add a good chunk more $$$$ to this.
are you serious?
Nice analysis.
Yea the heat footprint alone will require some good AC. I'm open minded and am intrigued on who this will be pulled off but still, sounds crazy and not too well thought out.
I do like the not for profit spin which helps the cause out.
A quick search found this;
http://bioteam.net/2011/08/why-you-should-never-build-a-backblaze-pod/
Basically its a sort of why not to use a backBlaze but we sort did ...
- aurf
I will read this tonight.
I have a meeting with Drobo tomorrow and I think this is the same article on of their guys sent me.
a major stumbling block is that the disk drive industry still hasn't recovererd fully from the Thai floods, and 3tB nearline server grade drives are on allocation and demanding high prices. you might find you can't just order 1000 of these from Newegg or whatever, they don't have them. using 2TB drives, and you'd need 1500, whcih means another rack and that much more power and HVAC
racking 2 PiB (or 2048TiB) of nearline grade storage will require about 1000 3.5" 3TB drives, allowing for a reasonable raid level and suitable number of hotspares. If its frequently updated transactional database storage, I'd want to use raid10. Using somethign like the Supermicro 847 chassis, you can get 36 drives plus a server in 4U, and draw about 700 watts actual in use.... I estimate you'll want about 28 of these servers, which will take two full racks and draw about 20KW, or 180 amps off 120V household circuits (realistically, you'll need 208V for this many servers). You'll also need about 10-15KW worth of air conditioning equipment to deal with the generated 68000 BTUs of heat. HVAC will push your power usage up to the 30-40kW range, or 2500 KWH/month, at $0.20/KWH typical residential power usage, you're looking at a $5000/month power bill, give or take.
those 28 SuperMicro servers will cost about $200,000 for the 1000 3TB enterprise nearline disks, plus another $200,000 or so for reasonably well configured servers. 20KVA of redundant UPS and 70000 BTU worth of computer room A/C will add a good chunk more $$$$ to this.
Hi John,
Yes, our (meaning yours and mine) calculations are different and I am probably wrong.
I think I am drawn to the BackBlaze POD for reasons like this. 135TiB in a single enclosure and that is not even using 4TB drives.
2PiB is an estimate for the next 2 years, currently there is a little bit over 480TiB according to adding up various math calculations (databases, du, app data, static files, etc)
I see your calculations about power and yes, currently power for just the UPS, Computers, fans and stuff runs me about $400 a month and I only have a few boxes handling the demo of the product. This will be my first summer in my new location in Cupertino and I know I will need to act on cooling really soon.
Are you using Comcast in Santa Cruz?
-Jason
On 01/25/12 6:23 PM, Jason T. Slack-Moehrle wrote:
Are you using Comcast in Santa Cruz?
absolutely not.the local cable system blows. my home is on a sonic.net ADSL circuit resold by another ISP. television is on satellite.
in my personal opinion, the backblaze is a little too funky.. $job has used the Sun X4500 'thumpers' but those are being obsoleted by Oracle with no followon product, I've evaluated a supermicro 847 box and was
I would not put this kind of storage that you're talking about in any sort of residential environment. I'd be far more likely to rent rack space (and connectivity) from a local nethaus. we have some folks downtown who have rental rack space and office spaces in the same building, with gigabit links, very attractive.
Hi John,
Are you using Comcast in Santa Cruz?
absolutely not.the local cable system blows. my home is on a sonic.net (http://sonic.net) ADSL circuit resold by another ISP. television is on satellite.
I am looking at Sonic.net and I am awaiting a call from a sales rep (had been 2 days)
They are offering a "Business T" for $308 per month and I also see they have the bonded
They advertise the starting "Business T" at 1.5Mbps per second
They advertise the ADSL2+ 2 lines at up to 40Mbps per second.
Am I mis-understanding that the cost for a T seems high, but a better option for me than getting their ADSL2+ service? I mean, is the "T" faster over all given it is all my traffic and I am not sharing?
Can you explain a bit so I can develop a better understanding of how they advertise speeds, etc?
-Jason
On 01/26/2012 05:09 PM, Jason T. Slack-Moehrle wrote:
Can you explain a bit so I can develop a better understanding of how they advertise speeds, etc?
have you considered taking your questions to the lopsa lists ? That would be far more topical ( or even to a local LUG list ) than the CentOS lists.
Hi Karanbir,
Can you explain a bit so I can develop a better understanding of how they advertise speeds, etc?
have you considered taking your questions to the lopsa lists ? That would be far more topical ( or even to a local LUG list ) than the CentOS lists.
I have no idea what lopsa is, so let me look it up. I know my questions are surely off topic here.
-Jason
On 01/26/2012 09:09 AM, Jason T. Slack-Moehrle wrote:
They advertise the starting "Business T" at 1.5Mbps per second They advertise the ADSL2+ 2 lines at up to 40Mbps per second. Am I mis-understanding that the cost for a T seems high, but a better option for me than getting their ADSL2+ service? I mean, is the "T" faster over all given it is all my traffic and I am not sharing? Can you explain a bit so I can develop a better understanding of how they advertise speeds, etc?
Yes, the cost for a T1 will seem very high. It is antiquated telco tech. T1s are generally very reliable, but very very slow.
1.5Mbps is not faster than 40Mbps. There's nothing hidden in the way they advertise speeds.
DSL and DOCSIS technologies have advanced and matured over the last couple of decades. T1 has not. A T1 connection is the same now as it has always been.
Hi Gordon.
They advertise the starting "Business T" at 1.5Mbps per second They advertise the ADSL2+ 2 lines at up to 40Mbps per second. Am I mis-understanding that the cost for a T seems high, but a better option for me than getting their ADSL2+ service? I mean, is the "T" faster over all given it is all my traffic and I am not sharing? Can you explain a bit so I can develop a better understanding of how they advertise speeds, etc?
Yes, the cost for a T1 will seem very high. It is antiquated telco tech. T1s are generally very reliable, but very very slow.
1.5Mbps is not faster than 40Mbps. There's nothing hidden in the way they advertise speeds.
DSL and DOCSIS technologies have advanced and matured over the last couple of decades. T1 has not. A T1 connection is the same now as it has always been.
Your timing is perfect with this reply. I was just on the phone with Sonic.net and the rep told me that the T1 was better due to it being all my traffic and much more reliable.
They told me that most companies buying internet for hosting their infrastructure internally are not happy with 40Mbps.
With Comcast we currently have a 20 x 5 and they are offering us a 50 x 10 circuit for $123/month.
-Jason
option for me than getting their ADSL2+ service? I mean, is the "T" faster over all given it is all my traffic and I am not sharing? Can you explain a bit so I can develop a better understanding of how they advertise speeds, etc?
Yes, the cost for a T1 will seem very high. It is antiquated telco tech. T1s are generally very reliable, but very very slow.
1.5Mbps is not faster than 40Mbps. There's nothing hidden in the way they advertise speeds.
DSL and DOCSIS technologies have advanced and matured over the last couple of decades. T1 has not. A T1 connection is the same now as it has always been.
Not so much haven't matured but are capable of some other technologies besides internet access that the local CO could setup, like channelizing and different types of signaling, not to mention a dedicated circuit to the CO.
I might compare "SLA" of the two. Might find a drastic difference.
On 01/26/2012 03:57 PM, Ken godee wrote:
Not so much haven't matured but are capable of some other technologies besides internet access that the local CO could setup, like channelizing and different types of signaling, not to mention a dedicated circuit to the CO.
...which they always have been. T1 *is* a mature technology, but it hasn't improved the way that DSL and DOCSIS have.
I might compare "SLA" of the two. Might find a drastic difference.
You might, but you can't count on it. SLAs tend to vary more by vendor than they do by wiring technology. In many areas, I can get Ethernet over copper for less money than a T1, with a greater connection speed and an equivalent SLA.
On 01/26/12 3:43 PM, Gordon Messmer wrote:
Yes, the cost for a T1 will seem very high. It is antiquated telco tech. T1s are generally very reliable, but very very slow.
1.5Mbps is not faster than 40Mbps. There's nothing hidden in the way they advertise speeds.
DSL and DOCSIS technologies have advanced and matured over the last couple of decades. T1 has not. A T1 connection is the same now as it has always been.
a modern T1 (aka DS0) is likely delivered to the end premises over HDSL using 2 pairs. while its slower than those consumer oriented technologies you mention, its far more reliable and has a guaranteed SLA (service level agreement) you won't get from DOCsis (cable) or end user ADSL, and tends to have very deterministic latencies...
On Jan 26, 2012, at 4:59 PM, John R Pierce wrote:
On 01/26/12 3:43 PM, Gordon Messmer wrote:
Yes, the cost for a T1 will seem very high. It is antiquated telco tech. T1s are generally very reliable, but very very slow.
1.5Mbps is not faster than 40Mbps. There's nothing hidden in the way they advertise speeds.
DSL and DOCSIS technologies have advanced and matured over the last couple of decades. T1 has not. A T1 connection is the same now as it has always been.
a modern T1 (aka DS0) is likely delivered to the end premises over HDSL using 2 pairs. while its slower than those consumer oriented technologies you mention, its far more reliable and has a guaranteed SLA (service level agreement) you won't get from DOCsis (cable) or end user ADSL, and tends to have very deterministic latencies...
Wow, that's just ... wrong.
There's nothing to "mature" in a T1. It's a telco transport standard that is well-known, and utilized everywhere as part of the Bell System standards for multiplexing and demultiplexing from smaller circuits to larger and back down. Ratified by the ITU for decades.
"T1" is a channelized synchronous telecommunications circuit type first designed in the late 60s, updated in the 70s. After removing framing bits, 1.544 Mb/s.
"DS0" is a sub-channel of a T1 when broken up into frames. Extended SuperFrame being the typical method these days. 24 of them at 64K per channel.
"HDSL" is a completely different technology than T1.
"DOCSIS" is the name of the standard utilized to deliver data services over a Cable Modem.
"ADSL" is a single-pair high speed connection that's very distance limited from the origination point.
"SLA" is a Service Level AGREEMENT. The key word being AGREEMENT. Your businesspeople are free to negotiate with any provider of ANY of the above technologies for anything they're willing to pay for. TYPICAL SLA's might be as stated above, but it's a contract... negotiate whatever you like.
What you might want SLA's on when ordering IP bandwidth:
- Maximum CONTINUOUS data rate upstream AND downstream simultaneously, and what thresholds are considered an OUTAGE on the SLA even if traffic is still flowing.
- Latency from your end of the circuit to a known point will never EXCEED "X" amount or it will be considered an outage under your SLA.
- Whether or not an UPSTREAM routing outage will be considered an SLA OUTAGE by your local carrier/ISP in terms of your bill. (In other words, how many backbone connections do they have and can they route around a problem, or are you stuck waiting for their one piddly edge router to be fixed in the case of fly-by-night providers.)
- In the case of a cable cut, are trucks rolled 24/7, or only during business hours?
Etc etc etc... there's more. Read up.
SLA's are themselves a playground for lawyers and businesspeople to dicker over.
Now the real world:
- Any company relying on a single IP connection via a single route... is so far down the food chain they're not going to get service during a larger scale outage anyway.
And... remember...
- An SLA just gives you a refund of your money for the outage. It doesn't keep you in business if the service provider doesn't keep their side of the bargain.
- If you have something that must be connected to the Internet 24/7 or you're out of business... buy more than one connection. An SLA won't matter at all when the backhoe cuts the only path out of your building.
- Or... host it in a data center that has far more than one backbone connection via more than one physical route.
Let's not mix all the technical details up with the business ones. That posting was the most misleading post I've read in quite a while, and shows a lot of the misconceptions out there.
*** ANY of the above technologies can deliver a certain number of bits, at a certain latency, a certain direction, across a certain type of physical media, to some network at the other end. ***
Whether that upstream provider has oversubscribed upstream connectivity, has latency issues, doesn't respond to fix their circuits in the middle of the night, pays you back for outages, scratches your back at the beach after signing that multi-million dollar bandwidth contract with giant SLA attached large enough to fund their entire fleet of trucks for a year...
That's all up to the contract...
On 01/26/12 4:32 PM, Nate Duehr wrote:
"T1" is a channelized synchronous telecommunications circuit type first designed in the late 60s, updated in the 70s. After removing framing bits, 1.544 Mb/s.
"DS0" is a sub-channel of a T1 when broken up into frames. Extended SuperFrame being the typical method these days. 24 of them at 64K per channel.
eek, I meant to say DS1 not DS0.
Quite often these days, what people refer to as a T1 is in fact a DS1 delivered over HDSL. For all practical purposes, except the electrical signalling on the copper, the two services are equivalent, same speed, same framing. HDSL is self tuning, while classic T1 required the NIUs at each end to be tuned for the circuit, also HDSL requires fewer repeaters for longer distance circuits.
On 01/26/2012 03:43 PM, Gordon Messmer wrote:
On 01/26/2012 09:09 AM, Jason T. Slack-Moehrle wrote:
They advertise the starting "Business T" at 1.5Mbps per second They advertise the ADSL2+ 2 lines at up to 40Mbps per second. Am I mis-understanding that the cost for a T seems high, but a better option for me than getting their ADSL2+ service? I mean, is the "T" faster over all given it is all my traffic and I am not sharing? Can you explain a bit so I can develop a better understanding of how they advertise speeds, etc?
Yes, the cost for a T1 will seem very high. It is antiquated telco tech. T1s are generally very reliable, but very very slow.
Yes they are indeed slow and reliable. That said, on the rare occasion they do go out, they get repaired quickly. This may not not be true in you case, but usually T1 lines are tariffed with guaranteed uptimes if you ask the right questions and read the fine print.
I have had clients on DSL be down for a few days while the telco got a round tuit.
There are two reasons T1 is more expensive. T1 requires 2 copper pairs in the cable. Those 2 pairs not available for voice traffic. The other reason is the uptime requirements.
DSL, while faster, does not preclude using the pair for voice traffic, uses a single copper pair and has no uptime commitments.
If T1 can meet your bandwidth needs and budget constraints, it could still be your best solution. The good news is, you get to decide.
On Thu, Jan 26, 2012, Raymond Lillard wrote:
On 01/26/2012 03:43 PM, Gordon Messmer wrote:
On 01/26/2012 09:09 AM, Jason T. Slack-Moehrle wrote:
They advertise the starting "Business T" at 1.5Mbps per second They advertise the ADSL2+ 2 lines at up to 40Mbps per second. Am I mis-understanding that the cost for a T seems high, but a better option for me than getting their ADSL2+ service? I mean, is the "T" faster over all given it is all my traffic and I am not sharing? Can you explain a bit so I can develop a better understanding of how they advertise speeds, etc?
Yes, the cost for a T1 will seem very high. It is antiquated telco tech. T1s are generally very reliable, but very very slow.
Slow is relative. Our T1 is infinitly faster than a cable or DSL circuit when the power is out, which happens quite frequently here. Every time the Comcast/Xfinity folks come around trying to sell their services I note that when we had a week-long outage about 14 months ago, our generator kept the computers going, and USWorst's T1 never faltered. Comcast was down for that week, and another after the power came back up.
Yes they are indeed slow and reliable. That said, on the rare occasion they do go out, they get repaired quickly. This may not not be true in you case, but usually T1 lines are tariffed with guaranteed uptimes if you ask the right questions and read the fine print.
We are a bit more the 20,000 feet from the local CO, and have had a couple of occassions in the last 13 years where they have replaced the entire circuit when having problems with repeaters and such. For a while there were incidents where a telco tech buggered our T1 while trying to grab pairs in a terminal block for voice lines.
I have had clients on DSL be down for a few days while the telco got a round tuit.
Same here, even in commercial areas of Seattle where one would expect the infrastructure to be solid.
There are two reasons T1 is more expensive. T1 requires 2 copper pairs in the cable. Those 2 pairs not available for voice traffic. The other reason is the uptime requirements.
DSL, while faster, does not preclude using the pair for voice traffic, uses a single copper pair and has no uptime commitments.
You can also share voice and data over a single T1. We have a couple of voice lines on our T1 which are split out with an Adtran channel bank that our provider supplies. I like this as it replaced the old Linux box we had with a (expensive) Sangoma card connecting to the T1.
Another option which someone else mentioned is direct ethernet connections. We have a client in an industrial area of South Seattle that got that recently, and has been quite happy with it.
Bill
On 01/27/2012 12:14 AM, Raymond Lillard wrote:
There are two reasons T1 is more expensive. T1 requires 2 copper pairs in the cable. Those 2 pairs not available for voice traffic. The other reason is the uptime requirements.
the DS<X> or the old T<X> standard does not still deliver better performance, reliability or maintainability/manageability than a network cable plugged into the rack switch in a hosting facility. Also, with these sort of telco links in ( not consumer grade adsl etc ) one needs fairly expensive and hard to manage LTE kit. Then there is the p-o-f issues.
In this day and age, building out a facility inhouse only makes sense if you have hundreds or more servers or need a facility that has no internet links ( very very few of those these days ) or you are a hosting company looking to build 30,000 odd sqft space to then lease out and are going to plumb in multi fibre inbound.
Think about it - do you even want Inergen Canisters floating around a residential facility ? Halon 1301 ? Or are you going to put in a sprinker system and hope the insurance man does not test-run it and kill every bit of electric equipment you have on site.
Just my 2bits.
On Thursday, January 26, 2012 06:43:55 PM Gordon Messmer wrote:
1.5Mbps is not faster than 40Mbps. There's nothing hidden in the way they advertise speeds.
Speed != bandwidth.
That '40Mb/s' connection is surely massively oversubscribed, whereas the 1.5Mb/s DS1 won't be (the tariff here states clearly that a DS1 data connection cannot be oversubscribed).
This infrastructure thread is pretty amusing.... I especially enjoyed the 30,000 square feet number Karanbir quoted, since that's exactly how much aggregate raised floor space I have on-campus.....and it reminded me of the day I was asked about providing a 1PB array for a user.... who had no clue how much such a thing would cost, how much room it would occupy, how much power it would use, and how much it would weigh. He chose instead to use rotated LTO-3 tapes in multiple changers, only keeping the 'interesting' data he generated. As it happened, his project in its lifetime did generate close to a PB of data at 2-4TB per day, IIRC (but it has been a few years).
On Jan 25, 2012, at 3:53 PM, Jason T. Slack-Moehrle wrote:
Hi All,
I started a 501c3 (not-for-profit) organization back in February 2011 to deal with information archival. Database servers and storage servers would go on the private VLAN? I am building a box to store all the data (mysql, video, crawl data, static datasets) and I strongly think it might be a backBlaze POD running CentOS.
Hi Jason,
Not to be one of those guys who answers a question with a question, but... why backBlaze for archival?
Are you building in some safe guards/redundancy not found in the current backBlaze implenetation?
Just curious, not a challenge or anything.
- aurf
Hi Aurf,
I am seeing a lot of solutions that are not all perfect and just insanely expensive. BackBlaze seems like a pretty decent solution, I have control of all hardware and software to do with as I please.
If you have ideas, please talk to me about them!
-Jason
On Thu, Jan 26, 2012 at 12:53 AM, Jason T. Slack-Moehrle < slackmoehrle@gmail.com> wrote:
HAProxy is a load-balancer, so It should do in front of web-servers so it can decide which web-server to send the traffic to?
Varnish Cache is all about caching commonly used resources so it seems that this has to go in front too?
Can this be the same box realistically? How does one spec this box out?
Varnish will do the load-balancing for you as well. What you need to figure out is the failover scenario fron one varnish to another - IF you really need more than 99.9 percent uptime.
A varnish machine should have LOTS of memory and a fair bit of fast disk with a BIG swapfile on it. Basically varnish treats the entire virtual memory space as its cache storage and let's vfs worry about what should be in memory and what can be swapped out.
BR Bent
From: Jason T. Slack-Moehrle slackmoehrle@gmail.com
Here is where I draw some confusion. Where do items such as Varnish Cache, HAProxy go in relationship to firewall, DMZ, etc?
Here, we use 2 keepalived/lvs servers in direct routing for HA, then n cache servers with nginx (for consistent hashing + some basic http/php) and, behind, varnish (or squid, not decided yet... varnish memory/disk handling seems "cleaner" and a little bit faster, but on the other hand squid cache will survive a restart (maybe varnish new version 3.x implemented it, not sure)).
JD