Hi All,
I started a 501c3 (not-for-profit) organization back in February 2011 to deal with information archival. A long vision here, I wont bore you with the details (if you really want to know, e-mail me privately) but the gist is I need to build an infrastructure to accommodate about 2PB of data that is database stuff, stored video, crawl data, static data sets, etc. Right now in my testing of the software I can easily bang down 300+gb a month of data. I have a Comcast business circuit and so far so good with them. I am investigating Sonic.net for a "Business T" solution as they call it.
As part of their deal, they want to lease me a "Managed Cisco Router". I know, i know which one? Well none of the Sales people know and they have to find out for me! They also told me that with this router, there is no reason to run my own dedicated firewall. Which I have been investigating recently as well. I do have cisco PIX experience and I am not sure how much of that translates to real world use now-a-days. I have not touched a PIX in 5 years.
So I am confused and I would appreciate some advice.
So this Cisco device they want to put in front of everything. I then wanted to run my own dedicated firewall (a custom build box probably thanks to John Pierce's advice about pfSense recently). Coming off that dedicated Firewall, I need a DMZ for web-serving, a private VLAN for database servers, etc and a private VLAN for my computers here that I use to do all the work behind the NPO.
Here is where I draw some confusion. Where do items such as Varnish Cache, HAProxy go in relationship to firewall, DMZ, etc?
HAProxy is a load-balancer, so It should do in front of web-servers so it can decide which web-server to send the traffic to?
Varnish Cache is all about caching commonly used resources so it seems that this has to go in front too?
Can this be the same box realistically? How does one spec this box out?
Database servers and storage servers would go on the private VLAN? I am building a box to store all the data (mysql, video, crawl data, static datasets) and I strongly think it might be a backBlaze POD running CentOS.
I know this is not the best list to ask these types of questions on, so if there is a better place besides ServerFault or SuperUser.com, I would appreciate knowing. I just find the folks here have so much knowledge besides CentOS.
I look at some of these organizations that talk about their infrastructure like WikiMedia Foundation, StackOverflow and I sort of really become quickly amazed that I could full the garage in my house with equipment easily and my wife wont like that!
-Jason