[CentOS] OT: CentOS server with 2 GbE links to 2 GbE switches

Fri Aug 26 17:23:54 UTC 2005
Patrick <centos at puzzled.xs4all.nl>

On Fri, 2005-08-26 at 12:04 -0400, Scot L. Harris wrote:
> On Fri, 2005-08-26 at 09:36, Patrick wrote:
> > Hi all,
> > 
> > I am trying to come up with an architecture that has some redundancy.
> > The idea is to hook up the two GbE LAN interfaces of a CentOS server to
> > two Gigabit Ethernet switches. In case one switch goes down, there is a
> > redundant path (the server is redundant too). Here is the idea:
> > 
> >                              -----------
> >                             |    GbE    |   
> >  PCs            ------------|   switch  |------------
> >   |            |             -----------             |
> >   |   -----------------                        -----------------
> >   ---| Workgoup Switch |                      | CentOS/Asterisk |
> >   |   -----------------                        -----------------
> >   |            |             -----------             |
> >  VoIP           ------------|    GbE    |------------
> > Phones                      |   switch  |   
> >                              -----------
> > 
> > How would I acomplish this? Can I use IP addresses from one IP network
> > (say 10.0.0.0/24) to assign to the 2 LAN ports on the CentOS server and
> > a port on each of the GbE switches and then use something like OSPF on
> > the switches and the CentOS box to do the routing? Any other ideas?
> > 
> > Many thanks for your suggestions.
> 
> The setup you describe has several single points of failure.  Are the
> GbE switches you are using that fragile and likely to fail?
> 
> 
> 
> In the network you describe above the workgroup switch and the Asterisk
> box are single points of failure.  If you want a redundant system then
> you need to eliminate the single points of failure.  You may want to
> look at using HSRP or VRRP (HSRP is Cisco specific, VRRP is more
> generic) for HA type network solutions.

Yes the workgroup switch is a SPoF but there are cold spares in case the
active one blows up. And there is room to add a 2nd workgroup switch and
use HSRP to cover that SPoF. Also, there is a second active Asterisk box
but for simplicity I left it out of the picture so that's not a SPoF.

> For the server you will need to look at cluster solutions.

Afaik VoIP servers can not be clustered. The reason being (I think):
Once a call is active it has a certain path & interaction with opened
UDP/RTP ports and with Asterisk on one or more boxes. If that box goes
down the call can not be rerouted realtime through another Asterisk box
in the cluster because the 2nd Asterisk box did not know the call
existed in the first place (the SIP call setup part is missing, RTP
ports are closed etc.). I'd love to hear the opposite is true (and some
pointers how to do this :) It might be possible that the Asterisk
Realtime Architecture (ARA) can do something to solve this but I would
need to investigate if that's the case.

> As others have mentioned you can try bonding the interfaces on the
> server to provide higher bandwidth but I believe you need to have a
> switch that understands bonding as well.

Totally agree. I think they use Cisco kit so I guess it would be a
3560G-24TS which is a relatively new model with current IOS.

> When designing for redundancy and high availability start by identifying
> the critical parts of your infrastructure and determine the type of
> disasters you want to protect against as well as the likely hood of such
> a disaster.  Many things while possible are unlikely or have little or
> now impact.  Concentrate on those things that are likely to happen and
> have major impact to your systems.

Sure. Powersupplies, fans and harddisks will all fail at some point and
must be available 1+1 and be hot swappable. Then there are cables,
ethernet ports and Gbics in core switches that can fail so must also be
available in a redundant fashion. Telco Interface cards (E1/PRI), can
also fail so must also be available in a redundant fashion and there is
off course also room in the rack for a few E1/PRI failover switches.
On a software level everything is redundant (dns, smtp, www, ntp,
syslog, asterisk, postgresql etc.). Afaict these are the things that are
likely to happen or if they happen there better be redundancy or some
critical services go down.

> And remember that adding more hardware or making your network more
> complex can sometimes increase the likely hood of having a failure cause
> service interruptions.

Agree but sometimes the application requires you to go a long way.

> Depending on the costs of taking an outage you may be better off having
> a cold spare handy to replace the switch or device that fails.

The organization has simply decided there shall not be an outage of the
service (which means indivual parts can blow up as long as the service
remains up) so the cost of adding redundancy till you drop is not an
issue. Obviously, next to the active redundancy, we could always add a
few cold spares :)

Thanks for your comments and suggestions.

Regards,
Patrick