Load balancing...

List overview All Threads
Download

newer

older

Internet connection sharing?

Re: [CentOS] Internet connection...

Todd

3 Mar 2011 3 Mar '11

11:43 p.m.

Hi All,

Can anyone help me hash out how best to load balance a website that is getting considerable traffic? In the past I only have experience with BigIP where you have a load balancing device that keeps track and send traffic to the best server possible at the time. This was a proprietary system that I think was something Dell rebranded.

Right now, the whole site is is 400gb of video, HTML5, Apache, PHP, MySQL, runs on a single box with 16gb of RAM and mirrored /var/www/html (2x1tb raid level drives). I have a Comcast 50/10 connection, 5 statics and I am seeing about 125 unique visitors a day. The site runs fine, but in anticipation of more traffic as well as a learning experience I would like to load balance.

Obviously I need a second server just like the one it is running on now. I will probably spec something out that is capable of 32gb of RAM.

What about a dedicated load balancing device? What specs should this be? How much RAM, HD, processor? It is sufficient to buy something with a GB NIC and say 4gb of RAM? Can one go slower but more RAM, small HD? I don't really quite know how intensive a task this decision making process is for the load balancer..

Right now, as example, I have an Untangle Firewall and it runs on a old AMD with 2gb RAM, GB NIC and it seems to do just fine.

My local computer store has several P4 2.8ghz with 2GB of RAM for like $99....

Can anyone enlighten me on specs, proper setup, caveats....?

-Jason

Attachments:

attachment.html (text/html — 1.6 KB)

Show replies by date

aurfalien＠gmail.com

3 Mar 3 Mar

11:51 p.m.

On Mar 3, 2011, at 3:43 PM, Todd wrote:

...

Hi All,

Can anyone help me hash out how best to load balance a website that is getting considerable traffic? In the past I only have experience with BigIP where you have a load balancing device that keeps track and send traffic to the best server possible at the time. This was a proprietary system that I think was something Dell rebranded.

Right now, the whole site is is 400gb of video, HTML5, Apache, PHP, MySQL, runs on a single box with 16gb of RAM and mirrored /var/www/ html (2x1tb raid level drives). I have a Comcast 50/10 connection, 5 statics and I am seeing about 125 unique visitors a day. The site runs fine, but in anticipation of more traffic as well as a learning experience I would like to load balance.

Obviously I need a second server just like the one it is running on now. I will probably spec something out that is capable of 32gb of RAM.

What about a dedicated load balancing device? What specs should this be? How much RAM, HD, processor? It is sufficient to buy something with a GB NIC and say 4gb of RAM? Can one go slower but more RAM, small HD? I don't really quite know how intensive a task this decision making process is for the load balancer..

Right now, as example, I have an Untangle Firewall and it runs on a old AMD with 2gb RAM, GB NIC and it seems to do just fine.

My local computer store has several P4 2.8ghz with 2GB of RAM for like $99....

Can anyone enlighten me on specs, proper setup, caveats....?

Well a bit outside what I know which isn't much, but...

What about external DNS provider with round robin DNS?

Or if you have control over your DNS, then you can easily do round robin.

Qucik and ez faq on round robin;

http://www.zytrax.com/books/dns/ch9/rr.html

Hope this helps.

I do this for my mail servers.

- aurf

Sean Hart

4 Mar 4 Mar

12:05 a.m.

On 3/3/11 3:51 PM, aurfalien@gmail.com wrote:

...

On Mar 3, 2011, at 3:43 PM, Todd wrote:

...
Hi All,

Can anyone help me hash out how best to load balance a website that is getting considerable traffic? In the past I only have experience with BigIP where you have a load balancing device that keeps track and send traffic to the best server possible at the time. This was a proprietary system that I think was something Dell rebranded.

Right now, the whole site is is 400gb of video, HTML5, Apache, PHP, MySQL, runs on a single box with 16gb of RAM and mirrored /var/www/ html (2x1tb raid level drives). I have a Comcast 50/10 connection, 5 statics and I am seeing about 125 unique visitors a day. The site runs fine, but in anticipation of more traffic as well as a learning experience I would like to load balance.

Obviously I need a second server just like the one it is running on now. I will probably spec something out that is capable of 32gb of RAM.

What about a dedicated load balancing device? What specs should this be? How much RAM, HD, processor? It is sufficient to buy something with a GB NIC and say 4gb of RAM? Can one go slower but more RAM, small HD? I don't really quite know how intensive a task this decision making process is for the load balancer..

Right now, as example, I have an Untangle Firewall and it runs on a old AMD with 2gb RAM, GB NIC and it seems to do just fine.

My local computer store has several P4 2.8ghz with 2GB of RAM for like $99....

Can anyone enlighten me on specs, proper setup, caveats....?

Well a bit outside what I know which isn't much, but...

What about external DNS provider with round robin DNS?

Or if you have control over your DNS, then you can easily do round robin.

Qucik and ez faq on round robin;

http://www.zytrax.com/books/dns/ch9/rr.html

Hope this helps.

I do this for my mail servers.

aurf

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Hello,

Building a high throughput, highly available site is a tough job, and there's a reason good sysadmins get paid what they do. But to give you some direction on Load Balancers.

BigIP (Made by f5) is the hands down leader of the Load Balancer world. You will pay dearly for it (20K each, min), but depending on your needs, may very well be the best choice for you. http://www.google.com/url?sa=t&source=web&cd=1&sqi=2&ved=0CC...

Zeus also makes a decent product, made to run as software. The software will run you ~9K I think, but is pretty feature rich. Requires hardware to go with it. http://www.zeus.com/products/load-balancer/

IPVS or LVS can work as a really simple/free solution: http://www.linuxvirtualserver.org/software/ipvs.html

Round robin DNS would balance load, but will cause problems if one of them goes down. You could also set up apache or squid to do proxying...

Cheers, Sean

aurfalien＠gmail.com

12:12 a.m.

...

Round robin DNS would balance load, but will cause problems if one of them goes down.

Hi Sean,

Can you explain as I may be planning this for a site.

So if I have 2 identical servers, each with there own IP, how will one of them going down cause issues?

I'm assuming multiple A records for the same host will be handled fine by the client lookup?

- aurf

John R Pierce

12:24 a.m.

On 03/03/11 4:12 PM, aurfalien@gmail.com wrote:

...

Can you explain as I may be planning this for a site.

So if I have 2 identical servers, each with there own IP, how will one of them going down cause issues?

I'm assuming multiple A records for the same host will be handled fine by the client lookup?

a given client will latch onto one of the IPs, try it, and get a timeout error, (or at best a ICMP "Host Not Reachable" if ICMP isn't blocked by a firewall). It won't go trying other IPs.

aurfalien＠gmail.com

12:27 a.m.

On Mar 3, 2011, at 4:24 PM, John R Pierce wrote:

...

On 03/03/11 4:12 PM, aurfalien@gmail.com wrote:

...
Can you explain as I may be planning this for a site.

So if I have 2 identical servers, each with there own IP, how will one of them going down cause issues?

I'm assuming multiple A records for the same host will be handled fine by the client lookup?

a given client will latch onto one of the IPs, try it, and get a timeout error, (or at best a ICMP "Host Not Reachable" if ICMP isn't blocked by a firewall). It won't go trying other IPs.

Thanks John.

Ryan Ordway

12:25 a.m.

On Mar 3, 2011, at 4:12 PM, aurfalien@gmail.com wrote:

...

...
Round robin DNS would balance load, but will cause problems if one of them goes down.

Hi Sean,

Can you explain as I may be planning this for a site.

So if I have 2 identical servers, each with there own IP, how will one of them going down cause issues?

I'm assuming multiple A records for the same host will be handled fine by the client lookup?

example.com resolves to: host1.example.com - A.B.C.D host2.example.com - W.X.Y.Z

1. Client performs DNS lookup and gets pointed to host2. All is well. 2. host2 goes down. DNS for example.com still resolves to host2, which is unreachable. Site is down.

Now, you can work around this by using a HA/failover system like heartbeat to have host1 and host2 communicating with each other and if one host goes down the other automatically takes over its IP address(es) and services. If you have control over your own DNS you can manage your zone's Time To Live so that records are less aggressively cached, etc.

-- Ryan Ordway E-mail: rordway@oregonstate.edu Unix Systems Administrator rordway@library.oregonstate.edu OSU Libraries, Corvallis, OR 97331 Office: Valley Library #4657

aurfalien＠gmail.com

12:35 a.m.

On Mar 3, 2011, at 4:25 PM, Ryan Ordway wrote:

...

On Mar 3, 2011, at 4:12 PM, aurfalien@gmail.com wrote:

...
...
Round robin DNS would balance load, but will cause problems if one of them goes down.

Hi Sean,

Can you explain as I may be planning this for a site.

So if I have 2 identical servers, each with there own IP, how will one of them going down cause issues?

I'm assuming multiple A records for the same host will be handled fine by the client lookup?

example.com resolves to: host1.example.com - A.B.C.D host2.example.com - W.X.Y.Z

Client performs DNS lookup and gets pointed to host2. All is well.

host2 goes down. DNS for example.com still resolves to host2,

which is unreachable. Site is down.

Now, you can work around this by using a HA/failover system like heartbeat to have host1 and host2 communicating with each other and if one host goes down the other automatically takes over its IP address(es) and services. If you have control over your own DNS you can manage your zone's Time To Live so that records are less aggressively cached, etc.

Yes, I usually have TTL pretty aggressive but you're right.

The DNS round robin is more of a poor mans load balancer and probably not appropriate for the OP.

- aurf

Sean Hart

1:56 a.m.

...

...
...
Hi Sean,

Can you explain as I may be planning this for a site.

So if I have 2 identical servers, each with there own IP, how will one of them going down cause issues?

I'm assuming multiple A records for the same host will be handled fine by the client lookup?

example.com resolves to: host1.example.com - A.B.C.D host2.example.com - W.X.Y.Z

Client performs DNS lookup and gets pointed to host2. All is well.

host2 goes down. DNS for example.com still resolves to host2,

which is unreachable. Site is down.

Yeah, what they said! I've done a few of these myself if you want to chat further off the list about your specific needs and so forth. I don't contract or anything, but I'm down to give advice.

~Sean Hart

Les Mikesell

4:14 a.m.

On 3/3/11 7:56 PM, Sean Hart wrote:

...

...
...
...
Hi Sean,

Can you explain as I may be planning this for a site.

So if I have 2 identical servers, each with there own IP, how will one of them going down cause issues?

I'm assuming multiple A records for the same host will be handled fine by the client lookup?

example.com resolves to: host1.example.com - A.B.C.D host2.example.com - W.X.Y.Z

Client performs DNS lookup and gets pointed to host2. All is well.

host2 goes down. DNS for example.com still resolves to host2,

which is unreachable. Site is down.

Yeah, what they said! I've done a few of these myself if you want to chat further off the list about your specific needs and so forth. I don't contract or anything, but I'm down to give advice.

Browsers actually handle this pretty well. If you give out multiple IPs and some are unreachable, most browsers will almost instantly connect to one one that works. And they'll recover even if one goes down after you have a connection. This only works if you give out all of the addresses in the first place, though. You can't count on changes to DNS to take effect quickly.

-- Les Mikesell lesmikesell@gmail.com

Jay Leafey

6:03 a.m.

I've used round-robin DNS with good success, but I added some additional tweaks using Heartbeat to manage the actual addresses. A typical case is where you have two systems that will be used to offer a service.

Each machine has it's own IP address, but in addition there are a pair of IPs for the SERVICE that are managed by Heartbeat. The round-robin DNS entry points to the service addresses, not the "primary" addresses of each node.

When one node goes down, Heartbeat on the other node causes it to take over the failed node's service address. This minimizes the time where the resolved address points to a dead node. so the window for failure is narrowed significantly. We've used this for DNS server, LDAP servers, and simple web servers with good results.

This is NOT an absolute fail-proof way of doing it, but it's easy to implement and is "good enough" in many cases. We've had some situations where Heartbeat didn't detect node failure quickly, but overall we've gotten acceptable results.

Your mileage may vary!

-- Jay Leafey - jay.leafey@mindless.com Memphis, TN

Nico Kadel-Garcia

3:55 a.m.

On Thu, Mar 3, 2011 at 7:25 PM, Ryan Ordway rordway@oregonstate.edu wrote:

...

On Mar 3, 2011, at 4:12 PM, aurfalien@gmail.com wrote:

...
...
Round robin DNS would balance load, but will cause problems if one of them goes down.

Hi Sean,

Can you explain as I may be planning this for a site.

So if I have 2 identical servers, each with there own IP, how will one of them going down cause issues?

I'm assuming multiple A records for the same host will be handled fine by the client lookup?

example.com resolves to: host1.example.com - A.B.C.D host2.example.com - W.X.Y.Z

Client performs DNS lookup and gets pointed to host2. All is well.

host2 goes down. DNS for example.com still resolves to host2, which is unreachable. Site is down.

Now, you can work around this by using a HA/failover system like heartbeat to have host1 and host2 communicating with each other and if one host goes down the other automatically takes over its IP address(es) and services. If you have control over your own DNS you can manage your zone's Time To Live so that records are less aggressively cached, etc.

Or "wackamole", which I've used in the past very successfully.

But the high video and high MySQL use are, themselves, issues. One of the keys is to distribute the *content* to distinct hosts. Video to something optimized for video, MySQL to a host right next to the database server, flat text and simple images wherever possible, and avoid Javasript oddities which slow down everything and cause problems for ADA complicance.

If you can view the website with Lynx, you're probably doing the right things to reduce unnecessary and extraneous *content* that will burden your server.

aurfalien＠gmail.com

12:22 a.m.

On Mar 3, 2011, at 4:05 PM, Sean Hart wrote:

...

IPVS or LVS can work as a really simple/free solution: http://www.linuxvirtualserver.org/software/ipvs.html

This was a very cool link.

I see that if in mid stream a server goes down while one is on it, problems could arise as it won't be seamless.

- aurf

m.roth＠5-cent.us

2:05 p.m.

aurfalien@gmail.com wrote:

...

On Mar 3, 2011, at 3:43 PM, Todd wrote:

...
Can anyone help me hash out how best to load balance a website that is getting considerable traffic? In the past I only have experience with BigIP where you have a load balancing device that keeps track and send traffic to the best server possible at the time. This was a proprietary system that I think was something Dell rebranded.

Right now, the whole site is is 400gb of video, HTML5, Apache, PHP, MySQL, runs on a single box with 16gb of RAM and mirrored /var/www/ html (2x1tb raid level drives). I have a Comcast 50/10 connection, 5 statics and I am seeing about 125 unique visitors a day. The site runs fine, but in anticipation of more traffic as well as a learning experience I would like to load balance.

<snip>

...

...
What about a dedicated load balancing device? What specs should this

<snip> If you're talking a load-balancing appliance, they get pricey. When I was at AT&T a few years ago, for a small group, we got one from Radware (a competitor of F5, the Big Name), but it was still several thousand dollars. It worked quite well, btw (and if you are interested, I can contact the vendor engineer I worked with....)

A warning: round robin can be problematical. At the same job, before we got the Radware box, we were going through IBM's WebSeal (part of the Tivoli suite). We went to upgrade one of four boxes from a perl website to php... and then discovered that the stupid thing did *not* do persistant connections, so someone would get (3 times out of 4) the perl, then it would 404 the next time, because it got the php, or vice versa.

mark

Charles Polisher

7 Mar 7 Mar

2:22 a.m.

m.roth@5-cent.us wrote:

...

A warning: round robin can be problematical.

Amen. Consider what happens with round-robin DNS when one host stops working. Round-robin DNS will hand out the address of the failed host just as often as it did when it was all working. Some clients (applications) will attempt to use another one of the IP addresses they get from the DNS, some won't.

A load balancer checks the health of the hosts and doesn't^Wshouldn't route traffic to hosts that aren't serving requests.

There are other considerations for round-robin DNS. As you'll want to make the TTL's very small, you must expect much more DNS traffic, so expect more load on the DNS.

-- Charles Polisher

Tim Dunphy

4:40 a.m.

an interesting choice for low cost hardware load balancing appliances is coyote point

http://www.coyotepoint.com/products/?gclid=CI6ri9jQu6cCFQbc4Aodmi1V4Q

however for my purpose open and free HAProxy remains best choice!!

On Sun, Mar 6, 2011 at 9:22 PM, Charles Polisher cpolish@surewest.net wrote:

...

m.roth@5-cent.us wrote:

...
A warning: round robin can be problematical.

Amen. Consider what happens with round-robin DNS when one host stops working. Round-robin DNS will hand out the address of the failed host just as often as it did when it was all working. Some clients (applications) will attempt to use another one of the IP addresses they get from the DNS, some won't.

A load balancer checks the health of the hosts and doesn't^Wshouldn't route traffic to hosts that aren't serving requests.

There are other considerations for round-robin DNS. As you'll want to make the TTL's very small, you must expect much more DNS traffic, so expect more load on the DNS.

-- Charles Polisher

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Lucian

5:42 a.m.

On Mon, Mar 7, 2011 at 4:40 AM, Tim Dunphy bluethundr@gmail.com wrote:

...

however for my purpose open and free HAProxy remains best choice!!

+1 for HAProxy; excellent piece of software.

David Brian Chait

6:36 a.m.

...

On Mon, Mar 7, 2011 at 4:40 AM, Tim Dunphy bluethundr@gmail.com wrote: however for my purpose open and free HAProxy remains best choice!!

...

+1 for HAProxy; excellent piece of software.

It really depends on your needs, if you are building a production ops environment then the last thing that you would want would be an unsupported/home grown solution. You need to consider the potential risks involved in implementing a poorly understood / virtually unsupported solution that in all likelihood only you would understand vs. a standard solution with an SLA behind it and an upgrade path going forward.

Nico Kadel-Garcia

11:44 a.m.

On Mon, Mar 7, 2011 at 1:36 AM, David Brian Chait dchait@invenda.com wrote:

...

...
On Mon, Mar 7, 2011 at 4:40 AM, Tim Dunphy bluethundr@gmail.com wrote: however for my purpose open and free HAProxy remains best choice!!

...
+1 for HAProxy; excellent piece of software.

It really depends on your needs, if you are building a production ops environment then the last thing that you would want would be an unsupported/home grown solution. You need to consider the potential risks involved in implementing a poorly understood / virtually unsupported solution that in all likelihood only you would understand vs. a standard solution with an SLA behind it and an upgrade path going forward.

Or in implementing an expensive, single point of failure third party device that requires a centralized control infrastructure. It can turn out to be a *very* expensive single point of failure, easily screwed up by a single upgrade or a single power supply issues or a failure to do failover networking to that device properly.

Round-robin DNS is also, unfortunately, often mishandled. People mistake changing the ordering of listed A records for round-robin and, to quote Wikipedia:

> There is no standard procedure for deciding which address will be used by the requesting application.

No such procedure. Zip, zero, nada, it's all client dependent. And if one of the IP's is on the same VLAN as the requesting host, you're *especially* likely to get all the traffic locked to that host, and DNS caches when you disable an IP can take rather unpredictable amounts of time to expire because every smart aleck downstream is doing their own caching and passing it along.

Iain Morris

8 Mar 8 Mar

8:26 p.m.

I'm surprised to see so many choosing HAProxy over LVS, which seems fairly integrated into Red Hat's offerings, with full documentation and rpms in CentOS and RHN. I've set up LVS before for an internal java application and it seemed straightforward after understanding arptables, etc. Is HAProxy worth considering as a better option for this scenario?

Regards,

-Iain

On Mon, Mar 7, 2011 at 3:44 AM, Nico Kadel-Garcia nkadel@gmail.com wrote:

...

On Mon, Mar 7, 2011 at 1:36 AM, David Brian Chait dchait@invenda.com wrote:

...
...
On Mon, Mar 7, 2011 at 4:40 AM, Tim Dunphy bluethundr@gmail.com

wrote:

...
...
however for my purpose open and free HAProxy remains best choice!!

...
+1 for HAProxy; excellent piece of software.

It really depends on your needs, if you are building a production ops

environment then the last thing that you would want would be an unsupported/home grown solution. You need to consider the potential risks involved in implementing a poorly understood / virtually unsupported solution that in all likelihood only you would understand vs. a standard solution with an SLA behind it and an upgrade path going forward.

Or in implementing an expensive, single point of failure third party device that requires a centralized control infrastructure. It can turn out to be a *very* expensive single point of failure, easily screwed up by a single upgrade or a single power supply issues or a failure to do failover networking to that device properly.

Round-robin DNS is also, unfortunately, often mishandled. People mistake changing the ordering of listed A records for round-robin and, to quote Wikipedia:
 > There is no standard procedure for deciding which address will
be used by the requesting application.

No such procedure. Zip, zero, nada, it's all client dependent. And if one of the IP's is on the same VLAN as the requesting host, you're *especially* likely to get all the traffic locked to that host, and DNS caches when you disable an IP can take rather unpredictable amounts of time to expire because every smart aleck downstream is doing their own caching and passing it along. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- -- - Iain Morris iain.t.morris@gmail.com

Brian Mathis

8:41 p.m.

On Tue, Mar 8, 2011 at 3:26 PM, Iain Morris iain.t.morris@gmail.com wrote:

...

On Mon, Mar 7, 2011 at 3:44 AM, Nico Kadel-Garcia nkadel@gmail.com wrote:

...
On Mon, Mar 7, 2011 at 1:36 AM, David Brian Chait dchait@invenda.com wrote:

...
...
On Mon, Mar 7, 2011 at 4:40 AM, Tim Dunphy bluethundr@gmail.com wrote: however for my purpose open and free HAProxy remains best choice!!

...
+1 for HAProxy; excellent piece of software.

It really depends on your needs, if you are building a production ops environment then the last thing that you would want would be an unsupported/home grown solution. You need to consider the potential risks involved in implementing a poorly understood / virtually unsupported solution that in all likelihood only you would understand vs. a standard solution with an SLA behind it and an upgrade path going forward.

Or in implementing an expensive, single point of failure third party device that requires a centralized control infrastructure. It can turn out to be a *very* expensive single point of failure, easily screwed up by a single upgrade or a single power supply issues or a failure to do failover networking to that device properly.

Round-robin DNS is also, unfortunately, often mishandled. People mistake changing the ordering of listed A records for round-robin and, to quote Wikipedia:

> There is no standard procedure for deciding which address will be used by the requesting application.

No such procedure. Zip, zero, nada, it's all client dependent. And if one of the IP's is on the same VLAN as the requesting host, you're *especially* likely to get all the traffic locked to that host, and DNS caches when you disable an IP can take rather unpredictable amounts of time to expire because every smart aleck downstream is doing their own caching and passing it along.

I'm surprised to see so many choosing HAProxy over LVS, which seems fairly integrated into Red Hat's offerings, with full documentation and rpms in CentOS and RHN. I've set up LVS before for an internal java application and it seemed straightforward after understanding arptables, etc. Is HAProxy worth considering as a better option for this scenario?

Regards, -Iain

I believe my post outlined a lot of the issues. LVS works at the IP-level, and as a result it cannot do intelligent things based on the content of the connections. A layer7 load balancer has a much better ability to handle real sticky sessions, and make all kinds of intelligent decisions based on the content, like serving images from one server while sending the dynamic app requests to another.

I had initially looked as LVS (Piranha) specifically for the reasons you mentioned, but in the current Internet landscape it has challenges that just cannot be overcome. For us the big issue was a client who was load-balancing outgoing requests over multiple Class A subnets, which completely destroyed any ability for LVS to be able to support sticky sessions.

David Brian Chait

4 Mar 4 Mar

12:51 a.m.

...

Can anyone help me hash out how best to load balance a website that is getting considerable traffic? In the past I only have experience with BigIP where you have a load balancing device that keeps track and send traffic to the best server possible at the time. This was a proprietary system that I think was something Dell rebranded.

Does the app require session state maintenance (sticky connections)? If so then round robin DNS would essentially break your application. There are a lot of low cost load balancing solutions out there that would do a far better job in a production environment...see:

http://www.kemptechnologies.com

and

http://www.loadbalancer.org/

Both offer devices in the sub-$2,000 range.

-David

James A. Peltier

1:01 a.m.

While not CentOS or even GNU/Linux related you can also have a look at OpenBSDs relayd. See

http://www.openbsd.org/cgi-bin/man.cgi?query=relayd&apropos=0&sektio...

-- James A. Peltier IT Services - Research Computing Group Simon Fraser University - Burnaby Campus Phone : 778-782-6573 Fax : 778-782-3045 E-Mail : jpeltier@sfu.ca Website : http://www.sfu.ca/itservices http://blogs.sfu.ca/people/jpeltier

aurfalien＠gmail.com

1:04 a.m.

...

http://www.openbsd.org/cgi-bin/man.cgi?query=relayd&apropos=0&sektio...

I wouldn't be surprised if that what was in part driving those low cost appliance load balancers.

Cool find, a definite book mark.

- aurf

Brian Mathis

3:09 p.m.

On Thu, Mar 3, 2011 at 6:43 PM, Todd slackmoehrle.lists@gmail.com wrote:

...

Hi All, Can anyone help me hash out how best to load balance a website that is getting considerable traffic? In the past I only have experience with BigIP where you have a load balancing device that keeps track and send traffic to the best server possible at the time. This was a proprietary system that I think was something Dell rebranded. Right now, the whole site is is 400gb of video, HTML5, Apache, PHP, MySQL, runs on a single box with 16gb of RAM and mirrored /var/www/html (2x1tb raid level drives). I have a Comcast 50/10 connection, 5 statics and I am seeing about 125 unique visitors a day. The site runs fine, but in anticipation of more traffic as well as a learning experience I would like to load balance. Obviously I need a second server just like the one it is running on now. I will probably spec something out that is capable of 32gb of RAM. What about a dedicated load balancing device? What specs should this be? How much RAM, HD, processor? It is sufficient to buy something with a GB NIC and say 4gb of RAM? Can one go slower but more RAM, small HD? I don't really quite know how intensive a task this decision making process is for the load balancer.. Right now, as example, I have an Untangle Firewall and it runs on a old AMD with 2gb RAM, GB NIC and it seems to do just fine. My local computer store has several P4 2.8ghz with 2GB of RAM for like $99.... Can anyone enlighten me on specs, proper setup, caveats....? -Jason

You have a lot of issues here, and some unanswered questions. Is the load on your site mostly bandwidth use? Do you have users who need to login to a system? Is the application designed to run with multiple front-ends? It's easy to get very basic load balancing, but your app most likely will require "sticky sessions" to ensure the user goes to the same backend server every time, and many solutions don't have this feature.

Of the free options already listed, here are the problems with them: - Round Robin DNS: Provides no additional features other then very poor "load spreading" across servers. As soon as you talk about load balancing there are usually features you need that this cannot provide, like automatic failover, dynamic adding/removing hosts, etc... Sticky sessions are simply not possible. RR DNS should not be used except in extremely basic situations.

- Linux LVS: This is a good idea on the face of it, but it can open up some tricky issues with routing and IP address handling. Also, sticky sessions are based on subnet of the IP address, which for many corporations using proxies will not work. I have seen companies that spread their proxy load across multiple /8 networks, so there's no way to sticky them.

OK, so what's good? For my requirements, HAProxy is excellent. It handled sticky sessions well, performs monitoring of each host, allows dynamic adding/removing of servers, as well as maintenance modes. It's very easy to install and configure. I'm using is as the backend to apache that is acting as an SSL termination point. It's been very high performing for us and I know a lot of big sites use it as well. The only question I would have with it is handling of video, as we only use it for typical web traffic, just high bandwidth stuff like that.

Also, make sure any load balancer you have is redundant and has some kind of failover, using something like pacemaker, heartbeat, etc...

Tim Dunphy

4:17 p.m.

...

OK, so what's good? For my requirements, HAProxy is excellent. It handled sticky sessions well, performs monitoring of each host, allows dynamic adding/removing of servers, as well as maintenance modes. It's very easy to install and configure. I'm using is as the backend to apache that is acting as an SSL termination point. It's been very high performing for us and I know a lot of big sites use it as well. The only question I would have with it is handling of video, as we only use it for typical web traffic, just high bandwidth stuff like that.

...

Also, make sure any load balancer you have is redundant and has some kind of failover, using something like pacemaker, heartbeat, etc...

I second the vote for HAProxy. It's one excellent free (as in beer) load balancer that is very easy to setup and configure.

One big site that uses it is 37 signals (the makers of basecamp and campfire among other things). HAProxy is capable of handling a lot of traffic apparently. I use it with a shared docroot living on and NFS mount. Works really great! It balances two centos vm's as primary with a physical freebsd host acting as a fallback.

Other good choices include nginx with the upstream fair plugin and #pound from apsis.

http://www.apsis.ch/pound/ http://wiki.nginx.org/LoadBalanceExample

Any of the above (pound, nginx or haproxy) will handle sticky sessions skillfully.

As to hardware load balancers I think that Netscaler by citrix deserves an honorable mention:

http://deliver.citrix.com/go/citrix/WWAD0111Q1NSGOOGLECLOUDWP?gclid=CNDzzIan...

But like any hardware lb they're certainly not cheap!! I remember when my last company was considering which load balancer to go with the contenders were Zeus, F5 and Citrix Netscaler.

I think they're all good products, but I remember when the F5 salesman came by, part of his sales pitch was "Ok, if you don't go with us I can understand why you would go with Netscaler. But Zeus? Really, guys?"

On Fri, Mar 4, 2011 at 10:09 AM, Brian Mathis brian.mathis@gmail.com wrote:

...

On Thu, Mar 3, 2011 at 6:43 PM, Todd slackmoehrle.lists@gmail.com wrote:

...
Hi All, Can anyone help me hash out how best to load balance a website that is getting considerable traffic? In the past I only have experience with BigIP where you have a load balancing device that keeps track and send traffic to the best server possible at the time. This was a proprietary system that I think was something Dell rebranded. Right now, the whole site is is 400gb of video, HTML5, Apache, PHP, MySQL, runs on a single box with 16gb of RAM and mirrored /var/www/html (2x1tb raid level drives). I have a Comcast 50/10 connection, 5 statics and I am seeing about 125 unique visitors a day. The site runs fine, but in anticipation of more traffic as well as a learning experience I would like to load balance. Obviously I need a second server just like the one it is running on now. I will probably spec something out that is capable of 32gb of RAM. What about a dedicated load balancing device? What specs should this be? How much RAM, HD, processor? It is sufficient to buy something with a GB NIC and say 4gb of RAM? Can one go slower but more RAM, small HD? I don't really quite know how intensive a task this decision making process is for the load balancer.. Right now, as example, I have an Untangle Firewall and it runs on a old AMD with 2gb RAM, GB NIC and it seems to do just fine. My local computer store has several P4 2.8ghz with 2GB of RAM for like $99.... Can anyone enlighten me on specs, proper setup, caveats....? -Jason

You have a lot of issues here, and some unanswered questions. Is the load on your site mostly bandwidth use? Do you have users who need to login to a system? Is the application designed to run with multiple front-ends? It's easy to get very basic load balancing, but your app most likely will require "sticky sessions" to ensure the user goes to the same backend server every time, and many solutions don't have this feature.

Of the free options already listed, here are the problems with them:

Round Robin DNS: Provides no additional features other then very

poor "load spreading" across servers. As soon as you talk about load balancing there are usually features you need that this cannot provide, like automatic failover, dynamic adding/removing hosts, etc... Sticky sessions are simply not possible. RR DNS should not be used except in extremely basic situations.

Linux LVS: This is a good idea on the face of it, but it can open

up some tricky issues with routing and IP address handling. Also, sticky sessions are based on subnet of the IP address, which for many corporations using proxies will not work. I have seen companies that spread their proxy load across multiple /8 networks, so there's no way to sticky them.

OK, so what's good? For my requirements, HAProxy is excellent. It handled sticky sessions well, performs monitoring of each host, allows dynamic adding/removing of servers, as well as maintenance modes. It's very easy to install and configure. I'm using is as the backend to apache that is acting as an SSL termination point. It's been very high performing for us and I know a lot of big sites use it as well. The only question I would have with it is handling of video, as we only use it for typical web traffic, just high bandwidth stuff like that.

Also, make sure any load balancer you have is redundant and has some kind of failover, using something like pacemaker, heartbeat, etc... _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Tim Dunphy

4:39 p.m.

also I forgot to mention for heartbeat I use keepalived

http://www.keepalived.org/

I found hearbeat a little difficult to implement but keepalived by comparison is a breeze to setup. Forget about multiple A records. That's a naive approach and entirely unnecessary. As other's have pointed out just setup a virtual ip using keepalived (or heartbeat or maybe something similar) and point your A record to that virtual ip.

On Fri, Mar 4, 2011 at 11:17 AM, Tim Dunphy bluethundr@gmail.com wrote:

...

...
OK, so what's good? For my requirements, HAProxy is excellent. It handled sticky sessions well, performs monitoring of each host, allows dynamic adding/removing of servers, as well as maintenance modes. It's very easy to install and configure. I'm using is as the backend to apache that is acting as an SSL termination point. It's been very high performing for us and I know a lot of big sites use it as well. The only question I would have with it is handling of video, as we only use it for typical web traffic, just high bandwidth stuff like that.

...
Also, make sure any load balancer you have is redundant and has some kind of failover, using something like pacemaker, heartbeat, etc...

I second the vote for HAProxy. It's one excellent free (as in beer) load balancer that is very easy to setup and configure.

One big site that uses it is 37 signals (the makers of basecamp and campfire among other things). HAProxy is capable of handling a lot of traffic apparently. I use it with a shared docroot living on and NFS mount. Works really great! It balances two centos vm's as primary with a physical freebsd host acting as a fallback.

Other good choices include nginx with the upstream fair plugin and #pound from apsis.

http://www.apsis.ch/pound/ http://wiki.nginx.org/LoadBalanceExample

Any of the above (pound, nginx or haproxy) will handle sticky sessions skillfully.

As to hardware load balancers I think that Netscaler by citrix deserves an honorable mention:

http://deliver.citrix.com/go/citrix/WWAD0111Q1NSGOOGLECLOUDWP?gclid=CNDzzIan...

But like any hardware lb they're certainly not cheap!! I remember when my last company was considering which load balancer to go with the contenders were Zeus, F5 and Citrix Netscaler.

I think they're all good products, but I remember when the F5 salesman came by, part of his sales pitch was "Ok, if you don't go with us I can understand why you would go with Netscaler. But Zeus? Really, guys?"

On Fri, Mar 4, 2011 at 10:09 AM, Brian Mathis brian.mathis@gmail.com wrote:

...
On Thu, Mar 3, 2011 at 6:43 PM, Todd slackmoehrle.lists@gmail.com wrote:

...
Hi All, Can anyone help me hash out how best to load balance a website that is getting considerable traffic? In the past I only have experience with BigIP where you have a load balancing device that keeps track and send traffic to the best server possible at the time. This was a proprietary system that I think was something Dell rebranded. Right now, the whole site is is 400gb of video, HTML5, Apache, PHP, MySQL, runs on a single box with 16gb of RAM and mirrored /var/www/html (2x1tb raid level drives). I have a Comcast 50/10 connection, 5 statics and I am seeing about 125 unique visitors a day. The site runs fine, but in anticipation of more traffic as well as a learning experience I would like to load balance. Obviously I need a second server just like the one it is running on now. I will probably spec something out that is capable of 32gb of RAM. What about a dedicated load balancing device? What specs should this be? How much RAM, HD, processor? It is sufficient to buy something with a GB NIC and say 4gb of RAM? Can one go slower but more RAM, small HD? I don't really quite know how intensive a task this decision making process is for the load balancer.. Right now, as example, I have an Untangle Firewall and it runs on a old AMD with 2gb RAM, GB NIC and it seems to do just fine. My local computer store has several P4 2.8ghz with 2GB of RAM for like $99.... Can anyone enlighten me on specs, proper setup, caveats....? -Jason

You have a lot of issues here, and some unanswered questions. Is the load on your site mostly bandwidth use? Do you have users who need to login to a system? Is the application designed to run with multiple front-ends? It's easy to get very basic load balancing, but your app most likely will require "sticky sessions" to ensure the user goes to the same backend server every time, and many solutions don't have this feature.

Of the free options already listed, here are the problems with them:

Round Robin DNS: Provides no additional features other then very

poor "load spreading" across servers. As soon as you talk about load balancing there are usually features you need that this cannot provide, like automatic failover, dynamic adding/removing hosts, etc... Sticky sessions are simply not possible. RR DNS should not be used except in extremely basic situations.

Linux LVS: This is a good idea on the face of it, but it can open

up some tricky issues with routing and IP address handling. Also, sticky sessions are based on subnet of the IP address, which for many corporations using proxies will not work. I have seen companies that spread their proxy load across multiple /8 networks, so there's no way to sticky them.

OK, so what's good? For my requirements, HAProxy is excellent. It handled sticky sessions well, performs monitoring of each host, allows dynamic adding/removing of servers, as well as maintenance modes. It's very easy to install and configure. I'm using is as the backend to apache that is acting as an SSL termination point. It's been very high performing for us and I know a lot of big sites use it as well. The only question I would have with it is handling of video, as we only use it for typical web traffic, just high bandwidth stuff like that.

Also, make sure any load balancer you have is redundant and has some kind of failover, using something like pacemaker, heartbeat, etc... _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

-- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B

Todd

5:34 p.m.

Brian,

Thanks for all of the great words here. I appreciate the detail in your reply.

OK, so what's good? For my requirements, HAProxy is excellent. It

...

handled sticky sessions well, performs monitoring of each host, allows dynamic adding/removing of servers, as well as maintenance modes. It's very easy to install and configure. I'm using is as the backend to apache that is acting as an SSL termination point. It's been very high performing for us and I know a lot of big sites use it as well. The only question I would have with it is handling of video, as we only use it for typical web traffic, just high bandwidth stuff like that.

Also, make sure any load balancer you have is redundant and has some kind of failover, using something like pacemaker, heartbeat, etc...

Can you outline a bit specs for building a homemade box to run HAProxy? The HAProxy site is very extensive, but I did not see ideal specs at a quick glance. I will read in depth this weekend.

Minimal specs and they excellent specs if you have thoughts.. I really don't have an idea how intensive a task like this is. Nobody needs to log into the box, simply use the box for this purpose.

-Jason

James Nguyen

7:18 p.m.

On Fri, Mar 4, 2011 at 9:34 AM, Todd slackmoehrle.lists@gmail.com wrote:

...

Brian, Thanks for all of the great words here. I appreciate the detail in your reply.

...
OK, so what's good? For my requirements, HAProxy is excellent. It handled sticky sessions well, performs monitoring of each host, allows dynamic adding/removing of servers, as well as maintenance modes. It's very easy to install and configure. I'm using is as the backend to apache that is acting as an SSL termination point. It's been very high performing for us and I know a lot of big sites use it as well. The only question I would have with it is handling of video, as we only use it for typical web traffic, just high bandwidth stuff like that.

Also, make sure any load balancer you have is redundant and has some kind of failover, using something like pacemaker, heartbeat, etc...

Can you outline a bit specs for building a homemade box to run HAProxy? The HAProxy site is very extensive, but I did not see ideal specs at a quick glance. I will read in depth this weekend. Minimal specs and they excellent specs if you have thoughts.. I really don't have an idea how intensive a task like this is. Nobody needs to log into the box, simply use the box for this purpose. -Jason _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

You want two boxes that run both haproxy + keepalived. This way you get the load balancing (HAProxy) plus the high availability (Keepalived) using a shared virtual IP for your two boxes. You can do maintenance on either one while traffic still remains active.

I don't have metrics to spec out the boxes, but given your traffic load you mentioned you don't need hefty boxes at all. Just get yourself a box with some Gigabit interfaces which I'm sure they all are these days. A single socket with 4 cores is more than enough. You can probably even do with 2 cores. Someone can correct me on that if they think the solution requires a lot of CPU. Memory wise I think machines come with at least 4Gb these days. That should do. You can probably both boxes for around 2k?

You already know how much F5 or any of those guys cost per device. =)

Best,

-- James H. Nguyen CallFire :: Systems Architect http://www.callfire.com 1.949.625.4263

m.roth＠5-cent.us

7:25 p.m.

James Nguyen wrote:

...

On Fri, Mar 4, 2011 at 9:34 AM, Todd slackmoehrle.lists@gmail.com wrote:

...
Brian, Thanks for all of the great words here. I appreciate the detail in your reply.

...
OK, so what's good? For my requirements, HAProxy is excellent. It

<snip>

...

if they think the solution requires a lot of CPU. Memory wise I think machines come with at least 4Gb these days. That should do. You can probably both boxes for around 2k?

You already know how much F5 or any of those guys cost per device. =)

Hmmm... when the job I was at went with Radware, their price was significantly lower than F5, and I was impressed with the appliance. Nice little 1u box, not even a pizza box deep, as I recall.

mark

James Nguyen

8:04 p.m.

On Fri, Mar 4, 2011 at 11:25 AM, m.roth@5-cent.us wrote:

...

James Nguyen wrote:

...
On Fri, Mar 4, 2011 at 9:34 AM, Todd slackmoehrle.lists@gmail.com wrote:

...
Brian, Thanks for all of the great words here. I appreciate the detail in your reply.

...
OK, so what's good? For my requirements, HAProxy is excellent. It

<snip> > if they think the solution requires a lot of CPU. Memory wise I think > machines come with at least 4Gb these days. That should do. You can > probably both boxes for around 2k? > > You already know how much F5 or any of those guys cost per device. =)

Hmmm... when the job I was at went with Radware, their price was significantly lower than F5, and I was impressed with the appliance. Nice little 1u box, not even a pizza box deep, as I recall.

mark

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Keep in mind you'd want at least 2 either it be appliances, devices or server boxes. The minimum for high availability is at least 2. That's assuming your power and internet route is already highly redundant as well. ;)

-- James H. Nguyen CallFire :: Systems Architect http://www.callfire.com 1.949.625.4263

Rudi Ahlers

7:28 p.m.

On Fri, Mar 4, 2011 at 9:18 PM, James Nguyen james@callfire.com wrote:

...

On Fri, Mar 4, 2011 at 9:34 AM, Todd slackmoehrle.lists@gmail.com wrote:

...
Brian, Thanks for all of the great words here. I appreciate the detail in your reply.

...
OK, so what's good? For my requirements, HAProxy is excellent. It handled sticky sessions well, performs monitoring of each host, allows dynamic adding/removing of servers, as well as maintenance modes. It's very easy to install and configure. I'm using is as the backend to apache that is acting as an SSL termination point. It's been very high performing for us and I know a lot of big sites use it as well. The only question I would have with it is handling of video, as we only use it for typical web traffic, just high bandwidth stuff like that.

Also, make sure any load balancer you have is redundant and has some kind of failover, using something like pacemaker, heartbeat, etc...

Can you outline a bit specs for building a homemade box to run HAProxy? The HAProxy site is very extensive, but I did not see ideal specs at a quick glance. I will read in depth this weekend. Minimal specs and they excellent specs if you have thoughts.. I really don't have an idea how intensive a task like this is. Nobody needs to log into the box, simply use the box for this purpose. -Jason _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

You want two boxes that run both haproxy + keepalived. This way you get the load balancing (HAProxy) plus the high availability (Keepalived) using a shared virtual IP for your two boxes. You can do maintenance on either one while traffic still remains active.

I don't have metrics to spec out the boxes, but given your traffic load you mentioned you don't need hefty boxes at all. Just get yourself a box with some Gigabit interfaces which I'm sure they all are these days. A single socket with 4 cores is more than enough. You can probably even do with 2 cores. Someone can correct me on that if they think the solution requires a lot of CPU. Memory wise I think machines come with at least 4Gb these days. That should do. You can probably both boxes for around 2k?

You already know how much F5 or any of those guys cost per device. =)

Best,

James H. Nguyen CallFire :: Systems Architect http://www.callfire.com 1.949.625.4263 _______________________________________________

How well will this setup work as a load balancer for a couple of web servers, running cPanel / VirtualMin and a few hundred websites sharing the same IP on each server?

-- Kind Regards Rudi Ahlers SoftDux Website: http://www.SoftDux.com Technical Blog: http://Blog.SoftDux.com Office: 087 805 9573 Cell: 082 554 7532

Les Mikesell

7:45 p.m.

On 3/4/2011 1:18 PM, James Nguyen wrote:

...

You want two boxes that run both haproxy + keepalived. This way you get the load balancing (HAProxy) plus the high availability (Keepalived) using a shared virtual IP for your two boxes. You can do maintenance on either one while traffic still remains active.

I don't have metrics to spec out the boxes, but given your traffic load you mentioned you don't need hefty boxes at all. Just get yourself a box with some Gigabit interfaces which I'm sure they all are these days. A single socket with 4 cores is more than enough. You can probably even do with 2 cores. Someone can correct me on that if they think the solution requires a lot of CPU. Memory wise I think machines come with at least 4Gb these days. That should do. You can probably both boxes for around 2k?

You already know how much F5 or any of those guys cost per device. =)

F5's are one of those things where if you have to ask the price you probably can't afford it... But they do provide a very nice web interface to control the pool members and virtual interfaces, something I haven't seen on free alternatives, and if you are big enough to have multiple locations they can propagate their server state info to their global DNS servers (also expensive) to control balancing/failover across sites.

For a couple of boxes that can work independently, I'd just use round robin DNS and also use heartbeat to float the IP's to the backup on outages. That way you normally share the load for performance but if one fails or is shut down gracefully, the other one will still handle things for both IP targets. If your application maintains any session state you'll need to work out a way to keep it in sync or after the initial connection, redirect to a specific machine and live with what happens when it goes down (which might not be that bad, maybe just a new login when they try to come back).

-- Les Mikesell lesmikesell@gmail.com

Brian Mathis

9:20 p.m.

On Fri, Mar 4, 2011 at 12:34 PM, Todd slackmoehrle.lists@gmail.com wrote:

...

Brian, Thanks for all of the great words here. I appreciate the detail in your reply.

...
OK, so what's good? For my requirements, HAProxy is excellent. It handled sticky sessions well, performs monitoring of each host, allows dynamic adding/removing of servers, as well as maintenance modes. It's very easy to install and configure. I'm using is as the backend to apache that is acting as an SSL termination point. It's been very high performing for us and I know a lot of big sites use it as well. The only question I would have with it is handling of video, as we only use it for typical web traffic, just high bandwidth stuff like that.

Also, make sure any load balancer you have is redundant and has some kind of failover, using something like pacemaker, heartbeat, etc...

Can you outline a bit specs for building a homemade box to run HAProxy? The HAProxy site is very extensive, but I did not see ideal specs at a quick glance. I will read in depth this weekend. Minimal specs and they excellent specs if you have thoughts.. I really don't have an idea how intensive a task like this is. Nobody needs to log into the box, simply use the box for this purpose. -Jason

The servers I use were brand new Dell R610s as of 2 years ago, with the lowest CPU I could get (dual core) and 8GB RAM (currently only 2.5GB used). However, my site only handles a high load once in a while, though I haven't seen any haproxy related problems with performance.

I would start with low-end servers and then monitor and add as you need to. If you setup the redundancy right, you can even skimp on things like dual power supplies, etc...

5360

Age (days ago)

5365

Last active (days ago)

discuss@lists.centos.org

33 comments

18 participants

tags (0)

participants (18)

aurfalien＠gmail.com
Brian Mathis
Charles Polisher
David Brian Chait
Iain Morris
James A. Peltier
James Nguyen
Jay Leafey
John R Pierce
Les Mikesell
Lucian
m.roth＠5-cent.us
Nico Kadel-Garcia
Rudi Ahlers
Ryan Ordway
Sean Hart
Tim Dunphy
Todd