[CentOS] clustering and load balancing Apache, using nginx

Thu Feb 12 17:04:49 UTC 2009
Les Mikesell <lesmikesell at gmail.com>

Sergej Kandyla wrote:

>>>
>>> In the preforking mode apache create a child on each incoming request, 
>>> so it's too much expensive for resource usage.
>>>     
>> Have you actually measured this?  Preforking apache doesn't fork per 
>> request, it forks enough instances to accept the concurrent connection 
>> count plus a few spares.  Each child would typically handle thousands of 
>> requests before exiting and requiring a new fork - the number is 
>> configurable.
>>
>>   
> Sorry for bad explanation.
> I meant that apache create a child (above MinSpareServers) for serving 
> each new unique client.

That's actually for each concurrent connection, not each unique client. 
  Browsers may fire off many simultaneous connections but http 
connections typically have a very short life, so unless users are 
downloading big files, streaming data, or have low-bandwidth connections 
(or your back end service is slow), you shouldn't have that much 
concurrency.

> I measured nginx in real life :)
> On some server (~15k uniq hosts per day, ~ 100k pageviews, and with 1-3k 
> concurrent tcp "established" connections ) with frontend(nginx) - 
> backend (apache + phpfastcgi) architecture I turned off nginx proxing 
> and server go away for a minute... apache forked to MaxClients (500) and 
> took all memory.

There are many factors that can affect it, but that seems like too many 
concurrent connections for that amount of traffic.  The obvious thing to 
check is whether you have keepalives on and if so, what timeout you use. 
  On a busy internet site you want it off or very short.  Also, I'm not 
sure the fastcgi interface gives the same buffer/decoupling effect that 
you get with a proxy.  With a proxy, the heavyweight backend is finished 
and can accept the next request as soon as it has sent its output to the 
proxy which may take much longer to deliver to slow clients. The fastcgi 
interface might keep the backend tied up until the output is delivered. 
   If that is the case, you would get much of the same effect with 
apache as a front end proxy.  Running apache as a proxy might work with 
less memory in threaded mode too.

> Also nginx helped me protect from low-medium DDoS. When apache forked to 
> maxclients, nginx could server many thousand concurrent connections.  So
> I've wrote shell scripts to parse nginx logs and put IPs of bots to 
> firewall table.

Basically if your backend can't deliver the data at the rate the 
requests come in you are fried anyway.

> Therefore I find nginx (lighttpd also a good choose) enough efficient 
> (at least for me). Off course you should understand what you expecting 
> from nginx, what it can do and what can't.
> 
> If you want real world measurements or examples of using nginx on heavy 
> loaded sites please to google. Also you could ask in the nginx at 
> sysoev.ru mail list (EN).

Thanks, I hadn't found much about it in english.

>>> Also apache spend about 
>>> 15-30Kb mem for serving each tcp connection at this time nginx only 
>>> 1-1.5Kb. If you have, for example, abount 100 concurrent connections 
>>> from different IPs there is nearly 100 apache forks... it's too expensive.
>>>     
>> A freshly forked child should have nearly 100% memory shared with its 
>> parent and other child instances. 
> Please tell me how much resources you should have for revers proxing 
> with apache for example nearly 1k-2k unique clients ?
> What cpu load and memory usage will you have?

I'm not sure there are good ways to measure the shared copy-on-write RAM 
of forked processes.  But 15k/connection doesn't sound unreasonable, 
keeping in mind that you have to buffer all unacknowledged data somewhere.

> I think that apache is great software. It's very flexible and features 
> rich, but it especially good as backend for dynamical applications 
> (mod_php, mod_perl, etc.)
> If you need to serve many thousand concurrent connections you should 
> look at nginx, lighttpd, squid, etc..
> IMHO.

I've been using F5 load balancers for the hard part of this for a while 
  but I'd still wonder why you have that much concurrency instead of 
delivering the page and dropping the connection.

-- 
   Les Mikesell
    lesmikesell at gmail.com