On 7/1/11, Les Mikesell lesmikesell@gmail.com wrote:
The principle is the same but the way to control it would be different. Spamassassin is a perl program that uses a lot of memory and takes a lot of resources to start up. If you run a lot of copies at once, expect the machine to crawl or die.
This I had experienced before, which is why the first thing I look at usually is the mail processes.
MimeDefang, being mostly perl itself, runs spamassassin in its own process and has a way to control the number of instances - and does it in a way that doesn't tie a big perl process to every sendmail instance. Other systems might run the spamd background process and queue up the messages to scan. The worst case is something that starts a new process for every received message and keeps the big perl/spamassassin process running for the duration - you might also see this with spamassassin runs happening in each user's .procmailrc. One thing that might help is to make sure the spam/virus check operations happen in an order that starts with the least resource usage and the most likely checks to cause rejection so spamassassin might not have to run so much.
I do have greylisting and stuff in to reject as much mail before spamd runs, so there's probably not much more I could do on that side without learning to program Exim conf.
The same principle applies there, especially if you have big cgi programs or mod_perl, mod_python, mod_php (etc.) modules that use a lot of resources. You are probably running in pre-forking mode so those programs quickly stop sharing memory in the child processes (perl is particularly bad about this since variable reference counts are always being updated). Even if you handle normal load, you might have a problem when a search engine indexer walks your links and fires off more copies than usual. You can get an idea of how much of a problem you have here by looking at the RES size of the httpd processes in top. If they are big and fairly variable, you have some pages/modules/programs that consume a lot of memory. You can limit the number of concurrent processes, and in some cases it might help to reduce their life (MaxRequestsPerChild).
I'll keep this in mind if the current fix doesn't hold up (no ballooning, higher starting memory for the VM) which it appears to so far.
Oh, one other thing... Do the web programs using mysql for anything? I've seen mysql do some really dumb things on a 3-table join, like make a temporary table containing all the join possibilities, sort it, then return the small number of rows you asked for with a LIMIT. Maybe it is better these days but that used to happen even when there were indexes on the fields involved and if any of the tables were big it would take a huge amount of disk activity.
Most of the apps run off mysql, the likely culprit could be the Wordpress corporate blog they have since that probably invites all kind of spambots and what not. Definitely not our customized apps since we basically have an audit trail of every single command issued to the system and so although I don't have the relevant httpd logs due to the logrotate error, I'm certain no cron jobs and nobody was accessing it at those times.