On 6/30/2011 12:39 PM, Emmanuel Noobadmin wrote:
But, odds are that the source of the problem is starting too many mail delivery programs, especially if they, or the user's local procmail, starts a spamassassin instance per message. Look at the mail logs for a problem time to see if you had a flurry of messages coming in. Sendmail/MimeDefang is fairly good at queuing the input and controlling the processes running at once but even with that you may have to throttle the concurrent sendmail processes.
Does it make a difference if I'm running Exim instead of sendmail/MimeDefang?
The principle is the same but the way to control it would be different. Spamassassin is a perl program that uses a lot of memory and takes a lot of resources to start up. If you run a lot of copies at once, expect the machine to crawl or die. MimeDefang, being mostly perl itself, runs spamassassin in its own process and has a way to control the number of instances - and does it in a way that doesn't tie a big perl process to every sendmail instance. Other systems might run the spamd background process and queue up the messages to scan. The worst case is something that starts a new process for every received message and keeps the big perl/spamassassin process running for the duration - you might also see this with spamassassin runs happening in each user's .procmailrc. One thing that might help is to make sure the spam/virus check operations happen in an order that starts with the least resource usage and the most likely checks to cause rejection so spamassassin might not have to run so much.
Right now it doesn't look like an mail run, more like a httpd run because it's starting to look like a large number of httpd threads was spawned just before that.
The same principle applies there, especially if you have big cgi programs or mod_perl, mod_python, mod_php (etc.) modules that use a lot of resources. You are probably running in pre-forking mode so those programs quickly stop sharing memory in the child processes (perl is particularly bad about this since variable reference counts are always being updated). Even if you handle normal load, you might have a problem when a search engine indexer walks your links and fires off more copies than usual. You can get an idea of how much of a problem you have here by looking at the RES size of the httpd processes in top. If they are big and fairly variable, you have some pages/modules/programs that consume a lot of memory. You can limit the number of concurrent processes, and in some cases it might help to reduce their life (MaxRequestsPerChild).