[CentOS] Find reason for heavy load

Thu Dec 31 19:00:18 UTC 2009

I initiated services shutdown as previously planned and once the
external services like exim, dovecot, httpd, crond (because it kept
restarting these services), the problem child stood out like a sore
thumb.

There was two exim instances that didn't go away despite service exim
stop. Once I killed these two PID, the load average started dropping
rapidly. After a minute or so, the server went back to a happy 0.2~0.3
load and disk activity became almost negligible.

I think these, orphaned? zombied?, exim instances were related to a
mail loop problem I discovered earlier today where one of my client on
holiday had a full mailbox and keep bouncing mails from a contact
whose site was suspended. Although I terminated that loop, it seemed
that exim had gotten those two instances stuck in limbo sucking up
processing power and hitting the disk somewhere unknown since they
weren't showing up in my exim logs.

After observing a while, I brought the services back and once exim got
started, my load went back to 2.x ~ 3.x. Unfortunately while I was
typing this email, I realize it didn't stop there. I'm up to 4.x ~ 5.x
load level by now.

So the application that is the cause of the load is definitely exim,
more specifically I think it's spam assassin because now that the mail
logs entries are slow, I can read the spamd details and mails are
taking between 3 to 8 seconds to be checked.

Thanks again to everybody who had offer suggestions and advice and do
have a Happy New Year :)

On 1/1/10, Noob Centos Admin <centos.admin at gmail.com> wrote:
> Hi,
>
>> I do not know about now but I had to unload the modules in question.
>> Just clearing the rules was not enough to ensure that the netfilter
>> connection tracking modules were not using any cpu at all.
>
> Thanks for pointing this out. Being a noob admin as my pseudonym
> states, I'd assumed stopping apf and restarting iptables was
> sufficient. I'll have to look up unloading module later.
>
>> /me shrugs. When I was the mta admin at Outblaze Ltd. (messaging
>> business now owned by IBM and called Lotus Live) spammers always ensured
>> I got called. All they do is just press the big red button (aka start
>> the script/system) and then go and play while I would have to deal with
>> whatever was started.
>
> Based on the almost precise timing of around 9:30 to 5:30 India time,
> I'm inclined to think in my case it wasn't so much a spammer pressing
> a red button but a compromised machine in an office starting up when
> the user gets into office and knocks off on time at 5:30 :D
>
>> I remember only one occasion when the spams were
>> launched but neutralized very soon because they were pushing a website
>> and I found a sample real early and so the anti spam system could just
>> dump the spams and knock out accounts being used to send the crap.
>
> Could I ask how do I knock out the accounts sending the crap if they
> are not within my systems?
>
>> First, try rmmod'ing the netfilter modules after you have cleared away
>> the state related rules to make sure that you are only using static
>> rules in netfilter...unless you have done that already..
>
> I think I'm only using static rules because after I restart iptables,
> I would then do a service iptables status to check my rules were in,
> and that list was very short compared to when APF was active.
>
> The good news is, I think I've fixed the big problem after doing my
> shutdown tests and returned to the original problem.
>