[CentOS] Find reason for heavy load
John R Pierce
pierce at hogranch.com
Wed Dec 30 04:59:48 UTC 2009
Noob Centos Admin wrote:
> My Centos 5 server has seen the average load jumped through the roof
> recently despite having no major additional clients placed on it.
> Previously, I was looking at an average of less than 0.6 load, I had a
> monitoring script that sends an email warning me if the current load
> stayed above 0.6 for more than 2 minutes. This script used to trigger
> perhaps once an hour during peak periods. Even so, I seldom see
> numbers higher than 1.x
> On 4th Dec, somebody from an Indian IP range started hammering my SMTP
> service, attempting to use it as an open relay. Naturally that didn't
> work and only end up budging my typical 400KB daily log report into
> 2MB~4MB affairs.
> After observing a few days to determine the IP range, I started
> blocking the Indian subnet with apf. Initially I had problems with
> getting apf to wok properly but after a couple of days managed to get
> the block working and my daily log went back down to expected size
> when all those connection attempts disappear from exim's log.
> Now this is when my server load started to shoot through the roof with
> figures like 8.64 5.90 3.62 being reported by my monitoring script,
> triggering so often. I had to raise my threshold to 1.6 to keep my own
> script from spamming myself.
> I've tried changing several things on the server, since initially it
> seems like the high load may be due to I/O wait. So I turning off
> non-essential services like OpenNMS to see if that had any effect. I
> also turned off apf and inserted rules manually into iptables to
> reduce the number of iptable rules the system has to process.
> All that doesn't seem to help much, I'm still getting consistent
> server loads in the 2.x to 3.x range almost all the time.
> The problem is using top, none of my processes are showing abnormal
> CPU%, most are well under 5%, manually adding them up doesn't equate
> the 200% to 300% the load figures of 2.x and 3.x are indicating.
> Even top's own summary says CPU % is in the 20~30% range, what's
> worrying is the System% is also in the same range. I have no idea what
> is "system" doing since it appears that anything running inside the
> kernel is lumped under "system". Or why even totalling both % up, I
> would expect 50~60% to translate to the expected load of 0.5~0.6 yet
> system load stats is 5x what's expected.
> I've installed utilities like dstat to try to see if I can figure out
> which process is making the system calls that is clogging up the
> server but either I don't understand it or it's not the right tool.
> So I'll appreciate some advice on how/what should I do next to
> identify the cause. Thanks in advance!
last time I saw something like that, it was a bunch of chinese 'bots'
hammering on my public services like ssh. another admin had turned
pop3 on too, this created a very heavy load yet they didn't show up in
top (bunches of pop3 and ssh processes showed up in ps -auxww, however,
plug netstat -an
More information about the CentOS