Noob Centos Admin wrote: > My Centos 5 server has seen the average load jumped through the roof > recently despite having no major additional clients placed on it. > Previously, I was looking at an average of less than 0.6 load, I had a > monitoring script that sends an email warning me if the current load > stayed above 0.6 for more than 2 minutes. This script used to trigger > perhaps once an hour during peak periods. Even so, I seldom see > numbers higher than 1.x > > On 4th Dec, somebody from an Indian IP range started hammering my SMTP > service, attempting to use it as an open relay. Naturally that didn't > work and only end up budging my typical 400KB daily log report into > 2MB~4MB affairs. > > After observing a few days to determine the IP range, I started > blocking the Indian subnet with apf. Initially I had problems with > getting apf to wok properly but after a couple of days managed to get > the block working and my daily log went back down to expected size > when all those connection attempts disappear from exim's log. > > Now this is when my server load started to shoot through the roof with > figures like 8.64 5.90 3.62 being reported by my monitoring script, > triggering so often. I had to raise my threshold to 1.6 to keep my own > script from spamming myself. > > I've tried changing several things on the server, since initially it > seems like the high load may be due to I/O wait. So I turning off > non-essential services like OpenNMS to see if that had any effect. I > also turned off apf and inserted rules manually into iptables to > reduce the number of iptable rules the system has to process. > > All that doesn't seem to help much, I'm still getting consistent > server loads in the 2.x to 3.x range almost all the time. > > The problem is using top, none of my processes are showing abnormal > CPU%, most are well under 5%, manually adding them up doesn't equate > the 200% to 300% the load figures of 2.x and 3.x are indicating. > > Even top's own summary says CPU % is in the 20~30% range, what's > worrying is the System% is also in the same range. I have no idea what > is "system" doing since it appears that anything running inside the > kernel is lumped under "system". Or why even totalling both % up, I > would expect 50~60% to translate to the expected load of 0.5~0.6 yet > system load stats is 5x what's expected. > > I've installed utilities like dstat to try to see if I can figure out > which process is making the system calls that is clogging up the > server but either I don't understand it or it's not the right tool. > > So I'll appreciate some advice on how/what should I do next to > identify the cause. Thanks in advance! last time I saw something like that, it was a bunch of chinese 'bots' hammering on my public services like ssh. another admin had turned pop3 on too, this created a very heavy load yet they didn't show up in top (bunches of pop3 and ssh processes showed up in ps -auxww, however, plug netstat -an