Noob Centos Admin wrote: > Hi, I had been experiencing a problem on our dedicated server running Centos > 5, and unable to successfully track down the problem. > > Since about 6 days ago, I noticed a spike in load/CPU utilization which went > from a typical 0.2x-0.3x to 3.x. At the same time, average traffic also went > up and so did the log usage. Prior to this, the server was working fine and > there had been no changes to the configuration. > > Initially, I narrowed it down to the mail system. Exim was generating > significantly more log data than usual. This was eventually narrowed down to > apparently our server and another server playing ping pong between two users > who coincidentally were on vacation and had both their mailboxes filled. > Thus it caused an endless loop of "Message Undelivered" and "Auto-reply". > > Once this was identified and cleared up, I had expected things to go back to > normal. However, load/traffic remained high. > > Looking at "top" output, I noted that %sys was as high and often much higher > than %user. However, individual process %CPU just didn't add up to the total > top was reporting. Top reports 160~170 sleeping tasks and only 4 active most > of the time, which was largely exim then httpd/mysql/php. > > top Snapshot > ========== > top - 17:25:03 up 7 days, 19:16, 1 user, load average: 2.03, 2.84, 3.04 > Tasks: 168 total, 4 running, 164 sleeping, 0 stopped, 0 zombie > Cpu(s): 26.5%us, 50.3%sy, 0.0%ni, 16.6%id, 6.1%wa, 0.0%hi, 0.5%si, > 0.0%st > Mem: 1915208k total, 1880256k used, 34952k free, 142100k buffers > Swap: 16777208k total, 66140k used, 16711068k free, 1276564k cached > > > iostat Snapshot > ============ > avg-cpu: %user %nice %system %iowait %steal %idle > 18.96 0.00 25.57 5.16 0.01 50.30 > > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn > sda 54.19 63.31 2460.80 42689802 1659234904 > sdb 55.12 76.41 2460.80 51521720 1659234904 > md1 315.95 139.72 2442.00 94207644 1646554216 > md0 0.01 0.00 0.02 1422 14736 > dm-0 39.13 65.85 292.50 44399402 197219496 > dm-1 267.18 36.18 2110.08 24398010 1422756072 > dm-2 9.64 37.68 39.42 25408576 26578648 > fd0 0.00 0.00 0.00 16 0 > sr0 0.00 0.00 0.00 136 0 > > Searching around for ways to interpret the output, I tried sar/iostat and > essentially, the information off the net indicates there wasn't a disk > problem, %io was relatively low and mdadm shows the RAID 1 disks working > perfectly fine. Since %sys is consistently highest, it appears that the > kernel was doing something outside of norm. > > The problem is I have no idea what else to do to determine what "something" > is. > > I've looked at netstat and there doesn't appear to be excessive connections, > logwatch summary also does not appear to give any clue as there are no > records of unusual failed log in attempts. > > Please advise what else can I look into or check. Thanks in advance! Well .. top says you have 4 processes running ... if that is consistent (4 processes always in a run state) then you should be able to determine the running processes with the command: ps -ef r (I think) I would think one of always running processes is the one that is taking up CPU time. Also while in top, <Shift>-H might show some hidden threads in the output. Maybe those will help to find the processes that are taking the CPU time. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 252 bytes Desc: OpenPGP digital signature URL: <http://lists.centos.org/pipermail/centos/attachments/20080329/d4e6781a/attachment-0005.sig>