[CentOS] Help in troubleshoot cause of high kernel activity
Johnny Hughes
johnny at centos.org
Sat Mar 29 10:37:28 UTC 2008
Noob Centos Admin wrote:
> Hi, I had been experiencing a problem on our dedicated server running Centos
> 5, and unable to successfully track down the problem.
>
> Since about 6 days ago, I noticed a spike in load/CPU utilization which went
> from a typical 0.2x-0.3x to 3.x. At the same time, average traffic also went
> up and so did the log usage. Prior to this, the server was working fine and
> there had been no changes to the configuration.
>
> Initially, I narrowed it down to the mail system. Exim was generating
> significantly more log data than usual. This was eventually narrowed down to
> apparently our server and another server playing ping pong between two users
> who coincidentally were on vacation and had both their mailboxes filled.
> Thus it caused an endless loop of "Message Undelivered" and "Auto-reply".
>
> Once this was identified and cleared up, I had expected things to go back to
> normal. However, load/traffic remained high.
>
> Looking at "top" output, I noted that %sys was as high and often much higher
> than %user. However, individual process %CPU just didn't add up to the total
> top was reporting. Top reports 160~170 sleeping tasks and only 4 active most
> of the time, which was largely exim then httpd/mysql/php.
>
> top Snapshot
> ==========
> top - 17:25:03 up 7 days, 19:16, 1 user, load average: 2.03, 2.84, 3.04
> Tasks: 168 total, 4 running, 164 sleeping, 0 stopped, 0 zombie
> Cpu(s): 26.5%us, 50.3%sy, 0.0%ni, 16.6%id, 6.1%wa, 0.0%hi, 0.5%si,
> 0.0%st
> Mem: 1915208k total, 1880256k used, 34952k free, 142100k buffers
> Swap: 16777208k total, 66140k used, 16711068k free, 1276564k cached
>
>
> iostat Snapshot
> ============
> avg-cpu: %user %nice %system %iowait %steal %idle
> 18.96 0.00 25.57 5.16 0.01 50.30
>
> Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
> sda 54.19 63.31 2460.80 42689802 1659234904
> sdb 55.12 76.41 2460.80 51521720 1659234904
> md1 315.95 139.72 2442.00 94207644 1646554216
> md0 0.01 0.00 0.02 1422 14736
> dm-0 39.13 65.85 292.50 44399402 197219496
> dm-1 267.18 36.18 2110.08 24398010 1422756072
> dm-2 9.64 37.68 39.42 25408576 26578648
> fd0 0.00 0.00 0.00 16 0
> sr0 0.00 0.00 0.00 136 0
>
> Searching around for ways to interpret the output, I tried sar/iostat and
> essentially, the information off the net indicates there wasn't a disk
> problem, %io was relatively low and mdadm shows the RAID 1 disks working
> perfectly fine. Since %sys is consistently highest, it appears that the
> kernel was doing something outside of norm.
>
> The problem is I have no idea what else to do to determine what "something"
> is.
>
> I've looked at netstat and there doesn't appear to be excessive connections,
> logwatch summary also does not appear to give any clue as there are no
> records of unusual failed log in attempts.
>
> Please advise what else can I look into or check. Thanks in advance!
Well .. top says you have 4 processes running ... if that is consistent
(4 processes always in a run state) then you should be able to determine
the running processes with the command:
ps -ef r
(I think)
I would think one of always running processes is the one that is taking
up CPU time.
Also while in top, <Shift>-H might show some hidden threads in the output.
Maybe those will help to find the processes that are taking the CPU time.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://lists.centos.org/pipermail/centos/attachments/20080329/d4e6781a/attachment.sig>
More information about the CentOS
mailing list