[CentOS] Find reason for heavy load

Thu Dec 31 09:26:12 UTC 2009
Noob Centos Admin <centos.admin at gmail.com>

Hi,


> Dstat could at least tell you if your problem is CPU or I/O.

This was the result of running the following command which I obtained
from reading up about two weeks ago when I started trying to
investigate the abnormal server behaviour.

dstat -c --top-cpu -d --top-bio --top-latency
usr sys idl wai hiq siq|  cpu process   | read  writ| latency process
  4   1  93   2   0   0|mysqld       0.0|  80k   82k|khelper         8
 42  46   0  12   0   0|httpd         12| 648k    0 |ksoftirqd/0   111
 26  37  12  26   0   0|httpd        1.5| 520k   11M|ksoftirqd/1    75
 23  49   8  19   0   0|exim         1.0| 652k   16k|ksoftirqd/0    44
 26  44   3  28   0   0|exim         1.0| 652k 1296k|ksoftirqd/0    44
 32  41   4  23   0   0|exim         1.5| 620k   16k|ksoftirqd/0    50
 28  52   3  16   0   0|exim         1.5| 700k    0 |ksoftirqd/1    47
 21  41  11  28   0   0|exim         1.0| 556k   11M|ksoftirqd/0    79
 27  46   3  24   0   0|exim         1.5| 684k   16k|ksoftirqd/1    40
 29  45   2  24   0   0|exim         1.0| 672k  944k|ksoftirqd/0    25
 28  33   3  37   0   0|httpd         14| 852k 5992k|ksoftirqd/1    39
 36  39   2  23   0   0|httpd        5.0|1024k    0 |ksoftirqd/0    84


> Even better, run
>
> vmstat 2 10
>
> Look at the first two columns.  What column have higher numbers?  If r,
> you're CPU-bound.  If b, you're I/O bound.

procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 8  1   3092 131460 100692 833668    0    0    40    21    1    0  4  1 92  2  0
 9  1   3092 130708 100700 835016    0    0   578   206  577 1420 32 50  3 15  0
 7  1   3092 128324 100716 836148    0    0   546  2866  594 1465 31 44  7 18  0
 4  1   3092 126860 100724 837268    0    0   540   256  596 1505 28 43  6 23  0
 7  2   3092 125600 100740 838564    0    0   620   234  661 1442 30 41  2 26  0
 5  1   3092 124028 100756 839752    0    0   570  2692  635 1430 24 45  6 25  0
 6  0   3092 122040 100784 840964    0    0   584  1464  682 1434 27 44  2 28  0
 6  1   3092 120588 100792 842232    0    0   602   278  624 1562 32 46  2 20  0
 2  3   3092 120556 100840 843064    0    0   440  2908  603 1299 22 35  6 37  0
 3  1   3092 119832 100876 844088    0    0   430  1104  605 1348 23 36  1 40  0

According to this, am I correct to conclude that I'm CPU bound and the
system is busy doing some unknown processing?

> Did you check if you have a defect disk or a rebuilding array?  That
> could be the cause.

I usually run a "cat /proc/mdstat" whenever I log into the server to
check my MD raid status. So far the array appears ok. There are no
disk warning when I run "dmesg". smartctl also reports no error logged
and passed for both disks, although no self test was ran. Would I be
safe to conclude that the disks are OK and not part of the problem?

Thanks again to everybody for the suggestions and help so far.