[CentOS] Weird performance problem

Thu Apr 16 13:12:10 UTC 2009
Ugo Bellavance <ugob at lubik.ca>

Hi,

I'm running a CentOS 4.  server and I sometimes face a weird problem. 
It is a weird performance problem, and here is how I discovered it.

This server runs OpenVZ virtual machines, and one of them is an asterisk 
server for my personal use.  The first symptom of the problem is that 
the voice quality became flaky.  So I logged on the server to see what 
could be eating cpu cycles, when I ran top, it took almost one minute 
before top actually showed.  Another hint is that when I run dstat (a 
monitoring utility that is a mix of iostat and vmstat and other stats), 
I often get a "missed xx ticks", where xx is a number.

Example (current) (sorry for the wrap):

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
   3   2  93   2   0   0| 106k  273k|   0     0 | 0.2   0.4 |1039   389
   3   6  91   0   0   0|   0  6416k| 276k  275k|   0     0 |2160  6822 
  missed 55 ticks
   4  10  84   2   0   0|1200k 1992k|  82k   93k|   0     0 |1188  6275 
  missed 29 ticks
   1   0  99   0   0   0|   0  1312k|  65k   66k|   0     0 |1050  1114 
  missed 38 ticks
   2   1  96   0   0   0|   0  1168k|  57k   59k|   0     0 | 491   877 
  missed 13 ticks
   3   1  94   1   0   0|   0  6016k| 181k  176k|   0     0 |2169  5996 
  missed 50 ticks
   4   2  91   1   0   0|  28k 8744k| 216k  214k|   0     0 |2159  5438 
  missed 37 ticks
   1   1  98   0   0   0|   0  2632k|  93k   91k|   0     0 | 983  1381 
  missed 34 ticks
   1   1  98   1   0   0|   0  5624k| 113k  110k|   0     0 |1569  2643 
  missed 52 ticks
   1   1  98   1   0   0|   0  2432k|  29k   28k|   0     0 | 679   647 
  missed 12 ticks
   0   0 100   0   0   0|   0     0 |  60B  374B|   0     0 |  13    15
   2   3  94   0   0   0|   0  1872k| 209k  210k|   0     0 |1375  3590 
  missed 30 ticks



The problem is currently occuring, but it doesn't seem to be affecting 
voice quality for now, so I have some time to try to find the cause. 
The only solution I've found up to now is to reboot... But hey, this 
isn't a Windows 98 machine :)!

I tried restarting the VZ system, which restarts all the VMs, but it 
didn't solve the problem.  I can't tell if the problem occurs on a stock 
centos kernel, because the server is running production (but 
non-critical) virtual machines, so it is always running the openVZ kernel.

So here is what I've done for now:

- Top shows a load of about 0.4

- vmstat 1 10 shows this:

procs -----------memory---------- ---swap-- -----io---- --system-- 
----cpu----
  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us 
sy id wa
  0  0    592 191092 381720 537956    0    0    53    68    4     3  3 
2 93  2
  0  0    592 190720 381720 537956    0    0     0     0   32    60  1 
1 98  0
  0  0    592 191092 381720 537956    0    0     0     0   41    59  0 
0 100  0
  1  0    592 191092 381728 537948    0    0     0  2584  311    96 10 
4 66 19
  0  0    592 189968 381732 537944    0    0     0  2080  222   174  2 
3 79 16
  0  1    592 189968 381732 537944    0    0     0  3244  170    73 10 
4 73 12
  0  0    592 190216 381732 537944    0    0     0   136   76   113  1 
2 93  4
  0  0    592 189844 381732 537944    0    0     0     0   33    69  1 
1 98  0
  0  0    592 189844 381732 537944    0    0     0     0   24    32  0 
0 100  0
  0  0    592 190340 381732 537944    0    0     0     0   28    42  0 
0 100  0

iostat -x 1 (excerpt)

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s 
avgrq-sz avgqu-sz   await  svctm  %util
sda          0.00 171.00  0.00 124.00    0.00 2368.00     0.00  1184.00 
    19.10     0.14    1.13   0.02   0.20
sdb          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00 
    0.00     0.00    0.00   0.00   0.00
sdc          0.00 171.00  0.00 124.00    0.00 2368.00     0.00  1184.00 
    19.10     0.17    1.35   0.02   0.30
sdd          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00 
    0.00     0.00    0.00   0.00   0.00
md0          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00 
    0.00     0.00    0.00   0.00   0.00
md2          0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00 
    0.00     0.00    0.00   0.00   0.00
md1          0.00   0.00  0.00 294.00    0.00 2352.00     0.00  1176.00 
     8.00     0.00    0.00   0.00   0.00
dm-0         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00 
    0.00     0.00    0.00   0.00   0.00
dm-1         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00 
    0.00     0.00    0.00   0.00   0.00
dm-2         0.00   0.00  0.00 294.00    0.00 2352.00     0.00  1176.00 
     8.00     0.30    1.01   0.02   0.50
dm-3         0.00   0.00  0.00 294.00    0.00 2352.00     0.00  1176.00 
     8.00     0.30    1.01   0.02   0.50
dm-4         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00 
    0.00     0.00    0.00   0.00   0.00
dm-5         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00 
    0.00     0.00    0.00   0.00   0.00
dm-6         0.00   0.00  0.00  0.00    0.00    0.00     0.00     0.00 
    0.00     0.00    0.00   0.00   0.00

Any insight would be greatly appreciated.  It is not critical, but I'd 
be glad to be able to finally pinpoint and solve the problem.

Hardware: HP Netserver, software raid, SCSI disks, 1.7 GB RAM.

I can provide more information if needed.

Thanks,

Ugo