[CentOS] Weird performance problem

Thu Apr 16 13:20:00 UTC 2009
Ugo Bellavance <ugob at lubik.ca>

Ugo Bellavance a écrit :
> Hi,
> 
> I'm running a CentOS 4.  server and I sometimes face a weird problem. 
> It is a weird performance problem, and here is how I discovered it.
> 
> This server runs OpenVZ virtual machines, and one of them is an asterisk 
> server for my personal use.  The first symptom of the problem is that 
> the voice quality became flaky.  So I logged on the server to see what 
> could be eating cpu cycles, when I ran top, it took almost one minute 
> before top actually showed.  Another hint is that when I run dstat (a 
> monitoring utility that is a mix of iostat and vmstat and other stats), 
> I often get a "missed xx ticks", where xx is a number.

Another hint is that pings are really slow.  Even pinging localhost is 
very long.  The first reply is fast, but the second takes ages to come.

It seems to be blocking here:

recvmsg(3, 0xbfbf84b0, MSG_DONTWAIT)    = -1 EAGAIN (Resource 
temporarily unavailable)
gettimeofday({1239887784, 389347}, NULL) = 0
poll(

The rest comes as soon as there is another response:

[{fd=3, events=POLLIN|POLLERR}], 1, 999) = 0
gettimeofday({1239887903, 119727}, NULL) = 0
gettimeofday({1239887903, 119791}, NULL) = 0
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), 
sin_addr=inet_addr("127.0.0.1")}, 
msg_iov(1)=[{"\10\0\335\2018)\0\4\0370\347I\357\323\1\0\10\t\n\v\f\r\16\17\20\21\22\23\24\25\26\27"..., 
64}], msg_controllen=0, msg_flags=0}, MSG_CONFIRM) = 64
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), 
sin_addr=inet_addr("127.0.0.1")}, 
msg_iov(1)=[{"E\0\0T\26\264\0\0@\1e\363\177\0\0\1\177\0\0\1\0\0\345\2018)\0\4\0370\347I"..., 
192}], msg_controllen=20, {cmsg_len=20, cmsg_level=SOL_SOCKET, 
cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 84
write(1, "64 bytes from hn01.domain"..., 82) = 82
recvmsg(3, 0xbfbf84b0, MSG_DONTWAIT)    = -1 EAGAIN (Resource 
temporarily unavailable)
gettimeofday({1239887903, 120785}, NULL) = 0
poll(

Then it blocks again...

This confuses Nagios that is running in a VM on this server.

Can the 'gettimeofday' be the problem?  'date' runs w/o delay.

Thanks,

Ugo