[CentOS] stuck process continues under strace

Sat Nov 24 10:39:52 UTC 2007
Jure Pečar <pegasus at nerv.eu.org>

Hello,

I have a few VPSes with centos5 on which I expirience a strange phenomen:

Every now and then a random network communicating process (lynx, sendmail,
httpd, ...) starts eating 100% of cpu while doing nothing. That slows down
a VPS a lot.

When examined with lsof, all of such processes have at least one tcp
connection in close_wait state. Touching such process with strace -p makes
it spring back to life and it goes on working normaly. Sendmail straces are
specialy interesting, because I can see it wants to continue smtp dialogue
somewhere from the middle. I suspect the cpu usage I see is the process
trying to read from tcp socket to which the other end has nothing more to
write. But as this is not happening in userspace (where I would see recv()
in strace output), I expect the problem lies higher, in virtuozzo.

I would like to be able to understand what is going on and maybe be able to
reproduce the problem, but I don't have enough understanding of what
exactly goes on. Specifically, what does strace do to a process that knocks
it out of the cpu consuming loop?

I'd be happy if you can give me some pointers.


-- 

Jure Pečar
http://jure.pecar.org/