CentOS 5/6 random system freezes - Discuss

15 Apr 2014


      I am working on high load daemon development, which listens on UDP and
processes packets. Last few months I noticed some strange issue when it
takes 500-700 ms to answer packet, while usually it takes 20 ms. I've run
strace on all daemon processes and found this thing:
13:35:36.979887 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309,
...}) = 0
13:35:36.979944 write(3, "[2014-04-15 13:35:36,979] WARNING    WORKER 26 -
[pkt#105132/AUTH] loadPresets - memory used: 0 kb\n", 99) = 99
13:35:37.599793 sendto(10, "Q\0\0\0\rSELECT 1\0", 14, MSG_NOSIGNAL, NULL,
0) = 14
13:35:37.599865 poll([{fd=10, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=10,
revents=POLLIN}])
You can see that between write and sendto passed around 600 ms. At this
time server was not overloaded (LA = 0.4, 16 cores). There were free
memory. There were no load on disks.
So I took straces of other daemon processes plus database processes. And
then did:
grep '13:35:37.4' *
grep '13:35:37.3' *
grep '13:35:37.2' *
None of commans show any lines. So I guess the system was just doing
nothing during this 600 ms.
Is there any way to diagnose this issue? What it might be?
-- 
Andrii Zinchenko
mail@zinok.org