I'm going to guess that your storage controller or drives was performing some internal operation or reset during that time, which blocked IO to the kernel and likely had disk-dependent tasks in iowait.
On Tue, Apr 15, 2014 at 5:25 PM, Andrii Zinchenko mail@zinok.org wrote:
I am working on high load daemon development, which listens on UDP and processes packets. Last few months I noticed some strange issue when it takes 500-700 ms to answer packet, while usually it takes 20 ms. I've run strace on all daemon processes and found this thing:
13:35:36.979887 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2309, ...}) = 0 13:35:36.979944 write(3, "[2014-04-15 13:35:36,979] WARNING WORKER 26 - [pkt#105132/AUTH] loadPresets - memory used: 0 kb\n", 99) = 99 13:35:37.599793 sendto(10, "Q\0\0\0\rSELECT 1\0", 14, MSG_NOSIGNAL, NULL, 0) = 14 13:35:37.599865 poll([{fd=10, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=10, revents=POLLIN}])
You can see that between write and sendto passed around 600 ms. At this time server was not overloaded (LA = 0.4, 16 cores). There were free memory. There were no load on disks. So I took straces of other daemon processes plus database processes. And then did:
grep '13:35:37.4' * grep '13:35:37.3' * grep '13:35:37.2' *
None of commans show any lines. So I guess the system was just doing nothing during this 600 ms.
Is there any way to diagnose this issue? What it might be?
-- Andrii Zinchenko mail@zinok.org _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos