On Tue, May 24, 2011 at 02:22:12PM -0400, R P Herrold wrote: > On Mon, 23 May 2011, Mag Gam wrote: > > > I would like to confirm Matt's claim. I too experienced larger > > latencies with Centos 5.x compared to 4.x. My application is very > > network sensitive and its easy to prove using lat_tcp. > > > Russ, > > I am curious about identifying the problem. What tools do you > > recommend to find where the latency is coming from in the application? > > I went through the obvious candidates: > system calls > (loss of control of when if ever the > scheduler decides to let your process run again) This is almost certainly what it is for us. But in this situation, these calls are limited to mutex operations and condition variable signaling. > polling v select > polling is almost always a wrong approach when > latency reduction is in play > (reading and understanding: man 2 select_tut > is time very well spent) We are using select(). However, that is only for the networking part (basically using select() to wait on data from a socket). Here, my concern isn't with network latency---it's with "intra process" latency. > choice of implementation language -- the issue here > being if one uses a scripting language, one cannot > 'see' the time leaks C/C++ here. > Doing metrics permits both 'hot spot' analysis, and moves the > coding from 'guesstimation' to software engineering. We use > graphviz, and gnuplot on the plain text 'CSV-style' timings > files to 'see' outliers and hotspots We're basically doing that. We pre-allocate a huge 2D array for keeping "stopwatch" points throughout the program. Each column represents a different stopwatch point, and each row represents and different iteration through these measured points. After a lot of iterations (usually at least 100k), the numbers are dumped to a file for analysis. Basically, the standard deviation from one iteration to the next is fairly low. It's not like there are a few outliers driving the average intra-process latency up; it's just that, in general, going from point A to point B takes longer with the newer kernels. For what it's worth, I tried a 2.6.39 mainline kernel (from elrepo), and the intra-process latencies get still worse. It appears that whatever changes are being made to the kernel, it's bad for our kind of program. I'm trying to figure out, from a conceptual level, what those changes are. I'm looking for an easier way to understand than reading the kernel source and change history. :)