[CentOS] scheduling differences between CentOS 4 and CentOS 5?

Mon May 30 22:07:57 UTC 2011
Matt Garman <matthew.garman at gmail.com>

On Tue, May 24, 2011 at 02:22:12PM -0400, R P Herrold wrote:
> On Mon, 23 May 2011, Mag Gam wrote:
> 
> > I would like to confirm Matt's claim. I too experienced larger
> > latencies with Centos 5.x compared to 4.x. My application is very
> > network sensitive and its easy to prove using lat_tcp.
> 
> > Russ,
> > I am curious about identifying the problem. What tools do you
> > recommend to find where the latency is coming from in the application?
> 
> I went through the obvious candidates:
>  	system calls
>  		(loss of control of when if ever the
>  		scheduler decides to let your process run again)

This is almost certainly what it is for us.  But in this situation,
these calls are limited to mutex operations and condition variable
signaling.

>  	polling v select
>  		polling is almost always a wrong approach when
>  		latency reduction is in play
>  		(reading and understanding: man 2 select_tut
>  		 is time very well spent)

We are using select().  However, that is only for the networking
part (basically using select() to wait on data from a socket).
Here, my concern isn't with network latency---it's with "intra
process" latency.

>  	choice of implementation language -- the issue here
>  		being if one uses a scripting language, one cannot
>  		'see' the time leaks

C/C++ here.

> Doing metrics permits both 'hot spot' analysis, and moves the 
> coding from 'guesstimation' to software engineering.  We use 
> graphviz, and gnuplot on the plain text 'CSV-style' timings 
> files to 'see' outliers and hotspots

We're basically doing that.  We pre-allocate a huge 2D array for
keeping "stopwatch" points throughout the program.  Each column
represents a different stopwatch point, and each row represents and
different iteration through these measured points.  After a lot of
iterations (usually at least 100k), the numbers are dumped to a file
for analysis.

Basically, the standard deviation from one iteration to the next is
fairly low.  It's not like there are a few outliers driving the
average intra-process latency up; it's just that, in general, going
from point A to point B takes longer with the newer kernels.

For what it's worth, I tried a 2.6.39 mainline kernel (from elrepo),
and the intra-process latencies get still worse.  It appears that
whatever changes are being made to the kernel, it's bad for our kind
of program.  I'm trying to figure out, from a conceptual level, what
those changes are.  I'm looking for an easier way to understand than
reading the kernel source and change history.  :)