Re: [CentOS] scheduling differences between CentOS 4 and CentOS 5?

30 May 2011


      On Tue, May 24, 2011 at 02:22:12PM -0400, R P Herrold wrote:
...
On Mon, 23 May 2011, Mag Gam wrote:
...
I would like to confirm Matt's claim. I too experienced larger
latencies with Centos 5.x compared to 4.x. My application is very
network sensitive and its easy to prove using lat_tcp.
...
Russ,
I am curious about identifying the problem. What tools do you
recommend to find where the latency is coming from in the application?
I went through the obvious candidates:
   system calls
   	(loss of control of when if ever the
   	scheduler decides to let your process run again)
This is almost certainly what it is for us.  But in this situation,
these calls are limited to mutex operations and condition variable
signaling.
...
polling v select
   	polling is almost always a wrong approach when
   	latency reduction is in play
   	(reading and understanding: man 2 select_tut
   	 is time very well spent)
We are using select().  However, that is only for the networking
part (basically using select() to wait on data from a socket).
Here, my concern isn't with network latency---it's with "intra
process" latency.
...
choice of implementation language -- the issue here
   	being if one uses a scripting language, one cannot
   	'see' the time leaks
C/C++ here.
...
Doing metrics permits both 'hot spot' analysis, and moves the 
coding from 'guesstimation' to software engineering.  We use 
graphviz, and gnuplot on the plain text 'CSV-style' timings 
files to 'see' outliers and hotspots
We're basically doing that.  We pre-allocate a huge 2D array for
keeping "stopwatch" points throughout the program.  Each column
represents a different stopwatch point, and each row represents and
different iteration through these measured points.  After a lot of
iterations (usually at least 100k), the numbers are dumped to a file
for analysis.
Basically, the standard deviation from one iteration to the next is
fairly low.  It's not like there are a few outliers driving the
average intra-process latency up; it's just that, in general, going
from point A to point B takes longer with the newer kernels.
For what it's worth, I tried a 2.6.39 mainline kernel (from elrepo),
and the intra-process latencies get still worse.  It appears that
whatever changes are being made to the kernel, it's bad for our kind
of program.  I'm trying to figure out, from a conceptual level, what
those changes are.  I'm looking for an easier way to understand than
reading the kernel source and change history.  :)

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

Re: [CentOS] scheduling differences between CentOS 4 and CentOS 5?