[CentOS] scheduling differences between CentOS 4 and CentOS 5?

We have several latency-sensitive "pipeline"-style programs that have
a measurable performance degredation when run on CentOS 5.x versus
CentOS 4.x.

By "pipeline" program, I mean one that has multiple threads.  The
mutiple threads work on shared data.  Between each thread, there is a
queue.  So thread A gets data, pushes into Qab, thread B pulls from
Qab, does some processing, then pushes into Qbc, thread C pulls from
Qbc, etc.  The initial data is from the network (generated by a 3rd
party).

We basically measure the time from when the data is received to when
the last thread performs its task.  In our application, we see an
increase of anywhere from 20 to 50 microseconds when moving from
CentOS 4 to CentOS 5.

I have used a few methods of profiling our application, and determined
that the added latency on CentOS 5 comes from queue operations (in
particular, popping).

However, I can improve performance on CentOS 5 (to be the same as
CentOS 4) by using taskset to bind the program to a subset of the
available cores.

So it appers to me, between CentOS 4 and 5, there was some change
(presumably to the kernel) that caused threads to be scheduled
differently (and this difference is suboptimal for our application).

While I can "solve" this problem with taskset, my preference is to not
have to do this.  I'm hoping there's some kind of kernel tunable (or
maybe collection of tunables) whose default was changed between
versions.

Anyone have any experience with this?  Perhaps some more areas to investigate?

Thanks,
Matt