I'd like to know what the cause of a particular DB server's slowdown might be. We've ruled out IOPs for the disks (~ 20%) and raw CPU load (top shows perhaps 1/2 of cores busy, but the system slows to a crawl.
We're suspecting that we're simply running out of memory bandwidth but have no way to confirm this suspicion. Is there a way to test for this? Think: iostat but for memory bandwidth instead of disk IO.
So far, searching has found intel-cmt-cat-master which isn't supported on our CPU and oprofile which *sounds* like it does what I want from their website but I can't seem to get output that, in any way, tells me what the bandwidth usage is.
Any idea?
On 2/2/2016 5:34 PM, Benjamin Smith wrote:
I'd like to know what the cause of a particular DB server's slowdown might be. We've ruled out IOPs for the disks (~ 20%) and raw CPU load (top shows perhaps 1/2 of cores busy, but the system slows to a crawl.
We're suspecting that we're simply running out of memory bandwidth but have no way to confirm this suspicion. Is there a way to test for this? Think: iostat but for memory bandwidth instead of disk IO.
memory bandwidth would show up as CPU busy, there's no distinction.
50% of your cores 100% busy, how many cores and how many waiting database tasks are there? typically with most database servers, one user connection == one core at a time. so if you have 16 cores, and only 8 busy/active database connections, that will tie up those 8 cores and leave the other 8 free. now the 8 processes will probably get bounced around between the cores, so it could end up looking like all 16 cores are 50% busy averaged over some sample rate, but thats the same net difference..
Hello, Try to install collectd and check the metrics for ram.
Best regards, El dia 03/02/2016 2:51 a. m., "John R Pierce" pierce@hogranch.com va escriure:
On 2/2/2016 5:34 PM, Benjamin Smith wrote:
I'd like to know what the cause of a particular DB server's slowdown might be. We've ruled out IOPs for the disks (~ 20%) and raw CPU load (top shows perhaps 1/2 of cores busy, but the system slows to a crawl.
We're suspecting that we're simply running out of memory bandwidth but have no way to confirm this suspicion. Is there a way to test for this? Think: iostat but for memory bandwidth instead of disk IO.
memory bandwidth would show up as CPU busy, there's no distinction.
50% of your cores 100% busy, how many cores and how many waiting database tasks are there? typically with most database servers, one user connection == one core at a time. so if you have 16 cores, and only 8 busy/active database connections, that will tie up those 8 cores and leave the other 8 free. now the 8 processes will probably get bounced around between the cores, so it could end up looking like all 16 cores are 50% busy averaged over some sample rate, but thats the same net difference..
-- john r pierce, recycling bits in santa cruz
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
On 02/02/2016 05:34 PM, Benjamin Smith wrote:
We've ruled out IOPs for the disks (~ 20%)
How did you measure that? What filesystem are you using? What is the disk / array configuration? Which database?
If you run "iostat -x 2" what does a representative summary look like?
and raw CPU load (top shows perhaps 1/2 of cores busy, but the system slows to a crawl.
Define "busy"?
On Tue, Feb 2, 2016 at 7:34 PM, Gordon Messmer gordon.messmer@gmail.com wrote:
On 02/02/2016 05:34 PM, Benjamin Smith wrote:
We've ruled out IOPs for the disks (~ 20%)
How did you measure that? What filesystem are you using? What is the disk / array configuration? Which database?
If you run "iostat -x 2" what does a representative summary look like?
and raw CPU load (top shows perhaps 1/2 of cores busy, but the system slows to a crawl.
Define "busy"?
Yeah.
It'd nice to see the output from top so we can see what is consuming most of the cpu or anything consuming less than it should because it's waiting for something else that's slower. It might be useful to see 'perf top' if perf is installed, and if not install it, reproduce the problem and let perf top run for a minute, then post it on fpaste or pastebin so the formatting stays semisane.
On Tue, 2 Feb 2016 at 20:34 -0000, Benjamin Smith wrote:
Any idea?
Wild guessing...How old a system? ~5 year old Nehalem? If so try:
echo 0 > /proc/sys/vm/zone_reclaim_mode
For some memory performance diagnosing try 'sar':
sar -B 10
There are lots of other sar options which might be useful.
Stuart
Benjamin Smith <lists@...> writes:
So far, searching has found intel-cmt-cat-master which isn't supported
on our
CPU and oprofile which *sounds* like it does what I want from their
website but
I can't seem to get output that, in any way, tells me what the bandwidth
usage
is.
Any idea?
Perhaps Intel Performance Counter Monitor tool can help here: https://software.intel.com/en-us/articles/intel-performance-counter- monitor
Quick CPU model check on ark.intel.com will indicate maximum CPU memory bandwidth.