[CentOS] Finding i/o bottleneck

Fri Sep 23 13:39:24 UTC 2011
Ross Walker <rswwalker at gmail.com>

On Sep 21, 2011, at 9:33 AM, "Nicolas Ross" <rossnick-lists at cybercat.ca> wrote:

>> Hi Nicolas,
>> 
>> While this doesn't exactly answer your question, I was wondering what
>> scheduler you were using on your GFS2 (Note: I have not used this file
>> system before) block. You can find this by issuing 'cat /sys/block/<insert
>> block device>/queue/scheduler' ?
>> 
>> By default the system uses cfq, which will show up as [cfq] when catting
>> the scheduler as I showed above. This is not the most optimal scheduler
>> for a webserver. In most cases you'd be better off with deadline or noop.
>> Not being familiar with GFS2 myself, I did skim this article, which makes
>> me think noop would be the better choice:
>> 
>> http://www.redhat.com/archives/linux-cluster/2010-June/msg00027.html
>> 
>> This could be why you are seeing the processes waiting on I/O.
>> 
> 
> In my case, /sys/block/dm-9/queue/scheduler show : none and 
> /sys/block/sdb/queue/scheduler shows "noop anticipatory deadline [cfq]".
> 
> Since this is a production cluster, I do not want to make changes to it just 
> now. I will ask advice from RHEL support for setting this.
> 
> But that seems logical.
> 
> In the meen time, I'd still like to find a tool to know what files are 
> requeted to the filesystem and what ones are being waited for...

You could try iotop, I am told it's good at showing both files and processes under high io or wait.

Switching to 'deadline' for a cluster file system (or any file server) is always a good idea as CFQ is designed to give equal weight to running processes on a system and kernel processes, remote processes or disk arrays were not factored into the equation.

-Ross