[CentOS] Finding i/o bottleneck

Mon Sep 26 21:00:52 UTC 2011
Dennis Jacobfeuerborn <dennisml at conversis.de>

On 09/26/2011 09:02 PM, Nicolas Ross wrote:
>>>> Not sure what those do, but lsof should show what files are open, and
>>>> 'strace -p process_id' would show the system calls issued by a
>>>> process.
>>>
>>> Thanks, that might be usefull. I'ill just have to find a way to strace
>>> multiple process at once and find the usefull info among that load of
>>> data...
>>
>> Note that if what is really happening is that different processes are
>> frequently accessing the same disk in different locations (a fairly
>> likely scenario) the time will be mostly taken by the head seeks in
>> between and may be hard to pin down.
>
> I found the -f option to strace is able to attach to the forked child of a
> parent process, so I will be using that in my debuging in conjunction
> witgh -e to filter out only the calls I want to see...
>
> But indeed, that might be hard to find. In one case, I want to see what
> files are opened / accessed on a gfs2 volume over a fiber channel link to a
> raid-1 array, and the controler is supposed to intelligent enough to
> distribute the read access across the 2 disks. And in the other case, it's
> an ssd, so seek time should be 0.

You could try systemtap:
http://sourceware.org/systemtap/examples/

In your case this script could be useful:
http://sourceware.org/systemtap/examples/io/iotime.stp

"The script watches each open, close, read, and write syscalls on the 
system. For each file the scripts observes opened it accumulates the amount 
of wall clock time spent in read and write operations and the number of 
bytes read and written. When a file is closed the script prints out a pair 
of lines for the file. Both lines begin with a timestamp in microseconds, 
the PID number, and the executable name in parentheses. The first line with 
the "access" keyword lists the file name, the attempted number of bytes for 
the read and write operations. The second line with the "iotime" keyword 
list the file name and the number of microseconds accumulated in the read 
and write syscalls."

Regards,
   Dennis