[CentOS] Disk Elevator

Sat Jan 6 22:36:38 UTC 2007
Ross S. W. Walker <rwalker at medallion.com>

> -----Original Message-----
> From: centos-bounces at centos.org 
> [mailto:centos-bounces at centos.org] On Behalf Of Jim Perrin
> Sent: Friday, January 05, 2007 7:24 PM
> To: CentOS mailing list
> Subject: Re: [CentOS] Disk Elevator
> 
> On 1/5/07, Matt <lm7812 at gmail.com> wrote:
> > Can anyone explain how the disk elevator works and if there 
> is anyway
> > to tweak it?  I have an email server which likely has a large number
> > of read and write requests and was wandering if there was anyway to
> > improve performance.
> 
> Reasonably decent writeup. Gives a good overview, but I'm not sure how
> much detail you'd like.
> http://www.redhat.com/magazine/008jun05/features/schedulers/

The disk elevators or io schedulers are there to minimize head seek by
re-ordering and merging requests to read or write data from common areas
of the disk.

There are some tweaks to improve performance, but the performance gains
are minimal on a raid array (the elevators do not not stripe size as
they were implemented with single-spindle drives in mind).

The biggest performance gain you can achieve on a raid array is to make
sure you format the volume aligned to your raid stripe size. For example
if you have a 4 drive raid 5 and it is using 64K chunks, your stripe
size will be 256K. Given a 4K filesystem block size you would then have
a stride of 64 (256/4), so when you format your volume:

Mke2fs -E stride=64 (other needed options -j for ext3, -N <# of inodes>
for extended # of i-nodes, -O dir_index speeds up directory searches for
large # of files) /dev/XXXX

By aligning the file-system to the array stripe size you can minimize
short write penalties to your array which will speed up writes. By using
the -O dir_index option you can speed up reads a fraction, but by
minimizing the write penalties reads will gain performance anyway.

A short write penalty is when data is written to an array that is
shorter then the stripe (256K) then the remaining blocks will need to be
read from the stripe in order to compute a new parity for the stripe. If
the OS knows the stripe size then each stripe can be cached before hand
in a read-ahead  so when a write comes it should have all the data it
needs to write the full stripe to disk. It can also give hints to the
page cache for combining separate io that falls in the same stripe.

-Ross



______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.