[CentOS] Disk Elevator

Mon Jan 8 18:44:21 UTC 2007
Ross S. W. Walker <rwalker at medallion.com>

> -----Original Message-----
> From: centos-bounces at centos.org 
> [mailto:centos-bounces at centos.org] On Behalf Of Ross S. W. Walker
> Sent: Monday, January 08, 2007 1:15 PM
> To: CentOS mailing list
> Subject: RE: [CentOS] Disk Elevator
> 
> > -----Original Message-----
> > From: centos-bounces at centos.org 
> > [mailto:centos-bounces at centos.org] On Behalf Of Aleksandar 
> Milivojevic
> > Sent: Monday, January 08, 2007 1:00 PM
> > To: centos at centos.org
> > Subject: RE: [CentOS] Disk Elevator
> > 
> > Quoting "Ross S. W. Walker" <rwalker at medallion.com>:
> > 
> > > The biggest performance gain you can achieve on a raid 
> > array is to make
> > > sure you format the volume aligned to your raid stripe 
> > size. For example
> > > if you have a 4 drive raid 5 and it is using 64K chunks, 
> your stripe
> > > size will be 256K. Given a 4K filesystem block size you 
> > would then have
> > > a stride of 64 (256/4), so when you format your volume:
> > >
> > > Mke2fs -E stride=64 (other needed options -j for ext3, -N 
> > <# of inodes>
> > > for extended # of i-nodes, -O dir_index speeds up directory 
> > searches for
> > > large # of files) /dev/XXXX
> > 
> > Shouldn't the argument for stride option be how many file system  
> > blocks there is per stripe?  After all, there's no way for OS 
> > to guess  
> > what RAID level you are using.  For 4 disk RAID5 with 64k 
> chunks and  
> > 4k file system blocks you have only 48 file system blocks 
> per stripe  
> > ((4-1)x64k/4k=48).  So it should be -E stride=48 in this 
> particular  
> > case.  If it was 4 disk RAID0 array, than it would be 64  
> > (4x64k/4k=64).  If it was 4 disk RAID10 array, than it would be 32  
> > ((4/2)*64k/4k=32).  Or at least that's the way I understood it by  
> > reading the man page.
> 
> You are correct, leave one of the chunks off for the parity, so for 4
> disk raid5 stride=48. I had just computed all 4 chunks as part of the
> stride.
> 
> BTW that parity chunk still needs to be in memory to avoid the read on
> it, no? In that case wouldn't a stride of 64 help in that case? And if
> the stride leaves out the parity chunk then will not successive
> read-aheads cause a continuous wrap of the stripe which will 
> negate the
> effect of the stride by not having the complete stripe cached?
> 

Let me follow up on my last post by saying that Aleksandar is abolutely
correct. The stride is the # of blocks per-stripe and has nothing to do
with read-ahead, and thus should be calculated by # of chunks minus
parity in a stripe.

For read-ahead, you would set this through blockdev --setra X /dev/YY,
and use a multiple of the # of sectors in a stripe, so for a 256K
stripe, set the read-ahead to 512, 1024, 2048, depending if the io is
mostly random or mostly sequential (bigger for sequential, smaller for
random).

-Ross



______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.