[CentOS] Disk Elevator

Tue Jan 16 14:37:20 UTC 2007
Jim Perrin <jperrin at gmail.com>

> > > Quoting "Ross S. W. Walker" <rwalker at medallion.com>:
> > >
> > > > The biggest performance gain you can achieve on a raid
> > > array is to make
> > > > sure you format the volume aligned to your raid stripe
> > > size. For example
> > > > if you have a 4 drive raid 5 and it is using 64K chunks,
> > your stripe
> > > > size will be 256K. Given a 4K filesystem block size you
> > > would then have
> > > > a stride of 64 (256/4), so when you format your volume:
> > > >
> > > > Mke2fs -E stride=64 (other needed options -j for ext3, -N
> > > <# of inodes>
> > > > for extended # of i-nodes, -O dir_index speeds up directory
> > > searches for
> > > > large # of files) /dev/XXXX
> > >
> > > Shouldn't the argument for stride option be how many file system
> > > blocks there is per stripe?  After all, there's no way for OS
> > > to guess
> > > what RAID level you are using.  For 4 disk RAID5 with 64k
> > chunks and
> > > 4k file system blocks you have only 48 file system blocks
> > per stripe
> > > ((4-1)x64k/4k=48).  So it should be -E stride=48 in this
> > particular
> > > case.  If it was 4 disk RAID0 array, than it would be 64
> > > (4x64k/4k=64).  If it was 4 disk RAID10 array, than it would be 32
> > > ((4/2)*64k/4k=32).  Or at least that's the way I understood it by
> > > reading the man page.
> >
> > You are correct, leave one of the chunks off for the parity, so for 4
> > disk raid5 stride=48. I had just computed all 4 chunks as part of the
> > stride.
> >
> > BTW that parity chunk still needs to be in memory to avoid the read on
> > it, no? In that case wouldn't a stride of 64 help in that case? And if
> > the stride leaves out the parity chunk then will not successive
> > read-aheads cause a continuous wrap of the stripe which will
> > negate the
> > effect of the stride by not having the complete stripe cached?

> For read-ahead, you would set this through blockdev --setra X /dev/YY,
> and use a multiple of the # of sectors in a stripe, so for a 256K
> stripe, set the read-ahead to 512, 1024, 2048, depending if the io is
> mostly random or mostly sequential (bigger for sequential, smaller for
> random).


To follow up on this (even if it is a little late), how is this
affected by LVM use?
I'm curious to know how (or if) this math changes with ext3 sitting on
LVM on the raid array.

-- 
During times of universal deceit, telling the truth becomes a revolutionary act.
George Orwell