RE: [CentOS] Disk Elevator

16 Jan 2007


      ...
-----Original Message-----
From: centos-bounces@centos.org 
[mailto:centos-bounces@centos.org] On Behalf Of Jim Perrin
Sent: Tuesday, January 16, 2007 9:37 AM
To: CentOS mailing list
Subject: Re: [CentOS] Disk Elevator
...
...
...
Quoting "Ross S. W. Walker" rwalker@medallion.com:
...
The biggest performance gain you can achieve on a raid
array is to make
...
sure you format the volume aligned to your raid stripe
size. For example
...
if you have a 4 drive raid 5 and it is using 64K chunks,
your stripe
...
...
size will be 256K. Given a 4K filesystem block size you
would then have
...
a stride of 64 (256/4), so when you format your volume:
Mke2fs -E stride=64 (other needed options -j for ext3, -N
<# of inodes>
...
for extended # of i-nodes, -O dir_index speeds up directory
searches for
...
large # of files) /dev/XXXX
Shouldn't the argument for stride option be how many file system
blocks there is per stripe?  After all, there's no way for OS
to guess
what RAID level you are using.  For 4 disk RAID5 with 64k
chunks and
...
4k file system blocks you have only 48 file system blocks
per stripe
...
((4-1)x64k/4k=48).  So it should be -E stride=48 in this
particular
...
case.  If it was 4 disk RAID0 array, than it would be 64
(4x64k/4k=64).  If it was 4 disk RAID10 array, than it
would be 32
...
...
...
((4/2)*64k/4k=32).  Or at least that's the way I
understood it by
...
...
...
reading the man page.
You are correct, leave one of the chunks off for the
parity, so for 4
...
...
disk raid5 stride=48. I had just computed all 4 chunks as
part of the
...
...
stride.
BTW that parity chunk still needs to be in memory to
avoid the read on
...
...
it, no? In that case wouldn't a stride of 64 help in that
case? And if
...
...
the stride leaves out the parity chunk then will not successive
read-aheads cause a continuous wrap of the stripe which will
negate the
effect of the stride by not having the complete stripe cached?
...
For read-ahead, you would set this through blockdev --setra
X /dev/YY,
...
and use a multiple of the # of sectors in a stripe, so for a 256K
stripe, set the read-ahead to 512, 1024, 2048, depending if
the io is
...
mostly random or mostly sequential (bigger for sequential,
smaller for
...
random).
To follow up on this (even if it is a little late), how is this
affected by LVM use?
I'm curious to know how (or if) this math changes with ext3 sitting on
LVM on the raid array.
Depends is the best answer. It really depends on LVM and the other block
layer devices. As the io requests descend down the different layers they
will enter multiple request_queues, each request_queue will have and io
scheduler assigned to it, either the system default or one of the
others, or one of the block devices own, so it is hard to say. Only by
testing can you know for sure. In my tests LVM is very good with
unnoticeable overhead going to hardware RAID, but if you use MD RAID
then your experience might be different.
Ext3
 |
VFS
 |
Page Cache
 |
LVM request_queue (io scheduler)
 |
LVM
 |
MD request_queue (io scheduler)
 |
MD
 |
-----------------
|   |   |   |   |
Que Que Que Que Que (io scheduler)
|   |   |   |   |
Sda sdb sdc sdd sde
Hope this helps clarify.
______________________________________________________________________
This e-mail, and any attachments thereto, is intended only for use by
the addressee(s) named herein and may contain legally privileged
and/or confidential information. If you are not the intended recipient
of this e-mail, you are hereby notified that any dissemination,
distribution or copying of this e-mail, and any attachments thereto,
is strictly prohibited. If you have received this e-mail in error,
please immediately notify the sender and permanently delete the
original and any copy or printout thereof.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

RE: [CentOS] Disk Elevator