-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Jim Perrin Sent: Tuesday, January 16, 2007 9:37 AM To: CentOS mailing list Subject: Re: [CentOS] Disk Elevator
Quoting "Ross S. W. Walker" rwalker@medallion.com:
The biggest performance gain you can achieve on a raid
array is to make
sure you format the volume aligned to your raid stripe
size. For example
if you have a 4 drive raid 5 and it is using 64K chunks,
your stripe
size will be 256K. Given a 4K filesystem block size you
would then have
a stride of 64 (256/4), so when you format your volume:
Mke2fs -E stride=64 (other needed options -j for ext3, -N
<# of inodes>
for extended # of i-nodes, -O dir_index speeds up directory
searches for
large # of files) /dev/XXXX
Shouldn't the argument for stride option be how many file system blocks there is per stripe? After all, there's no way for OS to guess what RAID level you are using. For 4 disk RAID5 with 64k
chunks and
4k file system blocks you have only 48 file system blocks
per stripe
((4-1)x64k/4k=48). So it should be -E stride=48 in this
particular
case. If it was 4 disk RAID0 array, than it would be 64 (4x64k/4k=64). If it was 4 disk RAID10 array, than it
would be 32
((4/2)*64k/4k=32). Or at least that's the way I
understood it by
reading the man page.
You are correct, leave one of the chunks off for the
parity, so for 4
disk raid5 stride=48. I had just computed all 4 chunks as
part of the
stride.
BTW that parity chunk still needs to be in memory to
avoid the read on
it, no? In that case wouldn't a stride of 64 help in that
case? And if
the stride leaves out the parity chunk then will not successive read-aheads cause a continuous wrap of the stripe which will negate the effect of the stride by not having the complete stripe cached?
For read-ahead, you would set this through blockdev --setra
X /dev/YY,
and use a multiple of the # of sectors in a stripe, so for a 256K stripe, set the read-ahead to 512, 1024, 2048, depending if
the io is
mostly random or mostly sequential (bigger for sequential,
smaller for
random).
To follow up on this (even if it is a little late), how is this affected by LVM use? I'm curious to know how (or if) this math changes with ext3 sitting on LVM on the raid array.
Depends is the best answer. It really depends on LVM and the other block layer devices. As the io requests descend down the different layers they will enter multiple request_queues, each request_queue will have and io scheduler assigned to it, either the system default or one of the others, or one of the block devices own, so it is hard to say. Only by testing can you know for sure. In my tests LVM is very good with unnoticeable overhead going to hardware RAID, but if you use MD RAID then your experience might be different.
Ext3 | VFS | Page Cache | LVM request_queue (io scheduler) | LVM | MD request_queue (io scheduler) | MD | ----------------- | | | | | Que Que Que Que Que (io scheduler) | | | | | Sda sdb sdc sdd sde
Hope this helps clarify.
______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.