Quoting "Ross S. W. Walker" rwalker@medallion.com:
The biggest performance gain you can achieve on a raid
array is to make
sure you format the volume aligned to your raid stripe
size. For example
if you have a 4 drive raid 5 and it is using 64K chunks,
your stripe
size will be 256K. Given a 4K filesystem block size you
would then have
a stride of 64 (256/4), so when you format your volume:
Mke2fs -E stride=64 (other needed options -j for ext3, -N
<# of inodes>
for extended # of i-nodes, -O dir_index speeds up directory
searches for
large # of files) /dev/XXXX
Shouldn't the argument for stride option be how many file system blocks there is per stripe? After all, there's no way for OS to guess what RAID level you are using. For 4 disk RAID5 with 64k
chunks and
4k file system blocks you have only 48 file system blocks
per stripe
((4-1)x64k/4k=48). So it should be -E stride=48 in this
particular
case. If it was 4 disk RAID0 array, than it would be 64 (4x64k/4k=64). If it was 4 disk RAID10 array, than it would be 32 ((4/2)*64k/4k=32). Or at least that's the way I understood it by reading the man page.
You are correct, leave one of the chunks off for the parity, so for 4 disk raid5 stride=48. I had just computed all 4 chunks as part of the stride.
BTW that parity chunk still needs to be in memory to avoid the read on it, no? In that case wouldn't a stride of 64 help in that case? And if the stride leaves out the parity chunk then will not successive read-aheads cause a continuous wrap of the stripe which will negate the effect of the stride by not having the complete stripe cached?
For read-ahead, you would set this through blockdev --setra X /dev/YY, and use a multiple of the # of sectors in a stripe, so for a 256K stripe, set the read-ahead to 512, 1024, 2048, depending if the io is mostly random or mostly sequential (bigger for sequential, smaller for random).
To follow up on this (even if it is a little late), how is this affected by LVM use? I'm curious to know how (or if) this math changes with ext3 sitting on LVM on the raid array.