[CentOS] Best mkfs.ext2 performance options on RAID5 in CentOS 4.2

Tue Nov 1 15:05:17 UTC 2005
Aleksandar Milivojevic <alex at milivojevic.org>

Quoting Sean Staats <sstaats at questia.com>:

> I can't seem to get the read and write performance better than
> approximately 40MB/s on an ext2 file system.  IMO, this is horrible
> performance for a 6-drive, hardware RAID 5 array.  Please have a look at
> what I'm doing and let me know if anybody has any suggestions on how to
> improve the performance...
> Output of using bonnie++:
[snip]
> ---------------------------
> $ /usr/local/bonnie++/sbin/bonnie++ -d /d01/test -r 6144 -m
> anchor_ext2_4k_64s
> Version  1.03       ------Sequential Output------ --Sequential Input-
> --Random-
>                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> anchor_ext2_4k_ 12G 41654  96 41937  11 30537   8 40676  88 233015  27
> 426.6   1

Correct me if I'm wrong, but you got 233MB/s for reads (the block read 
test). Assuming your disks can do 50MB/s sustained transfer rate each, 
you are preatty
darn close to the theoretical maximum of (6 - 1) * 50MB/s = 250MB/s for 6 disk
RAID5.

RAID5 as such is bad choice for file systems that will have more than 
about 30%
of writes (out of total I/O).  If most of the I/O will be writes, and you care
about performance, you should use RAID-10.  Remember, writes to Dumb,
not-optimized RAID5 implementation is slower than writing to a single 
disk. This is generic RAID wisdom, nothing to do with any particular 
implementation. In the worst case scenario, the write operation on 
6-disk RAID5 volume involves
reading a data block from 5 drives, calculating XOR, and writing back 
one block
of data and one block of checksum.  Whichever way you do it, it ain't gonna be
fast.

For large sequential writes, RAID5 implementations can do a lot of 
optimizations
(reducing the number of reads for each write operation).  But they 
still need to
generate and write that additional XOR checksum, so it is going to be slower
than reading from that same volume.

The random writes to RAID5 volumes are always going to be terribly slow since
RAID5 implementation can't optimize them very well.  If they are limited to
small areas of data, large battery backedup on-controller cache might help
(since the blocks needed to re-calculate XOR checksum might already be in the
cache, and the actuall writes can be delayed in hope there'll be enough 
data to
write in the future to reduce number of needed reads).  If they are spread all
over 1TB volume, you are screwed, no (reasonable) amount of cache is going to
save ya.

Back to reads, you got 40MB/s for per-chr reads and 233MB/s for block 
reads. The difference between this two cases is not 3ware related, at 
least not in
your case it seems.  The per-chr test is reading one byte at a time, and it is
influenced by three factors: how well the C library is optimized (and how good
it is in buffering), the CPU speed and the disk speed.  If you look at CPU
column, you'll see that your CPU was 88% busy during this test (probably most
time spent in a bonnie's loop that executes 12 billion getc() and the C 
library
itself).  So no matter how fast your drives are, you'd max out in per-chr read
test at maybe 45-50MB/s with the CPU you have in the box.  Setting larger read
ahead (as Joe suggested) might help to squeeze couple of MB/s more in 
benchmark
tests, but probably not really worth it in real world applications.


----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.