[CentOS] Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)

Wed Jun 3 00:52:53 UTC 2009
nate <centos at linuxpowered.net>

Chan Chung Hang Christopher wrote:
> Complete bollocks. The bottleneck is not the drives themselves as
> whether it is SATA/PATA disk drive performance has not changed much
> which is why 15k RPM disks are still king. The bottleneck is the bus be
> it PCI-X or PCIe 16x/8x/4x or at least the latencies involved due to bus
> traffic.

In most cases the bottleneck is the drives themselves, there is
only so many I/O requests per second a drive can handle. Most workloads
are random, rather than sequential, so the amount of data you can
pull from a particular drive can be very low depending on what
your workload is.

Taking a random drive from my storage array(which evenly distributes
I/O across every spindle in the system), a 7200RPM SATA-II
disk, over the past month has averaged:

Read IOPS: 24
Write IOPS: 10
Read KBytes/second: 861
Write KBytes/second: 468
Read I/O size: 37 kB
Write I/O size: 50 kB
Read Service time: 23 milliseconds
Write Service time: 47 milliseconds

Averaging the I/O size out to 43.5kB, that means this disk can
sustain roughly 3,915 kilobytes per second(assuming 90 IOPS for
a 7200RPM SATA disk), though the service times would likely be
unacceptably high for any sort of real time application. Lower
the I/O size and you can get better response times, though you'll
get less data through the drive at the same time. On my
previously lower end storage array that I had at my last company
a 47 millisecond sustained write service time would of meant
outage in the databases, this newer higher end array is much
better at optimizing I/O than the lower end box was.

With 40 drives in a drive enclosure connected currently via
2x4Gbps (active/active) fiber channel point to point link,
that means the shelf of drives can run up to roughly
150MB/second out of the 1024MB/second available to it on the
link. System is upgradable to 4x4Gbps (active/active)
point to point fiber channel links per drive enclosure, I
can use SATA, 10k FC, or 15k FC in the drive cages, though
I determined that SATA would be more than enough for our
needs. The array controllers have a tested limit of about
1.6 gigabytes/second of throughput to the disks(and
corresponding throughput to the hosts), or 160,000 I/O requests
per second to the disks with 4 controllers(4 high performance
ASICs for data movement and 16 Xeon CPU cores for everything
else).

Fortunately the large caches(12GB per controller, mirrored with
another controller) on the array buffer the higher response
times on the disks resulting in host response times of
around 20 milliseconds for reads, and 0-5 milliseconds for
writes, which by most measures is excellent.

nate