[CentOS] Serial ATA hardware raid.

Sat Apr 16 07:03:35 UTC 2005
Pasi Pirhonen <upi at iki.fi>

Hi,


On Fri, Apr 15, 2005 at 10:44:32PM -0700, Bryan J. Smith wrote:
> 
> It depends.
> "Raw" ATA sucks for multiple operations because it has no I/O queuing.
> AHCI is trying to address that, but it's still unintelligent.
> 3Ware queues reads/writes very well, and sequences them as best as it can.
> But it's still not perfect.

Yeah. I knew that. left it out quite in purpose. The marter command
queueing is trhe reson why A1000 beats hands down these much faster
sequentiqal I/O beasts. There is place for those A1000 boxen on my
netwrok too, but it's not my NFS-server which mostly does handle files
over 1MB+.


> 
> > Even for /dev/mdX.
> 
> Now with MD, you're starting to taxi your interconnect on writes.
> E.g., with Microcontroller or ASIC RAID, you only push the data you write.
> With software (including "FRAID"), you push 2x for RAID-1.
> That's 2x through your memory, over the system interconnect into the I/O and out the PCI bus.
> 
> When you talk RAID-3/4/5 writes, you slaughter the interconnect.
> The bottleneck isn't the CPU.
> It's the fact that for each stripe,
> you've gotta load from memory through the CPU back to memory - all over the system interconnect, before even looking at I/O.
> For 4+ modern ATA disks, your talking a roundtrip that costs you an aggregate percentage of your system interconnect time beyond 30%+.
> 
> On a dynamic web or other CPU computational intensive server, it matters little.
> The XOR operations actually use very little CPU power.
> And the web or computational streams aren't saturating the interconnect.
> But when you are doing file server I/O, and the system interconnect is used for raw bursts of network I/O as much as storage, it kills.

That is true too. I don't mind taxing my PCI. I do have dual-opterons
doing the crunching and dual PCI-X too. The machine is constantly being
over load avg. 4+ due to the fact that it's running several hercules
emulator instances on +15 niceness. That doesn't affect to the fact
that it's still able to sustain some 50+MB/s over Gbit LAN in and out.

I know very well about taxing the PCI-bus. I have all this hardware
having some 2Ghz+ Athlon64, dual-channel DDR memory and only puny
little 32bit/33Mhz PCI which isn't getting you nowhere. I did actually
try a Athlon64 2800+ and RocketRAID 1820A w/ 8x200GB SATA -> some
50MB/s where the PCI was _saturated_.

You have very valid point there tho. The I/O saturation counts very
much when you build servers. Most people doesn't actually even realize
this at all. I am sure too that you 'know your shit much better than i
do'. 

The point just being that 3ware is _SLOW_ compared to almoust anything
these days. I do have two 9500S-8 here too.


> 
> > Puny oldish A1000 can beat
> those with almoust factor of ten for random I/O,
> but being limited to
> > max. 40MB/s transfers by it's interface (UW/HVD).
> 
> Or more like the i960 because, after all, RAID should stripe some operations across multiple channels.

A1000 is actually poowered by P100. I don't remember seeing i960 in it,
but there definetly is some ASIC on board.

It's just so much faster for any random I/O operatin than any IDE/SATA
setup i've been testing so far.

> 
> 
> It has nothing to do with CPU cycles but interconnect. XOR puts no
> strain on modern CPUs, it's the added data streams being feed from
> memory to CPU. Furthermore, using async I/O, MD can actually be
> _faster_ than hardware RAID. Volume management in an OS will
> typically do much better than a hardware RAID card when it comes to
> block writes.

Actually it does matter for CPU-cycles too. The initialization for
speed of 60MB/s (ie. the MD-driver doing the parity calclation for
speed of hundreds of MB/s) it's eating one 1.4Ghz Opteron quite
totally.

It's also true that HT is making it all fly. Taking some PCI-X enable
P4/Xeon at 2.6Ghz just can't get even near the speeds of dual-opteron. 

it's also true that the kernel itself knows best what is the queueing
policy and how the data should be treated. 

> 
> Of course the 9500S is still maturing. Which is why I still prefer to
> use 4 and 8 channel 7506/8506 cards with RAID-0+1. Even the AccelATA
> and 5000 left much to be desired before the 6000 and latter 7000/8000
> series.
> 

Once again. Maturing won't make it's parity engine go over 100MB/s.
It's quite dead end AFAIK on that area. Then again, 100MB/s for someone
might be enought, but for my testing/flying by feeling, one needs prox.
2x I/O bandwith locally to be serving 1x for NFS - or even near. Same
seems to be fact for iSCSI too then tho.

As a conclusion i was only trying to make apoint that software solution
might be pretty good for someone (for me it's at least now). The 3ware
was good for me on dual-PIII which isn't able to go even near the
speeds of it with software solution. With dual-opteron the situation is
quite different. The 3ware still saturates on it's limits, but the
software goes much faster on the capable box.



-- 
Pasi Pirhonen - upi at iki.fi - http://iki.fi/upi/