From: Pasi Pirhonen
This is as good insertion point as anything else, for what i am going
to say.
3ware is working like a champ, but slowly. The tuning won't make it
magically go over 100MB/s sustained writes.
I am bumping up against 100MBps on my bonnie write benchmarks on an older 6410. Now that's in RAID-0+1, and not RAID-5.
Random I/O sucks (what i've
seen) for any SATA-setup.
It depends. "Raw" ATA sucks for multiple operations because it has no I/O queuing. AHCI is trying to address that, but it's still unintelligent. 3Ware queues reads/writes very well, and sequences them as best as it can. But it's still not perfect.
Even for /dev/mdX.
Now with MD, you're starting to taxi your interconnect on writes. E.g., with Microcontroller or ASIC RAID, you only push the data you write. With software (including "FRAID"), you push 2x for RAID-1. That's 2x through your memory, over the system interconnect into the I/O and out the PCI bus.
When you talk RAID-3/4/5 writes, you slaughter the interconnect. The bottleneck isn't the CPU. It's the fact that for each stripe, you've gotta load from memory through the CPU back to memory - all over the system interconnect, before even looking at I/O. For 4+ modern ATA disks, your talking a roundtrip that costs you an aggregate percentage of your system interconnect time beyond 30%+.
On a dynamic web or other CPU computational intensive server, it matters little. The XOR operations actually use very little CPU power. And the web or computational streams aren't saturating the interconnect. But when you are doing file server I/O, and the system interconnect is used for raw bursts of network I/O as much as storage, it kills.
Puny oldish A1000 can beat
those with almoust factor of ten for random I/O, but being limited to
max. 40MB/s transfers by it's interface (UW/HVD).
Or more like the i960 because, after all, RAID should stripe some operations across multiple channels.
But what i am going to say is that for my centos devel work (as in my NFS-server), i just recently moved my 1.6TB raid under /dev/md5 with HighPoint RocketRaid1820. I don't care that NOT being hardware RAID. The /dev/mdX beats the 3ware 9500S-8 formerly used hands down when you do have 'spare CPU cycles to let kernel handle the parity operations'.
It has nothing to do with CPU cycles but interconnect. XOR puts no strain on modern CPUs, it's the added data streams being feed from memory to CPU. Furthermore, using async I/O, MD can actually be _faster_ than hardware RAID. Volume management in an OS will typically do much better than a hardware RAID card when it comes to block writes.
Of course the 9500S is still maturing. Which is why I still prefer to use 4 and 8 channel 7506/8506 cards with RAID-0+1. Even the AccelATA and 5000 left much to be desired before the 6000 and latter 7000/8000 series.
Hi,
On Fri, Apr 15, 2005 at 10:44:32PM -0700, Bryan J. Smith wrote:
It depends. "Raw" ATA sucks for multiple operations because it has no I/O queuing. AHCI is trying to address that, but it's still unintelligent. 3Ware queues reads/writes very well, and sequences them as best as it can. But it's still not perfect.
Yeah. I knew that. left it out quite in purpose. The marter command queueing is trhe reson why A1000 beats hands down these much faster sequentiqal I/O beasts. There is place for those A1000 boxen on my netwrok too, but it's not my NFS-server which mostly does handle files over 1MB+.
Even for /dev/mdX.
Now with MD, you're starting to taxi your interconnect on writes. E.g., with Microcontroller or ASIC RAID, you only push the data you write. With software (including "FRAID"), you push 2x for RAID-1. That's 2x through your memory, over the system interconnect into the I/O and out the PCI bus.
When you talk RAID-3/4/5 writes, you slaughter the interconnect. The bottleneck isn't the CPU. It's the fact that for each stripe, you've gotta load from memory through the CPU back to memory - all over the system interconnect, before even looking at I/O. For 4+ modern ATA disks, your talking a roundtrip that costs you an aggregate percentage of your system interconnect time beyond 30%+.
On a dynamic web or other CPU computational intensive server, it matters little. The XOR operations actually use very little CPU power. And the web or computational streams aren't saturating the interconnect. But when you are doing file server I/O, and the system interconnect is used for raw bursts of network I/O as much as storage, it kills.
That is true too. I don't mind taxing my PCI. I do have dual-opterons doing the crunching and dual PCI-X too. The machine is constantly being over load avg. 4+ due to the fact that it's running several hercules emulator instances on +15 niceness. That doesn't affect to the fact that it's still able to sustain some 50+MB/s over Gbit LAN in and out.
I know very well about taxing the PCI-bus. I have all this hardware having some 2Ghz+ Athlon64, dual-channel DDR memory and only puny little 32bit/33Mhz PCI which isn't getting you nowhere. I did actually try a Athlon64 2800+ and RocketRAID 1820A w/ 8x200GB SATA -> some 50MB/s where the PCI was _saturated_.
You have very valid point there tho. The I/O saturation counts very much when you build servers. Most people doesn't actually even realize this at all. I am sure too that you 'know your shit much better than i do'.
The point just being that 3ware is _SLOW_ compared to almoust anything these days. I do have two 9500S-8 here too.
Puny oldish A1000 can beat
those with almoust factor of ten for random I/O, but being limited to
max. 40MB/s transfers by it's interface (UW/HVD).
Or more like the i960 because, after all, RAID should stripe some operations across multiple channels.
A1000 is actually poowered by P100. I don't remember seeing i960 in it, but there definetly is some ASIC on board.
It's just so much faster for any random I/O operatin than any IDE/SATA setup i've been testing so far.
It has nothing to do with CPU cycles but interconnect. XOR puts no strain on modern CPUs, it's the added data streams being feed from memory to CPU. Furthermore, using async I/O, MD can actually be _faster_ than hardware RAID. Volume management in an OS will typically do much better than a hardware RAID card when it comes to block writes.
Actually it does matter for CPU-cycles too. The initialization for speed of 60MB/s (ie. the MD-driver doing the parity calclation for speed of hundreds of MB/s) it's eating one 1.4Ghz Opteron quite totally.
It's also true that HT is making it all fly. Taking some PCI-X enable P4/Xeon@2.6Ghz just can't get even near the speeds of dual-opteron.
it's also true that the kernel itself knows best what is the queueing policy and how the data should be treated.
Of course the 9500S is still maturing. Which is why I still prefer to use 4 and 8 channel 7506/8506 cards with RAID-0+1. Even the AccelATA and 5000 left much to be desired before the 6000 and latter 7000/8000 series.
Once again. Maturing won't make it's parity engine go over 100MB/s. It's quite dead end AFAIK on that area. Then again, 100MB/s for someone might be enought, but for my testing/flying by feeling, one needs prox. 2x I/O bandwith locally to be serving 1x for NFS - or even near. Same seems to be fact for iSCSI too then tho.
As a conclusion i was only trying to make apoint that software solution might be pretty good for someone (for me it's at least now). The 3ware was good for me on dual-PIII which isn't able to go even near the speeds of it with software solution. With dual-opteron the situation is quite different. The 3ware still saturates on it's limits, but the software goes much faster on the capable box.
After reviewing my needs, my budget, and the fact that the budget has to account for a mailserver, a DNS server as well as IDS and a heap of other hardware, I've decided that cost and data redundancy are the two big criteria I need to look at. I'll settle for just mirroring as long as it's hardware mirroring. After reading all of the info provided on this list, I decided that setting up a decent hardware raid based on SATA is not something I should jump into without taking it slow and I need to do this soon.
I also looked on Ebay Australia and found a ton of 2 channel SATA winraids, all of which are useless for my purposes.
Then I found one of these: http://www.lsilogic.com/products/megaraid/megaraid_i4.html
It's description: Low-cost, hardware-based data protection for cost-sensitive server and workstation environments
And it lists RHEL 2 & 3 in the supported software. (and thereby probably 4 as well right?)
If I can just get by with mirroring till the hardware budget is in better condition, I can just upgrade it to a 3ware and SATA later on.
Ironically I could not see much in the way of decent SATA hardware raid controllers on ebay australia, (no 3ware at all actually) but there were tons of scsi 360 raid cards and drives going cheap. how funny is that.
does anyone have any comments on the megaraid i4 ????
rgds
Franki
Franki wrote:
It's description: Low-cost, hardware-based data protection for cost-sensitive server and workstation environments
And it lists RHEL 2 & 3 in the supported software. (and thereby probably 4 as well right?)
That is a fair assumption.
There is no need to download drivers from LSI Logic's web site.
According to their support website it uses the same MegaRAID driver as their SATA and SCSI RAID soltuions.
With RHEL2.1/CentOS 2.1 it'll use megaraid.
With RHEL3/CentOS 3 it'll use either megaraid or megaraid2 (the former will be the default)
RHEL4/CentOS 4 should work but you may have an issue with PCI IDs - the new driver (megaraid_mbox) will work with the older cards such as this, but lacks the PCI IDs to properly recognize them.
If I can just get by with mirroring till the hardware budget is in better condition, I can just upgrade it to a 3ware and SATA later on.
Ironically I could not see much in the way of decent SATA hardware raid controllers on ebay australia, (no 3ware at all actually) but there were tons of scsi 360 raid cards and drives going cheap. how funny is that.
does anyone have any comments on the megaraid i4 ????
Dell used to use them as their CERC cards before they switched to SATA.
Phil Brutsche wrote:
Franki wrote:
It's description: Low-cost, hardware-based data protection for cost-sensitive server and workstation environments
And it lists RHEL 2 & 3 in the supported software. (and thereby probably 4 as well right?)
That is a fair assumption.
There is no need to download drivers from LSI Logic's web site.
According to their support website it uses the same MegaRAID driver as their SATA and SCSI RAID soltuions.
With RHEL2.1/CentOS 2.1 it'll use megaraid.
With RHEL3/CentOS 3 it'll use either megaraid or megaraid2 (the former will be the default)
RHEL4/CentOS 4 should work but you may have an issue with PCI IDs - the new driver (megaraid_mbox) will work with the older cards such as this, but lacks the PCI IDs to properly recognize them.
Hi again,
Phil, according to this:
http://www.ibiblio.org/peanut/Kernel-2.6.10/scsi/ChangeLog.megaraid (note the kernel version)
The PCI id's for the i4 have been added to megaraid
Release Date : Fri Jul 23 15:22:07 EDT 2004 - Atul Mukker atulm@lsil.com Current Version : 2.20.2.0 (scsi module), 2.20.1.0 (cmm module) Older Version : 2.20.1.0 (scsi module), 2.20.0.0 (cmm module)
ii. Add PCI ids for I4
Considering the date, I imagine that this change was old enough that it would have been included in RHEL4.
So I've got a tentative green light for support. I'll be making a bid, but it looks like this auction is being populated by newbie's as the price seems to be heading up rather quickly, especially considering what the card actually is.
The PCI and device ID's: 1000 0522 MegaRAID 522 i4 133 RAID Controller
rgds
Franki
Franki wrote:
Hi again,
Phil, according to this:
http://www.ibiblio.org/peanut/Kernel-2.6.10/scsi/ChangeLog.megaraid (note the kernel version)
The PCI id's for the i4 have been added to megaraid
<checks again>
Ah, so it has.
The comments in the .c files in 2.6.11 are out of date then - the MegaRAID I4 isn't in the list of compatible controllers in megaraid_mbox.c.
ii. Add PCI ids for I4
Considering the date, I imagine that this change was old enough that it would have been included in RHEL4.
Again, a fair assumption.
I would still download the kernel SRPM for RHEL 4 to double check.
Phil Brutsche wrote:
Franki wrote:
Hi again,
Phil, according to this:
http://www.ibiblio.org/peanut/Kernel-2.6.10/scsi/ChangeLog.megaraid (note the kernel version)
The PCI id's for the i4 have been added to megaraid
<checks again>
Ah, so it has.
The comments in the .c files in 2.6.11 are out of date then - the MegaRAID I4 isn't in the list of compatible controllers in megaraid_mbox.c.
ii. Add PCI ids for I4
Considering the date, I imagine that this change was old enough that it would have been included in RHEL4.
Again, a fair assumption.
I would still download the kernel SRPM for RHEL 4 to double check.
Noted, but even if it isn't, I can probably roll me own with support if need be. (I stopped rolling my own kernels years ago where possible just because most of the time there isn't any need).
thanks again
rgds
Franki
Just to throw in my 2 cents worth...
I've used both the 3ware 850x SATA raid card and currently have an Adaptec 2410 SATA raid card. The machines are database servers with fairly large databases (75MM records+)
I'm not as deeply technical as many of the others who have responded to this issue, but having done no tuning of any sort, both cards perform about the same. In addition, neither of them has failed or been a problem.
Like I said, just my 2 cetns worth.
Michael
On Sunday 17 April 2005 15:51, Franki wrote:
Phil Brutsche wrote:
Franki wrote:
Hi again,
Phil, according to this:
http://www.ibiblio.org/peanut/Kernel-2.6.10/scsi/ChangeLog.megaraid (note the kernel version)
The PCI id's for the i4 have been added to megaraid
<checks again>
Ah, so it has.
The comments in the .c files in 2.6.11 are out of date then - the MegaRAID I4 isn't in the list of compatible controllers in megaraid_mbox.c.
ii. Add PCI ids for I4
Considering the date, I imagine that this change was old enough that it would have been included in RHEL4.
Again, a fair assumption.
I would still download the kernel SRPM for RHEL 4 to double check.
Noted, but even if it isn't, I can probably roll me own with support if need be. (I stopped rolling my own kernels years ago where possible just because most of the time there isn't any need).
thanks again
rgds
Franki _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Sat, 2005-04-16 at 07:46, Franki wrote:
After reviewing my needs, my budget, and the fact that the budget has to account for a mailserver, a DNS server as well as IDS and a heap of other hardware, I've decided that cost and data redundancy are the two big criteria I need to look at. I'll settle for just mirroring as long as it's hardware mirroring.
Note that software mirroring has some advantages too and is not a big performance hit as long as the underlying hardware uses DMA and works independently (i.e. don't do it with drives on the same IDE controller cable). I consider it a big plus to be able to take a drive from a mirrored pair, plug it into just about any computer without worrying about having exactly the same brand of controller, and recover the data - or take over the service the broken computer was providing.
And by the way, there's nothing wrong with running a DNS server and mail server on the same machine - but you should have a 2nd dns server which can also provide other services. You might need a bit more RAM but DNS typically is not a big cpu load.