Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)

List overview All Threads
Download

newer

older

CentOS-announce Digest, Vol 52,...

Hi ALL!!!

Michael A. Peters

2 Jun 2009 2 Jun '09

2:52 a.m.

-=- starting as new thread as it is off topic from controller thread -=-

Ross Walker wrote:

...

The real key is the controller though. Get one that can do hardware RAID1/10, 5/50, 6/60, if it can do both SATA and SAS even better and get a battery backed write-back cache, the bigger the better, 256MB good, 512MB better, 1GB best.

I've read a lot of different reports that suggest at this point in time, kernel software raid is in most cases better than controller raid.

The basic argument seems to be that CPU's are fast enough now that the limitation on throughput is the drive itself, and that SATA resolved the bottleneck that PATA caused with kernel raid. The arguments then go on to give numerous examples where a failing hardware raid controller CAUSED data loss, where a raid card died and an identical raid card had to be scrounged from eBay to even read the data on the drives, etc. - problems that apparently don't happen with kernel software raid.

The main exception I've seen to using software raid are high availability setups where a separate external unit ($$$) provides the same hard disk to multiple servers. Then the raid can't really be in the kernel but has to be in the hardware.

I'd be very interested in hearing opinions on this subject.

Show replies by date

nate

2 Jun 2 Jun

3:33 a.m.

Michael A. Peters wrote:

...

I'd be very interested in hearing opinions on this subject.

I mainly like hardware raid (good hardware raid not hybrid software/hardware raid) because of the simplicity, the system can easily boot from it, in many cases drives are hot swappable and you don't have to touch the software/driver you just yank the disk and put in a new one.

In the roughly 600 server class systems I've been exposed to over the years I have seen only one or two bad RAID cards, one of them I specifically remember was caught being bad during a burn-in test so it never went live, I think the other went bad after several years of service. While problems certainly can happen, the raid card seems to not be an issue provided your using a good one. I recall the one being "DOA" was a 3Ware 8006-2 and the other one was an HP, I believe a DL360G1.

The most crazy thing I've experienced on a RAID array was on some cheap shit LSI logic storage systems where a single disk failure somehow crippled it's storage controllers(both of them) knocking the entire array offline for an extended period of time. I think the drive spat out a bunch of errors on the fiber bus causing the controllers to flip out. The system eventually recovered on it's own. I have been told similar stories about other LSI logic systems(several big companies OEM them), though I'm sure the problem isn't limited to them, it's an architectural problem rather than an implementation issue.

The only time in my experience where we actually lost data (that I'm aware of) due to a storage/RAID/controller issue was back in 2004 with an EMC CLARiiON CX600, where a misconfiguration by the storage admin caused a catastrophic failure of the backup controller when the primary controller crashed. We spent a good 60 hours of downtime the following week rebuilding corrupt portions of the database as we came across them. More than a year later we still occasionally found corruption from that incident. Fortunately the data on the volumes that suffered corruption was quite old and rarely accessed. Ideally the array should of made the configuration error obvious or better yet prevented the error from occurring in the first place. Those old style enterprise arrays were too overly complicated(and yes that CX600 ran embedded Windows NT as it's OS!)

For servers, I like 3Ware for SATA and HP for SAS. Though these days the only things that sit on internal storage is the operating system. All important data is on enterprise grade storage systems, which for me means 3PAR(not to be confused with 3Ware), which get upwards of double the usable capacity vs any other system in the world while still being dead easy to use and the fastest arrays in the world(priced pretty good too), and the drives have point to point switched connections, they don't sit on a shared bus. Our array can recover from a failed 750GB SATA drive in (worst case) roughly 3.5 hours with no performance impact to the system. Our previous array would take more than 24 hours to rebuild a 400GB SATA drive, with a major performance hit to the array. I could go on all day why their arrays are so great!

My current company has mostly dell servers, and so far I don't have many good things to say about their controllers or drives(drives themselves are "OK" though Dell doesn't do a good enough job on QA with them, we had to manually flash dozens of drive firmwares because of performance problems, and the only way to flash the disk firmware is to boot to DOS, unlike flashing the BIOS or controller firmware). I believe the Dell SAS/SATA controllers are LSI logic. I have seen several kernel panics that seem to point to the storage array on the Dell systems.

HP is coming out with their G6 servers tomorrow, the new SmartArray controllers sound pretty nice, though I have had a couple incidents with older HP arrays where a failing drive caused massive performance problems on the array, and we weren't able to force fail the drive from remote we had to send someone on site to yank it out. No data loss though. Funny that the controller detected the drive was failing, but didn't give us the ability to take it off line. Support said it was fixed in a newer version of firmware, which of course required downtime to install.

nate

Chris Boyd

3:36 p.m.

New subject: Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)

On Jun 1, 2009, at 9:52 PM, Michael A. Peters wrote:

...

I've read a lot of different reports that suggest at this point in time, kernel software raid is in most cases better than controller raid.

I manage systems with both.

I like hardware RAID controllers. Yes, they do cost money up front, but when you have a failure you can get a replacement drive, give it to a low level tech, and say "Go to server A41, pull the drive with the solid red light and plug this one in." Then the controller will take over, format the drive and put it back into service.

With software RAID, you have to have a sysadmin log in to the box and do rootly things that require careful thought :-)

When these events are happening in the wee hours and there are other possible human factors like fatigue or stress, the first scenario is less risky and costly in the long run.

--Chris

Ross Walker

3:38 p.m.

New subject: Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)

On Mon, Jun 1, 2009 at 10:52 PM, Michael A. Peters mpeters@mac.com wrote:

...

-=- starting as new thread as it is off topic from controller thread -=-

Ross Walker wrote:

> > The real key is the controller though. Get one that can do hardware > RAID1/10, 5/50, 6/60, if it can do both SATA and SAS even better and > get a battery backed write-back cache, the bigger the better, 256MB > good, 512MB better, 1GB best.

I've read a lot of different reports that suggest at this point in time, kernel software raid is in most cases better than controller raid.

The basic argument seems to be that CPU's are fast enough now that the limitation on throughput is the drive itself, and that SATA resolved the bottleneck that PATA caused with kernel raid. The arguments then go on to give numerous examples where a failing hardware raid controller CAUSED data loss, where a raid card died and an identical raid card had to be scrounged from eBay to even read the data on the drives, etc. - problems that apparently don't happen with kernel software raid.

The main exception I've seen to using software raid are high availability setups where a separate external unit ($$$) provides the same hard disk to multiple servers. Then the raid can't really be in the kernel but has to be in the hardware.

I'd be very interested in hearing opinions on this subject.

The real reason I use hardware RAID is the write-back cache. Nothing beats it for shear write performance.

Hell I don't even use the on-board RAID. I just export the drives as individual RAID0 disks, readable with a straight SAS controller if need be, and use ZFS for RAID. ZFS only has to resilver the existing data and not the whole drive on a drive failure which reduces the double failure window significantly and the added parity checking on each block gives me piece of mind that the data is uncorrupted. The 512MB of write back cache makes the ZFS logging fly without having to buy in to expensive SSD drives.

I might explore using straight SAS controllers and MPIO with SSD drives for logging in the future once ZFS gets a way to disassociate a logging device from a storage pool after it's been associated in case the SSD device fails.

But now things are way off topic.

-Ross

Gordon Messmer

4:59 p.m.

On 06/01/2009 07:52 PM, Michael A. Peters wrote:

...

I've read a lot of different reports that suggest at this point in time, kernel software raid is in most cases better than controller raid.

There are certainly a lot of people who feel that way. It depends on what your priorities are. Hardware RAID has the advantage of offloading some calculations from the CPU, and has a write cache which can decrease memory use. However, both of those are relatively expensive, and there's no clear evidence that your money is better put into the RAID card than into faster CPU and more memory. Another important consideration is that hardware RAID will (must!) have a battery backup so that any scheduled writes can be completed later in the case of power loss. If you decide to use software RAID, I would strongly advise you to use a UPS, and to make sure your system monitors it and shuts down in the event of power loss.

Ross Walker

9:01 p.m.

New subject: Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)

On Tue, Jun 2, 2009 at 12:59 PM, Gordon Messmer yinyang@eburg.com wrote:

...

On 06/01/2009 07:52 PM, Michael A. Peters wrote:

...
I've read a lot of different reports that suggest at this point in time, kernel software raid is in most cases better than controller raid.

There are certainly a lot of people who feel that way. It depends on what your priorities are. Hardware RAID has the advantage of offloading some calculations from the CPU, and has a write cache which can decrease memory use. However, both of those are relatively expensive, and there's no clear evidence that your money is better put into the RAID card than into faster CPU and more memory. Another important consideration is that hardware RAID will (must!) have a battery backup so that any scheduled writes can be completed later in the case of power loss. If you decide to use software RAID, I would strongly advise you to use a UPS, and to make sure your system monitors it and shuts down in the event of power loss.

I'd advise anybody who manages server equipment always UPS it. It's not just power losses that can ruin your day, a power spike can take out a power supply just as easily and a UPS conditions the power so the output level is constant.

-Ross

Michael A. Peters

3 Jun 3 Jun

8:39 a.m.

Gordon Messmer wrote:

...

On 06/01/2009 07:52 PM, Michael A. Peters wrote:

...
I've read a lot of different reports that suggest at this point in time, kernel software raid is in most cases better than controller raid.

There are certainly a lot of people who feel that way. It depends on what your priorities are. Hardware RAID has the advantage of offloading some calculations from the CPU, and has a write cache which can decrease memory use. However, both of those are relatively expensive, and there's no clear evidence that your money is better put into the RAID card than into faster CPU and more memory. Another important consideration is that hardware RAID will (must!) have a battery backup so that any scheduled writes can be completed later in the case of power loss. If you decide to use software RAID, I would strongly advise you to use a UPS, and to make sure your system monitors it and shuts down in the event of power loss.

Yes - my home systems are all on UPS with automated shutdown after 5 minutes of no power. Display is on there too, so that in event of power outage while I'm using, I can save all my work.

I guess from the discussion that hardware raid is definitely still the way to go for servers, where the guy at the colo can simply swap out a dead drive if need be w/o any serious downtime etc.

What I'm personally interested in doing is building an amanda server for my home network, backing up /home and /etc from my 3 other computers, but virtual tapes (disk images) instead of real tapes, once blueray media becomes cheap enough to burn the virtual tapes as a secondary backup, but I primarily want the virtual tapes stored on a redundant raid so that recovering will be easier (no need to go from blue ray unless the raid failed)

I'm guessing for that software raid is probably good enough because the unit will be idle most of the time and cpu cycles won't be a needed commodity. In fact, I may even want something that sleeps and wakes on lan activity so that it doesn't waste as much power when it's just sitting there.

Les Mikesell

12:52 p.m.

Michael A. Peters wrote:

...

I guess from the discussion that hardware raid is definitely still the way to go for servers, where the guy at the colo can simply swap out a dead drive if need be w/o any serious downtime etc.

On the flip side, you generally have to install some vendor-specific tool to monitor the status of the drives in a hardware raid if you aren't in a position to look at the lights on the front where a simple 'cat /proc/mdstat' will show software raid status. And it's not all that hard to ssh in an run an mdadm command after the 'hands on' colo guy swaps the drive in for you. You do need swappable drive carriers, and SCSI controllers usually need another command to re-probe a device.

...

What I'm personally interested in doing is building an amanda server for my home network, backing up /home and /etc from my 3 other computers, but virtual tapes (disk images) instead of real tapes, once blueray media becomes cheap enough to burn the virtual tapes as a secondary backup, but I primarily want the virtual tapes stored on a redundant raid so that recovering will be easier (no need to go from blue ray unless the raid failed)

I'd recommend looking at backuppc instead of amanda if you mostly want on-line storage. Its storage scheme will hold a much longer history in the same amount of space and it has a handy web interface for browsing and restores. It can generate a tar-type archive output for tape/dvd, etc., but it is really designed around the on-line storage. And yes, you do want it on some kind of raid. You might even want to plan to periodically break the raid and swap a member drive offsite since the storage format packs so many files with hard links that it is difficult to copy the whole thing in other ways.

-- Les Mikesell lesmikesell@gmail.com

Michael A. Peters

2:39 p.m.

Les Mikesell wrote:

...

I'd recommend looking at backuppc instead of amanda if you mostly want on-line storage. Its storage scheme will hold a much longer history in the same amount of space and it has a handy web interface for browsing and restores.

I'd rather have something that has a client side daemon that just does it w/o users needing to initiate it.

I'm not worried about longer history, anything I do I need history on I already do with svn.

Les Mikesell

3:28 p.m.

Michael A. Peters wrote:

...

...
I'd recommend looking at backuppc instead of amanda if you mostly want on-line storage. Its storage scheme will hold a much longer history in the same amount of space and it has a handy web interface for browsing and restores.

I'd rather have something that has a client side daemon that just does it w/o users needing to initiate it.

Backuppc is as fully-automatic as it gets - and it doesn't need a client side daemon. It has options to use smb (for windows shares), tar over ssh, rsync over ssh, or rsync in standalone daemon mode (solves some windows problems) to collect the data. Rsync over ssh is usually the best approach where possible since that detects new files with old timestamps, deletions, and old files under renamed directories in incremental runs.

...

I'm not worried about longer history, anything I do I need history on I already do with svn.

It is tunable. But it compresses files and pools all instances of files with identical content with hardlinks (whether from previous runs on the same target or from different machines) so only new/changed files take up more space over time.

But the big difference vs. amanda (which is also pretty self-sufficient once installed) is when you want to restore a file. With backuppc you can use the web interface to find the version you want and download it directly through the browser (or a zip/tar of several files/directories) or specify where you want it restored. There are command line tools also, of course, but you'd probably only use them to generate a complete tar image to rebuild a machine.

-- Les Mikesell lesmikesell@gmail.com

Chan Chung Hang Christopher

2 Jun 2 Jun

10:15 p.m.

...

I've read a lot of different reports that suggest at this point in time, kernel software raid is in most cases better than controller raid.

Let me define 'most cases' for you. Linux software raid can perform better or the same if you are using raid0/raid1/raid1+0 arrays. If you are using raid5/6 arrays, the most disks are involved, the better hardware raid (those with sufficient processing power and cache - a long time ago software raid 5 beat the pants of hardware raid cards based on Intel i960 chips) will perform.

I have already posted on this and there are links to performance tests on this very subject. Let me look for the post.

...

The basic argument seems to be that CPU's are fast enough now that the limitation on throughput is the drive itself, and that SATA resolved the bottleneck that PATA caused with kernel raid. The arguments then go on

Complete bollocks. The bottleneck is not the drives themselves as whether it is SATA/PATA disk drive performance has not changed much which is why 15k RPM disks are still king. The bottleneck is the bus be it PCI-X or PCIe 16x/8x/4x or at least the latencies involved due to bus traffic.

...

to give numerous examples where a failing hardware raid controller CAUSED data loss, where a raid card died and an identical raid card had to be scrounged from eBay to even read the data on the drives, etc. - problems that apparently don't happen with kernel software raid.

Buy extra cards. Duh. Easy solution for what can be a very rare problem.

John R Pierce

10:32 p.m.

Chan Chung Hang Christopher wrote:

...

...
I've read a lot of different reports that suggest at this point in time, kernel software raid is in most cases better than controller raid.

Let me define 'most cases' for you. Linux software raid can perform better or the same if you are using raid0/raid1/raid1+0 arrays. If you are using raid5/6 arrays, the most disks are involved, the better hardware raid (those with sufficient processing power and cache - a long time ago software raid 5 beat the pants of hardware raid cards based on Intel i960 chips) will perform.

not if you're doing committed random writes such as a transactional database server... this is where a 'true' hardware raid controller with significant battery backed write cache will blow the doors off your software raid.

Christopher Chan

3 Jun 3 Jun

1:53 a.m.

John R Pierce wrote:

...

Chan Chung Hang Christopher wrote:

...
...
I've read a lot of different reports that suggest at this point in time, kernel software raid is in most cases better than controller raid.

Let me define 'most cases' for you. Linux software raid can perform better or the same if you are using raid0/raid1/raid1+0 arrays. If you are using raid5/6 arrays, the most disks are involved, the better hardware raid (those with sufficient processing power and cache - a long time ago software raid 5 beat the pants of hardware raid cards based on Intel i960 chips) will perform.

not if you're doing committed random writes such as a transactional database server... this is where a 'true' hardware raid controller with significant battery backed write cache will blow the doors off your software raid.

See my reply to nate. If you are using boards with 12GB of cache, software raid is not even on the radar.

Ross Walker

2:03 p.m.

New subject: Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)

On Jun 2, 2009, at 9:53 PM, Christopher Chan <christopher.chan@bradbury.edu.hk

...

wrote:

...

John R Pierce wrote:

...
Chan Chung Hang Christopher wrote:

...
...
I've read a lot of different reports that suggest at this point in time, kernel software raid is in most cases better than controller raid.

Let me define 'most cases' for you. Linux software raid can perform better or the same if you are using raid0/raid1/raid1+0 arrays. If you are using raid5/6 arrays, the most disks are involved, the better hardware raid (those with sufficient processing power and cache - a long time ago software raid 5 beat the pants of hardware raid cards based on Intel i960 chips) will perform.

not if you're doing committed random writes such as a transactional database server... this is where a 'true' hardware raid controller with significant battery backed write cache will blow the doors off your software raid.

See my reply to nate. If you are using boards with 12GB of cache, software raid is not even on the radar.

True, but I feel an important point is being missed here.

In a multi-user environment whether it be file or database the data will NOT be contiguous long after the initial build, so all I/O will go random. That's why defragging will always be important except for say COW based file systems which are random by nature, then it's their transactional log, like databases, that makes 'em or breaks 'em.

In order to avoid a lot of the random I/O file systems use page cache to combine I/O operations and transaction logs to log it sequentially before committing it in the background later but the ability of the disks to handle a large amount of random I/O is also a big factor as if the commits can't execute fast enough the log will not empty enough to make it perform, so then you need an ever bigger log.

-Ross

Chan Chung Hang Christopher

3:25 p.m.

New subject: Harware vs Kernel RAID (was Re: External SATA enclosures: SiI3124 and CentOS 5?)

...

...
See my reply to nate. If you are using boards with 12GB of cache, software raid is not even on the radar.

True, but I feel an important point is being missed here.

In order to avoid a lot of the random I/O file systems use page cache to combine I/O operations and transaction logs to log it sequentially before committing it in the background later but the ability of the disks to handle a large amount of random I/O is also a big factor as if the commits can't execute fast enough the log will not empty enough to make it perform, so then you need an ever bigger log.

/me shrugs. It is not as if a hardware raid card is the only solution. One can try external journaling devices for the filesystem like a Gigabyte I-RAM drive or a UMEM card and still use md devices. Although I would not bother with a software raid setup if I had lots of disk arrays I suspect.

nate

12:52 a.m.

Chan Chung Hang Christopher wrote:

...

Complete bollocks. The bottleneck is not the drives themselves as whether it is SATA/PATA disk drive performance has not changed much which is why 15k RPM disks are still king. The bottleneck is the bus be it PCI-X or PCIe 16x/8x/4x or at least the latencies involved due to bus traffic.

In most cases the bottleneck is the drives themselves, there is only so many I/O requests per second a drive can handle. Most workloads are random, rather than sequential, so the amount of data you can pull from a particular drive can be very low depending on what your workload is.

Taking a random drive from my storage array(which evenly distributes I/O across every spindle in the system), a 7200RPM SATA-II disk, over the past month has averaged:

Read IOPS: 24 Write IOPS: 10 Read KBytes/second: 861 Write KBytes/second: 468 Read I/O size: 37 kB Write I/O size: 50 kB Read Service time: 23 milliseconds Write Service time: 47 milliseconds

Averaging the I/O size out to 43.5kB, that means this disk can sustain roughly 3,915 kilobytes per second(assuming 90 IOPS for a 7200RPM SATA disk), though the service times would likely be unacceptably high for any sort of real time application. Lower the I/O size and you can get better response times, though you'll get less data through the drive at the same time. On my previously lower end storage array that I had at my last company a 47 millisecond sustained write service time would of meant outage in the databases, this newer higher end array is much better at optimizing I/O than the lower end box was.

With 40 drives in a drive enclosure connected currently via 2x4Gbps (active/active) fiber channel point to point link, that means the shelf of drives can run up to roughly 150MB/second out of the 1024MB/second available to it on the link. System is upgradable to 4x4Gbps (active/active) point to point fiber channel links per drive enclosure, I can use SATA, 10k FC, or 15k FC in the drive cages, though I determined that SATA would be more than enough for our needs. The array controllers have a tested limit of about 1.6 gigabytes/second of throughput to the disks(and corresponding throughput to the hosts), or 160,000 I/O requests per second to the disks with 4 controllers(4 high performance ASICs for data movement and 16 Xeon CPU cores for everything else).

Fortunately the large caches(12GB per controller, mirrored with another controller) on the array buffer the higher response times on the disks resulting in host response times of around 20 milliseconds for reads, and 0-5 milliseconds for writes, which by most measures is excellent.

nate

Christopher Chan

1:52 a.m.

nate wrote:

...

Chan Chung Hang Christopher wrote:

...
Complete bollocks. The bottleneck is not the drives themselves as whether it is SATA/PATA disk drive performance has not changed much which is why 15k RPM disks are still king. The bottleneck is the bus be it PCI-X or PCIe 16x/8x/4x or at least the latencies involved due to bus traffic.

In most cases the bottleneck is the drives themselves, there is only so many I/O requests per second a drive can handle. Most workloads are random, rather than sequential, so the amount of data you can pull from a particular drive can be very low depending on what your workload is.

Which is true whether you are running hardware or software raid 0/1/1+0. However, when it comes to software raid, given enough disks, the bottleneck moves from the disk to the bus especially for raid5/6.

...

Fortunately the large caches(12GB per controller, mirrored with another controller) on the array buffer the higher response times on the disks resulting in host response times of around 20 milliseconds for reads, and 0-5 milliseconds for writes, which by most measures is excellent.

Haha, yeah, if you have such large scale setups, nobody would compare software raid.

6032

Age (days ago)

6033

Last active (days ago)

discuss@lists.centos.org

16 comments

9 participants

tags (0)

participants (9)

Chan Chung Hang Christopher
Chris Boyd
Christopher Chan
Gordon Messmer
John R Pierce
Les Mikesell
Michael A. Peters
nate
Ross Walker