[CentOS] Software RAID complete drives or individual partitions

Wed Mar 6 19:27:25 UTC 2013
Gordon Messmer <yinyang at eburg.com>

On 03/06/2013 08:00 AM, Mark Snyder wrote:
> - Avoid software RAID5 or 6, only use it for RAID1 or 10. Software
> RAID5 performance can be abysmal, because of the parity calculations
> and the fact that each write to the array requires that all drives be
> read and written.

My understanding of Linux mdadm RAID5 is that a write will read the 
block being written and the parity block.  The calculations can be done 
with only those blocks, and the two are written.  That's one extra read 
per write plus parity calculations.

I'm quite certain that I've seem some hardware RAID arrays that will 
read the entire stripe to do a write.

RAID5 will always write more slowly than RAID1 or RAID10, but that can 
sometimes be acceptable if capacity is more important than performance.

> Older hardware raid controllers can be pretty cheap
> on eBay, I'm using an old 3Ware on my home CentOS server.

If there's anything to avoid, it'd be old 3ware hardware.  Those cards 
are often less reliable than the disks they're attached to, and that's 
saying something.


> Avoid
> hostraid adapters, these are just software raid in the controller
> rather than the OS.

All hardware raid is "just software raid in the controller rather than 
the OS".  The advantages of hardware RAID are offloading parity 
calculations to dedicated hardware so that the CPU doesn't need to do 
it, and a battery backed write cache.

The write cache is critical to safely writing a RAID array in the event 
of a power loss, and can greatly improve performance provided that you 
don't write enough data to fill the cache.

The host CPU is very often faster with parity than the dedicated 
hardware, which is why Alan Cox has been quoted as saying that the best 
RAID controllers in the world are made by Intel and AMD.  However, if 
you think you need the couple of percent of CPU cycles that would have 
been used by software RAID, you might prefer the hardware solution.

> If you are using drives over 1TB, consider partitioning the drives
> into smaller chunks, say around 500MB, and creating multiple arrays.
> That way if you get a read error on one sector that causes one of the
> raid partitions to be marked as bad, only that partition needs to be
> rebuild rather than the whole drive.

If you have a disk on which a bad sector is found, it's time to replace 
it no matter how your partitions are set up.  Drives reserve a set of 
sectors for re-mapping sectors that are detected as bad.  If your OS 
sees a bad sector, it's because that reserve has been exhausted.  More 
sectors will continue to go bad, and you will lose data.  Always replace 
a drive as soon as your OS sees bad sectors, or before based on SMART data.

Partitioning into many smaller chunks is probably a waste of time.  Like 
most of the other participants in this thread, I create software RAID 
sets of one or two partitions per disk and use LVM on top of that.

Hopefully BTRFS will simplify this even further in the near future. :)